Category

Genomic Interval Manipulation


Usage

bedtools window [OPTIONS] -a <bed/gff/vcf> -b <bed/gff/vcf>


Manual

This tool is part of the bedtools suite and it's also known as windowBed.

Similar to bedtools intersectwindow searches for overlapping features in A and B. However, window adds a specified number (1000, by default) of base pairs upstream and downstream of each feature in A. In effect, this allows features in B that are near features in A to be detected.

Schematic summary of the functionality

A:         =========
                    |~~~4kb~~~|
                    |~~~~~~~~~9kb~~~~~~~~~|     
B:           ===               =====       =======

intersect    ===

window       ===               =====
-w 5000

window       ===               =====       =======
-w 10000

Required arguments

  • -a <bed/gff/vcf>: File A. Each feature in A is compared to B in search of overlaps. Use stdin if passing A with a UNIX pipe.
  • -b <bed/gff/vcf>: File B. Use stdin if passing B with a UNIX pipe. -b may be followed with multiple databases and/or wildcard (*) character(s). Support for multiple databases is not available until version 2.21.0.

Options

  • -abam: The A input file is in BAM format. Output will be BAM as well. Replaces -a.
  • -ubam: Write uncompressed BAM output. Default writes compressed BAM.
  • -bed: When using BAM input (-abam), write output as BED. The default is to write output in BAM when using -abam.
  • -w <int>: Base pairs added upstream and downstream of each entry in A when searching for overlaps in B. It creates symmetrical windows around A. Default is 1000 bp.
  • -l <int>: Base pairs added upstream (left of) of each entry in A when searching for overlaps in B. It allows one to define asymmetrical windows. Default is 1000 bp.
  • -r <int>: Base pairs added downstream (right of) of each entry in A when searching for overlaps in B. It allows one to define asymmetrical windows. Default is 1000 bp.
  • -sw: Define -l and -r based on strand. For example if used, -l 500 for a negative-stranded feature will add 500 bp downstream. (not enabled by default)
  • -sm: Only report hits in B that overlap A on the same strand. By default, overlaps are reported without respect to strand.
  • -Sm: Only report hits in B that overlap A on the opposite strand. By default, overlaps are reported without respect to strand.
  • -u: Write the original A entry once if any overlaps found in B. In other words, just report the fact >=1 hit was found.
  • -c: For each entry in A, report the number of overlaps with B. It reports 0 for A entries that have no overlap with B. Overlaps restricted by -w, -l, and -r.
  • -v: Only report those entries in A that have no overlaps with B. Similar to grep -v
  • -header: Print the header from the A file prior to results.

Note: By default, the -l and -r options ignore strand. If you want to define upstream and downstream based on strand, use the -sw option with the -l and -r options.

Examples

By default, bedtools window adds 1000 bp upstream and downstream of each A feature and searches for features in B that overlap this “window”. If an overlap is found in B, both the original A feature and the original B feature are reported.

$ cat A.bed
chr1  100  200

$ cat B.bed
chr1  500  1000
chr1  1300 2000

$ bedtools window -a A.bed -b B.bed
chr1  100  200  chr1  500  1000
Defining a custom window size

Instead of using the default window size of 1000bp, one can define a custom, symmetric window around each feature in A using the -w option. One should specify the window size in base pairs. For example, a window of 5kb should be defined as -w 5000.

For example (note that in contrast to the default behavior, the second B entry is reported):

$ cat A.bed
chr1  100  200

$ cat B.bed
chr1  500  1000
chr1  1300 2000

$ bedtools window -a A.bed -b B.bed -w 5000
chr1  100  200  chr1  500   1000
chr1  100  200  chr1  1300  2000
Defining asymmetric windows

One can also define asymmetric windows where a differing number of bases are added upstream and downstream of each feature using the -l (upstream) and -r (downstream) options.

For example (note the difference between -l 200 and -l 300):

$ cat A.bed
chr1  1000  2000

$ cat B.bed
chr1  500   800
chr1  10000 20000

$ bedtools window -a A.bed -b B.bed -l 200 -r 20000
chr1  1000   2000  chr1  10000  20000

$ bedtools window -a A.bed -b B.bed -l 300 -r 20000
chr1  1000   2000  chr1  500    800
chr1  1000   2000  chr1  10000  20000
Defining asymmetric windows based on strand

Especially when dealing with gene annotations or RNA-seq experiments, you may want to define asymmetric windows based on “strand”. For example, you may want to screen for overlaps that occur within 5000 bp upstream of a gene (e.g. a promoter region) while screening only 1000 bp downstream of the gene. By enabling the -sw (stranded windows) option, the windows are added upstream or downstream according to strand. For example, imagine one specifies -l 5000-r 1000 as well as the -sw option. In this case, forward stranded (+) features will screen 5000 bp to the left (that is, lower genomic coordinates) and 1000 bp to the right (that is, higher genomic coordinates). By contrast, reverse stranded (-) features will screen 5000 bp to the right (that is, higher genomic coordinates) and 1000 bp to the left (that is, lower genomic coordinates).

For example (note the difference between -l 200 and -l 300):

$ cat A.bed
chr1  10000  20000  A.forward  1  +
chr1  10000  20000  A.reverse  1  -

$ cat B.bed
chr1  1000   8000   B1
chr1  24000  32000  B2

$ bedtools window -a A.bed -b B.bed -l 5000 -r 1000 -sw
chr1  10000  20000  A.forward  1  +  chr1  1000   8000   B1
chr1  10000  20000  A.reverse  1  -  chr1  24000  32000  B2

File formats this tool works with
BEDGFFGTFVCF

Share your experience or ask a question