Category

Genomic Interval Manipulation


Usage

bedtools map [OPTIONS] -a <bed/gff/vcf> -b <bed/gff/vcf>


Manual

This tool is part of the bedtools suite.

The main purpose of bedtools map is to extract and manipulate information from the overlapping intervals in input_file_B and incorporate it into intervals from input_file_A. This process is particularly useful for tasks such as annotating genomic features, calculating summary statistics, or associating specific values with genomic regions.

Required arguments

  • -a input_file_A: Path to the input feature file A in BED, GFF, or VCF format.
  • -b input_file_B: Path to the input feature file B in BED, GFF, or VCF format.

Options

  • -c [int,...]: Specify columns from the B file to map onto intervals in A. Default: 5. As of version 2.19.1, multiple columns can be specified in a comma-delimited list.
  • -o [str,...]: Specify the operation that should be applied to -c. Multiple operations can be specified in a comma-delimited list. The number of columns must match the number of operations unless specified otherwise. Valid operations:
    • sum: (default)
    • min
    • max
    • absmin
    • absmax
    • mean
    • median
    • mode
    • antimode
    • stdev
    • sstdev
    • collapse
    • distinct
    • distinct_sort_num
    • distinct_sort_num_desc
    • distinct_only
    • count
    • count_distinct
    • first
    • last
  • -delim str: Specify a custom delimiter for the collapse operations. Example: -delim "|". Default: ",".
  • -prec int: Sets the decimal precision for output. Default: 5.
  • -null str: Records in A that have no overlap will, by default, return . for the computed value from B. This can be changed with this option.
  • -s: Require same strandedness. Only report hits in B that overlap A on the same strand. By default, overlaps are reported without respect to strand.
  • -S: Require different strandedness. Only report hits in B that overlap A on the opposite strand. By default, overlaps are reported without respect to strand.
  • -f: Minimum overlap required as a fraction of A. Default: $10^{-9}$ (i.e., 1bp).
  • -F: Minimum overlap required as a fraction of B. Default: $10^{-9}$ (i.e., 1bp).
  • -r: Require reciprocal overlap for both A and B.
  • -e: Require the minimum fraction to be satisfied for either A or B.
  • -split: Treat "split" BAM or BED12 entries as distinct BED intervals.
  • -g str: Provide a genome file to enforce consistent chromosome sort order across input files. Only applies when used with -sorted option.
  • -nonamecheck: For sorted data, don't throw an error if the file has different naming conventions for the same chromosome.
  • -bed: If using BAM input, write output as BED.
  • -header: Print the header from the A file prior to results.
  • -nobuf: Disable buffered output. Each line of output is printed as it's generated, rather than saved in a buffer.
  • -iobuf: Specify amount of memory to use for input buffer. Takes an integer argument. Optional suffixes K/M/G supported.

Examples

Compute the sum of the score column for all overlaps

By default, map computes the sum of the 5th column (the score field for BED format) for all intervals in B that overlap each interval in A.

$ cat a.bed
chr1        10      20      a1      1       +
chr1        50      60      a2      2       -
chr1        80      90      a3      3       -

$ cat b.bed
chr1        12      14      b1      2       +
chr1        13      15      b2      5       -
chr1        16      18      b3      5       +
chr1        82      85      b4      2       -
chr1        85      87      b5      3       +

$ bedtools map -a a.bed -b b.bed
chr1        10      20      a1      1       +       12
chr1        50      60      a2      2       -       .
chr1        80      90      a3      3       -       5
Compute the mean of a column from overlapping intervals
$ cat a.bed
chr1        10      20      a1      1       +
chr1        50      60      a2      2       -
chr1        80      90      a3      3       -

$ cat b.bed
chr1        12      14      b1      2       +
chr1        13      15      b2      5       -
chr1        16      18      b3      5       +
chr1        82      85      b4      2       -
chr1        85      87      b5      3       +

$ bedtools map -a a.bed -b b.bed -c 5 -o mean
chr1        10      20      a1      1       +       4
chr1        50      60      a2      2       -       .
chr1        80      90      a3      3       -       2.5
List each value of a column from overlapping intervals
$ bedtools map -a a.bed -b b.bed -c 5 -o collapse
chr1        10      20      a1      1       +       2,5,5
chr1        50      60      a2      2       -       .
chr1        80      90      a3      3       -       2,3
List each unique value of a column from overlapping intervals
$ bedtools map -a a.bed -b b.bed -c 5 -o distinct
chr1        10      20      a1      1       +       2,5
chr1        50      60      a2      2       -       .
chr1        80      90      a3      3       -       2,3
Only include intervals that overlap on the same strand
$ bedtools map -a a.bed -b b.bed -c 5 -o collapse -s
chr1        10      20      a1      1       +       2,5
chr1        50      60      a2      2       -       .
chr1        80      90      a3      3       -       2
Only include intervals that overlap on the opposite strand
$ bedtools map -a a.bed -b b.bed -c 5 -o collapse -S
chr1        10      20      a1      1       +       5
chr1        50      60      a2      2       -       .
chr1        80      90      a3      3       -       3
Multiple operations and columns at the same time
$ bedtools map -a a.bed -b b.bed -c 5,5,5,5 -o min,max,median,collapse

Or, apply the same function to multiple columns:

$ bedtools map -a a.bed -b b.bed -c 3,4,5,6 -o mean

File formats this tool works with
BEDGFFGTFVCF

Share your experience or ask a question