Category
Genomic Interval Manipulation
Usage
bedtools map [OPTIONS] -a <bed/gff/vcf> -b <bed/gff/vcf>
Manual
This tool is part of the bedtools
suite.
The main purpose of bedtools map
is to extract and manipulate information from the overlapping intervals in input_file_B and incorporate it into intervals from input_file_A. This process is particularly useful for tasks such as annotating genomic features, calculating summary statistics, or associating specific values with genomic regions.
Required arguments
- -a input_file_A: Path to the input feature file A in BED, GFF, or VCF format.
- -b input_file_B: Path to the input feature file B in BED, GFF, or VCF format.
Options
- -c [int,...]: Specify columns from the B file to map onto intervals in A. Default: 5. As of version 2.19.1, multiple columns can be specified in a comma-delimited list.
- -o [str,...]: Specify the operation that should be applied to -c. Multiple operations can be specified in a comma-delimited list. The number of columns must match the number of operations unless specified otherwise. Valid operations:
- sum: (default)
- min
- max
- absmin
- absmax
- mean
- median
- mode
- antimode
- stdev
- sstdev
- collapse
- distinct
- distinct_sort_num
- distinct_sort_num_desc
- distinct_only
- count
- count_distinct
- first
- last
- -delim str: Specify a custom delimiter for the collapse operations. Example:
-delim "|"
. Default: ",".
- -prec int: Sets the decimal precision for output. Default: 5.
- -null str: Records in A that have no overlap will, by default, return . for the computed value from B. This can be changed with this option.
- -s: Require same strandedness. Only report hits in B that overlap A on the same strand. By default, overlaps are reported without respect to strand.
- -S: Require different strandedness. Only report hits in B that overlap A on the opposite strand. By default, overlaps are reported without respect to strand.
- -f: Minimum overlap required as a fraction of A. Default: $10^{-9}$ (i.e., 1bp).
- -F: Minimum overlap required as a fraction of B. Default: $10^{-9}$ (i.e., 1bp).
- -r: Require reciprocal overlap for both A and B.
- -e: Require the minimum fraction to be satisfied for either A or B.
- -split: Treat "split" BAM or BED12 entries as distinct BED intervals.
- -g str: Provide a genome file to enforce consistent chromosome sort order across input files. Only applies when used with -sorted option.
- -nonamecheck: For sorted data, don't throw an error if the file has different naming conventions for the same chromosome.
- -bed: If using BAM input, write output as BED.
- -header: Print the header from the A file prior to results.
- -nobuf: Disable buffered output. Each line of output is printed as it's generated, rather than saved in a buffer.
- -iobuf: Specify amount of memory to use for input buffer. Takes an integer argument. Optional suffixes K/M/G supported.
Examples
Compute the sum of the score column for all overlaps
By default, map
computes the sum of the 5th column (the score field for BED format) for all intervals in B that overlap each interval in A.
$ cat a.bed
chr1 10 20 a1 1 +
chr1 50 60 a2 2 -
chr1 80 90 a3 3 -
$ cat b.bed
chr1 12 14 b1 2 +
chr1 13 15 b2 5 -
chr1 16 18 b3 5 +
chr1 82 85 b4 2 -
chr1 85 87 b5 3 +
$ bedtools map -a a.bed
-b b.bed
chr1 10 20 a1 1 + 12
chr1 50 60 a2 2 - .
chr1 80 90 a3 3 - 5
Compute the mean of a column from overlapping intervals
$ cat a.bed
chr1 10 20 a1 1 +
chr1 50 60 a2 2 -
chr1 80 90 a3 3 -
$ cat b.bed
chr1 12 14 b1 2 +
chr1 13 15 b2 5 -
chr1 16 18 b3 5 +
chr1 82 85 b4 2 -
chr1 85 87 b5 3 +
$ bedtools map -a a.bed
-b b.bed
-c 5
-o mean
chr1 10 20 a1 1 + 4
chr1 50 60 a2 2 - .
chr1 80 90 a3 3 - 2.5
List each value of a column from overlapping intervals
$ bedtools map -a a.bed
-b b.bed
-c 5
-o collapse
chr1 10 20 a1 1 + 2,5,5
chr1 50 60 a2 2 - .
chr1 80 90 a3 3 - 2,3
List each unique value of a column from overlapping intervals
$ bedtools map -a a.bed
-b b.bed
-c 5
-o distinct
chr1 10 20 a1 1 + 2,5
chr1 50 60 a2 2 - .
chr1 80 90 a3 3 - 2,3
Only include intervals that overlap on the same strand
$ bedtools map -a a.bed
-b b.bed
-c 5
-o collapse
-s
chr1 10 20 a1 1 + 2,5
chr1 50 60 a2 2 - .
chr1 80 90 a3 3 - 2
Only include intervals that overlap on the opposite strand
$ bedtools map -a a.bed
-b b.bed
-c 5
-o collapse
-S
chr1 10 20 a1 1 + 5
chr1 50 60 a2 2 - .
chr1 80 90 a3 3 - 3
Multiple operations and columns at the same time
$ bedtools map -a a.bed
-b b.bed
-c 5,5,5,5
-o min,max,median,collapse
Or, apply the same function to multiple columns:
$ bedtools map -a a.bed
-b b.bed
-c 3,4,5,6
-o mean
File formats this tool works with
Share your experience or ask a question