Category

Genomic Interval Manipulation


Usage

bedtools tag [OPTIONS] -i <BAM> -files FILE1 .. FILEn -labels LAB1 .. LABn


Manual

This tool is part of the bedtools suite and is also known as tagBed.

Required arguments

  • -i BAM: Input BAM file to be annotated.
  • -files FILE1 .. FILEn: List of annotation files (BED/GFF/VCF) for overlap.
  • -labels LAB1 .. LABn: Labels for annotation files in the same order as files.

Options

  • -s: Require overlaps on the same strand.
  • -S: Require overlaps on the opposite strand.
  • -f FLOAT: Minimum overlap required as a fraction of the alignment (default: $10^{-9}$).
  • -tag STRING: Dictate what the tag should be (default: YB).
  • -names: Use the name field from annotation files to populate tags (default: -labels values).
  • -scores: Use the score field from annotation files to populate tags (default: -labels values).
  • -intervals: Use the full interval (including name, score, and strand) to populate tags. Requires the -labels option to identify which file the interval came from.

Examples

Add tags to alignments based on their overlap with regions defined in bed files

In the following example, we will add a tag (YB, can be overrided by using the -tag option) to alignments in the bam file test.bam, if the alignments overlap with regions in bed files test1.bed or test2.bed (as specified by the -files options). Tag values are the source of regions the alignments overlap with. For example, if an alignment with a region defined in test1.bed and we label this file as s1 (as defined by the -labels option), then the alignment will have a tag value of YB:Z:s1.

$ bedtools tag -i test.bam -files test1.bed test2.bed -labels s1 s2 > tagged.bam
$ samtools view tagged.bam | head
# this alignment doesn't overlap with any regions in test1.bed or test2.bed
example1.41109452	16	chr1	16223	255	30M	*	0	0	GACAGTCTCAGTTGCACACACGAGCCAGCA	GHGIGF>IGIIIHGFEDFFFFHFFFFFCC@	NH:i:1	HI:i:1	AS:i:29	nM:i:0	NM:i:0	MD:Z:30	jM:B:c,-1	jI:B:i,-1
# this alignment overlaps with regions in test1.bed
example2.40005100	0	chr1	139013	255	30M	*	0	0	GAGTAAGTTTTGGGCCCGGAGATGATGTCC	BBCDDDDEHHHHHJJJJJJIJIJIJJIJJJ	NH:i:1	HI:i:1	AS:i:29	nM:i:0	NM:i:0	MD:Z:30	jM:B:c,-1	jI:B:i,-1	YB:Z:s1
# this alignment overlaps with regions in test2.bed
example3.17421922	0	chr1	804895	255	30M	*	0	0	AGAAAACACCGGGGAAGTCCAGCCTGCACG	CCCFFFFFHHHHHJJJGIJJJJJJJJJJJJ	NH:i:1	HI:i:1	AS:i:29	nM:i:0	NM:i:0	MD:Z:30	jM:B:c,-1	jI:B:i,-1	YB:Z:s2
# this alignment overlaps with regions in both test1.bed and test2.bed
example4.1423869	16	chr1	267979	255	30M	*	0	0	TTTCTCCTCAGTTTCTCTGTGCAGCACCAG	GJIJIJJJJJJJIGHFFBFGHHFFDDF@C@	NH:i:1	HI:i:1	AS:i:29	nM:i:0	NM:i:0	MD:Z:30	jM:B:c,-1	jI:B:i,-1	YB:Z:s1;s2
Use regions' names or scores as tag values

In the above example, we assigned two labels (s1 and s2) to regions defined in test1.bed and test2.bed respectively. If you want to use regions' names (the fourth column in a bed file) as the tag value, you can use the -names options:

$ bedtools tag -i test.bam -files s1.bed s2.bed -names | samtools view | head
example2.40005100	0	chr1	139013	255	30M	*	0	0	GAGTAAGTTTTGGGCCCGGAGATGATGTCC	BBCDDDDEHHHHHJJJJJJIJIJIJJIJJJ	NH:i:1	HI:i:1	AS:i:29	nM:i:0	NM:i:0	MD:Z:30	jM:B:c,-1	jI:B:i,-1	YB:Z:EH38E2776521

Similarly, if you want to use regions' scores (the fifth column in a bed file), use the -scores option instead

 

File formats this tool works with
BEDBAMGFFGTFVCF

Share your experience or ask a question