Reference Code backup Executable files
Report overlaps between a BEDPE file and a BED/GFF/VCF file.
bedtools pairtobed [OPTIONS] -a <bedpe> -b <bed/gff/vcf>
This tool is part of the bedtools
suite and it's also known as pairToBed
.
pairToBed
to extract correct alignment coordinates for each end based on their respective CIGAR strings. It also assumes that the alignments for a given pair come in groups of twos. There is not yet a standard method for reporting multiple alignments using BAM. pairToBed will fail if an aligner does not report alignments in pairs.-f 0.5
, and below shows how the results change:
Chromosome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
BEDPE/BAM A *****.................................*****
BED File B ^^ ^^^^^^
Result (-f 0.5
)
BEDPE/BAM A *****.................................*****
BED File B ^^^^ ^^^^^^
Result =====.................................=====
-type inspan
or -type outspan
.-type inspan
or -type outspan
.Chromosome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ BEDPE/BAM A *****.................................***** BED File B ^^^^^^^^ ^^^^^^ Result =====.................................=====
Chromosome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ BEDPE/BAM A *****.................................***** BED File B ^^^^^^^^ ^^^^^^ Result BEDPE/BAM A *****.................................***** BED File B ^^^^ ^^^^^^ Result =====.................................=====
Chromosome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ BEDPE/BAM A *****.................................***** BED File B ^^^^^^^^ ^^^^^^ Result BEDPE/BAM A *****.................................***** BED File B ^^^^^^^^ ^^^^^^ Result =====.................................=====
Chromosome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ BEDPE/BAM A *****.................................***** BED File B ^^^^^^^^ ^^^^^^ Result =====.................................===== BEDPE/BAM A *****.................................***** BED File B ^^^^ ^^^^^^ Result
Chromosome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ BEDPE/BAM A *****.................................***** BED File B ^^^^^^^^ ^^^^^^ Result =====.................................===== BEDPE/BAM A *****.................................***** BED File B ^^^ ^^^^^^ Result =====.................................===== BEDPE/BAM A *****.................................***** BED File B ^^^^ ^^^^^^ Result
Chromosome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Inner span |-------------------------------| BEDPE/BAM A *****.................................***** BED File B ^^^^^^^^ Result =====.................................===== BEDPE/BAM A =====.................................===== BED File B ==== Result
Chromosome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Outer span |-----------------------------------------| BEDPE/BAM A *****.................................***** BED File B ^^^^^^^^^^^^ Result =====.................................===== BEDPE/BAM A *****.................................***** BED File B ^^^^ Result
Chromosome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Inner span |-------------------------------| BEDPE/BAM A *****.................................***** BED File B ^^^^^^^^ Result BEDPE/BAM A *****.................................***** BED File B ^^^^ Result =====.................................=====
Chromosome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Outer span |-----------------------------------------| BEDPE/BAM A *****.................................***** BED File B ^^^^^^^^^^^^ Result BEDPE/BAM A *****.................................***** BED File B ^^^^ Result =====.................................=====
Assume we have a bedpe file which stores chromatin loops identified by HiC (intactHiC_loops.bedpe):
$ head intactHiC_loops.bedpe chr10 102835000 102836000 chr10 102901000 102902000 . . . . 0,255,255 16.0 2.5453029 2.0566912 2.896359 2.6027875 0.0 0.0 5.9604645E-8 0.0 2 102835000 102901500 500 102834600 102835200 102901400 102901700 102834700 102901500 4.0 2.17173181723318E-4 0 chr10 123583000 123584000 chr10 123967000 123968000 . . . . 0,255,255 17.0 1.2294405 1.126373 1.5320965 2.968846 0.0 0.0 0.0 0.0 2 123583000 123967500 500 NA NA NA NA NA NA NA NA NA chr10 60780000 60782000 chr10 60828000 60830000 . . . . 0,255,255 16.0 3.9354546 3.6036625 4.087633 2.7198699 3.993511E-6 1.3113022E-6 6.377697E-6 5.9604645E-8 1 60781000 60829000 0 NA NA NA NA NA NA NA NA NA chr10 33050000 33051000 chr10 33067000 33068000 . . . . 0,255,255 11.0 1.8586187 1.8435045 1.302612 2.001136 4.23193E-6 3.9339066E-6 1.1920929E-7 8.34465E-6 1 33050500 33067500 0 NA NA NA NA NA NA NA NA NA chr10 11412000 11414000 chr10 11472000 11474000 . . . . 0,255,255 27.0 5.0568876 4.13146 3.918301 7.767662 0.00.0 0.0 5.9604645E-8 1 11413000 11473000 0 11412000 11412500 11471700 11472700 11412200 11472100 10.0 0.008861431153317945 0 chr10 45005000 45010000 chr10 45465000 45470000 . . . . 0,255,255 16.0 3.658383 2.8294744 6.2742825 2.340569 1.6093254E-6 5.9604645E-8 8.097887E-4 0.0 2 45007500 45465000 2500 NA NA NA NA NA NA NA NA NA chr10 120420000 120430000 chr10 120600000 120610000 . . . . 0,255,255 29.0 9.291821 9.322407 13.77734 8.561912 1.7881393E-7 1.7881393E-7 2.3120642E-4 5.9604645E-8 3 120421666 120595000 12018 NA NA NA NA NA NA NA NA NA chr10 62437000 62438000 chr10 63158000 63159000 . . . . 0,255,255 13.0 1.353705 0.9746326 1.4709425 1.4360862 0.0 0.0 0.0 0.0 1 62437500 63158500 0 NA NA NA NA NA NA NA NA NA chr10 91006000 91008000 chr10 91296000 91298000 . . . . 0,255,255 12.0 2.4448104 1.9931997 3.1885839 2.2351904 1.013279E-5 1.3113022E-6 1.2511015E-4 4.172325E-6 2 91007000 91296000 1000 NA NA NA NA NA NA NA NA NA chr10 16952000 16954000 chr10 17256000 17258000 . . . . 0,255,255 13.0 2.43382 2.2646153 2.7904801 2.2059083 1.7881393E-6 8.34465E-7 7.6293945E-6 5.9604645E-7 1 16953000 17257000 0 NA NA NA NA NA NA NA NA NA
And we have a bed file which defines all promoter regions:
$ head promoters_1kb_tss_centered.bed
chr1 68590 69590 OR4F5 promoter + 69090 70008 102,194,165
chr1 181892 182892 FO538757.3 promoter + 182392 184158 102,194,165
chr1 194911 195911 FO538757.2 promoter - 184922 195411 102,194,165
chr1 194911 195911 FO538757.2 promoter - 184924 195411 102,194,165
chr1 199822 200822 FO538757.2 promoter - 184926 200322 102,194,165
chr1 451178 452178 OR4F29 promoter - 450739 451678 102,194,165
chr1 686154 687154 OR4F16 promoter - 685715 686654 102,194,165
chr1 924379 925379 SAMD11 promoter + 924879 939291 102,194,165
chr1 924649 925649 SAMD11 promoter + 925149 935793 102,194,165
chr1 925237 926237 SAMD11 promoter + 925737 944575 102,194,165
We can run the following command to get all promoters regions that each HiC anchor overlap with:
$ bedtools pairtobed-a ENCFF256ZMD.bedpe
-b promoters_1kb_tss_centered.bed
| head chr10 60780000 60782000 chr10 60828000 60830000 . . . . 0,255,255 16.0 3.9354546 3.6036625 4.087633 2.7198699 3.993511E-6 1.3113022E-6 6.377697E-6 5.9604645E-8 1 60781000 60829000 0 NA NA NA NA NA NA NA NA NA chr10 60779652 60780652 CDK1 promoter + 60780152 60794852 102,194,165 chr10 67885000 67890000 chr10 69315000 69320000 . . . . 0,255,255 9.0 0.46407583 0.5072134 0.5800615 0.531669740.0 0.0 0.0 0.0 1 67887500 69317500 0 NA NA NA NA NA NA NA NA NA chr10 67884168 67885168 SIRT1 promoter + 67884668 67918390 102,194,165 chr10 67885000 67890000 chr10 69315000 69320000 . . . . 0,255,255 9.0 0.46407583 0.5072134 0.5800615 0.531669740.0 0.0 0.0 0.0 1 67887500 69317500 0 NA NA NA NA NA NA NA NA NA chr10 67884680 67885680 SIRT1 promoter + 67885180 67891496 102,194,165 chr10 67885000 67890000 chr10 69315000 69320000 . . . . 0,255,255 9.0 0.46407583 0.5072134 0.5800615 0.531669740.0 0.0 0.0 0.0 1 67887500 69317500 0 NA NA NA NA NA NA NA NA NA chr10 67884680 67885680 SIRT1 promoter + 67885180 67918389 102,194,165 chr10 67885000 67890000 chr10 69315000 69320000 . . . . 0,255,255 9.0 0.46407583 0.5072134 0.5800615 0.531669740.0 0.0 0.0 0.0 1 67887500 69317500 0 NA NA NA NA NA NA NA NA NA chr10 67884683 67885683 SIRT1 promoter + 67885183 67906879 102,194,165 chr10 67885000 67890000 chr10 69315000 69320000 . . . . 0,255,255 9.0 0.46407583 0.5072134 0.5800615 0.531669740.0 0.0 0.0 0.0 1 67887500 69317500 0 NA NA NA NA NA NA NA NA NA chr10 69315284 69316284 HK1 promoter + 69315784 69369624 102,194,165 chr10 67885000 67890000 chr10 69315000 69320000 . . . . 0,255,255 9.0 0.46407583 0.5072134 0.5800615 0.531669740.0 0.0 0.0 0.0 1 67887500 69317500 0 NA NA NA NA NA NA NA NA NA chr10 69315353 69316353 HK1 promoter + 69315853 69401879 102,194,165 chr10 67885000 67890000 chr10 69315000 69320000 . . . . 0,255,255 9.0 0.46407583 0.5072134 0.5800615 0.531669740.0 0.0 0.0 0.0 1 67887500 69317500 0 NA NA NA NA NA NA NA NA NA chr10 69318343 69319343 HK1 promoter + 69318843 69401882 102,194,165 chr10 67885000 67890000 chr10 69315000 69320000 . . . . 0,255,255 9.0 0.46407583 0.5072134 0.5800615 0.531669740.0 0.0 0.0 0.0 1 67887500 69317500 0 NA NA NA NA NA NA NA NA NA chr10 69318610 69319610 HK1 promoter + 69319110 69376970 102,194,165 chr10 110300000 110310000 chr10 112310000 112320000 . . . . 0,255,255 13.0 2.389684 2.4640477 2.5979018 3.2360432 1.4901161E-6 2.026558E-6 3.5762787E-6 3.4868717E-5 2 110310000 112315000 5000 NA NA NA NA NA NA NA NA NA chr10 110304451 110305451 SMNDC1 promoter - 110290729 110304951 102,194,165
The bold parts are from the input bedpe file, and the italic parts are from the query bed file.
$ pairToBed-a sv.bedpe
-b genes.bed
> sv.genes
$ pairToBed-abam reads.bam
-b SSRs.bed
-type neither
> reads.noSSRs.bam