Category

Genomic Interval Manipulation


Usage

bedtools pairtobed [OPTIONS] -a <bedpe> -b <bed/gff/vcf>


Manual

This tool is part of the bedtools suite and it's also known as pairToBed.

Required arguments

  • -a <bedpe>: The A input file in BEDPE format.
  • -b <bed/gff/vcf>: The B input file, which must be in BED, GFF, or VCF format.

Options

  • -abam: Indicates the A input file is in BAM format. Output will also be in BAM. Replaces -a. Requires requires that the BAM file is sorted/grouped by the read name. This allows pairToBed to extract correct alignment coordinates for each end based on their respective CIGAR strings. It also assumes that the alignments for a given pair come in groups of twos. There is not yet a standard method for reporting multiple alignments using BAM. pairToBed will fail if an aligner does not report alignments in pairs.
  • -ubam: Write uncompressed BAM output. Default writes compressed BAM. This option is used when the output format is BAM from -abam.
  • -bedpe: If using BAM input (-abam), write output as BEDPE. The default is to write output in BAM when using -abam.
  • -ed: Use BAM total edit distance (NM tag) for BEDPE score. Default for BEDPE is to use the minimum of the two mapping qualities for the pair. When this option is used the total edit distance from the two mates is reported as the score.
  • -f float: Specifies the minimum overlap required as a fraction of A (e.g., 0.05). Default is $10^{-9}$ (effectively 1bp). For example, if you want to report A only at least 50% of one of the two ends is overlapped by B, you can specify -f 0.5, and below shows how the results change:
    Chromosome  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    BEDPE/BAM A         *****.................................*****
    BED File B         ^^                                           ^^^^^^
    Result (-f 0.5)
    
    BEDPE/BAM A         *****.................................*****
    BED File B         ^^^^                                         ^^^^^^
    Result              =====.................................=====
  • -s: Requires the same strandedness when finding overlaps. Default is to ignore strandedness. Not applicable with -type inspan or -type outspan.
  • -S: Requires different strandedness when finding overlaps. Default is to ignore strandedness. Not applicable with -type inspan or -type outspan.
  • -type string: Defines the approach to reporting overlaps between BEDPE and BED. Valid options:
    • either: Report overlaps if either end of A overlaps B. This is the default setting.
      Chromosome  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      BEDPE/BAM A         *****.................................*****
      BED File B         ^^^^^^^^                                          ^^^^^^
      Result              =====.................................=====
    • neither: Report A if neither end of A overlaps B.
      Chromosome  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      BEDPE/BAM A         *****.................................*****
      BED File B         ^^^^^^^^                                          ^^^^^^
      Result
      
      BEDPE/BAM A         *****.................................*****
      BED File B   ^^^^                                                  ^^^^^^
      Result              =====.................................=====
    • both: Report overlaps if both ends of A overlap B.
      Chromosome  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      BEDPE/BAM A         *****.................................*****
      BED File B         ^^^^^^^^                                          ^^^^^^
      Result
      
      BEDPE/BAM A         *****.................................*****
      BED File B         ^^^^^^^^                                   ^^^^^^
      Result              =====.................................=====
    • xor: Report overlaps if one and only one end of A overlaps B.
      Chromosome  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      BEDPE/BAM A         *****.................................*****
      BED File B         ^^^^^^^^                                          ^^^^^^
      Result              =====.................................=====
      
      BEDPE/BAM A         *****.................................*****
      BED File B         ^^^^                                   ^^^^^^
      Result
    • notboth: Report overlaps if neither end or one and only one end of A overlap B. Equivalent to xor + neither.
      Chromosome  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      BEDPE/BAM A         *****.................................*****
      BED File B         ^^^^^^^^                                          ^^^^^^
      Result              =====.................................=====
      
      BEDPE/BAM A         *****.................................*****
      BED File B     ^^^                                               ^^^^^^
      Result              =====.................................=====
      
      BEDPE/BAM A         *****.................................*****
      BED File B         ^^^^                                   ^^^^^^
      Result
    • ispan: Report overlaps between $[\text{end}_1, \text{start}_2]$ of A and B. Entries where $\text{chrom}_1 \neq \text{chrom}_2$ are ignored (Applicable only to intra-chromosomal features).
      Chromosome  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                    Inner span |-------------------------------|
      BEDPE/BAM A         *****.................................*****
      BED File B                         ^^^^^^^^
      Result              =====.................................=====
      
      BEDPE/BAM A         =====.................................=====
      BED File B         ====
      Result
    • ospan: Report overlaps between $[\text{start}_1, \text{end}_2]$ (outer span) of A and B. Entries where $\text{chrom}_1 \neq \text{chrom}_2$ are ignored (Applicable only to intra-chromosomal features).
      Chromosome  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
              Outer span  |-----------------------------------------|
      BEDPE/BAM A         *****.................................*****
      BED File B             ^^^^^^^^^^^^
      Result              =====.................................=====
      
      BEDPE/BAM A         *****.................................*****
      BED File B     ^^^^
      Result
    • notispan: Report A if the ispan of A doesn't overlap B. Entries where $\text{chrom}_1 \neq \text{chrom}_2$ are ignored (Applicable only to intra-chromosomal features).
      Chromosome  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                    Inner span |-------------------------------|
      BEDPE/BAM A         *****.................................*****
      BED File B                         ^^^^^^^^
      Result
      
      BEDPE/BAM A         *****.................................*****
      BED File B         ^^^^
      Result              =====.................................=====
    • notospan: Report A if the outer span of A doesn't overlap B. Entries where $\text{chrom}_1 \neq \text{chrom}_2$ are ignored (Applicable only to intra-chromosomal features).
      Chromosome  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
              Outer span  |-----------------------------------------|
      BEDPE/BAM A         *****.................................*****
      BED File B             ^^^^^^^^^^^^
      Result
      
      BEDPE/BAM A         *****.................................*****
      BED File B     ^^^^
      Result              =====.................................=====

Examples

Get HiC loops that overlap with gene promoters

Assume we have a bedpe file which stores chromatin loops identified by HiC (intactHiC_loops.bedpe):

$ head intactHiC_loops.bedpe
chr10    102835000    102836000    chr10    102901000    102902000    .    .    .    .    0,255,255    16.0    2.5453029    2.0566912    2.896359    2.6027875    0.0    0.0    5.9604645E-8    0.0    2    102835000    102901500    500    102834600    102835200    102901400    102901700    102834700    102901500    4.0    2.17173181723318E-4    0
chr10    123583000    123584000    chr10    123967000    123968000    .    .    .    .    0,255,255    17.0    1.2294405    1.126373    1.5320965    2.968846    0.0    0.0    0.0    0.0    2    123583000    123967500    500    NA    NA    NA    NA    NA    NA    NA    NA    NA
chr10    60780000    60782000    chr10    60828000    60830000    .    .    .    .    0,255,255    16.0    3.9354546    3.6036625    4.087633    2.7198699    3.993511E-6    1.3113022E-6    6.377697E-6    5.9604645E-8    1    60781000    60829000    0    NA    NA    NA    NA    NA    NA    NA    NA    NA
chr10    33050000    33051000    chr10    33067000    33068000    .    .    .    .    0,255,255    11.0    1.8586187    1.8435045    1.302612    2.001136    4.23193E-6    3.9339066E-6    1.1920929E-7    8.34465E-6    1    33050500    33067500    0    NA    NA    NA    NA    NA    NA    NA    NA    NA
chr10    11412000    11414000    chr10    11472000    11474000    .    .    .    .    0,255,255    27.0    5.0568876    4.13146    3.918301    7.767662    0.00.0    0.0    5.9604645E-8    1    11413000    11473000    0    11412000    11412500    11471700    11472700    11412200    11472100    10.0    0.008861431153317945    0
chr10    45005000    45010000    chr10    45465000    45470000    .    .    .    .    0,255,255    16.0    3.658383    2.8294744    6.2742825    2.340569    1.6093254E-6    5.9604645E-8    8.097887E-4    0.0    2    45007500    45465000    2500    NA    NA    NA    NA    NA    NA    NA    NA    NA
chr10    120420000    120430000    chr10    120600000    120610000    .    .    .    .    0,255,255    29.0    9.291821    9.322407    13.77734    8.561912    1.7881393E-7    1.7881393E-7    2.3120642E-4    5.9604645E-8    3    120421666    120595000    12018    NA    NA    NA    NA    NA    NA    NA    NA    NA
chr10    62437000    62438000    chr10    63158000    63159000    .    .    .    .    0,255,255    13.0    1.353705    0.9746326    1.4709425    1.4360862    0.0    0.0    0.0    0.0    1    62437500    63158500    0    NA    NA    NA    NA    NA    NA    NA    NA    NA
chr10    91006000    91008000    chr10    91296000    91298000    .    .    .    .    0,255,255    12.0    2.4448104    1.9931997    3.1885839    2.2351904    1.013279E-5    1.3113022E-6    1.2511015E-4    4.172325E-6    2    91007000    91296000    1000    NA    NA    NA    NA    NA    NA    NA    NA    NA
chr10    16952000    16954000    chr10    17256000    17258000    .    .    .    .    0,255,255    13.0    2.43382    2.2646153    2.7904801    2.2059083    1.7881393E-6    8.34465E-7    7.6293945E-6    5.9604645E-7    1    16953000    17257000    0    NA    NA    NA    NA    NA    NA    NA    NA    NA

And we have a bed file which defines all promoter regions:

$ head promoters_1kb_tss_centered.bed
chr1    68590    69590    OR4F5    promoter    +    69090    70008    102,194,165
chr1    181892    182892    FO538757.3    promoter    +    182392    184158    102,194,165
chr1    194911    195911    FO538757.2    promoter    -    184922    195411    102,194,165
chr1    194911    195911    FO538757.2    promoter    -    184924    195411    102,194,165
chr1    199822    200822    FO538757.2    promoter    -    184926    200322    102,194,165
chr1    451178    452178    OR4F29    promoter    -    450739    451678    102,194,165
chr1    686154    687154    OR4F16    promoter    -    685715    686654    102,194,165
chr1    924379    925379    SAMD11    promoter    +    924879    939291    102,194,165
chr1    924649    925649    SAMD11    promoter    +    925149    935793    102,194,165
chr1    925237    926237    SAMD11    promoter    +    925737    944575    102,194,165

We can run the following command to get all promoters regions that each HiC anchor overlap with:

$ bedtools pairtobed -a ENCFF256ZMD.bedpe -b promoters_1kb_tss_centered.bed | head
chr10    60780000    60782000    chr10    60828000    60830000    .    .    .    .    0,255,255    16.0    3.9354546    3.6036625    4.087633    2.7198699    3.993511E-6    1.3113022E-6    6.377697E-6    5.9604645E-8    1    60781000    60829000    0    NA    NA    NA    NA    NA    NA    NA    NA    NA    chr10    60779652    60780652    CDK1    promoter    +    60780152    60794852    102,194,165
chr10    67885000    67890000    chr10    69315000    69320000    .    .    .    .    0,255,255    9.0    0.46407583    0.5072134    0.5800615    0.531669740.0    0.0    0.0    0.0    1    67887500    69317500    0    NA    NA    NA    NA    NA    NA    NA    NA    NA    chr10    67884168    67885168    SIRT1    promoter    +    67884668    67918390    102,194,165
chr10    67885000    67890000    chr10    69315000    69320000    .    .    .    .    0,255,255    9.0    0.46407583    0.5072134    0.5800615    0.531669740.0    0.0    0.0    0.0    1    67887500    69317500    0    NA    NA    NA    NA    NA    NA    NA    NA    NA    chr10    67884680    67885680    SIRT1    promoter    +    67885180    67891496    102,194,165
chr10    67885000    67890000    chr10    69315000    69320000    .    .    .    .    0,255,255    9.0    0.46407583    0.5072134    0.5800615    0.531669740.0    0.0    0.0    0.0    1    67887500    69317500    0    NA    NA    NA    NA    NA    NA    NA    NA    NA    chr10    67884680    67885680    SIRT1    promoter    +    67885180    67918389    102,194,165
chr10    67885000    67890000    chr10    69315000    69320000    .    .    .    .    0,255,255    9.0    0.46407583    0.5072134    0.5800615    0.531669740.0    0.0    0.0    0.0    1    67887500    69317500    0    NA    NA    NA    NA    NA    NA    NA    NA    NA    chr10    67884683    67885683    SIRT1    promoter    +    67885183    67906879    102,194,165
chr10    67885000    67890000    chr10    69315000    69320000    .    .    .    .    0,255,255    9.0    0.46407583    0.5072134    0.5800615    0.531669740.0    0.0    0.0    0.0    1    67887500    69317500    0    NA    NA    NA    NA    NA    NA    NA    NA    NA    chr10    69315284    69316284    HK1    promoter    +    69315784    69369624    102,194,165
chr10    67885000    67890000    chr10    69315000    69320000    .    .    .    .    0,255,255    9.0    0.46407583    0.5072134    0.5800615    0.531669740.0    0.0    0.0    0.0    1    67887500    69317500    0    NA    NA    NA    NA    NA    NA    NA    NA    NA    chr10    69315353    69316353    HK1    promoter    +    69315853    69401879    102,194,165
chr10    67885000    67890000    chr10    69315000    69320000    .    .    .    .    0,255,255    9.0    0.46407583    0.5072134    0.5800615    0.531669740.0    0.0    0.0    0.0    1    67887500    69317500    0    NA    NA    NA    NA    NA    NA    NA    NA    NA    chr10    69318343    69319343    HK1    promoter    +    69318843    69401882    102,194,165
chr10    67885000    67890000    chr10    69315000    69320000    .    .    .    .    0,255,255    9.0    0.46407583    0.5072134    0.5800615    0.531669740.0    0.0    0.0    0.0    1    67887500    69317500    0    NA    NA    NA    NA    NA    NA    NA    NA    NA    chr10    69318610    69319610    HK1    promoter    +    69319110    69376970    102,194,165
chr10    110300000    110310000    chr10    112310000    112320000    .    .    .    .    0,255,255    13.0    2.389684    2.4640477    2.5979018    3.2360432    1.4901161E-6    2.026558E-6    3.5762787E-6    3.4868717E-5    2    110310000    112315000    5000    NA    NA    NA    NA    NA    NA    NA    NA    NA    chr10    110304451    110305451    SMNDC1    promoter    -    110290729    110304951    102,194,165

The bold parts are from the input bedpe file, and the italic parts are from the query bed file.

Return all structural variants (in BEDPE format) that overlap with genes on either end
$ pairToBed -a sv.bedpe -b genes.bed > sv.genes
Retain only paired-end BAM alignments where neither end overlaps simple sequence repeats
$ pairToBed -abam reads.bam -b SSRs.bed -type neither > reads.noSSRs.bam

 

 

File formats this tool works with
BEDBEDPE

Share your experience or ask a question