clusterBed [OPTIONS] -i <BED/GFF/VCF>
This tool is part of the bedtools
suite.
Similar to bedtools merge
(aka mergeBed
), clusterBed
(also known as bedtools cluster
) report each set of overlapping or book-ended features in an interval file.
In contrast to bedtools merge
, clusterBed
does not flatten the cluster of intervals into a new meta-interval; instead, it assigns an unique cluster ID to each record in each cluster (a new column of the cluster IDs will be appended to the end of the output).
This tool may be useful for having fine control over how sets of overlapping intervals in a single interval file are combined.
clusterBed
requires that you presort your data by chromosome and then by start position (e.g., sort -k1,1 -k2,2n in.bed > in.sorted.bed
for BED files).
By default, clusterBed
collects overlapping (by at least 1 bp) and/or bookended intervals into distinct clusters. In the example below, the 4th column is the cluster ID.
$ cat A.bed chr1 100 200 chr1 180 250 chr1 250 500 chr1 501 1000 $ clusterBed -i A.bed chr1 100 200 1 chr1 180 250 1 chr1 250 500 1 chr1 501 1000 2