Category

Reads Manipulation


Usage

bowtie-build [options]* <reference_in> <ebwt_base>


Manual

Main arguments

A comma-separated list of FASTA files containing the reference sequences to be aligned to, or, if -c is specified, the sequences themselves. E.g., might be chr1.fa,chr2.fa,chrX.fa,chrY.fa, or, if -c is specified, this might be GGTCATCCT,ACGGGTCGT,CCGTTCTATGCGGCTTA.

The basename of the index files to write. By default, bowtie-build writes files named NAME.1.ebwt, NAME.2.ebwt, NAME.3.ebwt, NAME.4.ebwt, NAME.rev.1.ebwt, and NAME.rev.2.ebwt, where NAME is .

Options

-f

The reference input files (specified as ) are FASTA files (usually having extension .fa, .mfa, .fna or similar).

-c

The reference sequences are given on the command line. I.e. is a comma-separated list of sequences rather than a list of FASTA files.

-C/--color

Build a colorspace index, to be queried using bowtie -C.

-a/--noauto

Disable the default behavior whereby bowtie-build automatically selects values for the --bmax, --dcv and --packed parameters according to available memory. Instead, user may specify values for those parameters. If memory is exhausted during indexing, an error message will be printed; it is up to the user to try new parameters.

-p/--packed

Use a packed (2-bits-per-nucleotide) representation for DNA strings. This saves memory but makes indexing 2-3 times slower. Default: off. This is configured automatically by default; use -a/--noauto to configure manually.

--bmax 

The maximum number of suffixes allowed in a block. Allowing more suffixes per block makes indexing faster, but increases peak memory usage. Setting this option overrides any previous setting for --bmax, or --bmaxdivn. Default (in terms of the --bmaxdivn parameter) is --bmaxdivn 4. This is configured automatically by default; use -a/--noauto to configure manually.

--bmaxdivn 

The maximum number of suffixes allowed in a block, expressed as a fraction of the length of the reference. Setting this option overrides any previous setting for --bmax, or --bmaxdivn. Default: --bmaxdivn 4. This is configured automatically by default; use -a/--noauto to configure manually.

--dcv 

Use as the period for the difference-cover sample. A larger period yields less memory overhead, but may make suffix sorting slower, especially if repeats are present. Must be a power of 2 no greater than 4096. Default: 1024. This is configured automatically by default; use -a/--noauto to configure manually.

--nodc

Disable use of the difference-cover sample. Suffix sorting becomes quadratic-time in the worst case (where the worst case is an extremely repetitive reference). Default: off.

-r/--noref

Do not build the NAME.3.ebwt and NAME.4.ebwt portions of the index, which contain a bitpacked version of the reference sequences and are used for paired-end alignment.

-3/--justref

Build only the NAME.3.ebwt and NAME.4.ebwt portions of the index, which contain a bitpacked version of the reference sequences and are used for paired-end alignment.

-o/--offrate 

To map alignments back to positions on the reference sequences, it's necessary to annotate ("mark") some or all of the Burrows-Wheeler rows with their corresponding location on the genome. -o/--offrate governs how many rows get marked: the indexer will mark every 2^ rows. Marking more rows makes reference-position lookups faster, but requires more memory to hold the annotations at runtime. The default is 5 (every 32nd row is marked; for human genome, annotations occupy about 340 megabytes).

-t/--ftabchars 

The ftab is the lookup table used to calculate an initial Burrows-Wheeler range with respect to the first characters of the query. A larger yields a larger lookup table but faster query times. The ftab has size 4^(+1) bytes. The default setting is 10 (ftab is 4MB).

--ntoa

Convert Ns in the reference sequence to As before building the index. By default, Ns are simply excluded from the index and bowtie will not report alignments that overlap them.

--big --little

Endianness to use when serializing integers to the index file. Default: little-endian (recommended for Intel- and AMD-based architectures).

--seed 

Use as the seed for pseudo-random number generator.

-q/--quiet

bowtie-build is verbose by default. With this option bowtie-build will print only error messages.

-h/--help

Print usage information and quit.

--version

Print version information and quit.


Share your experience or ask a question