cd-hit-dup is a simple tool for removing duplicates from sequencing reads, with optional step to detect and remove chimeric reads.
cd-hit-dup -i R1.fq -i2 R2.fq -o output-R1.fq -o2 output-R2.fq [other options]
-i Input file (FASTQ or FASTA);
-i2 Second input file (FASTQ or FASTA);
-o Output file;
-o2 Output file for R2;
-d Description length (default 0, truncate at the first whitespace character)
-u Length of prefix to be used in the analysis (default 0, for full/maximum length);
-m Match length (true/false, default true);
-e Maximum number/percent of mismatches allowed;
-f Filter out chimeric clusters (true/false, default false);
-s Minimum length of common sequence shared between a chimeric read
and each of its parents (default 30, minimum 20);
-a Abundance cutoff (default 1 without chimeric filtering, 2 with chimeric filtering);
-b Abundance ratio between a parent read and a chimeric read (default 1);
-p Dissimilarity control for chimeric filtering (default 1);