SummaryThe community-level analysis of samples containing diverse genetic material, via metabarcoding and metagenomic approaches, is increasingly popular. While the production of sequence data for such studies has become straightforward, questions remain about how best to analyze and taxonomically characterize sequence data. For many sequence classification approaches, an important component of the workflow involves the curation of reference sequences. Ideally, this involves trimming away extraneous sequence at the 3 prime and 5 prime ends of the target marker of interest, as well as the removal of reference sequence duplicates. Here, we present MetaCurator, a software package written in Python, designed for automated reference sequence curation and highly generalizable across markers and study systems. MetaCurator is organized in a modular fashion, so users can implement tools individually in addition to utilizing the automated and flexible MetaCurator parental code. Aside from modules used to organize and format taxonomic lineage data, MetaCurator contains two signature tools. IterRazor utilizes profile hidden Markov models and an iterative search framework to exhaustively identify and extract the precise amplicon marker of interest from available reference sequence data. DerepByTaxonomy then facilitates sequence dereplication using a taxonomically aware approach, removing duplicates only when they belong to the same taxon. This is important for cases of incomplete lineage sorting between species and for highly conserved markers, such as plantrbcLandtrnL, which often display no sequence divergence across taxa, even at the genus level.Availability and implementationMetaCurator is supported on OSX and Linux (RedHat/CentOS) and is freely available under a GPL v3.0 license athttps://github.com/RTRichar/[email protected] informationCode associated with this work is available athttps://github.com/RTRichar/MetabarcodeDBsV2and additional analysis is presented in supplementary files.