1. Large-scale, long-term biodiversity monitoring is essential to meeting conservation and land management goals and identifying threats to biodiversity. However, multispecies surveys are prone to various types of observation error, including false positive/negative detection, and misclassification, where a species is encountered but its species identity is not correctly identified. Previous methods assume an imperfect classifier produces species-level classifications, but in practice, particularly with human observers, we may end up with extraspecific classifications including "unknown", morphospecies designations, and taxonomic identifications coarser than species. Disregarding these types of species misclassification in biodiversity monitoring datasets can bias estimates of ecologically important quantities such as demographic rates, occurrence, and species richness.
2. Here we develop an occupancy model that accounts for species non-detection and misclassification. Our framework accommodates extinction and colonization dynamics, allows for additional uncertain 'morphospecies' designations in the imperfect species classifications, and makes use of individual specimen with known species identities in a semi-supervised setting. We compare the performance of our joint classification-occupancy model to a reduced classification model that discards information about occupancy and encounter rate on a withheld test set. We illustrate our model with an empirical case study of the carabid beetle (Carabidae) community at the National Ecological Observatory Network Niwot Ridge Mountain Research Station, west of Boulder, CO, USA, and quantify taxonomist identification error by accounting for classification probabilities.
3. Species occupancy varied through time and across sites and species. The model yielded high probabilities (30 to 92\% medians) of classification where the imperfect classifier matched the true species. The classification model informed by occupancy and encounter rates outperformed the classification that was not, and these differences were most pronounced for abundant species.
4. Our probabilistic framework can be applied to datasets with imperfect species detection and classification. This model can identify commonly misclassified species, helping biodiversity monitoring organizations systematically prioritize which samples need validation by an expert. Our Bayesian approach propagates classification uncertainty to offer an alternative to making conservation decisions based on point estimates