Efficient design of maximally active and specific nucleic acid diagnostics for thousands of viruses
AbstractHarnessing genomic data and predictive models will provide activity-informed diagnostic assays for thousands of viruses and offer rapid design for novel ones. Here we develop and extensively validate new algorithms that design nucleic acid assays having maximal predicted detection activity over a virus’s full genomic diversity with stringent specificity. Focusing on CRISPR-Cas13a detection, we test a library of ~ 19,000 guide-target pairs and construct a convolutional neural network that predicts Cas13a detection activity better than other techniques. We link our methods by building ADAPT, an end-to-end system that automatically leverages the latest viral genome data. We designed optimal species-specific assays for the 1,933 vertebrate-infecting viral species within 2 hours for most species and 24 hours for all but 3. ADAPT’s designs are sensitive and specific down to the lineage-level for the range of taxa we tested, including ones that pose challenges involving genomic diversity and specificity. They also exhibit significantly higher fluorescence and lower limits of detection, across a virus’s full spectrum of genomic diversity, than designs from standard techniques. ADAPT is available in an accessible software package and can be applied to other detection technologies to enhance critically-needed viral diagnostic and surveillance efforts.