Define protein variant functions with high-complexity mutagenesis libraries and enhanced mutation detection software
Open reading frame (ORF) variant libraries have advanced our ability to query the functions of a large number of variants of a protein simultaneously. A variant library targeting a full-length ORF typically consists of all possible single-amino-acid substitutions and a stop codon at each amino-acid position. Because a single codon variation separates a variant from the template ORF, variant quantification presents the most profound challenge. Efforts such as dividing a library into sub-libraries for direct sequencing or using a tag-directed subassembly approach are practical only with small ORFs. Our approach, on the other hand, features single-pool libraries for all genes up to 3600bp (EGFR), and an enhanced variant-detecting toolkit. Having succeeded in processing screens of ~20 ORF variant libraries, this tool calls variants reliably, and also presents variant annotations in datafiles to enable analyses that have in turn reshaped our strategies governing library design, screen deconvolution, sequencing and its analysis.