scholarly journals DeCoDe: degenerate codon design for complete protein-coding DNA libraries

2020 ◽  
Vol 36 (11) ◽  
pp. 3357-3364 ◽  
Author(s):  
Tyler C Shimko ◽  
Polly M Fordyce ◽  
Yaron Orenstein

Abstract Motivation High-throughput protein screening is a critical technique for dissecting and designing protein function. Libraries for these assays can be created through a number of means, including targeted or random mutagenesis of a template protein sequence or direct DNA synthesis. However, mutagenic library construction methods often yield vastly more nonfunctional than functional variants and, despite advances in large-scale DNA synthesis, individual synthesis of each desired DNA template is often prohibitively expensive. Consequently, many protein-screening libraries rely on the use of degenerate codons (DCs), mixtures of DNA bases incorporated at specific positions during DNA synthesis, to generate highly diverse protein-variant pools from only a few low-cost synthesis reactions. However, selecting DCs for sets of sequences that covary at multiple positions dramatically increases the difficulty of designing a DC library and leads to the creation of many undesired variants that can quickly outstrip screening capacity. Results We introduce a novel algorithm for total DC library optimization, degenerate codon design (DeCoDe), based on integer linear programming. DeCoDe significantly outperforms state-of-the-art DC optimization algorithms and scales well to more than a hundred proteins sharing complex patterns of covariation (e.g. the lab-derived avGFP lineage). Moreover, DeCoDe is, to our knowledge, the first DC design algorithm with the capability to encode mixed-length protein libraries. We anticipate DeCoDe to be broadly useful for a variety of library generation problems, ranging from protein engineering attempts that leverage mutual information to the reconstruction of ancestral protein states. Availability and implementation github.com/OrensteinLab/DeCoDe. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

2019 ◽  
Author(s):  
Tyler C. Shimko ◽  
Polly M. Fordyce ◽  
Yaron Orenstein

AbstractMotivationHigh-throughput protein screening is a critical technique for dissecting and designing protein function. Libraries for these assays can be created through a number of means, including targeted or random mutagenesis of a template protein sequence or direct DNA synthesis. However, mutagenic library construction methods often yield vastly more non-functional than functional variants and, despite advances in large-scale DNA synthesis, individual synthesis of each desired DNA template is often prohibitively ex-pensive. Consequently, many protein screening libraries rely on the use of degenerate codons (DCs), mixtures of DNA bases incorporated at specific positions during DNA synthesis, to generate highly diverse protein variant pools from only a few low-cost synthesis reactions. However, selecting DCs for sets of sequences that covary at multiple positions dramatically increases the difficulty of designing a DC library and leads to the creation of many undesired variants that can quickly outstrip screening capacity.ResultsWe introduce a novel algorithm for total DC library optimization, DeCoDe, based on integer linear programming. DeCoDe significantly outperforms state-of-the-art DC optimization algorithms and scales well to more than a hundred proteins sharing complex patterns of covariation (e.g. the lab-derived avGFP lineage). Moreover, DeCoDe is, to our knowledge, the first DC design algorithm with the capability to encode mixed-length protein libraries. We anticipate DeCoDe to be broadly useful for a variety of library generation problems, ranging from protein engineering attempts that leverage mutual information to the reconstruction of ancestral protein states.Availabilitygithub.com/OrensteinLab/[email protected]


2020 ◽  
Vol 36 (16) ◽  
pp. 4383-4388 ◽  
Author(s):  
Xiaoqiong Wei ◽  
Chengxin Zhang ◽  
Peter L Freddolino ◽  
Yang Zhang

Abstract Motivation Many protein function databases are built on automated or semi-automated curations and can contain various annotation errors. The correction of such misannotations is critical to improving the accuracy and reliability of the databases. Results We proposed a new approach to detect potentially incorrect Gene Ontology (GO) annotations by comparing the ratio of annotation rates (RAR) for the same GO term across different taxonomic groups, where those with a relatively low RAR usually correspond to incorrect annotations. As an illustration, we applied the approach to 20 commonly studied species in two recent UniProt-GOA releases and identified 250 potential misannotations in the 2018-11-6 release, where only 25% of them were corrected in the 2019-6-3 release. Importantly, 56% of the misannotations are ‘Inferred from Biological aspect of Ancestor (IBA)’ which is in contradiction with previous observations that attributed misannotations mainly to ‘Inferred from Sequence or structural Similarity (ISS)’, probably reflecting an error source shift due to the new developments of function annotation databases. The results demonstrated a simple but efficient misannotation detection approach that is useful for large-scale comparative protein function studies. Availability and implementation https://zhanglab.ccmb.med.umich.edu/RAR. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 36 (8) ◽  
pp. 2429-2437 ◽  
Author(s):  
Xiaoqiang Huang ◽  
Wei Zheng ◽  
Robin Pearce ◽  
Yang Zhang

Abstract Motivation Most proteins perform their biological functions through interactions with other proteins in cells. Amino acid mutations, especially those occurring at protein interfaces, can change the stability of protein–protein interactions (PPIs) and impact their functions, which may cause various human diseases. Quantitative estimation of the binding affinity changes (ΔΔGbind) caused by mutations can provide critical information for protein function annotation and genetic disease diagnoses. Results We present SSIPe, which combines protein interface profiles, collected from structural and sequence homology searches, with a physics-based energy function for accurate ΔΔGbind estimation. To offset the statistical limits of the PPI structure and sequence databases, amino acid-specific pseudocounts were introduced to enhance the profile accuracy. SSIPe was evaluated on large-scale experimental data containing 2204 mutations from 177 proteins, where training and test datasets were stringently separated with the sequence identity between proteins from the two datasets below 30%. The Pearson correlation coefficient between estimated and experimental ΔΔGbind was 0.61 with a root-mean-square-error of 1.93 kcal/mol, which was significantly better than the other methods. Detailed data analyses revealed that the major advantage of SSIPe over other traditional approaches lies in the novel combination of the physical energy function with the new knowledge-based interface profile. SSIPe also considerably outperformed a former profile-based method (BindProfX) due to the newly introduced sequence profiles and optimized pseudocount technique that allows for consideration of amino acid-specific prior mutation probabilities. Availability and implementation Web-server/standalone program, source code and datasets are freely available at https://zhanglab.ccmb.med.umich.edu/SSIPe and https://github.com/tommyhuangthu/SSIPe. Supplementary information Supplementary data are available at Bioinformatics online.


2015 ◽  
Author(s):  
Nicole E. Wheeler ◽  
Lars Barquist ◽  
Robert A. Kingsley ◽  
Paul P. Gardner

AbstractMotivationNext generation sequencing technologies have provided us with a wealth of information on genetic variation, but predicting the functional significance of this variation is a difficult task. While many comparative genomics studies have focused on gene flux and large scale changes, relatively little attention has been paid to quantifying the effects of single nucleotide polymorphisms and indels on protein function, particularly in bacterial genomics.ResultsWe present a hidden Markov model based approach we call delta-bitscore (DBS) for identifying orthologous proteins that have diverged at the amino acid sequence level in a way that is likely to impact biological function. We benchmark this approach with several widely used datasets and apply it to a proof-of-concept study of orthologous proteomes in an investigation of host adaptation in Salmonella enterica. We highlight the value of the method in identifying functional divergence of genes, and suggest that this tool may be a better approach than the commonly used dN/dS metric for identifying functionally significant genetic changes occurring in recently diverged organisms.AvailabilityA program implementing DBS for pairwise genome comparisons is freely available at: https://github.com/UCanCompBio/[email protected], [email protected] informationSupplementary data are available at BioRxiv online.


2016 ◽  
Author(s):  
Andrian Yang ◽  
Michael Troup ◽  
Peijie Lin ◽  
Joshua W. K. Ho

AbstractSummarySingle-cell RNA-seq (scRNA-seq) is increasingly used in a range of biomedical studies. Nonetheless, current RNA-seq analysis tools are not specifically designed to efficiently process scRNA-seq data due to their limited scalability. Here we introduce Falco, a cloud-based framework to enable paralellisation of existing RNA-seq processing pipelines using big data technologies of Apache Hadoop and Apache Spark for performing massively parallel analysis of large scale transcriptomic data. Using two public scRNA-seq data sets and two popular RNA-seq alignment/feature quantification pipelines, we show that the same processing pipeline runs 2.6 – 145.4 times faster using Falco than running on a highly optimised single node analysis. Falco also allows user to the utilise low-cost spot instances of Amazon Web Services (AWS), providing a 65% reduction in cost of analysis.AvailabilityFalco is available via a GNU General Public License at https://github.com/VCCRI/Falco/[email protected] informationSupplementary data are available at BioRXiv online.


1987 ◽  
Vol 19 (5-6) ◽  
pp. 701-710 ◽  
Author(s):  
B. L. Reidy ◽  
G. W. Samson

A low-cost wastewater disposal system was commissioned in 1959 to treat domestic and industrial wastewaters generated in the Latrobe River valley in the province of Gippsland, within the State of Victoria, Australia (Figure 1). The Latrobe Valley is the centre for large-scale generation of electricity and for the production of pulp and paper. In addition other industries have utilized the brown coal resource of the region e.g. gasification process and char production. Consequently, industrial wastewaters have been dominant in the disposal system for the past twenty-five years. The mixed industrial-domestic wastewaters were to be transported some eighty kilometres to be treated and disposed of by irrigation to land. Several important lessons have been learnt during twenty-five years of operating this system. Firstly the composition of the mixed waste stream has varied significantly with the passage of time and the development of the industrial base in the Valley, so that what was appropriate treatment in 1959 is not necessarily acceptable in 1985. Secondly the magnitude of adverse environmental impacts engendered by this low-cost disposal procedure was not imagined when the proposal was implemented. As a consequence, clean-up procedures which could remedy the adverse effects of twenty-five years of impact are likely to be costly. The question then may be asked - when the total costs including rehabilitation are considered, is there really a low-cost solution for environmentally safe disposal of complex wastewater streams?


BMC Biology ◽  
2019 ◽  
Vol 17 (1) ◽  
Author(s):  
Amrita Srivathsan ◽  
Emily Hartop ◽  
Jayanthi Puniamoorthy ◽  
Wan Ting Lee ◽  
Sujatha Narayanan Kutty ◽  
...  

Abstract Background More than 80% of all animal species remain unknown to science. Most of these species live in the tropics and belong to animal taxa that combine small body size with high specimen abundance and large species richness. For such clades, using morphology for species discovery is slow because large numbers of specimens must be sorted based on detailed microscopic investigations. Fortunately, species discovery could be greatly accelerated if DNA sequences could be used for sorting specimens to species. Morphological verification of such “molecular operational taxonomic units” (mOTUs) could then be based on dissection of a small subset of specimens. However, this approach requires cost-effective and low-tech DNA barcoding techniques because well-equipped, well-funded molecular laboratories are not readily available in many biodiverse countries. Results We here document how MinION sequencing can be used for large-scale species discovery in a specimen- and species-rich taxon like the hyperdiverse fly family Phoridae (Diptera). We sequenced 7059 specimens collected in a single Malaise trap in Kibale National Park, Uganda, over the short period of 8 weeks. We discovered > 650 species which exceeds the number of phorid species currently described for the entire Afrotropical region. The barcodes were obtained using an improved low-cost MinION pipeline that increased the barcoding capacity sevenfold from 500 to 3500 barcodes per flowcell. This was achieved by adopting 1D sequencing, resequencing weak amplicons on a used flowcell, and improving demultiplexing. Comparison with Illumina data revealed that the MinION barcodes were very accurate (99.99% accuracy, 0.46% Ns) and thus yielded very similar species units (match ratio 0.991). Morphological examination of 100 mOTUs also confirmed good congruence with morphology (93% of mOTUs; > 99% of specimens) and revealed that 90% of the putative species belong to the neglected, megadiverse genus Megaselia. We demonstrate for one Megaselia species how the molecular data can guide the description of a new species (Megaselia sepsioides sp. nov.). Conclusions We document that one field site in Africa can be home to an estimated 1000 species of phorids and speculate that the Afrotropical diversity could exceed 200,000 species. We furthermore conclude that low-cost MinION sequencers are very suitable for reliable, rapid, and large-scale species discovery in hyperdiverse taxa. MinION sequencing could quickly reveal the extent of the unknown diversity and is especially suitable for biodiverse countries with limited access to capital-intensive sequencing facilities.


2021 ◽  
Vol 16 (1) ◽  
Author(s):  
Xiao Li Ma ◽  
Guang Tao Fei ◽  
Shao Hui Xu

Abstract In this study, polyaniline (PANI) is prepared by means of chemical oxidization polymerization and directly loaded on the modified fiber ball (m-FB) to obtain macroscale polyaniline/modified fiber ball (PANI/m-FB) composite, and then its removal ability of Cr(VI) is investigated. The effects of different parameters such as contact time, pH value and initial concentration on Cr(VI) removal efficiency are discussed. The experimental results illustrate that the favorable pH value is 5.0 and the maximum removal capacity is measured to be 293.13 mg g−1. Besides, PANI/m-FB composites can be regenerated and reused after being treated with strong acid. The kinetic study indicates that the adsorption procedure is mainly controlled by chemical adsorption. More importantly, the macroscale of composites can avoid secondary pollution efficiently. Benefiting from the low cost, easy preparation in large scale, environmentally friendly, excellent recycling performance as well as high removal ability, PANI/m-FB composites exhibit a potential possibility to remove Cr(VI) from industrial waste water. Graphic Abstract The polyaniline (PANI) was coated on modified fiber ball (m-FB) to remove Cr(VI) in waste water, and this kind of PANI/m-FB composites can avoid secondary pollution efficiently due to its macrostructure. Furthermore, the removal capacity can reach to 291.13 mg/g and can be multiple reused.


2020 ◽  
Vol 9 (1) ◽  
pp. 751-759 ◽  
Author(s):  
Xinxin Lian ◽  
Yuanjiang Lv ◽  
Haoliang Sun ◽  
David Hui ◽  
Guangxin Wang

AbstractAg nanoparticles/Mo–Ag alloy films with different Ag contents were prepared on polyimide by magnetron sputtering. The effects of Ag contents on the microstructure of self-grown Ag nanoparticles/Mo–Ag alloy films were investigated using XRD, FESEM, EDS and TEM. The Ag content plays an important role in the size and number of uniformly distributed Ag nanoparticles spontaneously formed on the Mo–Ag alloy film surface, and the morphology of the self-grown Ag nanoparticles has changed significantly. Additionally, it is worth noting that the Ag nanoparticles/Mo–Ag alloy films covered by a thin Ag film exhibits highly sensitive surface-enhanced Raman scattering (SERS) performance. The electric field distributions were calculated using finite-difference time-domain analysis to further prove that the SERS enhancement of the films is mainly determined by “hot spots” in the interparticle gap between Ag nanoparticles. The detection limit of the Ag film/Ag nanoparticles/Mo–Ag alloy film for Rhodamine 6G probe molecules was 5 × 10−14 mol/L. Therefore, the novel type of the Ag film/Ag nanoparticles/Mo–Ag alloy film can be used as an ideal SERS-active substrate for low-cost and large-scale production.


Sign in / Sign up

Export Citation Format

Share Document