scholarly journals The use of consensus sequence information to engineer stability and activity in proteins

Author(s):  
Matt Sternke ◽  
Katherine W. Tripp ◽  
Doug Barrick
Author(s):  
Ren-Xiang Yan ◽  
Jing Liu ◽  
Yi-Min Tao

Profile-profile alignment may be the most sensitive and useful computational resource for identifying remote homologies and recognizing protein folds. However, profile-profile alignment is usually much more complex and slower than sequence-sequence or profile-sequence alignment. The profile or PSSM (position-specific scoring matrix) can be used to represent the mutational variability at each sequence position of a protein by using a vector of amino acid substitution frequencies and it is a much richer encoding of a protein sequence. Consensus sequence, which can be considered as a simplified profile, was used to improve sequence alignment accuracy in the early time. Recently, several studies were carried out to improve PSI-BLAST’s fold recognition performance by using consensus sequence information. There are several ways to compute a consensus sequence. Based on these considerations, we propose a method that combines the information of different types of consensus sequences with the assistance of support vector machine learning in this chapter. Benchmark results suggest that our method can further improve PSI-BLAST’s fold recognition performance.


2021 ◽  
Author(s):  
Yanli Chen ◽  
Qiongwen Wu ◽  
Guiman Li ◽  
Hongzhe Li ◽  
Wenlong Li ◽  
...  

Abstract Human norovirus, an RNA virus of the family Caliciviridae, is a common viral pathogen causing acute gastroenteritis of all age groups worldwide. To date, tens of thousands genome sequences of norovirus have been uploaded to NCBI database, more than half of them were epidemic strains of GII.4 or GII.17 genotype. However, sequence information on the non-epidemic norovirus strains remains poorly studied. In this study, an uncommon norovirus genotype, GIX.1[GII.P15], was isolated using Raji cells and the full-genome sequence of the strain was extensively characterized. The norovirus virus particles with a diameter of approximately 30 nm and a morphology of spherical and lace-like appearance were observed by electron microscopy. Viral genome replication in Raji cells were confirmed by real-time quantitative reverse transcription-PCR from viral replication kinetics and passaging experiments of the primary virus. Phylogenetic analysis showed that the strain (KMN1) belonged to the GIX.1[GII.P15] genotype and indicated that no recombination has occurred in this strain thus far. Further compared analysis of the full genome sequence with the consensus sequence of GIX.1[GII.P15] genomes revealed a total of 81 nucleotide substitutions (53 in ORF1, 20 in ORF2, and 8 in ORF3) across the genome, but only 6 substitutions resulted in amino acid changes (3 in ORF1, 1 in ORF2, and 2 in ORF3). Moreover, one amino acid substitution at the 302 amino acid site (P302S) was observed in the P2 domain of the capsid protein, and the site was around one of the predicted conformational epitopes on the VP1 protein structure. The genomic information obtained from the novel strain may extend the understanding of the non-epidemic GIX.1[GII.P15] strains.


Genome ◽  
1987 ◽  
Vol 29 (5) ◽  
pp. 770-781 ◽  
Author(s):  
Michael Lassner ◽  
Olin Anderson ◽  
Jan Dvořák

A ribosomal RNA gene (rDNA) unit from the Nor-D3 locus (D genome) of Triticum aestivum L. was cloned and the "nontranscribed spacer" (NTS) was sequenced. The DNA sequence was compared with previously reported Nor-B2 locus (B genome) NTS sequences to study the molecular basis of evolution of these repeated genes and to look for evidence of homogenization between B- and D-genome rDNA. The NTS has seven subrepeats with a modal repeat length of 120 nucleotides; the subrepeats are shorter than Nor-B2 subrepeats owing to loss of one element of a 12-bp duplication present in Nor-B2 subrepeats. This 12 nucleotide sequence or its permutation, whose consensus sequence is CACGTACACGGA, is found at all sites where the B- and D-genome rDNA spacers differ by insertions or deletions longer than two nucleotides. The DNA sequence information was used to identify restriction sites unique to each locus that could be used in search of conversions between the B- and D-genome rDNA loci. Despite the coexistence of rDNA of the B- and D-genomes in the same nucleus for a minimum of 8000 years, no evidence for frequent interchromosomal conversion events between chromosomes 1B or 6B and 5D was found. Key words: Triticum, rDNA, concerted evolution, spacer.


2013 ◽  
pp. 1667-1675
Author(s):  
Ren-Xiang Yan ◽  
Jing Liu ◽  
Yi-Min Tao

Profile-profile alignment may be the most sensitive and useful computational resource for identifying remote homologies and recognizing protein folds. However, profile-profile alignment is usually much more complex and slower than sequence-sequence or profile-sequence alignment. The profile or PSSM (position-specific scoring matrix) can be used to represent the mutational variability at each sequence position of a protein by using a vector of amino acid substitution frequencies and it is a much richer encoding of a protein sequence. Consensus sequence, which can be considered as a simplified profile, was used to improve sequence alignment accuracy in the early time. Recently, several studies were carried out to improve PSI-BLAST’s fold recognition performance by using consensus sequence information. There are several ways to compute a consensus sequence. Based on these considerations, we propose a method that combines the information of different types of consensus sequences with the assistance of support vector machine learning in this chapter. Benchmark results suggest that our method can further improve PSI-BLAST’s fold recognition performance.


2015 ◽  
Vol 53 (7) ◽  
pp. 2049-2059 ◽  
Author(s):  
Charlotte Hedskog ◽  
Krishna Chodavarapu ◽  
Karin S. Ku ◽  
Simin Xu ◽  
Ross Martin ◽  
...  

Hepatitis C virus (HCV) exhibits a high genetic diversity and is classified into 6 genotypes, which are further divided into 66 subtypes. Current sequencing strategies require prior knowledge of the HCV genotype and subtype for efficient amplification, making it difficult to sequence samples with a rare or unknown genotype and/or subtype. Here, we describe a subtype-independent full-genome sequencing assay based on a random amplification strategy coupled with next-generation sequencing. HCV genomes from 17 patient samples with both common subtypes (1a, 1b, 2a, 2b, and 3a) and rare subtypes (2c, 2j, 3i, 4a, 4d, 5a, 6a, 6e, and 6j) were successfully sequenced. On average, 3.7 million reads were generated per sample, with 15% showing HCV specificity. The assembled consensus sequences covered 99.3% to 100% of the HCV coding region, and the average coverage was 6,070 reads/position. The accuracy of the generated consensus sequence was estimated to be >99% based on results fromin vitroHCV replicon amplification, with the same extrapolated amount of input RNA molecules as that for the patient samples. Taken together, the HCV genomes from 17 patient samples were successfully sequenced, including samples with subtypes that have limited sequence information. This method has the potential to sequence any HCV patient sample, independent of genotype or subtype. It may be especially useful in confounding cases, like those with rare subtypes, intergenotypic recombination, or multiple genotype infections, and may allow greater insight into HCV evolution, its genetic diversity, and drug resistance development.


2010 ◽  
Vol 84 (18) ◽  
pp. 9557-9574 ◽  
Author(s):  
Laurent Dacheux ◽  
Nicolas Berthet ◽  
Gabriel Dissard ◽  
Edward C. Holmes ◽  
Olivier Delmas ◽  
...  

ABSTRACT The rapid and accurate identification of pathogens is critical in the control of infectious disease. To this end, we analyzed the capacity for viral detection and identification of a newly described high-density resequencing microarray (RMA), termed PathogenID, which was designed for multiple pathogen detection using database similarity searching. We focused on one of the largest and most diverse viral families described to date, the family Rhabdoviridae. We demonstrate that this approach has the potential to identify both known and related viruses for which precise sequence information is unavailable. In particular, we demonstrate that a strategy based on consensus sequence determination for analysis of RMA output data enabled successful detection of viruses exhibiting up to 26% nucleotide divergence with the closest sequence tiled on the array. Using clinical specimens obtained from rabid patients and animals, this method also shows a high species level concordance with standard reference assays, indicating that it is amenable for the development of diagnostic assays. Finally, 12 animal rhabdoviruses which were currently unclassified, unassigned, or assigned as tentative species within the family Rhabdoviridae were successfully detected. These new data allowed an unprecedented phylogenetic analysis of 106 rhabdoviruses and further suggest that the principles and methodology developed here may be used for the broad-spectrum surveillance and the broader-scale investigation of biodiversity in the viral world.


2005 ◽  
Vol 187 (14) ◽  
pp. 4928-4934 ◽  
Author(s):  
Jos Boekhorst ◽  
Mark W. H. J. de Been ◽  
Michiel Kleerebezem ◽  
Roland J. Siezen

ABSTRACT Surface proteins of gram-positive bacteria often play a role in adherence of the bacteria to host tissue and are frequently required for virulence. A specific subgroup of extracellular proteins contains the cell wall-sorting motif LPxTG, which is the target for cleavage and covalent coupling to the peptidoglycan by enzymes called sortases. A comprehensive set of putative sortase substrates was identified by in silico analysis of 199 completely sequenced prokaryote genomes. A combination of detection methods was used, including secondary structure prediction, pattern recognition, sequence homology, and genome context information. With the hframe algorithm, putative substrates were identified that could not be detected by other methods due to errors in open reading frame calling, frameshifts, or sequencing errors. In total, 732 putative sortase substrates encoded in 49 prokaryote genomes were identified. We found striking species-specific variation for the LPxTG motif. A hidden Markov model (HMM) based on putative sortase substrates was created, which was subsequently used for the automatic detection of sortase substrates in recently completed genomes. A database was constructed, LPxTG-DB (http://bamics3.cmbi.kun.nl/sortase_substrates ), containing for each genome a list of putative sortase substrates, sequence information of these substrates, the organism-specific HMMs based on the consensus sequence of the sortase recognition motif, and a graphic representation of this consensus.


2005 ◽  
Author(s):  
An Oskarsson ◽  
Reid Hastie ◽  
Gary H. McClelland ◽  
Leaf Van Boven
Keyword(s):  

2019 ◽  
Vol 45 (4) ◽  
pp. 689-699 ◽  
Author(s):  
Tanya R. Jonker ◽  
Jeffrey D. Wammes ◽  
Colin M. MacLeod

Sign in / Sign up

Export Citation Format

Share Document