RHIVDB: A Freely Accessible Database of HIV Amino Acid Sequences and Clinical Data of Infected Patients

Human immunodeficiency virus (HIV) infection remains one of the most severe problems for humanity, particularly due to the development of HIV resistance. To evaluate an association between viral sequence data and drug combinations and to estimate an effect of a particular drug combination on the treatment results, collection of the most representative drug combinations used to cure HIV and the biological data on amino acid sequences of HIV proteins is essential. We have created a new, freely available web database containing 1,651 amino acid sequences of HIV structural proteins [reverse transcriptase (RT), protease (PR), integrase (IN), and envelope protein (ENV)], treatment history information, and CD4+ cell count and viral load data available by the user’s query. Additionally, the biological data on new HIV sequences and treatment data can be stored in the database by any user followed by an expert’s verification. The database is available on the web at http://www.way2drug.com/rhivdb.

Download Full-text

Techniques for the verification of minimal phylogenetic trees illustrated with ten mammalian haemoglobin sequences

Biochemical Journal ◽

10.1042/bj1870065 ◽

1980 ◽

Vol 187 (1) ◽

pp. 65-74 ◽

Cited By ~ 12

Author(s):

D Penny ◽

M D Hendy ◽

L R Foulds

Keyword(s):

Amino Acid ◽

Phylogenetic Tree ◽

Protein Sequence ◽

Phylogenetic Trees ◽

Sequence Data ◽

Protein Sequences ◽

Nucleotide Sequences ◽

Amino Acid Sequences ◽

Minimal Tree ◽

Protein Sequence Data

We have recently reported a method to identify the shortest possible phylogenetic tree for a set of protein sequences [Foulds Hendy & Penny (1979) J. Mol. Evol. 13. 127–150; Foulds, Penny & Hendy (1979) J. Mol. Evol. 13, 151–166]. The present paper discusses issues that arise during the construction of minimal phylogenetic trees from protein-sequence data. The conversion of the data from amino acid sequences into nucleotide sequences is shown to be advantageous. A new variation of a method for constructing a minimal tree is presented. Our previous methods have involved first constructing a tree and then either proving that it is minimal or transforming it into a minimal tree. The approach presented in the present paper progressively builds up a tree, taxon by taxon. We illustrate this approach by using it to construct a minimal tree for ten mammalian haemoglobin alpha-chain sequences. Finally we define a measure of the complexity of the data and illustrate a method to derive a directed phylogenetic tree from the minimal tree.

Download Full-text

Molecular characterization of the coat protein gene revealed considerable diversity of viral species complex in Garlic (Allium sativum L.)

10.1101/2020.12.03.409680 ◽

2020 ◽

Author(s):

Abel Debebe Mitiku ◽

Dawit Tesfaye Degefu ◽

Adane Abraham ◽

Desta Mejan ◽

Pauline Asami ◽

...

Keyword(s):

Amino Acid ◽

Coat Protein ◽

Species Complex ◽

Sequence Data ◽

Amino Acid Sequences ◽

Viral Gene ◽

Planting Material ◽

Virus C ◽

Garlic Virus ◽

Material Exchange

AbstractGarlic is one of the most crucial Allium vegetables used as seasoning of foods. It has a lot of benefits from the medicinal and nutritional point of view; however, its production is highly constrained by both biotic and abiotic challenges. Among these, viral infections are the most prevalent factors affecting crop productivity around the globe. This experiment was conducted on eleven selected garlic accessions and three improved varieties collected from different garlic growing agro-climatic regions of Ethiopia. This study aimed to identify and characterize the isolated garlic virus using the coat protein (CP) gene and further determine their phylogenetic relatedness. RNA was extracted from fresh young leaves, thirteen days old seedlings, which showed yellowing, mosaic, and stunting symptoms. Pairwise molecular diversity for CP nucleotide and amino acid sequences were calculated using MEGA5. Maximum Likelihood tree of CP nucleotide sequence data of Allexivirus and Potyvirus were conducted using PhyML, while a neighbor-joining tree was constructed for the amino acid sequence data using MEGA5. From the result, five garlic viruses were identified viz. Garlic virus C (78.6 %), Garlic virus D (64.3 %), Garlic virus X (78.6 %), Onion yellow dwarf virus (OYDV) (100%), and Leek yellow stripe virus (LYSV) (78.6 %). The study revealed the presence of complex mixtures of viruses with 42.9 % of the samples had co-infected with a species complex of Garlic virus C, Garlic virus D, Garlic virus X, OYDV, and LYSV. Pairwise comparisons of the isolated Potyviruses and Allexiviruses species revealed high identity with that of the known members of their respected species. As an exception, less within species identity was observed among Garlic virus C isolates as compared with that of the known members of the species. Finally, our results highlighted the need for stepping up a working framework to establish virus-free garlic planting material exchange in the country which could result in the reduction of viral gene flow across the country.Author SummaryGarlic viruses are the most devastating disease since garlic is the most vulnerable crop due to their vegetative nature of propagation. Currently, the garlic viruses are the aforementioned production constraint in Ethiopia. However, so far very little is known on the identification, diversity, and dissemination of garlic infecting viruses in the country. Here we explore the prevalence, genetic diversity, and the presence of mixed infection of garlic viruses in Ethiopia using next generation sequencing platform. Analysis of nucleotide and amino acid sequences of coat protein genes from infected samples revealed the association of three species from Allexivirus and two species from Potyvirus in a complex mixture. Ultimately the article concludes there is high time to set up a working framework to establish garlic free planting material exchange platform which could result in a reduction of viral gene flow across the country.

Download Full-text

Alignment of Amino Acid and DNA Sequences of Human Proline-rich Proteins

Critical Reviews in Oral Biology & Medicine ◽

10.1177/10454411930040030501 ◽

1993 ◽

Vol 4 (3) ◽

pp. 287-292 ◽

Cited By ~ 12

Author(s):

D.L. Kauffman ◽

P.J. Keller ◽

A. Bennick ◽

M. Blum

Keyword(s):

Amino Acid ◽

Dna Sequences ◽

Sequence Data ◽

Gel Filtration ◽

Exchange Chromatography ◽

Amino Acid Sequences ◽

Secreted Proteins ◽

Dna Encoding ◽

Protein Amino Acid ◽

Primary Gene

Human proline-rich proteins (PRPs) constitute a complex family of salivary proteins that are encoded by a small number of genes. The primary gene product is cleaved by proteases, thereby giving rise to about 20 secreted proteins. To determine the genes for the secreted PRPs, therefore, it is necessary to obtain sequences of both the secreted proteins and the DNA encoding these proteins. We have sequenced most PRPs from one donor (D.K.) and aligned the protein sequences with available DNA sequences from unrelated individuals. Partial sequence data have now been obtained for an additional PRP from D.K. named II-1. This protein was purified from parotid saliva by gel filtration and ion-exchange chromatography. Peptides were obtained by cleavage with trypsin, clostripain, and N-bromosuccinimide, followed by column chromatography. The peptides were sequenced on a gas-phase protein sequenator. Overlapping peptide sequences were obtained for most of II-1 and aligned with translated DNA sequences. The best fit was obtained with clones containing sequences for the allele PRB4" (Lyons et al., 1988). However, there was not complete identity of the protein amino acid sequence and the DNA-derived sequences, indicating that II-1 is not encoded by PRB4". Other PRPs isolated from D.K. also fail to conform to any DNA structure so far reported. This shows the need to obtain amino acid sequences and corresponding DNA sequences from the same person to assign genes for the PRPs and to determine the location of the postribosomal cleavage points in the primary translation product.

Download Full-text

ESTIMATION OF TIME OF DIVERGENCE FROM PHYLOGENETIC STUDIES

Canadian Journal of Genetics and Cytology ◽

10.1139/g77-024 ◽

1977 ◽

Vol 19 (2) ◽

pp. 217-223 ◽

Cited By ~ 35

Author(s):

Ranajit Chakraborty

Keyword(s):

Amino Acid ◽

Sequence Data ◽

Amino Acid Sequences ◽

Evolutionary Significance ◽

Simultaneous Estimation ◽

Homologous Proteins ◽

Phylogenetic Structure ◽

Phylogenetic Studies ◽

Base Sequences ◽

Time Of Divergence

Recent studies with comparative data on base sequences of homologous DNAs or amino acid sequences of homologous proteins indicate that simultaneous estimation of phylogenetic structure and time of divergence is often cumbersome and time consuming. On the other hand, when the topology of an evolutionary tree is known, it is shown in this paper that the least squares theory may be applied to obtain simple estimates of the relative time lengths for each segment of the tree under the assumption of uniform random substitutions in each segment. The method is illustrated with amino acid sequence data on various globin molecules and cytochrome c. The evolutionary significance of some of the estimates is also discussed.

Download Full-text

The United States Swine Pathogen Database: integrating veterinary diagnostic laboratory sequence data to monitor emerging pathogens of swine

10.1101/2021.04.16.439882 ◽

2021 ◽

Author(s):

Tavis K Anderson ◽

Blake K Inderski ◽

Diego G Diel ◽

Benjamin M Hause ◽

Elizabeth Porter ◽

...

Keyword(s):

United States ◽

Sequence Data ◽

The United States ◽

Content Management ◽

Biological Data ◽

Amino Acid Sequences ◽

Control Measures ◽

Clinical Samples ◽

Respiratory Syndrome Virus ◽

Genomic Information

Veterinary diagnostic laboratories annually derive thousands of nucleotide sequences from clinical samples of swine pathogens such as porcine reproductive and respiratory syndrome virus (PRRSV), Senecavirus A, and swine enteric coronaviruses. In addition, next generation sequencing has resulted in the rapid production of full-length genomes. Presently, sequence data are released to diagnostic clients for the purposes of informing control measures, but are not publicly available as data may be associated with sensitive information. However, public sequence data can be used to objectively design field-relevant vaccines; determine when and how pathogens are spreading across the landscape; identify virus transmission hotspots; and are a critical component in genomic surveillance for pandemic preparedness. We have developed a centralized sequence database that integrates a selected set of previously private clinical data, using PRRSV data as an exemplar, alongside publicly available genomic information. We implemented the Tripal toolkit, using the open source Drupal content management system and the Chado database schema. Tripal consists of a collection of Drupal modules that are used to manage, visualize, and disseminate biological data stored within Chado. Hosting is provided by Amazon Web Services (AWS) EC2 cloud instance with resource scaling. New sequences sourced from diagnostic labs contain at a minimum four data items: genomic information; date of collection; collection location (state or province level); and a unique identifier. Users can download annotated genomic sequences from the database using a customized search interface that incorporates data mined from published literature; search for similar sequences using BLAST-based tools; and explore annotated reference genomes. Additionally, because the bulk of data presently are PRRSV sequences, custom curation and annotation pipelines have determined PRRSV genotype (Type 1 or 2), the location of open reading frames and nonstructural proteins, generated amino acid sequences, the occurrence of putative frame shifts, and restriction fragment length polymorphism (RFLP) classification of GP5 genes. Genomic data from seven major swine pathogens have been curated and annotated. The resource provides researchers timely access to sequences discovered by veterinary diagnosticians, allowing for epidemiological and comparative virology studies. The result will be a better understanding on the emergence of novel swine viruses in the United States (US), and how these novel strains are disseminated in the US and abroad.

Download Full-text

Phylogenetic relationships of the nematode subfamily Phascolostrongylinae from macropodid and vombatid marsupials inferred using mitochondrial protein sequence data

Parasites & Vectors ◽

10.1186/s13071-021-05028-2 ◽

2021 ◽

Vol 14 (1) ◽

Author(s):

Tanapan Sukee ◽

Ian Beveridge ◽

Anson V. Koehler ◽

Ross Hall ◽

Robin B. Gasser ◽

...

Keyword(s):

Amino Acid ◽

Phylogenetic Relationships ◽

Sequence Data ◽

Mitochondrial Protein ◽

Amino Acid Sequences ◽

Internal Transcribed Spacers ◽

Nuclear Ribosomal Dna ◽

Phylogenetic Position ◽

Data Sets ◽

Sister Relationship

Abstract Background The subfamily Phascolostrongylinae (Superfamily Strongyloidea) comprises nematodes that are parasitic in the gastrointestinal tracts of macropodid (Family Macropodidae) and vombatid (Family Vombatidae) marsupials. Currently, nine genera and 20 species have been attributed to the subfamily Phascolostrongylinae. Previous studies using sequence data sets for the internal transcribed spacers (ITS) of nuclear ribosomal DNA showed conflicting topologies between the Phascolostrongylinae and related subfamilies. Therefore, the aim of this study was to validate the phylogenetic relationships within the Phascolostrongylinae and its relationship with the families Chabertiidae and Strongylidae using mitochondrial amino acid sequences. Methods The sequences of all 12 mitochondrial protein-coding genes were obtained by next-generation sequencing of individual adult nematodes (n = 8) representing members of the Phascolostrongylinae. These sequences were conceptually translated and the phylogenetic relationships within the Phascolostrongylinae and its relationship with the families Chabertiidae and Strongylidae were inferred from aligned, concatenated amino acid sequence data sets. Results Within the Phascolostrongylinae, the wombat-specific genera grouped separately from the genera occurring in macropods. Two of the phascolostrongyline tribes were monophyletic, including Phascolostrongylinea and Hypodontinea, whereas the tribe Macropostrongyloidinea was paraphyletic. The tribe Phascolostrongylinea occurring in wombats was closely related to Oesophagostomum spp., also from the family Chabertiidae, which formed a sister relationship with the Phascolostrongylinae. Conclusion The current phylogenetic relationship within the subfamily Phascolostrongylinae supports findings from a previous study based on ITS sequence data. This study contributes also to the understanding of the phylogenetic position of the subfamily Phascolostrongylinae within the Chabertiidae. Future studies investigating the relationships between the Phascolostrongylinae and Cloacininae from macropodid marsupials may advance our knowledge of the phylogeny of strongyloid nematodes in marsupials. Graphical Abstract

Download Full-text

SignalP 6.0 predicts all five types of signal peptides using protein language models

Nature Biotechnology ◽

10.1038/s41587-021-01156-3 ◽

2022 ◽

Author(s):

Felix Teufel ◽

José Juan Almagro Armenteros ◽

Alexander Rosenberg Johansen ◽

Magnús Halldór Gíslason ◽

Silas Irby Pihl ◽

...

Keyword(s):

Machine Learning ◽

Amino Acid ◽

Sequence Data ◽

Amino Acid Sequences ◽

Language Models ◽

Metagenomic Data ◽

Signal Peptides ◽

Machine Learning Model ◽

Living Organisms ◽

Control Protein

AbstractSignal peptides (SPs) are short amino acid sequences that control protein secretion and translocation in all living organisms. SPs can be predicted from sequence data, but existing algorithms are unable to detect all known types of SPs. We introduce SignalP 6.0, a machine learning model that detects all five SP types and is applicable to metagenomic data.

Download Full-text

Amino acid sequences of α-helical segments from S-carbosymethylkerateine-A. Complete sequence of a type-I segment

Biochemical Journal ◽

10.1042/bj1730373 ◽

1978 ◽

Vol 173 (2) ◽

pp. 373-385 ◽

Cited By ~ 34

Author(s):

K H Gough ◽

A S Inglis ◽

W G Crewther

Keyword(s):

Amino Acid ◽

Sequence Data ◽

Coiled Coil ◽

Alpha Helix ◽

Protein S ◽

Amino Acid Sequences ◽

Type I ◽

Type Ii ◽

Sequencing Data ◽

Helical Segment

The amino acid sequence of a type-I helical segment from the low-sulphur protein (S-carboxymethylkerateine-A) of wool was determined by combining automatic and manual-sequencing data. Whereas in the type-II helical segment most of the cationic groups occur in pairs, 11 of the 22 anionic residues in the sequence of the type-I segment were situated next to a second anionic residue. This suggests possible interactions between type-I and type-II helical segments in alpha-keratin. As observed with the sequence of a type-II helical segment a model constructed on 3.6 residues per turn of helix shows a line of hydrophobic residues along the helix, thereby supporting the physicochemical evidence that the molecule is predominantly helical and forms part of a coiled-coil structure. Examination of the sequence data by predictive methods indicates the possibilty of extensive sections of alpha-helix interspersed with discontinuities. The molecule contains a number of regions with peptide sequences identical with those found by other workers after enzymic digestion of fractions from oxidized wool.

Download Full-text

A comparative study of repeated sequences in the SM50 gene of some sea urchins

Zygote ◽

10.1017/s0967199400130424 ◽

1999 ◽

Vol 8 (S1) ◽

pp. S75-S75

Author(s):

Masayuki Goto ◽

Masahiro Matsumoto ◽

Takashi Kitajima ◽

Akiya Hino

Keyword(s):

Amino Acid ◽

Sea Urchin ◽

Sequence Data ◽

Sea Urchins ◽

Rabbit Antiserum ◽

Pcr Primers ◽

Amino Acid Sequences ◽

Immunofluorescent Staining ◽

Upstream Region ◽

Amino Acid Region

Spicule matrix proteins of sea urchin embryo are the specific products of the micromere / primary mesenchyme cell (PMC) lineage, and are considered to be involved in spicule formation (Wilt, 1999). One of these proteins, SM50, has been described for three species: Strongylocentrotus purpuratus (SP), Lytechinus pictus (Lp) and Hemicentrotus pulcherrimus (Hp) (for references see Wilt, 1999). The nucleotide and amino acid sequences are well conserved in these species. SM50 proteins of these species have repetitive amino acid sequences in the carboxyl-terminal half of the proteins. Therefore, examination of SM50 sequences, especially the repetitive sequence region, in various species will help an understanding of the process of sea urchin ontogeny and evolution. In this study we tried to amplify, by PCR, the SM50 sequences of species for which no sequence data are reported.Total DNA was extracted from the sperm of sea urchins by standard procedures. The purified DNA was subjected to PCR to amplify the repetitive amino acid region and its upstream region. The primers were designed based on the highly conserved sequences in the reported SM50 as Consensus-Degenerate Hybrid Oligonucleotide Primers (Rose et al., 1997). The amplified products were gel-purified, and sequenced using ABI PRISM 310 Genetic Analyzer using PCR primers. The determined nucleotide sequences were translated into amino acid sequences and compared among species with a phylogenetic tree constructed by the neighbour-joining method. For indirect immunofluorescent staining, embryos were fixed with 70% methanol and reacted with rabbit antiserum against recombinant SM50 protein.

Download Full-text

Isolation, by partial pepsin digestion, of the three collagen-like regions present in subcomponent Clq of the first component of human complement

Biochemical Journal ◽

10.1042/bj1550005 ◽

1976 ◽

Vol 155 (1) ◽

pp. 5-17 ◽

Cited By ~ 104

Author(s):

K B M Reid

Keyword(s):

Amino Acid ◽

Sodium Dodecyl Sulphate ◽

Sequence Data ◽

Polyacrylamide Gel Electrophoresis ◽

Sodium Dodecyl ◽

Amino Acid Sequences ◽

Disulphide Bond ◽

Small Peptides ◽

Molecular Weights ◽

A Chain

1. Digestion of human subcomponent C1q with pepsin at pH4.45 for 20h at 37 degrees C fragmented most of the non-collagen-like amino acid sequences in the molecule to small peptides, whereas the entire regions of collagen-like sequence that comprised 38% by weight of the subcomponent C1q were left intact. 2. The collagen-like fraction of the digest was eluted in the void volume of a Sephadex G-200 column, was was showm to be composed of two major fragments when examined by electrophoresis on polyacrylamide gels run in buffers containing sodium dodecyl sulphate. These fragments were separated on CM-cellulose at pH4.9 in buffers containing 7.5M-urea. 3. Human subcomponent C1q on reduction and alkylation yields equimolar amounnts of three chains, which have been designated A, B and C [Reid et al. (1972) Biochem. J. 130, 749-763]. One of the pepsin fragments was shown to be composed of the N-terminal 95 residues of the A chain linked, via residue A4, by a single disulphide bond to a residue in the sequence B2-B6 in the N-terminal 91 residues of the B chain. The second pepsin fragment was shown to be composed of a disulphide-linked dimer of the N-terminal 94 residues of the C chain, the only disulphide bond being located at residue C4.4. The mol. wts. of the unoxidized and oxidized pepsin fragments were estimated from their amino acid compositions to be 20 000 and 18 200 for the A-B and C-C dimers and 11 400, 8800 and 9600 for the collagen-like fragments of the A, B and C chains respectively. Estimation of the molecular weights of the peptic fragments by polyacrylamide-gel electrophoresis run in the presence of sodium dodecyl sulphate gave values that were approx. 50% higher than expected from the amino acid sequence data. This is probably due to the high collagen-like sequence content of these fragments.

Download Full-text