Ribosomal protein database profiling lends clarity to ribosomal protein evolution and mass distribution

New Protein

FASTA file format is a common file type for distributing proteome information, especially those obtained from Uniprot. While MATLAB could automatically read fasta files using the built-in function, fastaread, important information such as protein name and organism name remain enmeshed in a character array. Hence, difficulty exists in automatic extraction of protein names from fasta proteome file to help in building a database with fields comprising protein name and its amino acid sequence. The objective of this work was in developing a MATLAB software that could automatically extract protein name and amino acid sequence information from fasta proteome file and assign them to a new database that comprises fields such as protein name, amino acid sequence, number of amino acid residues, molecular weight of protein and nucleotide sequence of protein. Information on number of amino acid residues came from the use of the length built-in function in MATLAB analyzing the length of the amino acid sequence of a protein. The final two fields were provided by MATLAB built-in functions molweight and aa2nt, respectively. Molecular weight of proteins is useful for a variety of applications while nucleotide sequence is essential for gene synthesis applications in molecular cloning. Finally, the MATLAB software is also equipped with an error check function to help detect letters in the amino acid sequence that are not part of the family of 20 natural amino acids. Sequences with such letters would constitute as error inputs to molweight and aa2nt, and would not be processed. Collectively, given that important information such as protein name is enmeshed in a character array in fasta proteome file, this work sets out to develop a MATLAB software that could automatically extract protein name and amino acid sequence information, and assigns them to a new protein database. Using built-in functions, number of amino acid residues, molecular weight and nucleotide sequence of each protein were calculated; thereby, yielding a new protein database with improved functionalities that could support a variety of biology workflows ranging from sequence alignment to molecular cloning.

MATLAB software for extracting protein name and sequence information from FASTA formatted proteome file

10.7287/peerj.preprints.27856v2 ◽

2019 ◽

Author(s):

Wenfa Ng

Keyword(s):

Molecular Weight ◽

Amino Acid ◽

Nucleotide Sequence ◽

Amino Acid Sequence ◽

Sequence Information ◽

Amino Acid Residues ◽

Protein Database ◽

Matlab Software ◽

New Protein

FASTA file format is a common file type for distributing proteome information, especially those obtained from Uniprot. While MATLAB could automatically read fasta files using the built-in function, fastaread, important information such as protein name and organism name remain enmeshed in a character array. Hence, difficulty exists in automatic extraction of protein names from fasta proteome file to help in building a database with fields comprising protein name and its amino acid sequence. The objective of this work was in developing a MATLAB software that could automatically extract protein name and amino acid sequence information from fasta proteome file and assign them to a new database that comprises fields such as protein name, amino acid sequence, number of amino acid residues, molecular weight of protein and nucleotide sequence of protein. Information on number of amino acid residues came from the use of the length built-in function in MATLAB analyzing the length of the amino acid sequence of a protein. The final two fields were provided by MATLAB built-in functions molweight and aa2nt, respectively. Molecular weight of proteins is useful for a variety of applications while nucleotide sequence is essential for gene synthesis applications in molecular cloning. Finally, the MATLAB software is also equipped with an error check function to help detect letters in the amino acid sequence that are not part of the family of 20 natural amino acids. Sequences with such letters would constitute as error inputs to molweight and aa2nt, and would not be processed. Collectively, given that important information such as protein name is enmeshed in a character array in fasta proteome file, this work sets out to develop a MATLAB software that could automatically extract protein name and amino acid sequence information, and assigns them to a new protein database. Using built-in functions, number of amino acid residues, molecular weight and nucleotide sequence of each protein were calculated; thereby, yielding a new protein database with improved functionalities that could support a variety of biology workflows ranging from sequence alignment to molecular cloning.

MATLAB software for extracting protein name and sequence information from FASTA formatted proteome file

10.7287/peerj.preprints.27856v1 ◽

2019 ◽

Author(s):

Wenfa Ng

Keyword(s):

Molecular Weight ◽

Amino Acid ◽

Nucleotide Sequence ◽

Amino Acid Sequence ◽

Sequence Information ◽

Amino Acid Residues ◽

Protein Database ◽

Matlab Software ◽

New Protein

FASTA file format is a common file type for distributing proteome information, especially those obtained from Uniprot. While MATLAB could automatically read fasta files using the built-in function, fastaread, important information such as protein name and organism name remain enmeshed in a character array. Hence, difficulty exists in automatic extraction of protein names from fasta proteome file to help in building a database with fields comprising protein name and its amino acid sequence. The objective of this work was in developing a MATLAB software that could automatically extract protein name and amino acid sequence information from fasta proteome file and assign them to a new database that comprises fields such as protein name, amino acid sequence, number of amino acid residues, molecular weight of protein and nucleotide sequence of protein. Information on number of amino acid residues came from the use of the length built-in function in MATLAB analyzing the length of the amino acid sequence of a protein. The final two fields were provided by MATLAB built-in functions molweight and aa2nt, respectively. Molecular weight of proteins is useful for a variety of applications while nucleotide sequence is essential for gene synthesis applications in molecular cloning. Finally, the MATLAB software is also equipped with an error check function to help detect letters in the amino acid sequence that are not part of the family of 20 natural amino acids. Sequences with such letters would constitute as error inputs to molweight and aa2nt, and would not be processed. Collectively, given that important information such as protein name is enmeshed in a character array in fasta proteome file, this work sets out to develop a MATLAB software that could automatically extract protein name and amino acid sequence information, and assigns them to a new protein database. Using built-in functions, number of amino acid residues, molecular weight and nucleotide sequence of each protein were calculated; thereby, yielding a new protein database with improved functionalities that could support a variety of biology workflows ranging from sequence alignment to molecular cloning.

Theoretical ribosomal protein mass fingerprint database

10.7287/peerj.preprints.26878v1 ◽

2018 ◽

Author(s):

Wenfa Ng

Keyword(s):

Amino Acid ◽

Amino Acid Sequence ◽

Ribosomal Protein ◽

Molecular Mass ◽

Ribosomal Proteins ◽

Sequence Information ◽

Protein Mass ◽

And Function ◽

Fingerprint Database ◽

Ribosomes are highly conserved macromolecular machines whose critical function is protein synthesis. However, existence of unique molecular mass of the same type of ribosomal protein for individual species in the same domain of life raises the interesting question concerning the interaction between natural selection forces and the conservation of structure and function of ribosomal proteins. Thus, given differentiated molecular mass and sequence of ribosomal proteins across species, the structures of ribosomes are correspondingly differentiated even though the general structure and function of the macromolecular machine is conserved across species in the same domain of life. The collection of molecular mass of all ribosomal proteins in the large and small ribosome subunits can be understood as the ribosomal protein mass fingerprint of the species useful for gaining fundamental knowledge of ribosomal proteins, as well as serving as tools for species identification through comparison of ribosomal protein mass spectra. This preprint introduces the Theoretical Ribosomal Protein Mass Fingerprint database that comprises the theoretical molecular mass of all ribosomal proteins of a species calculated based on available amino acid sequence information of the ribosomal proteins. Using amino acid sequence information from the Ribosomal Protein Gene Database, the Theoretical Ribosomal Protein Mass Fingerprint database ( https://ngwenfa.wordpress.com/database/ ) spans species from cyanobacteria, fungus, bacteria, archaea, nematodes, diatoms, micro-algae, and various model organisms. The database should be useful as a resource for gaining fundamental understanding of the mass distribution of ribosomal proteins of a species, or serving as a limited reference database for identifying species based on comparing experimental ribosomal protein mass fingerprint of unknown species against theoretically calculated ones of known species. Future expansion of the database will aim to catalogue the theoretical ribosomal protein mass fingerprint of more microbial species using amino acid sequence information from UniProt.

Theoretical ribosomal protein mass fingerprint database

10.7287/peerj.preprints.26878 ◽

2018 ◽

Author(s):

Wenfa Ng

Keyword(s):

Amino Acid ◽

Amino Acid Sequence ◽

Ribosomal Protein ◽

Molecular Mass ◽

Ribosomal Proteins ◽

Sequence Information ◽

Protein Mass ◽

And Function ◽

Fingerprint Database ◽

Ribosomes are highly conserved macromolecular machines whose critical function is protein synthesis. However, existence of unique molecular mass of the same type of ribosomal protein for individual species in the same domain of life raises the interesting question concerning the interaction between natural selection forces and the conservation of structure and function of ribosomal proteins. Thus, given differentiated molecular mass and sequence of ribosomal proteins across species, the structures of ribosomes are correspondingly differentiated even though the general structure and function of the macromolecular machine is conserved across species in the same domain of life. The collection of molecular mass of all ribosomal proteins in the large and small ribosome subunits can be understood as the ribosomal protein mass fingerprint of the species useful for gaining fundamental knowledge of ribosomal proteins, as well as serving as tools for species identification through comparison of ribosomal protein mass spectra. This preprint introduces the Theoretical Ribosomal Protein Mass Fingerprint database that comprises the theoretical molecular mass of all ribosomal proteins of a species calculated based on available amino acid sequence information of the ribosomal proteins. Using amino acid sequence information from the Ribosomal Protein Gene Database, the Theoretical Ribosomal Protein Mass Fingerprint database ( https://ngwenfa.wordpress.com/database/ ) spans species from cyanobacteria, fungus, bacteria, archaea, nematodes, diatoms, micro-algae, and various model organisms. The database should be useful as a resource for gaining fundamental understanding of the mass distribution of ribosomal proteins of a species, or serving as a limited reference database for identifying species based on comparing experimental ribosomal protein mass fingerprint of unknown species against theoretically calculated ones of known species. Future expansion of the database will aim to catalogue the theoretical ribosomal protein mass fingerprint of more microbial species using amino acid sequence information from UniProt.

Amino acid sequence information in proteins and complex proteinaceous material revealed by pyrolysis-capillary gas chromatography-low and high resolution mass spectrometry

Journal of Analytical and Applied Pyrolysis ◽

10.1016/0165-2370(87)85038-6 ◽

1987 ◽

Vol 11 ◽

pp. 313-327 ◽

Cited By ~ 75

Author(s):

Jaap J. Boon ◽

J.W. De Leeuw

Keyword(s):

Mass Spectrometry ◽

Gas Chromatography ◽

Amino Acid ◽

High Resolution ◽

Amino Acid Sequence ◽

Capillary Gas Chromatography ◽

High Resolution Mass Spectrometry ◽

Sequence Information ◽

Resolution Mass

Amino acid sequence determination by g.l.c.-mass spectrometry of permethylated peptides. Optimization of the formation of chemical derivatives at the 2-10 nmol level

Biochemical Journal ◽

10.1042/bj2150261 ◽

1983 ◽

Vol 215 (2) ◽

pp. 261-272 ◽

Cited By ~ 22

Author(s):

K Rose ◽

M G Simona ◽

R E Offord

Keyword(s):

Mass Spectrometry ◽

Amino Acid ◽

Amino Acid Sequence ◽

Sodium Dodecyl ◽

Sequence Information ◽

Sequence Determination ◽

Chemical Derivatives ◽

Peptide Derivatives ◽

A New Technique ◽

A new technique is described that permits the permethylation of acylated peptides at the 2-10 nmol level. The presence of up to 400 micrograms of sodium dodecyl sulphate per sample does not affect the reaction yields. The technique, which is a miniaturization of the widely used methyl iodide/dimethylsulphinyl carbanion procedure, employs a layer of hexane to exclude moisture and oxygen from the reaction mixture. Analysis of the peptide derivatives by combined g.l.c.-mass spectrometry permits amino acid sequence information to be obtained. In addition to studies of digests of a model substrate (glucagon), the new permethylation technique has been used to identify a peptide of interest from a digest of a cytochrome and to define the N-termini of two proteins at the 5 nmol level.

Existence of theoretical ribosomal protein mass fingerprints in bacteria, archaea and eukaryotes

10.7287/peerj.preprints.26511v1 ◽

2018 ◽

Author(s):

Wenfa Ng

Keyword(s):

Mass Spectrometry ◽

Ribosomal Protein ◽

Molecular Mass ◽

Ribosomal Proteins ◽

General Structure ◽

Structure And Function ◽

Microbial Identification ◽

Protein Mass ◽

Domains Of Life ◽

And Function

Ribosomes are highly conserved given the importance of protein synthesis to cell survival. Although small differences in structure and functions exists in ribosomes from different species of bacteria, archaea and eukaryotes, the general structure and function remains conserved across species in the same domain of life. Thus, are ribosomal proteins that constitute ribosomes highly conserved between species in the same domain or do they possess sufficient sequence variation that help identify individual species? Having differentiated sequence would mean that ribosomal proteins from different species might account for differences in structure and function of the ribosomes in different species. Using ribosomal protein amino acid sequence information from Ribosomal Protein Gene Database for calculating molecular mass of ribosomal proteins, this study sought to determine if the molecular mass of a set of ribosomal proteins from a species could constitute a unique ribosomal protein mass fingerprint. In addition, the question of whether unique ribosomal protein mass fingerprint exists between different species in the three domains of life was also examined. Results revealed that distinct molecular mass of individual ribosomal protein could aggregate into a unique ribosomal protein mass fingerprint for individual bacterial, archaeal and eukaryotic species. Such ribosomal protein mass fingerprints could potentially find use in microbial identification through gel-free matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) profiling of solubilized ribosomal proteins. Obtained ribosomal protein mass spectrum could be compared with those catalogued in a reference database of known microorganisms where pattern recognition algorithms could determine a match. Additionally, existence of theoretical ribosomal protein mass fingerprint across species in the three domains of life also pointed to the presence of small differences in structure and function of both the large and small ribosome subunit. Such differences could reveal possible differentiated ribosomal structure and function in different species even though the general structure and function of the ribosome is conserved across species. Collectively, distinct molecular mass of individual ribosomal proteins in species pointed to a unique ribosomal protein mass fingerprint that could find use in microbial identification through gel-free mass spectrometry analysis of solubilized ribosomal proteins. Differences in mass of ribosomal proteins across species also highlighted existence of ribosomes of differentiated structure and function between different species even though the general structure and function of the ribosome remains highly conserved.

Existence of theoretical ribosomal protein mass fingerprints in bacteria, archaea and eukaryotes

10.7287/peerj.preprints.26511 ◽

2018 ◽

Author(s):

Wenfa Ng

Keyword(s):

Mass Spectrometry ◽

Ribosomal Protein ◽

Molecular Mass ◽

Ribosomal Proteins ◽

General Structure ◽

Structure And Function ◽

Microbial Identification ◽

Protein Mass ◽

Domains Of Life ◽

And Function

Ribosomes are highly conserved given the importance of protein synthesis to cell survival. Although small differences in structure and functions exists in ribosomes from different species of bacteria, archaea and eukaryotes, the general structure and function remains conserved across species in the same domain of life. Thus, are ribosomal proteins that constitute ribosomes highly conserved between species in the same domain or do they possess sufficient sequence variation that help identify individual species? Having differentiated sequence would mean that ribosomal proteins from different species might account for differences in structure and function of the ribosomes in different species. Using ribosomal protein amino acid sequence information from Ribosomal Protein Gene Database for calculating molecular mass of ribosomal proteins, this study sought to determine if the molecular mass of a set of ribosomal proteins from a species could constitute a unique ribosomal protein mass fingerprint. In addition, the question of whether unique ribosomal protein mass fingerprint exists between different species in the three domains of life was also examined. Results revealed that distinct molecular mass of individual ribosomal protein could aggregate into a unique ribosomal protein mass fingerprint for individual bacterial, archaeal and eukaryotic species. Such ribosomal protein mass fingerprints could potentially find use in microbial identification through gel-free matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) profiling of solubilized ribosomal proteins. Obtained ribosomal protein mass spectrum could be compared with those catalogued in a reference database of known microorganisms where pattern recognition algorithms could determine a match. Additionally, existence of theoretical ribosomal protein mass fingerprint across species in the three domains of life also pointed to the presence of small differences in structure and function of both the large and small ribosome subunit. Such differences could reveal possible differentiated ribosomal structure and function in different species even though the general structure and function of the ribosome is conserved across species. Collectively, distinct molecular mass of individual ribosomal proteins in species pointed to a unique ribosomal protein mass fingerprint that could find use in microbial identification through gel-free mass spectrometry analysis of solubilized ribosomal proteins. Differences in mass of ribosomal proteins across species also highlighted existence of ribosomes of differentiated structure and function between different species even though the general structure and function of the ribosome remains highly conserved.

Faculty Opinions recommendation of A study of archaeal enzymes involved in polar lipid synthesis linking amino acid sequence information, genomic contexts and lipid composition.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.1028632.342399 ◽

2005 ◽

Author(s):

Robert Michell

Keyword(s):

Amino Acid ◽

Amino Acid Sequence ◽

Polar Lipid ◽

Lipid Composition ◽

Lipid Synthesis ◽

Sequence Information ◽