SSEalign: accurate function prediction of bacterial unannotated protein, based on effective training dataset

Mapping Intimacies ◽

10.1101/200915 ◽

2017 ◽

Author(s):

Zhiyuan Yang ◽

Stephen Kwok-Wing Tsui

Keyword(s):

Search Algorithm ◽

Hypothetical Protein ◽

Training Dataset ◽

Prediction Methods ◽

Test Accuracy ◽

Bacterial Proteins ◽

Urgent Task ◽

Backtracking Line Search ◽

Effective Training ◽

Line Search Algorithm

AbstractThe functions of numerous bacterial proteins remain unknown because of the variety of their sequences. The performances of existing prediction methods are highly weak toward these proteins, leading to the annotation of “hypothetical protein” deposited in NCBI database. Elucidating the functions of these unannotated proteins is an urgent task in computational biology. We report a method about secondary structure element alignment called SSEalign based on an effective training dataset extracting from 20 well-studied bacterial genomes. The experimentally validated same genes in different species were selected as training positives, while different genes in different species were selected as training negatives. Moreover, SSEalign used a set of well-defined basic alignment elements with the backtracking line search algorithm to derive the best parameters for accurate prediction. Experimental results showed that SSEalign achieved 91.2% test accuracy, better than existing prediction methods. SSEalign was subsequently applied to identify the functions of those unannotated proteins in the latest published minimal bacteria genome JCVI-syn3.0. Results indicated that At least 99 proteins out of 149 unannotated proteins in the JCVI-syn3.0 genome could be annotated by SSEalign. In conclusion, our method is effective for the identification of protein homology and the annotation of uncharacterized proteins in the genome.

Download Full-text

Accurate estimation of isoelectric point of protein and peptide based on amino acid sequences

Bioinformatics ◽

10.1093/bioinformatics/btv674 ◽

2015 ◽

Vol 32 (6) ◽

pp. 821-827 ◽

Cited By ~ 19

Author(s):

Enrique Audain ◽

Yassel Ramos ◽

Henning Hermjakob ◽

Darren R. Flower ◽

Yasset Perez-Riverol

Keyword(s):

Machine Learning ◽

Isoelectric Point ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Basis Set ◽

Superior Performance ◽

Supplementary Information ◽

Training Dataset ◽

Accurate Estimation ◽

Prediction Methods

Abstract Motivation: In any macromolecular polyprotic system—for example protein, DNA or RNA—the isoelectric point—commonly referred to as the pI—can be defined as the point of singularity in a titration curve, corresponding to the solution pH value at which the net overall surface charge—and thus the electrophoretic mobility—of the ampholyte sums to zero. Different modern analytical biochemistry and proteomics methods depend on the isoelectric point as a principal feature for protein and peptide characterization. Protein separation by isoelectric point is a critical part of 2-D gel electrophoresis, a key precursor of proteomics, where discrete spots can be digested in-gel, and proteins subsequently identified by analytical mass spectrometry. Peptide fractionation according to their pI is also widely used in current proteomics sample preparation procedures previous to the LC-MS/MS analysis. Therefore accurate theoretical prediction of pI would expedite such analysis. While such pI calculation is widely used, it remains largely untested, motivating our efforts to benchmark pI prediction methods. Results: Using data from the database PIP-DB and one publically available dataset as our reference gold standard, we have undertaken the benchmarking of pI calculation methods. We find that methods vary in their accuracy and are highly sensitive to the choice of basis set. The machine-learning algorithms, especially the SVM-based algorithm, showed a superior performance when studying peptide mixtures. In general, learning-based pI prediction methods (such as Cofactor, SVM and Branca) require a large training dataset and their resulting performance will strongly depend of the quality of that data. In contrast with Iterative methods, machine-learning algorithms have the advantage of being able to add new features to improve the accuracy of prediction. Contact: [email protected] Availability and Implementation: The software and data are freely available at https://github.com/ypriverol/pIR. Supplementary information: Supplementary data are available at Bioinformatics online.

Download Full-text

Sensory Descriptor Analysis of Whisky Lexicons through the Use of Deep Learning

Foods ◽

10.3390/foods10071633 ◽

2021 ◽

Vol 10 (7) ◽

pp. 1633

Author(s):

Chreston Miller ◽

Leah Hamilton ◽

Jacob Lahne

Keyword(s):

Deep Learning ◽

Short Term Memory ◽

Descriptive Analysis ◽

Food Products ◽

Training Dataset ◽

Test Accuracy ◽

Language Structure ◽

Deep Learning Model ◽

Descriptor Analysis ◽

Descriptive Language

This paper is concerned with extracting relevant terms from a text corpus on whisk(e)y. “Relevant” terms are usually contextually defined in their domain of use. Arguably, every domain has a specialized vocabulary used for describing things. For example, the field of Sensory Science, a sub-field of Food Science, investigates human responses to food products and differentiates “descriptive” terms for flavors from “ordinary”, non-descriptive language. Within the field, descriptors are generated through Descriptive Analysis, a method wherein a human panel of experts tastes multiple food products and defines descriptors. This process is both time-consuming and expensive. However, one could leverage existing data to identify and build a flavor language automatically. For example, there are thousands of professional and semi-professional reviews of whisk(e)y published on the internet, providing abundant descriptors interspersed with non-descriptive language. The aim, then, is to be able to automatically identify descriptive terms in unstructured reviews for later use in product flavor characterization. We created two systems to perform this task. The first is an interactive visual tool that can be used to tag examples of descriptive terms from thousands of whisky reviews. This creates a training dataset that we use to perform transfer learning using GloVe word embeddings and a Long Short-Term Memory deep learning model architecture. The result is a model that can accurately identify descriptors within a corpus of whisky review texts with a train/test accuracy of 99% and precision, recall, and F1-scores of 0.99. We tested for overfitting by comparing the training and validation loss for divergence. Our results show that the language structure for descriptive terms can be programmatically learned.

Download Full-text

Energy Conservation Using Cost Minimization for Household Space Heating

Volume 4: 19th Design for Manufacturing and the Life Cycle Conference; 8th International Conference on Micro- and Nanosystems ◽

10.1115/detc2014-35202 ◽

2014 ◽

Author(s):

Mayank Pareek ◽

Rupal Vikas Srivastava ◽

Sara Behdad

Keyword(s):

Line Search ◽

Search Algorithm ◽

Cost Minimization ◽

Insulation Material ◽

Insulation Materials ◽

Building Insulation ◽

Temperature Heat ◽

Quasi Newton ◽

Line Search Algorithm ◽

Decision Making Tool

Building insulation is considered as a solution to reduce the energy cost for both residential and commercial buildings. However, determining the best combination of insulation materials that result into the lowest total ownership cost is now becoming a bigger challenge. Various factors influence the efficiency of heat transfer within a room including geometry and size of the room, ambient temperature, heat and sink sources presented inside the building, type of insulation materials, etc. The aim of this paper is to develop an optimization-based decision making tool to help house owners select the best combination of given insulation materials considering all these factors. The purpose of design approach adopted in this paper is to minimize total ownership cost while providing the required heating in the building. The SQP, Quasi-Newton, line-search algorithm was used to obtain the optimized thermal conductivity values for the combination of insulation material to be used in the walls, floor, ceiling, window and the door of a room, along with the width of the air gap to be kept. The results help in deciding what combination of insulation material will achieve the required heating for the house owner while keep the total cost incurred to be minimum.

Download Full-text

A primal–dual augmented Lagrangian penalty-interior-point filter line search algorithm

Mathematical Methods of Operations Research ◽

10.1007/s00186-017-0625-x ◽

2017 ◽

Vol 87 (3) ◽

pp. 451-483 ◽

Cited By ~ 4

Author(s):

Renke Kuhlmann ◽

Christof Büskens

Keyword(s):

Interior Point ◽

Line Search ◽

Augmented Lagrangian ◽

Search Algorithm ◽

Primal Dual ◽

Line Search Algorithm

Download Full-text

An improved time line search algorithm for manufacturing decision-making

International Journal of Production Research ◽

10.1080/00207543.2013.839892 ◽

2013 ◽

Vol 52 (4) ◽

pp. 1116-1132 ◽

Cited By ~ 3

Author(s):

Miguel Mujica Mota ◽

Miquel Angel Piera

Keyword(s):

Decision Making ◽

Line Search ◽

Search Algorithm ◽

Time Line ◽

Line Search Algorithm

Download Full-text

Adaptive line search algorithm for packet classification

Proceedings 10th IEEE International Conference on Networks (ICON 2002). Towards Network Superiority (Cat. No.02EX588) ◽

10.1109/icon.2002.1033310 ◽

2003 ◽

Author(s):

Pau-Chuan Ting ◽

Yung-Sheng Hsu ◽

Tsern-Huei Lee

Keyword(s):

Line Search ◽

Search Algorithm ◽

Packet Classification ◽

Line Search Algorithm

Download Full-text

Study on Construction Process of Suspension Bridge Based on a Co-Rotational Framework

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.238.709 ◽

2012 ◽

Vol 238 ◽

pp. 709-713

Author(s):

Bing Jian Wang ◽

Jian Yong Song ◽

Jian Ming Lu

Keyword(s):

Search Algorithm ◽

Nonlinear Behavior ◽

Suspension Bridge ◽

Suspension Bridges ◽

Element Formulation ◽

Structural System ◽

Convergence Criteria ◽

Modified Method ◽

Energy Convergence ◽

Line Search Algorithm

Based on a co-rational (CR) framework, a 2-noded element formulation of 3D truss was presented, which was used for accurately modeling of suspension bridges with large displacements and rotations. The CR framework could consider the out-plane stiffness by the geometric stiffness, which was applicable to the analysis of 3D cable bridges. Using the co-rational truss united with the energy convergence criteria and the Newton with Line Search Algorithm, the nonlinear behavior of 3D cable structural system was simulated conveniently and accurately. Therefore, the traditional truss elements based on elastic modulus modified method and complex catenary elements were avoided. In order to simulate the hanging of girder and the structural system changing during the construction, the elements’ killing and activating methods were realized by the modulus modified methods.

Download Full-text