Machine learning approach to segment Saccharomyces cerevisiae yeast cells

Author(s):  
Mohamed Tleis ◽  
Fons Verbeek
2015 ◽  
Vol 12 (3) ◽  
pp. 44-64
Author(s):  
Mohamed Tleis ◽  
Fons J. Verbeek

Summary In biological research, Saccharomyces cerevisiae yeast cells are used to study the behaviour of proteins. This is a time consuming and not completely objective process. Hence, Image analysis platforms are developed to address these problems and to offer analysis per cell as well. The robust segmentation algorithms implemented in such platforms enables us to apply a machine learning approach on the measured cells. Such approach is based on a set of relevant individual cell features extracted from the microscope images of the yeast cells. In this paper, we composed a set of features to represent the intensity and morphology characteristics in a more sophisticated way. These features are based on first and second order histograms and wavelet-based texture measurement. To show the discrimination power of these features, we built a classification model to discriminate between different groups. The building process involved evaluation of a set of classification systems, data sampling techniques, data normalization schemes and attribute selection algorithms. The results show a significant ability to discriminate different cell strains and conditions; subsequently it reveals the benefits of the classification model based on the introduced features. This model is promising in revealing subtle patterns in future high-throughput yeast studies.


2021 ◽  
Author(s):  
Brittany M Berger ◽  
Wayland Yeung ◽  
Arnav Goyal ◽  
Zhongliang Zhou ◽  
Emily R Hildebrandt ◽  
...  

Protein prenylation by farnesyltransferase (FTase) is often described as the targeting of a cysteine-containing motif (CaaX) that is enriched for aliphatic amino acids at the a1 and a2 positions, while quite flexible at the X position. Prenylation prediction methods often rely on these features despite emerging evidence that FTase has broader target specificity than previously considered. Using a machine learning approach and training sets based on canonical (prenylated, proteolyzed, and carboxymethylated) and recently identified shunted motifs (prenylation only), this study aims to improve prenylation predictions with the goal of determining the full scope of prenylation potential among the 8000 possible Cxxx sequence combinations. Further, this study aims to subdivide the prenylated sequences as either shunted (i.e., uncleaved) or cleaved (i.e., canonical). Predictions were determined for Saccharomyces cerevisiae FTase and compared to results derived using currently available prenylation prediction methods. In silico predictions were further evaluated using in vivo methods coupled to two yeast reporters, the yeast mating pheromone a-factor and Hsp40 Ydj1p, that represent proteins with canonical and shunted CaaX motifs, respectively. Our machine learning based approach expands the repertoire of predicted FTase targets and provides a framework for functional classification.


Diabetes ◽  
2020 ◽  
Vol 69 (Supplement 1) ◽  
pp. 1552-P
Author(s):  
KAZUYA FUJIHARA ◽  
MAYUKO H. YAMADA ◽  
YASUHIRO MATSUBAYASHI ◽  
MASAHIKO YAMAMOTO ◽  
TOSHIHIRO IIZUKA ◽  
...  

2020 ◽  
Author(s):  
Clifford A. Brown ◽  
Jonny Dowdall ◽  
Brian Whiteaker ◽  
Lauren McIntyre

2017 ◽  
Author(s):  
Sabrina Jaeger ◽  
Simone Fulle ◽  
Samo Turk

Inspired by natural language processing techniques we here introduce Mol2vec which is an unsupervised machine learning approach to learn vector representations of molecular substructures. Similarly, to the Word2vec models where vectors of closely related words are in close proximity in the vector space, Mol2vec learns vector representations of molecular substructures that are pointing in similar directions for chemically related substructures. Compounds can finally be encoded as vectors by summing up vectors of the individual substructures and, for instance, feed into supervised machine learning approaches to predict compound properties. The underlying substructure vector embeddings are obtained by training an unsupervised machine learning approach on a so-called corpus of compounds that consists of all available chemical matter. The resulting Mol2vec model is pre-trained once, yields dense vector representations and overcomes drawbacks of common compound feature representations such as sparseness and bit collisions. The prediction capabilities are demonstrated on several compound property and bioactivity data sets and compared with results obtained for Morgan fingerprints as reference compound representation. Mol2vec can be easily combined with ProtVec, which employs the same Word2vec concept on protein sequences, resulting in a proteochemometric approach that is alignment independent and can be thus also easily used for proteins with low sequence similarities.


Sign in / Sign up

Export Citation Format

Share Document