scholarly journals PhotoModPlus: A web server for photosynthetic protein prediction from genome neighborhood features

PLoS ONE ◽  
2021 ◽  
Vol 16 (3) ◽  
pp. e0248682
Author(s):  
Apiwat Sangphukieo ◽  
Teeraphan Laomettachit ◽  
Marasri Ruengjitchatchawalya

A new web server called PhotoModPlus is presented as a platform for predicting photosynthetic proteins via genome neighborhood networks (GNN) and genome neighborhood-based machine learning. GNN enables users to visualize the overview of the conserved neighboring genes from multiple photosynthetic prokaryotic genomes and provides functional guidance on the query input. In the platform, we also present a new machine learning model utilizing genome neighborhood features for predicting photosynthesis-specific functions based on 24 prokaryotic photosynthesis-related GO terms, namely PhotoModGO. The new model performed better than the sequence-based approaches with an F1 measure of 0.872, based on nested five-fold cross-validation. Finally, we demonstrated the applications of the webserver and the new model in the identification of novel photosynthetic proteins. The server is user-friendly, compatible with all devices, and available at bicep.kmutt.ac.th/photomod.

2020 ◽  
Author(s):  
Apiwat Sangphukieo ◽  
Teeraphan Laomettachit ◽  
Marasri Ruengjitchatchawalya

AbstractIdentification of photosynthetic proteins and their functions is essential for understanding and improving photosynthetic efficiency. We present here a new webserver called PhotoModPlus as a platform to predict photosynthetic proteins via genome neighborhood networks (GNN) and a machine learning method. GNN facilitates users to visualize the overview of the conserved neighboring genes from multiple photosynthetic prokaryotic genomes and provides functional guidance to the query input. We also integrated a newly developed machine learning model for predicting photosynthesis-specific functions based on 24 prokaryotic photosynthesis-related GO terms, namely PhotoModGO, into the webserver. The new model was developed using a multi-label classification approach and genome neighborhood features. The performance of the new model was up to 0.872 of F1 measure, which was better than the sequence-based approaches evaluated by nested five-fold cross-validation. Finally, we demonstrated the applications of the webserver and the new model in the identification of novel photosynthetic proteins. The server was user-friendly designed and compatible with all devices and available at http://bicep.kmutt.ac.th/photomod or http://bicep2.kmutt.ac.th/photomod.


2021 ◽  
Vol 3 ◽  
Author(s):  
Paul Buchmann ◽  
Timothy DelSole

This paper shows that skillful week 3–4 predictions of a large-scale pattern of 2 m temperature over the US can be made based on the Nino3.4 index alone, where skillful is defined to be better than climatology. To find more skillful regression models, this paper explores various machine learning strategies (e.g., ridge regression and lasso), including those trained on observations and on climate model output. It is found that regression models trained on climate model output yield more skillful predictions than regression models trained on observations, presumably because of the larger training sample. Nevertheless, the skill of the best machine learning models are only modestly better than ordinary least squares based on the Nino3.4 index. Importantly, this fact is difficult to infer from the parameters of the machine learning model because very different parameter sets can produce virtually identical predictions. For this reason, attempts to interpret the source of predictability from the machine learning model can be very misleading. The skill of machine learning models also are compared to those of a fully coupled dynamical model, CFSv2. The results depend on the skill measure: for mean square error, the dynamical model is slightly worse than the machine learning models; for correlation skill, the dynamical model is only modestly better than machine learning models or the Nino3.4 index. In summary, the best predictions of the large-scale pattern come from machine learning models trained on long climate simulations, but the skill is only modestly better than predictions based on the Nino3.4 index alone.


2020 ◽  
Author(s):  
Adrián Garcia-Recio ◽  
José Carlos Gómez-Tamayo ◽  
Iker Reina ◽  
Mercedes Campillo ◽  
Arnau Cordomí ◽  
...  

AbstractThe massive amount of data generated from genome sequencing have given rise to several mutation predictor tools although no mutation database or predictor tool have been developed specifically for the transmembrane region of membrane proteins.We present TMSNP, a database that currently contains information from 2624 pathogenic and 195964 non-pathogenic reported mutations located on the TM region of membrane proteins. The computed conservation parameters and annotations on these mutations were used to train a machine-learning model that classifies TM mutations as pathogenic or non-pathogenic. The presented tool improves considerably the prediction power of commonly used mutation predictors and additionally represents the first mutation prediction tool specific for TM mutations.TMSNP is available at http://lmc.uab.es/tmsnp/[email protected]


Author(s):  
Christian Kapuku ◽  
Seung-Young Kho ◽  
Dong-Kyu Kim ◽  
Shin-Hyung Cho

New shared mobility services have become increasingly common in many cities and shown potential to address urban transportation challenges. This study aims to analyze the mobility performance of integrating bike-sharing into multimodal transport systems and develop a machine learning model to predict the performance of intermodal trips with bike-sharing compared with those without bike-sharing for a given trip using transit smart card data and bike-sharing GPS data from the city of Seoul. The results suggest that using bike-sharing in the intermodal trips where it performs better than buses could enhance the mobility performance by providing up to 34% savings in travel time per trip compared with the scenarios in which bus is used exclusively for the trips and up to 33% savings when bike-sharing trips are used exclusively. The results of the machine learning models suggest that the random forest classifier outperformed three other classifiers with an accuracy of 90% in predicting the performance of bike-sharing and intermodal transit trips. Further analysis and applications of the mobility performance of bike-sharing in Seoul are presented and discussed.


2021 ◽  
Vol 3 (1) ◽  
Author(s):  
Adrián Garcia-Recio ◽  
José Carlos Gómez-Tamayo ◽  
Iker Reina ◽  
Mercedes Campillo ◽  
Arnau Cordomí ◽  
...  

Abstract The massive amount of data generated from genome sequencing brings tons of newly identified mutations, whose pathogenic/non-pathogenic effects need to be evaluated. This has given rise to several mutation predictor tools that, in general, do not consider the specificities of the various protein groups. We aimed to develop a predictor tool dedicated to membrane proteins, under the premise that their specific structural features and environment would give different responses to mutations compared to globular proteins. For this purpose, we created TMSNP, a database that currently contains information from 2624 pathogenic and 196 705 non-pathogenic reported mutations located in the transmembrane region of membrane proteins. By computing various conservation parameters on these mutations in combination with annotations, we trained a machine-learning model able to classify mutations as pathogenic or not. TMSNP (freely available at http://lmc.uab.es/tmsnp/) improves considerably the prediction power of commonly used mutation predictors trained with globular proteins.


2018 ◽  
Vol 20 (5) ◽  
pp. 1131-1147 ◽  
Author(s):  
N. Caradot ◽  
M. Riechel ◽  
M. Fesneau ◽  
N. Hernandez ◽  
A. Torres ◽  
...  

Abstract Deterioration models can be successfully deployed only if decision-makers trust the modelling outcomes and are aware of model uncertainties. Our study aims to address this issue by developing a set of clearly understandable metrics to assess the performance of sewer deterioration models from an end-user perspective. The developed metrics are used to benchmark the performance of a statistical model, namely, GompitZ based on survival analysis and Markov-chains, and a machine learning model, namely, Random Forest, an ensemble learning method based on decision trees. The models have been trained with the extensive CCTV dataset of the sewer network of Berlin, Germany (115,258 inspections). At network level, both models give satisfactory outcomes with deviations between predicted and inspected condition distributions below 5%. At pipe level, the statistical model does not perform better than a simple random model, which attributes randomly a condition class to each inspected pipe, whereas the machine learning model provides satisfying performance. 66.7% of the pipes inspected in bad condition have been predicted correctly. The machine learning approach shows a strong potential for supporting operators in the identification of pipes in critical condition for inspection programs whereas the statistical approach is more adapted to support strategic rehabilitation planning.


Author(s):  
Celestine Iwendi ◽  
Ebuka Ibeke ◽  
Harshini Eggoni ◽  
Sreerajavenkatareddy Velagala ◽  
Gautam Srivastava

The creation of digital marketing has enabled companies to adopt personalized item recommendations for their customers. This process keeps them ahead of the competition. One of the techniques used in item recommendation is known as item-based recommendation system or item–item collaborative filtering. Presently, item recommendation is based completely on ratings like 1–5, which is not included in the comment section. In this context, users or customers express their feelings and thoughts about products or services. This paper proposes a machine learning model system where 0, 2, 4 are used to rate products. 0 is negative, 2 is neutral, 4 is positive. This will be in addition to the existing review system that takes care of the users’ reviews and comments, without disrupting it. We have implemented this model by using Keras, Pandas and Sci-kit Learning libraries to run the internal work. The proposed approach improved prediction with [Formula: see text] accuracy for Yelp datasets of businesses across 11 metropolitan areas in four countries, along with a mean absolute error (MAE) of [Formula: see text], precision at [Formula: see text], recall at [Formula: see text] and F1-Score at [Formula: see text]. Our model shows scalability advantage and how organizations can revolutionize their recommender systems to attract possible customers and increase patronage. Also, the proposed similarity algorithm was compared to conventional algorithms to estimate its performance and accuracy in terms of its root mean square error (RMSE), precision and recall. Results of this experiment indicate that the similarity recommendation algorithm performs better than the conventional algorithm and enhances recommendation accuracy.


2021 ◽  
Vol 12 ◽  
Author(s):  
Zihao Chen ◽  
Maoli Wang ◽  
Rudy Leon De Wilde ◽  
Ruifa Feng ◽  
Mingqiang Su ◽  
...  

BackgroundImmune checkpoint blockade (ICB) has been approved for the treatment of triple-negative breast cancer (TNBC), since it significantly improved the progression-free survival (PFS). However, only about 10% of TNBC patients could achieve the complete response (CR) to ICB because of the low response rate and potential adverse reactions to ICB.MethodsOpen datasets from The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) were downloaded to perform an unsupervised clustering analysis to identify the immune subtype according to the expression profiles. The prognosis, enriched pathways, and the ICB indicators were compared between immune subtypes. Afterward, samples from the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) dataset were used to validate the correlation of immune subtype with prognosis. Data from patients who received ICB were selected to validate the correlation of the immune subtype with ICB response. Machine learning models were used to build a visual web server to predict the immune subtype of TNBC patients requiring ICB.ResultsA total of eight open datasets including 931 TNBC samples were used for the unsupervised clustering. Two novel immune subtypes (referred to as S1 and S2) were identified among TNBC patients. Compared with S2, S1 was associated with higher immune scores, higher levels of immune cells, and a better prognosis for immunotherapy. In the validation dataset, subtype 1 samples had a better prognosis than sub type 2 samples, no matter in overall survival (OS) (p = 0.00036) or relapse-free survival (RFS) (p = 0.0022). Bioinformatics analysis identified 11 hub genes (LCK, IL2RG, CD3G, STAT1, CD247, IL2RB, CD3D, IRF1, OAS2, IRF4, and IFNG) related to the immune subtype. A robust machine learning model based on random forest algorithm was established by 11 hub genes, and it performed reasonably well with area Under the Curve of the receiver operating characteristic (AUC) values = 0.76. An open and free web server based on the random forest model, named as triple-negative breast cancer immune subtype (TNBCIS), was developed and is available from https://immunotypes.shinyapps.io/TNBCIS/.ConclusionTNBC open datasets allowed us to stratify samples into distinct immunotherapy response subgroups according to gene expression profiles. Based on two novel subtypes, candidates for ICB with a higher response rate and better prognosis could be selected by using the free visual online web server that we designed.


2017 ◽  
Author(s):  
◽  
Meng Zhang

Tongue, the primary taste organ in the mouth, can reflect the whole body's health conditions based on the Traditional Chinese Medical (TCM) theories. Watching the tongue is one of the most common, essential and reliable methods for the TCM doctor to make diagnoses. In this thesis, a new health system is introduced based on tongue image analysis. The technologies adopted in this system ranged from tongue image processing algorithms to machine learning applications. The tongue image algorithms used in this work include image segmentation, tongue recognition and tongue image classification. Image segmentation was used to get rid of other unrelated parts, such as lip, face and neck, while keeping the tongue only. Then two recognition methods were applied to check whether the segmented result is a tongue or not. For different tongue patterns, the Support Vector Machine is applied to train a machine learning model and make predictions to classify the tongue into different labeled groups. An app named 'iTongue' is designed to monitor the body status by taking and processing tongue images in smart phones. The app provides a user-friendly, fast and powerful health tool based on TCM theories. The whole system is implemented in a webbased environment. An advanced portal was developed to connect the users and the TCM doctors. The users will not only obtain the analysis label of tongue images, but also get some life style recommendations based on the tongue image analysis. This portal helps the user understand more about his body status and guide him to adopt a more suitable diet and improve exercise.


Author(s):  
Abdullah Sani Abd Rahman ◽  
◽  
Suraya Masrom ◽  
Rahayu Abdul Rahman ◽  
Roslina Ibrahim

Reseachers have acknowledged that machine learning is useful to be utilized in many different domains of complex real life problem. However, to implement a complete machine learning model involves some technical hurdles such as the steep learning curve, the abundance of the programming skills, the complexities of hyper-parameters, and the lack of user friendly platform to be used for the implementation. This paper provides an insight of a rapid software framework for implementing machine learning. This paper also demonstrates the empirical research results of machine learning classification models from the rapid software framework. Additionally, this paper explains comparisons of results between two platforms of rapid software; the proposed software and Python program. The machine learning model in the two platforms were tested on breast cancer and tax avoidance datasets with Decision Tree algorithm. The results indicated that although the software framework is easier than the programming platform for implementing the machine learning model, the results from the software framework were highly accurate and reliable. Keywords- Software framework, rapid, implementation, machine learning


Sign in / Sign up

Export Citation Format

Share Document