scholarly journals An improved catalogue of putative synaptic genes defined by their temporal transcription profiles through an ensemble machine learning approach

2019 ◽  
Author(s):  
Flavio Pazos ◽  
Pablo Soto ◽  
Martín Palazzo ◽  
Gustavo Guerberoff ◽  
Patricio Yankilevich ◽  
...  

Abstract Background. Assembly and function of neuronal synapses require the coordinated expression of a yet undetermined set of genes. Previously, we had trained an ensemble machine learning model to assign a probability of having synaptic function to every protein-coding gene in Drosophila melanogaster. This approach resulted in the publication of a catalogue of 893 genes that was postulated to be very enriched in genes with still undocumented synaptic functions. Since then, the scientific community has experimentally identified 79 new synaptic genes. Here we used these new empirical data to evaluate the predictive power of the catalogue. Then we implemented a series of improvements to the training scheme and the ensemble rules of our model and added the new synaptic genes to the training set, to obtain a new, enhanced catalogue of putative synaptic genes. Results. The retrospective analysis demonstrated that our original catalogue was indeed highly enriched in genes with unknown synaptic function. The changes to the training scheme and the ensemble rules resulted in a catalogue with better predictive power. Finally, training this improved model with an updated training set, that includes all the new synaptic genes, we obtained a new, enhanced catalogue of putative synaptic genes, which we present here announcing a regularly updated version that will be available online at: http://synapticgenes.bnd.edu.uy Conclusions. We show that training a machine learning model solely with the whole-body temporal transcription profiles of known synaptic genes resulted in a catalogue with a significant enrichment in undiscovered synaptic genes. Using new empirical data, we validated our original approach, improved our model an obtained a better catalogue. The utility of this approach is that it reduces the number of genes to be tested through hypothesis-driven experimentation.

BMC Genomics ◽  
2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Flavio Pazos Obregón ◽  
Martín Palazzo ◽  
Pablo Soto ◽  
Gustavo Guerberoff ◽  
Patricio Yankilevich ◽  
...  

Abstract Background Assembly and function of neuronal synapses require the coordinated expression of a yet undetermined set of genes. Previously, we had trained an ensemble machine learning model to assign a probability of having synaptic function to every protein-coding gene in Drosophila melanogaster. This approach resulted in the publication of a catalogue of 893 genes which we postulated to be very enriched in genes with a still undocumented synaptic function. Since then, the scientific community has experimentally identified 79 new synaptic genes. Here we use these new empirical data to evaluate our original prediction. We also implement a series of changes to the training scheme of our model and using the new data we demonstrate that this improves its predictive power. Finally, we added the new synaptic genes to the training set and trained a new model, obtaining a new, enhanced catalogue of putative synaptic genes. Results The retrospective analysis demonstrate that our original catalogue was significantly enriched in new synaptic genes. When the changes to the training scheme were implemented using the original training set we obtained even higher enrichment. Finally, applying the new training scheme with a training set including the 79 new synaptic genes, resulted in an enhanced catalogue of putative synaptic genes. Here we present this new catalogue and announce that a regularly updated version will be available online at: http://synapticgenes.bnd.edu.uy Conclusions We show that training an ensemble of machine learning classifiers solely with the whole-body temporal transcription profiles of known synaptic genes resulted in a catalogue with a significant enrichment in undiscovered synaptic genes. Using new empirical data provided by the scientific community, we validated our original approach, improved our model an obtained an arguably more precise prediction. This approach reduces the number of genes to be tested through hypothesis-driven experimentation and will facilitate our understanding of neuronal function. Availability http://synapticgenes.bnd.edu.uy


2019 ◽  
Author(s):  
Flavio Pazos Obregón ◽  
Martín Palazzo ◽  
Pablo Soto ◽  
Gustavo Guerberoff ◽  
Patricio Yankilevich ◽  
...  

Abstract Background . Assembly and function of neuronal synapses require the coordinated expression of a yet undetermined set of genes. Previously, we had trained an ensemble machine learning model to assign a probability of having synaptic function to every protein-coding gene in Drosophila melanogaster. This approach resulted in the publication of a catalogue of 893 genes which we postulated to be very enriched in genes with a still undocumented synaptic function. Since then, the scientific community has experimentally identified 79 new synaptic genes. Here we use these new empirical data to evaluate our original prediction. We also implement a series of changes to the training scheme of our model and using the new data we demonstrate that this improves its predictive power. Finally, we added the new synaptic genes to the training set and trained a new model, obtaining a new, enhanced catalogue of putative synaptic genes. Results . The retrospective analysis demonstrate that our original catalogue was significantly enriched in new synaptic genes. When the changes to the training scheme were implemented using the original training set we obtained even higher enrichment. Finally, applying the new training scheme with a training set including the 79 new synaptic genes, resulted in an enhanced catalogue of putative synaptic genes. Here we present this new catalogue and announce that a regularly updated version will be available online at: http://synapticgenes.bnd.edu.uy Conclusions . We show that training an ensemble of machine learning classifiers solely with the whole-body temporal transcription profiles of known synaptic genes resulted in a catalogue with a significant enrichment in undiscovered synaptic genes. Using new empirical data provided by the scientific community, we validated our original approach, improved our model an obtained an arguably more precise prediction. This approach reduces the number of genes to be tested through hypothesis-driven experimentation and will facilitate our understanding of neuronal function. Availability : http://synapticgenes.bnd.edu.uy


2019 ◽  
Author(s):  
Flavio Pazos Obregón ◽  
Martín Palazzo ◽  
Pablo Soto ◽  
Gustavo Guerberoff ◽  
Patricio Yankilevich ◽  
...  

Abstract Background . Assembly and function of neuronal synapses require the coordinated expression of a yet undetermined set of genes. Previously, we had trained an ensemble machine learning model to assign a probability of having synaptic function to every protein-coding gene in Drosophila melanogaster. This approach resulted in the publication of a catalogue of 893 genes which we postulated to be very enriched in genes with a still undocumented synaptic function. Since then, the scientific community has experimentally identified 79 new synaptic genes. Here we use these new empirical data to evaluate our original prediction. We also implement a series of changes to the training scheme of our model and using the new data we demonstrate that this improves its predictive power. Finally, we added the new synaptic genes to the training set and trained a new model, obtaining a new, enhanced catalogue of putative synaptic genes. Results . The retrospective analysis demonstrate that our original catalogue was significantly enriched in new synaptic genes. When the changes to the training scheme were implemented using the original training set we obtained even higher enrichment. Finally, applying the new training scheme with a training set including the 79 new synaptic genes, resulted in an enhanced catalogue of putative synaptic genes. Here we present this new catalogue and announce that a regularly updated version will be available online at: http://synapticgenes.bnd.edu.uy Conclusions . We show that training an ensemble of machine learning classifiers solely with the whole-body temporal transcription profiles of known synaptic genes resulted in a catalogue with a significant enrichment in undiscovered synaptic genes. Using new empirical data provided by the scientific community, we validated our original approach, improved our model an obtained an arguably more precise prediction. This approach reduces the number of genes to be tested through hypothesis-driven experimentation and will facilitate our understanding of neuronal function. Availability : http://synapticgenes.bnd.edu.uy


2019 ◽  
Author(s):  
Flavio Pazos Obregón ◽  
Pablo Soto ◽  
Martín Palazzo ◽  
Gustavo Guerberoff ◽  
Patricio Yankilevich ◽  
...  

Abstract Background . Assembly and function of neuronal synapses require the coordinated expression of a yet undetermined set of genes. Previously, we had trained an ensemble machine learning model to assign a probability of having synaptic function to every protein-coding gene in Drosophila melanogaster. This approach resulted in the publication of a catalogue of 893 genes which we postulated to be very enriched in genes with a still undocumented synaptic function. Since then, the scientific community has experimentally identified 79 new synaptic genes. Here we use these new empirical data to evaluate our original prediction. We also implement a series of changes to the training scheme of our model and using the new data we demonstrate that this improves its predictive power. Finally, we added the new synaptic genes to the training set and trained a new model, obtaining a new, enhanced catalogue of putative synaptic genes. Results . The retrospective analysis demonstrate that our original catalogue was significantly enriched in new synaptic genes. When the changes to the training scheme were implemented using the original training set we obtained even higher enrichment. Finally, applying the new training scheme with a training set including the 79 new synaptic genes, resulted in an enhanced catalogue of putative synaptic genes. Here we present this new catalogue and announce that a regularly updated version will be available online at: http://synapticgenes.bnd.edu.uy Conclusions . We show that training an ensemble of machine learning classifiers solely with the whole-body temporal transcription profiles of known synaptic genes resulted in a catalogue with a significant enrichment in undiscovered synaptic genes. Using new empirical data provided by the scientific community, we validated our original approach, improved our model an obtained an arguably more precise prediction. This approach reduces the number of genes to be tested through hypothesis-driven experimentation and will facilitate our understanding of neuronal function. Availability : http://synapticgenes.bnd.edu.uy


IEEE Access ◽  
2020 ◽  
Vol 8 ◽  
pp. 145968-145983 ◽  
Author(s):  
Amirhosein Mosavi ◽  
Ataollah Shirzadi ◽  
Bahram Choubin ◽  
Fereshteh Taromideh ◽  
Farzaneh Sajedi Hosseini ◽  
...  

2020 ◽  
Author(s):  
Chunbo Kang ◽  
Xubin Li ◽  
Xiaoqian Chi ◽  
Yabin Yang ◽  
Haifeng Shan ◽  
...  

Abstract BACKGROUND Accurate preoperative prediction of complicated appendicitis (CA) could help selecting optimal treatment and reducing risks of postoperative complications. The study aimed to develop a machine learning model based on clinical symptoms and laboratory data for preoperatively predicting CA.METHODS 136 patients with clinicopathological diagnosis of acute appendicitis were retrospectively included in the study. The dataset was randomly divided (94: 42) into training and testing set. Predictive models using individual and combined selected clinical and laboratory data features were built separately. Three combined models were constructed using logistic regression (LR), support vector machine (SVM) and random forest (RF) algorithms. The CA prediction performance was evaluated with Receiver Operating Characteristic (ROC) analysis, using the area under the curve (AUC), sensitivity, specificity and accuracy factors.RESULTS The features of the abdominal pain time, nausea and vomiting, the highest temperature, high sensitivity-CRP (hs-CRP) and procalcitonin (PCT) had significant differences in the CA prediction (P<0.001). The ability to predict CA by individual feature was low (AUC<0.8). The prediction by combined features was significantly improved. The AUC of the three models (LR, SVM and RF) in the training set and the testing set were 0.805, 0.888, 0.908 and 0.794, 0.895, 0.761, respectively. The SVM-based model showed a better performance for CA prediction. RF had a higher AUC in the training set, but its poor efficiency in the testing set indicated a poor generalization ability.CONCLUSIONS The SVM machine learning model applying clinical and laboratory data can well predict CA preoperatively which could assist diagnosis in resource limited settings.


Sign in / Sign up

Export Citation Format

Share Document