scholarly journals Identifying Protein Features and Pathways Responsible for Toxicity using Machine learning, CANDO, and Tox21 datasets: Implications for Predictive Toxicology

2021 ◽  
Author(s):  
Lama Moukheiber ◽  
William Mangione ◽  
Saeed Maleki ◽  
Zackary Falls ◽  
Mingchen Gao ◽  
...  

Humans are exposed to numerous compounds daily, some of which have adverse effects on health. Computational approaches for modeling toxicological data in conjunction with machine learning algorithms have gained popularity over the last few years. Machine learning methods have been used to predict toxicity-related biological activities using chemical structure descriptors. However, proteomic features have not been fully investigated. In this study, we construct a computational model using machine learning for selecting the most important proteins representing features in predicting the toxicity of the compounds in the Tox21 dataset using the multiscale Computational Analysis of Novel Drug Opportunities (CANDO) platform for therapeutic discovery. Tox21 is a highly imbalanced dataset consisting of twelve in-vitro assays, seven from the nuclear receptor (NR) signaling pathway and five from the stress response (SR) pathway, for more than 10,000 compounds. For our computational model, we employed a random forest (RF) with the combination of Synthetic Minority Oversampling Technique (SMOTE) and Edited Nearest Neighbor (ENN) method, aka SMOTE+ENN, which is resampling method to balance the activity class distribution. Within the NR and SR pathways, the activity of the aryl hydrocarbon receptor (NR-AhR), toxicity mediating transcription factor, and microchondrial membrane potential (SR-MMP) were two of the top-performing twelve toxicity endpoints with AUROCs of 0.90 and 0.92, respectively. The top extracted features for evaluating compound toxicity were passed into enrichment analysis to highlight the implicated biological pathways and proteins. We validated our enrichment results for the activity of the AhR using a thorough literature search. Our case study showed that the selected enriched pathways and proteins from our computational pipeline are not only correlated with NR-AhR toxicity but also form a cascading upstream/downstream arrangement. Our work elucidates significant relationships between protein and compound interactions computed using CANDO and the associated biological pathways to which the proteins belong, with twelve toxicity endpoints. This novel study uses machine learning not only to predict and understand toxicity but also elucidates therapeutic mechanisms at a proteomic level for a variety of toxicity endpoints.

Author(s):  
Yu Shao ◽  
Xinyue Wang ◽  
Wenjie Song ◽  
Sobia Ilyas ◽  
Haibo Guo ◽  
...  

With the increasing aging population in modern society, falls as well as fall-induced injuries in elderly people become one of the major public health problems. This study proposes a classification framework that uses floor vibrations to detect fall events as well as distinguish different fall postures. A scaled 3D-printed model with twelve fully adjustable joints that can simulate human body movement was built to generate human fall data. The mass proportion of a human body takes was carefully studied and was reflected in the model. Object drops, human falling tests were carried out and the vibration signature generated in the floor was recorded for analyses. Machine learning algorithms including K-means algorithm and K nearest neighbor algorithm were introduced in the classification process. Three classifiers (human walking versus human fall, human fall versus object drop, human falls from different postures) were developed in this study. Results showed that the three proposed classifiers can achieve the accuracy of 100, 85, and 91%. This paper developed a framework of using floor vibration to build the pattern recognition system in detecting human falls based on a machine learning approach.


Sensors ◽  
2021 ◽  
Vol 21 (4) ◽  
pp. 1274
Author(s):  
Daniel Bonet-Solà ◽  
Rosa Ma Alsina-Pagès

Acoustic event detection and analysis has been widely developed in the last few years for its valuable application in monitoring elderly or dependant people, for surveillance issues, for multimedia retrieval, or even for biodiversity metrics in natural environments. For this purpose, sound source identification is a key issue to give a smart technological answer to all the aforementioned applications. Diverse types of sounds and variate environments, together with a number of challenges in terms of application, widen the choice of artificial intelligence algorithm proposal. This paper presents a comparative study on combining several feature extraction algorithms (Mel Frequency Cepstrum Coefficients (MFCC), Gammatone Cepstrum Coefficients (GTCC), and Narrow Band (NB)) with a group of machine learning algorithms (k-Nearest Neighbor (kNN), Neural Networks (NN), and Gaussian Mixture Model (GMM)), tested over five different acoustic environments. This work has the goal of detailing a best practice method and evaluate the reliability of this general-purpose algorithm for all the classes. Preliminary results show that most of the combinations of feature extraction and machine learning present acceptable results in most of the described corpora. Nevertheless, there is a combination that outperforms the others: the use of GTCC together with kNN, and its results are further analyzed for all the corpora.


2021 ◽  
Vol 20 ◽  
pp. 117693512110092
Author(s):  
Abicumaran Uthamacumaran ◽  
Narjara Gonzalez Suarez ◽  
Abdoulaye Baniré Diallo ◽  
Borhane Annabi

Background: Vasculogenic mimicry (VM) is an adaptive biological phenomenon wherein cancer cells spontaneously self-organize into 3-dimensional (3D) branching network structures. This emergent behavior is considered central in promoting an invasive, metastatic, and therapy resistance molecular signature to cancer cells. The quantitative analysis of such complex phenotypic systems could require the use of computational approaches including machine learning algorithms originating from complexity science. Procedures: In vitro 3D VM was performed with SKOV3 and ES2 ovarian cancer cells cultured on Matrigel. Diet-derived catechins disruption of VM was monitored at 24 hours with pictures taken with an inverted microscope. Three computational algorithms for complex feature extraction relevant for 3D VM, including 2D wavelet analysis, fractal dimension, and percolation clustering scores were assessed coupled with machine learning classifiers. Results: These algorithms demonstrated the structure-to-function galloyl moiety impact on VM for each of the gallated catechin tested, and shown applicable in quantifying the drug-mediated structural changes in VM processes. Conclusions: Our study provides evidence of how appropriate 3D VM compression and feature extractors coupled with classification/regression methods could be efficient to study in vitro drug-induced perturbation of complex processes. Such approaches could be exploited in the development and characterization of drugs targeting VM.


2021 ◽  
pp. 1-17
Author(s):  
Ahmed Al-Tarawneh ◽  
Ja’afer Al-Saraireh

Twitter is one of the most popular platforms used to share and post ideas. Hackers and anonymous attackers use these platforms maliciously, and their behavior can be used to predict the risk of future attacks, by gathering and classifying hackers’ tweets using machine-learning techniques. Previous approaches for detecting infected tweets are based on human efforts or text analysis, thus they are limited to capturing the hidden text between tweet lines. The main aim of this research paper is to enhance the efficiency of hacker detection for the Twitter platform using the complex networks technique with adapted machine learning algorithms. This work presents a methodology that collects a list of users with their followers who are sharing their posts that have similar interests from a hackers’ community on Twitter. The list is built based on a set of suggested keywords that are the commonly used terms by hackers in their tweets. After that, a complex network is generated for all users to find relations among them in terms of network centrality, closeness, and betweenness. After extracting these values, a dataset of the most influential users in the hacker community is assembled. Subsequently, tweets belonging to users in the extracted dataset are gathered and classified into positive and negative classes. The output of this process is utilized with a machine learning process by applying different algorithms. This research build and investigate an accurate dataset containing real users who belong to a hackers’ community. Correctly, classified instances were measured for accuracy using the average values of K-nearest neighbor, Naive Bayes, Random Tree, and the support vector machine techniques, demonstrating about 90% and 88% accuracy for cross-validation and percentage split respectively. Consequently, the proposed network cyber Twitter model is able to detect hackers, and determine if tweets pose a risk to future institutions and individuals to provide early warning of possible attacks.


Author(s):  
Sandy C. Lauguico ◽  
◽  
Ronnie S. Concepcion II ◽  
Jonnel D. Alejandrino ◽  
Rogelio Ruzcko Tobias ◽  
...  

The arising problem on food scarcity drives the innovation of urban farming. One of the methods in urban farming is the smart aquaponics. However, for a smart aquaponics to yield crops successfully, it needs intensive monitoring, control, and automation. An efficient way of implementing this is the utilization of vision systems and machine learning algorithms to optimize the capabilities of the farming technique. To realize this, a comparative analysis of three machine learning estimators: Logistic Regression (LR), K-Nearest Neighbor (KNN), and Linear Support Vector Machine (L-SVM) was conducted. This was done by modeling each algorithm from the machine vision-feature extracted images of lettuce which were raised in a smart aquaponics setup. Each of the model was optimized to increase cross and hold-out validations. The results showed that KNN having the tuned hyperparameters of n_neighbors=24, weights='distance', algorithm='auto', leaf_size = 10 was the most effective model for the given dataset, yielding a cross-validation mean accuracy of 87.06% and a classification accuracy of 91.67%.


Author(s):  
Robert Ancuceanu ◽  
Marilena Viorica Hovanet ◽  
Adriana Iuliana Anghel ◽  
Florentina Furtunescu ◽  
Monica Neagu ◽  
...  

Drug induced liver injury (DILI) remains one of the challenges in the safety profile of both authorized drugs and candidate drugs and predicting hepatotoxicity from the chemical structure of a substance remains a challenge worth pursuing, being also coherent with the current tendency for replacing non-clinical tests with in vitro or in silico alternatives. In 2016 a group of researchers from FDA published an improved annotated list of drugs with respect to their DILI risk, constituting “the largest reference drug list ranked by the risk for developing drug-induced liver injury in humans”, DILIrank. This paper is one of the few attempting to predict liver toxicity using the DILIrank dataset. Molecular descriptors were computed with the Dragon 7.0 software, and a variety of feature selection and machine learning algorithms were implemented in the R computing environment. Nested (double) cross-validation was used to externally validate the models selected. A number of 78 models with reasonable performance have been selected and stacked through several approaches, including the building of multiple meta-models. The performance of the stacked models was slightly superior to other models published. The models were applied in a virtual screening exercise on over 100,000 compounds from the ZINC database and about 20% of them were predicted to be non-hepatotoxic.


2020 ◽  
Author(s):  
Vagner Seibert ◽  
Ricardo Araújo ◽  
Richard McElligott

To guarantee a high indoor air quality is an increasingly important task. Sensors measure pollutants in the air and allow for monitoring and controlling air quality. However, all sensors are susceptible to failures, either permanent or transitory, that can yield incorrect readings. Automatically detecting such faulty readings is therefore crucial to guarantee sensors' reliability. In this paper we evaluate three Machine Learning algorithms applied to the task of classifying a single reading from a sensor as faulty or not, comparing them to standard statistical approaches. We show that all tested machine learning methods -- Multi-layer Perceptron, K-Nearest Neighbor and Random Forest -- outperform their statistical counterparts, both by allowing better separation boundaries and by allowing for the use of contextual information. We further show that this result does not depend on the amount of data, but ML methods are able to continue to improve as more data is made available.


Machine Learning is empowering many aspects of day-to-day lives from filtering the content on social networks to suggestions of products that we may be looking for. This technology focuses on taking objects as image input to find new observations or show items based on user interest. The major discussion here is the Machine Learning techniques where we use supervised learning where the computer learns by the input data/training data and predict result based on experience. We also discuss the machine learning algorithms: Naïve Bayes Classifier, K-Nearest Neighbor, Random Forest, Decision Tress, Boosted Trees, Support Vector Machine, and use these classifiers on a dataset Malgenome and Drebin which are the Android Malware Dataset. Android is an operating system that is gaining popularity these days and with a rise in demand of these devices the rise in Android Malware. The traditional techniques methods which were used to detect malware was unable to detect unknown applications. We have run this dataset on different machine learning classifiers and have recorded the results. The experiment result provides a comparative analysis that is based on performance, accuracy, and cost.


Blood ◽  
2020 ◽  
Vol 136 (Supplement 1) ◽  
pp. 33-34 ◽  
Author(s):  
Yazan Rouphail ◽  
Nathan Radakovich ◽  
Jacob Shreve ◽  
Sudipto Mukherjee ◽  
Babal K. Jha ◽  
...  

Background Multi-omic analysis can identify unique signatures that correlate with cancer subtypes. While clinically meaningful molecular subtypes of AML have been defined based on the status of single genes such as NPM1 and FLT3, such categories remain heterogeneous and further work is needed to characterize their genetic and transcriptomic diversity on a truly individualized basis. Further, patients (pts) with NPM1+/FLT3-ITD- AML have a better overall survival compared to patients with NPM1-/FLT3-ITD+, suggesting that these pts could have different transcriptomic signature that impact phenotype, pathophysiology, and outcomes. Many current transcriptome analytic techniques use clustering analysis to aggregate samples and look at relationships on a cohort-wide basis to build transcriptomic signatures that correlate with phenotype or outcome. Such approaches can undermine the heterogeneity of the gene expression in pts with the same signatures. In this study, we took advantage of state of the art machine learning algorithms to identify unique transcriptomic signatures that correlate with AML genomic phenotype. Methods Genomic (whole exome sequencing and targeted deep sequencing) and transcriptomic data from 451 AML pts included in the Beat AML study (publicly available data) were used to build transcriptomic signatures that are specific for AML patients with NPM1+/FLT3-ITD+ compared to NPM1+/FLT3-ITD, and NPM1-/FLT3-ITD-. We chose these AML phenotypes as they have been described extensively and they correlate with clinical outcomes. Results A total of 242 patients (54%) had NPM1-/FLT3-, 35 (8%) were NPM1+/FLT3-, and 47 (10%) were NPM1+/FLT3+. Our algorithm identified 20 genes that are highly specific for NPM1/FLT3ITD phenotype: HOXB-AS3, SCRN1, LMX1B, PCBD1, DNAJC15, HOXA3, NPTXq, RP11-1055B8, ABDH128, HOXB8, SOCS2, HOXB3, HOXB9, MIR503HG, FAM221B, NRP1, NDUFAF3, MEG3, CCDC136, and HIST1H2BC. Interestingly, several of those genes were overexpressed or underexpressed in specific phenotypes. For example, SCRN1, LMX1B, RP11-1055B8, ABDH128, HOXB8, MIR503HG, NRP1 are only overexpressed or underexpressed in patients with NPM1-/FLT3-, while PCBD1, NDUFAF3, FAM221B are overexpressed or underexpressed in pts with NPM1+/FLT3+. These genes affect several important pathways that regulate cell differentiation, proliferation, mitochondrial oxidative phosphorylation, histone modification and lipid metabolism. All these genes had previously been reported as having altered expression in genomic studies of AML, confirming our approach's ability to identify biologically meaningful relationships. Further, our algorithm can provide a personalized explanation of overexpressed and underexpressed genes specific for a given patient, thus identifying targetable pathways for each pt. Figure 1 below shows three pts with the same genotype (NPM1+/FLT3-ITD+) but demonstrate different transcriptomic patterns of overexpression or underexpression that affect different biological pathways. Conclusions We describe the use of a state of the art explainable machine learning approach to define transcriptomic signatures that are specific for individual pts. In addition to correctly distinguishing AML subtype based on specific transcriptomic signatures, our model was able to accurately identify upregulated and downregulated genes that affecte several important biological pathways in AML and can summarize these pathways at an individual level. Such an approach can be used to provide personalized treatment options that can target the activated pathways at an individual level. Disclosures Mukherjee: Partnership for Health Analytic Research, LLC (PHAR, LLC): Honoraria; Novartis: Consultancy, Membership on an entity's Board of Directors or advisory committees, Research Funding; EUSA Pharma: Consultancy; Celgene/Acceleron: Membership on an entity's Board of Directors or advisory committees; Bristol Myers Squib: Honoraria; Aplastic Anemia and MDS International Foundation: Honoraria; Celgene: Consultancy, Honoraria, Research Funding. Maciejewski:Alexion, BMS: Speakers Bureau; Novartis, Roche: Consultancy, Honoraria. Sekeres:BMS: Consultancy; Takeda/Millenium: Consultancy; Pfizer: Consultancy. Nazha:Jazz: Research Funding; Incyte: Speakers Bureau; Novartis: Speakers Bureau; MEI: Other: Data monitoring Committee.


Sign in / Sign up

Export Citation Format

Share Document