scholarly journals Application of Machine Learning and Weighted Gene Co-expression Network Algorithm to Explore the Hub Genes in the Aging Brain

2021 ◽  
Vol 13 ◽  
Author(s):  
Keping Chai ◽  
Jiawei Liang ◽  
Xiaolin Zhang ◽  
Panlong Cao ◽  
Shufang Chen ◽  
...  

Aging is a major risk factor contributing to neurodegeneration and dementia. However, it remains unclarified how aging promotes these diseases. Here, we use machine learning and weighted gene co-expression network (WGCNA) to explore the relationship between aging and gene expression in the human frontal cortex and reveal potential biomarkers and therapeutic targets of neurodegeneration and dementia related to aging. The transcriptional profiling data of the human frontal cortex from individuals ranging from 26 to 106 years old was obtained from the GEO database in NCBI. Self-Organizing Feature Map (SOM) was conducted to find the clusters in which gene expressions downregulate with aging. For WGCNA analysis, first, co-expressed genes were clustered into different modules, and modules of interest were identified through calculating the correlation coefficient between the module and phenotypic trait (age). Next, the overlapping genes between differentially expressed genes (DEG, between young and aged group) and genes in the module of interest were discovered. Random Forest classifier was performed to obtain the most significant genes in the overlapping genes. The disclosed significant genes were further identified through network analysis. Through WGCNA analysis, the greenyellow module is found to be highly negatively correlated with age, and functions mainly in long-term potentiation and calcium signaling pathways. Through step-by-step filtering of the module genes by overlapping with downregulated DEGs in aged group and Random Forest classifier analysis, we found that MAPT, KLHDC3, RAP2A, RAP2B, ELAVL2, and SYN1 were co-expressed and highly correlated with aging.

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Elisabeth Sartoretti ◽  
Thomas Sartoretti ◽  
Michael Wyss ◽  
Carolin Reischauer ◽  
Luuk van Smoorenburg ◽  
...  

AbstractWe sought to evaluate the utility of radiomics for Amide Proton Transfer weighted (APTw) imaging by assessing its value in differentiating brain metastases from high- and low grade glial brain tumors. We retrospectively identified 48 treatment-naïve patients (10 WHO grade 2, 1 WHO grade 3, 10 WHO grade 4 primary glial brain tumors and 27 metastases) with either primary glial brain tumors or metastases who had undergone APTw MR imaging. After image analysis with radiomics feature extraction and post-processing, machine learning algorithms (multilayer perceptron machine learning algorithm; random forest classifier) with stratified tenfold cross validation were trained on features and were used to differentiate the brain neoplasms. The multilayer perceptron achieved an AUC of 0.836 (receiver operating characteristic curve) in differentiating primary glial brain tumors from metastases. The random forest classifier achieved an AUC of 0.868 in differentiating WHO grade 4 from WHO grade 2/3 primary glial brain tumors. For the differentiation of WHO grade 4 tumors from grade 2/3 tumors and metastases an average AUC of 0.797 was achieved. Our results indicate that the use of radiomics for APTw imaging is feasible and the differentiation of primary glial brain tumors from metastases is achievable with a high degree of accuracy.


2017 ◽  
Vol 25 (3) ◽  
pp. 811-827 ◽  
Author(s):  
Dimitris Spathis ◽  
Panayiotis Vlamos

This study examines the clinical decision support systems in healthcare, in particular about the prevention, diagnosis and treatment of respiratory diseases, such as Asthma and chronic obstructive pulmonary disease. The empirical pulmonology study of a representative sample (n = 132) attempts to identify the major factors that contribute to the diagnosis of these diseases. Machine learning results show that in chronic obstructive pulmonary disease’s case, Random Forest classifier outperforms other techniques with 97.7 per cent precision, while the most prominent attributes for diagnosis are smoking, forced expiratory volume 1, age and forced vital capacity. In asthma’s case, the best precision, 80.3 per cent, is achieved again with the Random Forest classifier, while the most prominent attribute is MEF2575.


In universities, student dropout is a major concern that reflects the university's quality. Some characteristics cause students to drop out of university. A high dropout rate of students affects the university's reputation and the student's careers in the future. Therefore, there's a requirement for student dropout analysis to enhance academic plan and management to scale back student's drop out from the university also on enhancing the standard of the upper education system. The machine learning technique provides powerful methods for the analysis and therefore the prediction of the dropout. This study uses a dataset from a university representative to develop a model for predicting student dropout. In this work, machine- learning models were used to detect dropout rates. Machine learning is being more widely used in the field of knowledge mining diagnostics. Following an examination of certain studies, we observed that dropout detection may be done using several methods. We've even used five dropout detection models. These models are Decision tree, Naïve bayes, Random Forest Classifier, SVM and KNN. We used machine-learning technology to analyze the data, and we discovered that the Random Forest classifier is highly promising for predicting dropout rates, with a training accuracy of 94% and a testing accuracy of 86%.


Author(s):  
Amy Marie Campbell ◽  
Marie-Fanny Racault ◽  
Stephen Goult ◽  
Angus Laurenson

Oceanic and coastal ecosystems have undergone complex environmental changes in recent years, amid a context of climate change. These changes are also reflected in the dynamics of water-borne diseases as some of the causative agents of these illnesses are ubiquitous in the aquatic environment and their survival rates are impacted by changes in climatic conditions. Previous studies have established strong relationships between essential climate variables and the coastal distribution and seasonal dynamics of the bacteria Vibrio cholerae, pathogenic types of which are responsible for human cholera disease. In this study we provide a novel exploration of the potential of a machine learning approach to forecast environmental cholera risk in coastal India, home to more than 200 million inhabitants, utilising atmospheric, terrestrial and oceanic satellite-derived essential climate variables. A Random Forest classifier model is developed, trained and tested on a cholera outbreak dataset over the period 2010–2018 for districts along coastal India. The random forest classifier model has an Accuracy of 0.99, an F1 Score of 0.942 and a Sensitivity score of 0.895, meaning that 89.5% of outbreaks are correctly identified. Spatio-temporal patterns emerged in terms of the model’s performance based on seasons and coastal locations. Further analysis of the specific contribution of each Essential Climate Variable to the model outputs shows that chlorophyll-a concentration, sea surface salinity and land surface temperature are the strongest predictors of the cholera outbreaks in the dataset used. The study reveals promising potential of the use of random forest classifiers and remotely-sensed essential climate variables for the development of environmental cholera-risk applications. Further exploration of the present random forest model and associated essential climate variables is encouraged on cholera surveillance datasets in other coastal areas affected by the disease to determine the model’s transferability potential and applicative value for cholera forecasting systems.


2020 ◽  
Author(s):  
Sonam Wangchuk ◽  
Tobias Bolch

<p>An accurate detection and mapping of glacial lakes in the Alpine regions such as the Himalayas, the Alps and the Andes are challenged by many factors. These factors include 1) a small size of glacial lakes, 2) cloud cover in optical satellite images, 3) cast shadows from mountains and clouds, 4) seasonal snow in satellite images, 5) varying degree of turbidity amongst glacial lakes, and 6) frozen glacial lake surface. In our study, we propose a fully automated approach, that overcomes most of the above mentioned challenges, to detect and map glacial lakes accurately using multi-source data and machine learning techniques such as the random forest classifier algorithm. The multi-source data are from the Sentinel-1 Synthetic Aperture Radar data (radar backscatter), the Sentinel-2 multispectral instrument data (NDWI), and the SRTM digital elevation model (slope). We use these data as inputs for the rule-based segmentation of potential glacial lakes, where decision rules are implemented from the expert system. The potential glacial lake polygons are then classified either as glacial lakes or non-glacial lakes by the trained and tested random forest classifier algorithm. The performance of the method was assessed in eight test sites located across the Alpine regions (e.g. the Boshula mountain range and Koshi basin in the Himalayas, the Tajiks Pamirs, the Swiss Alps and the Peruvian Andes) of the word. We show that the proposed method performs efficiently irrespective of geographic, geologic, climatic, and glacial lake conditions.</p>


2020 ◽  
Vol 184 ◽  
pp. 01011
Author(s):  
Sreethi Musunuru ◽  
Mahaalakshmi Mukkamala ◽  
Latha Kunaparaju ◽  
N V Ganapathi Raju

Though banks hold an abundance of data on their customers in general, it is not unusual for them to track the actions of the creditors regularly to improve the services they offer to them and understand why a lot of them choose to exit and shift to other banks. Analyzing customer behavior can be highly beneficial to the banks as they can reach out to their customers on a personal level and develop a business model that will improve the pricing structure, communication, advertising, and benefits for their customers and themselves. Features like the amount a customer credits every month, his salary per annum, the gender of the customer, etc. are used to classify them using machine learning algorithms like K Neighbors Classifier and Random Forest Classifier. On classifying the customers, banks can get an idea of who will be continuing with them and who will be leaving them in the near future. Our study determines to remove the features that are independent but are not influential to determine the status of the customers in the future without the loss of accuracy and to improve the model to see if this will also increase the accuracy of the results.


2020 ◽  
Vol 8 (6) ◽  
pp. 3912-3914

The main objective of this paper is to build a model to predict the value of stock market prices from the previous year's data. This project starts with collecting the stock price data and pre-processing the data. 12 years dataset is used to train the model by the Random Forest classifier algorithm. Backtesting is the most important part of the quantitative strategy by which the accuracy of the model is obtained. Then the current data is collected from yahoo finance and the data is fed to the model. Then the model will predict the stock that is going to perform well based on its learning from the historical data. This model predicted the stocks with great accuracy and it can be used in the stock market institution for finding the good stock in that index.


Author(s):  
Aqilah Aini Zahra ◽  
Widyawan Widyawan ◽  
Silmi Fauziati

A Twitter bot is a Twitter account programmed to automatically do social activities by sending tweets through a scheduling program. Some bots intend to disseminate useful information such as earthquake and weather information. However, not a few bots have a negative influence, such as broadcasting false news, spam, or become a follower to increase an account's popularity. It can change public sentiments about an issue, decrease user confidence, or even change the social order. Therefore, an application is needed to distinguish between a bot and non-bot accounts. Based on these problems, this paper develops bot detection systems using machine learning for multiclass classification. These classes include human classes, informative, spammers, and fake followers. The model training used guided methods based on labeled training data. First, a dataset of 2,333 accounts was pre-processed to obtain 28 feature sets for classification. This feature set came from analysis of user profiles, temporal analysis, and analysis of tweets with numeric values. Afterward, the data was partitioned, normalized with scaling, and a random forest classifier algorithm was implemented on the data. After that, the features were reselected into 17 feature sets to obtain the highest accuracy achieved by the model. In the evaluation stage, bot detection models generated an accuracy of 96.79%, 97% precision, 96% recall, and an f-1 score of 96%. Therefore, the detection model was classified as having high accuracy. The bot detection model that had been completed was then implemented on the website and deployed to the cloud. In the end, this machine learning-based web application could be accessed and used by the public to detect Twitter bots.


Author(s):  
Pedro Sobreiro ◽  
Pedro Guedes-Carvalho ◽  
Abel Santos ◽  
Paulo Pinheiro ◽  
Celina Gonçalves

The phenomenon of dropout is often found among customers of sports services. In this study we intend to evaluate the performance of machine learning algorithms in predicting dropout using available data about their historic use of facilities. The data relating to a sample of 5209 members was taken from a Portuguese fitness centre and included the variables registration data, payments and frequency, age, sex, non-attendance days, amount billed, average weekly visits, total number of visits, visits hired per week, number of registration renewals, number of members referrals, total monthly registrations, and total member enrolment time, which may be indicative of members’ commitment. Whilst the Gradient Boosting Classifier had the best performance in predicting dropout (sensitivity = 0.986), the Random Forest Classifier was the best at predicting non-dropout (specificity = 0.790); the overall performance of the Gradient Boosting Classifier was superior to the Random Forest Classifier (accuracy 0.955 against 0.920). The most relevant variables predicting dropout were “non-attendance days”, “total length of stay”, and “total amount billed”. The use of decision trees provides information that can be readily acted upon to identify member profiles of those at risk of dropout, giving also guidelines for measures and policies to reduce it.


Sign in / Sign up

Export Citation Format

Share Document