A Machine Learning Approach to the Recognition of Brazilian Atlantic Forest Parrot Species

Mapping Intimacies ◽

10.1101/2019.12.24.888180 ◽

2019 ◽

Author(s):

Bruno Tavares Padovese ◽

Linilson Rodrigues Padovese

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Random Forest ◽

Bird Species ◽

Monitoring Systems ◽

Random Forest Algorithm ◽

Signal Segmentation ◽

Detection And Identification ◽

Landscape Monitoring ◽

Passive Acoustic

AbstractAvian survey is a time-consuming and challenging task, often being conducted in remote and sometimes inhospitable locations. In this context, the development of automated acoustic landscape monitoring systems for bird survey is essential. We conducted a comparative study between two machine learning methods for the detection and identification of 2 endangered Brazilian bird species from the Psittacidae species, the Amazona brasiliensis and the Amazona vinacea. Specifically, we focus on the identification of these 2 species in an acoustic landscape where similar vocalizations from other Psittacidae species are present. A 3-step approach is presented, composed of signal segmentation and filtering, feature extraction, and classification. In the feature extraction step, the Mel-Frequency Cepstrum Coefficients features were extract and fed to the Random Forest Algorithm and the Multilayer Perceptron for training and classifying acoustic samples. The experiments showed promising results, particularly for the Random Forest algorithm, achieving accuracy of up to 99%. Using a combination of signal segmentation and filtering before the feature extraction steps greatly increased experimental results. Additionally, the results show that the proposed approach is robust and flexible to be adopted in passive acoustic monitoring systems.

Download Full-text

Document Preprocessing with TF-IDF to Improve the Polarity Classification Performance of Unstructured Sentiment Analysis

Kinetik Game Technology Information System Computer Network Computing Electronics and Control ◽

10.22219/kinetik.v5i3.1066 ◽

2020 ◽

pp. 235-242

Author(s):

Farrikh Alzami ◽

Erika Devi Udayanti ◽

Dwi Puji Prabowo ◽

Rama Aria Megantara

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Random Forest ◽

Sentiment Analysis ◽

Classification Performance ◽

Document Preparation ◽

Learning Models ◽

Polarity Classification ◽

Negative Sentiment ◽

Machine Learning Models

Sentiment analysis in terms of polarity classification is very important in everyday life, with the existence of polarity, many people can find out whether the respected document has positive or negative sentiment so that it can help in choosing and making decisions. Sentiment analysis usually done manually. Therefore, an automatic sentiment analysis classification process is needed. However, it is rare to find studies that discuss extraction features and which learning models are suitable for unstructured sentiment analysis types with the Amazon food review case. This research explores some extraction features such as Word Bags, TF-IDF, Word2Vector, as well as a combination of TF-IDF and Word2Vector with several machine learning models such as Random Forest, SVM, KNN and Naïve Bayes to find out a combination of feature extraction and learning models that can help add variety to the analysis of polarity sentiments. By assisting with document preparation such as html tags and punctuation and special characters, using snowball stemming, TF-IDF results obtained with SVM are suitable for obtaining a polarity classification in unstructured sentiment analysis for the case of Amazon food review with a performance result of 87,3 percent.

Download Full-text

Classification and photometric redshift estimation of quasars in photometric surveys

Proceedings of the International Astronomical Union ◽

10.1017/s1743921320001829 ◽

2020 ◽

Vol 15 (S359) ◽

pp. 40-41

Author(s):

L. M. Izuti Nakazono ◽

C. Mendes de Oliveira ◽

N. S. T. Hirata ◽

S. Jeram ◽

A. Gonzalez ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Nearest Neighbour ◽

Random Forest Algorithm ◽

Photometric Redshift ◽

Using Data

AbstractWe present a machine learning methodology to separate quasars from galaxies and stars using data from S-PLUS in the Stripe-82 region. In terms of quasar classification, we achieved 95.49% for precision and 95.26% for recall using a Random Forest algorithm. For photometric redshift estimation, we obtained a precision of 6% using k-Nearest Neighbour.

Download Full-text

Effects of air quality on the health of Mediterranean forests

10.5194/egusphere-egu21-16171 ◽

2021 ◽

Author(s):

Adrián García Bruzón ◽

Patricia Arrogante Funes ◽

Laura Muñoz Moral

Keyword(s):

Climate Change ◽

Machine Learning ◽

Random Forest ◽

Aridity Index ◽

Plant Health ◽

Mediterranean Forests ◽

Random Forest Algorithm ◽

The Mediterranean ◽

Heterogeneous Variables ◽

Peninsular Spain

The climate change has turned out to be a determining factor in the development of forest in Spain. Production systems have emitted polluting gases and other particles into the atmosphere, for which some plants have not yet developed adaptation systems. Among the most harmful pollutants for the environment are gases such as nitrous oxides, ozone, particulate matter.However, this condition is not the same in Peninsular Spain, and the Balearic Islands since the plant compositions differ in the territory and the bioclimatic, topographic, and anthropic characteristics. Monitoring the vegetation with sufficient spatial and temporal resolution, studying variables conditioning plant health is a challenge from the nature of the variables and the amount of data to be handled.&#160;The Mediterranean forest is one of the most ecosystem affected by climate change because of usually experimented long periods of drought that, in combination with increased temperatures, can drastically reduce the photosynthetic activity of trees and therefore the biomass of forests.That is why the application of environmental technologies based on Remote Sensing (which provide plant health indices from passive sensors on satellite platforms and other variables of interest), Geographic Information Systems (to integrate, process, analyze spatial and temporal data) and machine learning models (which facilitate the extraction of relationships between variables, conditioning factors and predict patterns).&#160;In this regard, this work's objective is to evaluate the possible effect that different pollutants have on the health of the vegetation, measured from the annual values of the Normalized Difference Vegetation Index (NDVI), in the Mediterranean forests of Peninsular Spain. To achieve this, we are used machine learning techniques using the Random Forest algorithm. The study has also been done with various climatic, topographic, and anthropic variables that characterize the forest to carry it out.&#160;The results showed that certain variables such as the aridity index had generated the NDVI values and therefore plant development, while others are limiting factors such as the concentration of certain pollutants and the direct relationship between them particulates and NOx. This study can verify how the Random Forest algorithm offers reliable results, even when working with heterogeneous variables.&#160;

Download Full-text

Prediction of novel mouse TLR9 agonists using a random forest approach

BMC Molecular and Cell Biology ◽

10.1186/s12860-019-0241-0 ◽

2019 ◽

Vol 20 (S2) ◽

Author(s):

Varun Khanna ◽

Lei Li ◽

Johnson Fung ◽

Shoba Ranganathan ◽

Nikolai Petrovsky

Keyword(s):

Machine Learning ◽

Random Forest ◽

Correlation Coefficient ◽

Matthews Correlation Coefficient ◽

Learning Algorithms ◽

Ensemble Classifier ◽

Innate Immune ◽

Machine Learning Algorithms ◽

Support Vector ◽

Random Forest Algorithm

Abstract Background Toll-like receptor 9 is a key innate immune receptor involved in detecting infectious diseases and cancer. TLR9 activates the innate immune system following the recognition of single-stranded DNA oligonucleotides (ODN) containing unmethylated cytosine-guanine (CpG) motifs. Due to the considerable number of rotatable bonds in ODNs, high-throughput in silico screening for potential TLR9 activity via traditional structure-based virtual screening approaches of CpG ODNs is challenging. In the current study, we present a machine learning based method for predicting novel mouse TLR9 (mTLR9) agonists based on features including count and position of motifs, the distance between the motifs and graphically derived features such as the radius of gyration and moment of Inertia. We employed an in-house experimentally validated dataset of 396 single-stranded synthetic ODNs, to compare the results of five machine learning algorithms. Since the dataset was highly imbalanced, we used an ensemble learning approach based on repeated random down-sampling. Results Using in-house experimental TLR9 activity data we found that random forest algorithm outperformed other algorithms for our dataset for TLR9 activity prediction. Therefore, we developed a cross-validated ensemble classifier of 20 random forest models. The average Matthews correlation coefficient and balanced accuracy of our ensemble classifier in test samples was 0.61 and 80.0%, respectively, with the maximum balanced accuracy and Matthews correlation coefficient of 87.0% and 0.75, respectively. We confirmed common sequence motifs including ‘CC’, ‘GG’,‘AG’, ‘CCCG’ and ‘CGGC’ were overrepresented in mTLR9 agonists. Predictions on 6000 randomly generated ODNs were ranked and the top 100 ODNs were synthesized and experimentally tested for activity in a mTLR9 reporter cell assay, with 91 of the 100 selected ODNs showing high activity, confirming the accuracy of the model in predicting mTLR9 activity. Conclusion We combined repeated random down-sampling with random forest to overcome the class imbalance problem and achieved promising results. Overall, we showed that the random forest algorithm outperformed other machine learning algorithms including support vector machines, shrinkage discriminant analysis, gradient boosting machine and neural networks. Due to its predictive performance and simplicity, the random forest technique is a useful method for prediction of mTLR9 ODN agonists.

Download Full-text

Research on machine learning framework based on random forest algorithm

10.1063/1.4977376 ◽

2017 ◽

Cited By ~ 5

Author(s):

Qiong Ren ◽

Hui Cheng ◽

Hai Han

Keyword(s):

Machine Learning ◽

Random Forest ◽

Random Forest Algorithm ◽

Learning Framework

Download Full-text

Forest Fire Prediction using Machine Learning Models based on DC, Wind and RH

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.f1026.0386s20 ◽

2020 ◽

Vol 8 (6S) ◽

pp. 142-143

Keyword(s):

Machine Learning ◽

Random Forest ◽

Forest Fire ◽

Random Forest Algorithm ◽

Learning Models ◽

Learning Classifier ◽

Machine Learning Models ◽

Classifier Algorithms

The paper points out forest fire prediction using machine learning models on the basis of viz. DC, Wind, RH out of the several machine learning classifier algorithms, It is relevant that random forest algorithm generates optimum accuracy(99.61%).

Download Full-text

A Machine Learning-Based Prediction Model for Cardiovascular Risk in Women With Preeclampsia

Frontiers in Cardiovascular Medicine ◽

10.3389/fcvm.2021.736491 ◽

2021 ◽

Vol 8 ◽

Author(s):

Guan Wang ◽

Yanbo Zhang ◽

Sijin Li ◽

Jun Zhang ◽

Dongkui Jiang ◽

...

Keyword(s):

Machine Learning ◽

Cardiovascular Disease ◽

Cardiovascular Risk ◽

Random Forest ◽

Prediction Model ◽

Disease Risk ◽

Cardiovascular Disease Risk ◽

Machine Learning Algorithms ◽

Brier Score ◽

Random Forest Algorithm

Objective: Preeclampsia affects 2–8% of women and doubles the risk of cardiovascular disease in women after preeclampsia. This study aimed to develop a model based on machine learning to predict postpartum cardiovascular risk in preeclamptic women.Methods: Collecting demographic characteristics and clinical serum markers associated with preeclampsia during pregnancy of 907 preeclamptic women retrospectively, we predicted the cardiovascular risk (ischemic heart disease, ischemic cerebrovascular disease, peripheral vascular disease, chronic kidney disease, metabolic system disease or arterial hypertension). The study samples were divided into training sets and test sets randomly in the ratio of 8:2. The prediction model was developed by 5 different machine learning algorithms, including Random Forest. 10-fold cross-validation was performed on the training set, and the performance of the model was evaluated on the test set.Results: Cardiovascular disease risk occurred in 186 (20.5%) of these women. By weighing area under the curve (AUC), the Random Forest algorithm presented the best performance (AUC = 0.711[95%CI: 0.697–0.726]) and was adopted in the feature selection and the establishment of the prediction model. The most important variables in Random Forest algorithm included the systolic blood pressure, Urea nitrogen, neutrophil count, glucose, and D-Dimer. Random Forest algorithm was well calibrated (Brier score = 0.133) in the test group, and obtained the highest net benefit in the decision curve analysis.Conclusion: Based on the general situation of patients and clinical variables, a new machine learning algorithm was developed and verified for the individualized prediction of cardiovascular risk in post-preeclamptic women.

Download Full-text

Prediction of Lung Cancer Risk using Random Forest Algorithm Based on Kaggle Data Set

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.f7879.038620 ◽

2020 ◽

Vol 8 (6) ◽

pp. 1623-1630

Keyword(s):

Machine Learning ◽

Lung Cancer ◽

Random Forest ◽

Naive Bayes ◽

Early Stage ◽

Naïve Bayes ◽

Training Data ◽

Random Forest Algorithm ◽

Data Set ◽

Wide Range

As huge amount of data accumulating currently, Challenges to draw out the required amount of data from available information is needed. Machine learning contributes to various fields. The fast-growing population caused the evolution of a wide range of diseases. This intern resulted in the need for the machine learning model that uses the patient's datasets. From different sources of datasets analysis, cancer is the most hazardous disease, it may cause the death of the forbearer. The outcome of the conducted surveys states cancer can be nearly cured in the initial stages and it may also cause the death of an affected person in later stages. One of the major types of cancer is lung cancer. It highly depends on the past data which requires detection in early stages. The recommended work is based on the machine learning algorithm for grouping the individual details into categories to predict whether they are going to expose to cancer in the early stage itself. Random forest algorithm is implemented, it results in more efficiency of 97% compare to KNN and Naive Bayes. Further, the KNN algorithm doesn't learn anything from training data but uses it for classification. Naive Bayes results in the inaccuracy of prediction. The proposed system is for predicting the chances of lung cancer by displaying three levels namely low, medium, and high. Thus, mortality rates can be reduced significantly.

Download Full-text

A machine learning framework for the evaluation of myocardial rotation in patients with noncompaction cardiomyopathy

PLoS ONE ◽

10.1371/journal.pone.0260195 ◽

2021 ◽

Vol 16 (11) ◽

pp. e0260195

Author(s):

Marcelo Dantas Tavares de Melo ◽

Jose de Arimatéia Batista Araujo-Filho ◽

José Raimundo Barbosa ◽

Camila Rocon ◽

Carlos Danilo Miranda Regis ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Ejection Fraction ◽

Area Under The Curve ◽

Random Forest Algorithm ◽

Noncompaction Cardiomyopathy ◽

2D Echocardiography ◽

Specific Strain ◽

Lv Ejection Fraction ◽

Sensitivity Specificity

Aims Noncompaction cardiomyopathy (NCC) is considered a genetic cardiomyopathy with unknown pathophysiological mechanisms. We propose to evaluate echocardiographic predictors for rigid body rotation (RBR) in NCC using a machine learning (ML) based model. Methods and results Forty-nine outpatients with NCC diagnosis by echocardiography and magnetic resonance imaging (21 men, 42.8±14.8 years) were included. A comprehensive echocardiogram was performed. The layer-specific strain was analyzed from the apical two-, three, four-chamber views, short axis, and focused right ventricle views using 2D echocardiography (2DE) software. RBR was present in 44.9% of patients, and this group presented increased LV mass indexed (118±43.4 vs. 94.1±27.1g/m2, P = 0.034), LV end-diastolic and end-systolic volumes (P< 0.001), E/e’ (12.2±8.68 vs. 7.69±3.13, P = 0.034), and decreased LV ejection fraction (40.7±8.71 vs. 58.9±8.76%, P < 0.001) when compared to patients without RBR. Also, patients with RBR presented a significant decrease of global longitudinal, radial, and circumferential strain. When ML model based on a random forest algorithm and a neural network model was applied, it found that twist, NC/C, torsion, LV ejection fraction, and diastolic dysfunction are the strongest predictors to RBR with accuracy, sensitivity, specificity, area under the curve of 0.93, 0.99, 0.80, and 0.88, respectively. Conclusion In this study, a random forest algorithm was capable of selecting the best echocardiographic predictors to RBR pattern in NCC patients, which was consistent with worse systolic, diastolic, and myocardium deformation indices. Prospective studies are warranted to evaluate the role of this tool for NCC risk stratification.

Download Full-text

Analysing the Capability of the Catchment's Spectral Signature for the Regionalization of Hydrological Parameters

10.22541/au.162100995.56312514/v1 ◽

2021 ◽

Author(s):

Laura Fragoso-Campón ◽

Pablo Durán-Barroso ◽

Elia Rosado

Keyword(s):

Machine Learning ◽

Random Forest ◽

Physical Properties ◽

Spectral Response ◽

Spectral Signature ◽

Spectral Approach ◽

Hydrological Response ◽

Random Forest Algorithm ◽

Hydrological Parameters ◽

Climatic Environment

Water resource management in ungauged catchments is complex due to the uncertainties around the hydrological parameters that dominate the streamflow behaviour. These parameters are usually defined by regionalization approaches in which hydrological response patterns are transferred from gauged to ungauged basins. Regression-based methods using physical properties derived from cartographic data sources are widely used. The current remote sensing techniques offer us new standpoints in regionalisation processing since the hydrological response depends on the physical attributes related to the spectral responses of the territory. Moreover, machine learning approaches have not been specifically applied to the regionalization of hydrologic parameters. This work studies the capability of a catchment’s spectral response based on Sentinel-1 and Sentinel-2 data to address a regression-based regionalization of hydrological parameters using a machine learning approach. Hydrological modelling was conducted by the HBV-light model. We tested the random forest algorithm in several regionalization scenarios: the new approach using the catchments’ spectral signature, the traditional method using physical properties and a fusion of them. The calibration results were excellent (median KGE = 0.83), and the regionalized parameters obtained with the random forest algorithm achieved good performance in which the three scenarios showed almost the same goodness of fit (median KGE = 0.45 to 0.50). We found that the effectiveness depends on the climatic environment and that predictions in humid catchments exhibited better performance than those in the driest catchments. The physical approach (median KGE= 0.71) exhibited better performance than the spectral approach (median KGE= 0.64) in humid catchments, whereas spectral regionalization (median KGE= 0.33) outperformed the physical scenario in the driest catchments (median KGE= 0.25). Herein, our results confirm that regionalization is still challenging in Mediterranean climate variants where the new spectral approach showed promising results and time series of satellite data could improve seasonal regionalization methodologies.

Download Full-text