scholarly journals Modeling and Theoretical Analysis of GNSS-R Soil Moisture Retrieval Based on the Random Forest and Support Vector Machine Learning Approach

2020 ◽  
Vol 12 (22) ◽  
pp. 3679
Author(s):  
Yan Jia ◽  
Shuanggen Jin ◽  
Patrizia Savi ◽  
Qingyun Yan ◽  
Wenmei Li

Global Navigation Satellite System-Reflectometry (GNSS-R) as a microwave remote sensing technique can retrieve the Earth’s surface parameters using the GNSS reflected signal from the surface. These reflected signals convey the surface features and therefore can be utilized to detect certain physical properties of the reflecting surface such as soil moisture content (SMC). Up to now, a serial of electromagnetic models (e.g., bistatic radar and Fresnel equations, etc.) are employed and solved for SMC retrieval. However, due to the uncertainty of the physical characteristics of the sites, complexity, and nonlinearity of the inversion process, etc., it is still challenging to accurately retrieve the soil moisture. The popular machine learning (ML) methods are flexible and able to handle nonlinear problems. It can dig out and model the complex interactions between input and output and ultimately make good predictions. In this paper, two typical ML methods, specifically, random forest (RF) and support vector machine (SVM), are employed for SMC retrieval from GNSS-R data of self-designed experiments (in situ and airborne). A comprehensive simulated dataset involving different types of soil is constructed firstly to represent the complex interactions between the variables (reflectivity, elevation angle, dielectric constant, and SMC) for the requirement of training ML regression models. Correspondingly, the main task of soil moisture retrieval (regression) is addressed. Specifically, the post-processed data (reflectivity and elevation angle) from sensor acquisitions are used to make predictions by these two adopted ML methods and compared with the commonly used GNSS-R retrieval method (electromagnetic models). The results show that the RF outperforms the SVM method, and it is more suitable for handling the inversion problem. Moreover, the RF regression model built by the comprehensive dataset demonstrates satisfactory accuracy and strong universality, especially when the soil type is not uniform or unknown. Furthermore, the typical task of detecting water/soil (classification) is discussed. The ML algorithms demonstrate a high potential and efficiency in SMC retrieval from GNSS-R data.

RSC Advances ◽  
2014 ◽  
Vol 4 (106) ◽  
pp. 61624-61630 ◽  
Author(s):  
N. S. Hari Narayana Moorthy ◽  
Silvia A. Martins ◽  
Sergio F. Sousa ◽  
Maria J. Ramos ◽  
Pedro A. Fernandes

Classification models to predict the solvation free energies of organic molecules were developed using decision tree, random forest and support vector machine approaches and with MACCS fingerprints, MOE and PaDEL descriptors.


Witheverypassingsecondsocialnetworkcommunityisgrowingrapidly,becauseofthat,attackershaveshownkeeninterestinthesekindsofplatformsandwanttodistributemischievouscontentsontheseplatforms.Withthefocus on introducing new set of characteristics and features forcounteractivemeasures,agreatdealofstudieshasresearchedthe possibility of lessening the malicious activities on social medianetworks. This research was to highlight features for identifyingspammers on Instagram and additional features were presentedto improve the performance of different machine learning algorithms. Performance of different machine learning algorithmsnamely, Multilayer Perceptron (MLP), Random Forest (RF), K-Nearest Neighbor (KNN) and Support Vector Machine (SVM)were evaluated on machine learning tools named, RapidMinerand WEKA. The results from this research tells us that RandomForest (RF) outperformed all other selected machine learningalgorithmsonbothselectedmachinelearningtools.OverallRandom Forest (RF) provided best results on RapidMiner. Theseresultsareusefulfortheresearcherswhoarekeentobuildmachine learning models to find out the spamming activities onsocialnetworkcommunities.


2020 ◽  
Vol 11 (40) ◽  
pp. 8-23
Author(s):  
Pius MARTHIN ◽  
Duygu İÇEN

Online product reviews have become a valuable source of information which facilitate customer decision with respect to a particular product. With the wealthy information regarding user's satisfaction and experiences about a particular drug, pharmaceutical companies make the use of online drug reviews to improve the quality of their products. Machine learning has enabled scientists to train more efficient models which facilitate decision making in various fields. In this manuscript we applied a drug review dataset used by (Gräβer, Kallumadi, Malberg,& Zaunseder, 2018), available freely from machine learning repository website of the University of California Irvine (UCI) to identify best machine learning model which provide a better prediction of the overall drug performance with respect to users' reviews. Apart from several manipulations done to improve model accuracy, all necessary procedures required for text analysis were followed including text cleaning and transformation of texts to numeric format for easy training machine learning models. Prior to modeling, we obtained overall sentiment scores for the reviews. Customer's reviews were summarized and visualized using a bar plot and word cloud to explore the most frequent terms. Due to scalability issues, we were able to use only the sample of the dataset. We randomly sampled 15000 observations from the 161297 training dataset and 10000 observations were randomly sampled from the 53766 testing dataset. Several machine learning models were trained using 10 folds cross-validation performed under stratified random sampling. The trained models include Classification and Regression Trees (CART), classification tree by C5.0, logistic regression (GLM), Multivariate Adaptive Regression Spline (MARS), Support vector machine (SVM) with both radial and linear kernels and a classification tree using random forest (Random Forest). Model selection was done through a comparison of accuracies and computational efficiency. Support vector machine (SVM) with linear kernel was significantly best with an accuracy of 83% compared to the rest. Using only a small portion of the dataset, we managed to attain reasonable accuracy in our models by applying the TF-IDF transformation and Latent Semantic Analysis (LSA) technique to our TDM.


Author(s):  
Prathima P

Abstract: Fall is a significant national health issue for the elderly people, generally resulting in severe injuries when the person lies down on the floor over an extended period without any aid after experiencing a great fall. Thus, elders need to be cared very attentively. A supervised-machine learning based fall detection approach with accelerometer, gyroscope is devised. The system can detect falls by grouping different actions as fall or non-fall events and the care taker is alerted immediately as soon as the person falls. The public dataset SisFall with efficient class of features is used to identify fall. The Random Forest (RF) and Support Vector Machine (SVM) machine learning algorithms are employed to detect falls with lesser false alarms. The SVM algorithm obtain a highest accuracy of 99.23% than RF algorithm. Keywords: Fall detection, Machine learning, Supervised classification, Sisfall, Activities of daily living, Wearable sensors, Random Forest, Support Vector Machine


Author(s):  
Syaifulloh Amien Pandega Perdana ◽  
Teguh Bharata Aji ◽  
Ridi Ferdiana

Ulasan pelanggan merupakan opini terhadap kualitas barang atau jasa yang dirasakan konsumen. Ulasan pelanggan mengandung informasi yang berguna bagi konsumen maupun penyedia barang atau jasa. Ketersediaan ulasan pelanggan dalam jumlah besar pada website membutuhkan suatu framework untuk mengekstraksi sentimen secara otomatis. Sebuah ulasan pelanggan sering kali mengandung banyak aspek sehingga Aspect Based Sentiment Analysis (ABSA) harus digunakan untuk mengetahui polaritas masing-masing aspek. Salah satu tugas penting dalam ABSA adalah Aspect Category Detection. Metode machine learning untuk Aspect Category Detection sudah banyak dilakukan pada domain berbahasa Inggris, tetapi pada domain bahasa Indonesia masih sedikit. Makalah ini membandingkan kinerja tiga algoritme machine learning, yaitu Naïve Bayes (NB), Support Vector Machine (SVM), dan Random Forest (RF) pada ulasan pelanggan berbahasa Indonesia menggunakan Term Frequency–Inverse Document Frequency (TF-IDF) sebagai term weighting. Hasil menunjukkan bahwa RF memiliki kinerja paling unggul dibandingkan NB dan SVM pada tiga domain yang berbeda, yaitu restoran, hotel, dan e-commerce, dengan nilai f1-score untuk masing-masing domain adalah 84.3%, 85.7%, dan 89,3%.


2020 ◽  
Vol 24 (4) ◽  
pp. 533-554
Author(s):  
Arthur Lula Mota ◽  
Daniel Lima Miquelluti ◽  
Vitor Augusto Ozaki

O seguro agrícola tem ganho maior atenção no Brasil desde o início da década passada, com a implementação do Programa de Subvenção ao Prêmio do Seguro Rural. O presente estudo testou o desempenho de algoritmos de Machine Learning para as seguradoras anteciparem a ocorrência de sinistro, elaborando previsões por meio de dados de apólices e bases de dados climáticas entre os anos de 2006 e 2017. Foram testados os algoritmos Random Forest, Support Vector Machine e k-Nearest Neighbours. O segundo mostrou melhor performance preditiva de sinistros. No entanto, todos os métodos apresentaram baixa capacidade preditiva para a ocorrência de sinistros.


2021 ◽  
Vol 13 (16) ◽  
pp. 3203
Author(s):  
Won-Kyung Baek ◽  
Hyung-Sup Jung

It is well known that the polarization characteristics in X-band synthetic aperture radar (SAR) image analysis can provide us with additional information for marine target classification and detection. Normally, dual-and single-polarized SAR images are acquired by SAR satellites, and then we must determine how accurate the marine mapping performance from dual-polarized (pol) images is versus the marine mapping performance from the single-pol images in a given machine learning model. The purpose of this study is to compare the performance of single- and dual-pol SAR image classification achieved by the support vector machine (SVM), random forest (RF), and deep neural network (DNN) models. The test image is a TerraSAR-X dual-pol image acquired from the 2007 Kerch Strait oil spill event. For this, 824,026 pixels and 1,648,051 pixels were extracted from the image for the training and test, respectively, and sea, ship, oil, and land objects were classified from the image by using the three machine learning methods. The mean f1-scores of the SVM, RF, and DNN models resulting from the single-pol image were approximately 0.822, 0.882, and 0.889, respectively, and those from the dual-pol image were about 0.852, 0.908, and 0.898, respectively. The performance improvement achieved by dual-pol was about 3.6%, 2.9%, and 1% in SVM, RF, and DNN, respectively. The DNN model had the best performance (0.889) in the single-pol test while the RF model was best (0.908) in the dual-pol test. The performance improvement was approximately 2.1% and not noticeable. If the condition that dual-pol images have two-times lower spatial resolution versus single-pol images in the azimuth direction is considered, a small improvement may not be valuable. Therefore, the results show that the performance improvement by X-band dual-pol image may be not remarkable when classifying the sea, ships, oil spills, and sea and land surfaces.


PLoS ONE ◽  
2021 ◽  
Vol 16 (9) ◽  
pp. e0257069
Author(s):  
Jae-Geum Shim ◽  
Kyoung-Ho Ryu ◽  
Sung Hyun Lee ◽  
Eun-Ah Cho ◽  
Sungho Lee ◽  
...  

Objective To construct a prediction model for optimal tracheal tube depth in pediatric patients using machine learning. Methods Pediatric patients aged <7 years who received post-operative ventilation after undergoing surgery between January 2015 and December 2018 were investigated in this retrospective study. The optimal location of the tracheal tube was defined as the median of the distance between the upper margin of the first thoracic(T1) vertebral body and the lower margin of the third thoracic(T3) vertebral body. We applied four machine learning models: random forest, elastic net, support vector machine, and artificial neural network and compared their prediction accuracy to three formula-based methods, which were based on age, height, and tracheal tube internal diameter(ID). Results For each method, the percentage with optimal tracheal tube depth predictions in the test set was calculated as follows: 79.0 (95% confidence interval [CI], 73.5 to 83.6) for random forest, 77.4 (95% CI, 71.8 to 82.2; P = 0.719) for elastic net, 77.0 (95% CI, 71.4 to 81.8; P = 0.486) for support vector machine, 76.6 (95% CI, 71.0 to 81.5; P = 1.0) for artificial neural network, 66.9 (95% CI, 60.9 to 72.5; P < 0.001) for the age-based formula, 58.5 (95% CI, 52.3 to 64.4; P< 0.001) for the tube ID-based formula, and 44.4 (95% CI, 38.3 to 50.6; P < 0.001) for the height-based formula. Conclusions In this study, the machine learning models predicted the optimal tracheal tube tip location for pediatric patients more accurately than the formula-based methods. Machine learning models using biometric variables may help clinicians make decisions regarding optimal tracheal tube depth in pediatric patients.


2021 ◽  
Vol 12 (2) ◽  
pp. 28-55
Author(s):  
Fabiano Rodrigues ◽  
Francisco Aparecido Rodrigues ◽  
Thelma Valéria Rocha Rodrigues

Este estudo analisa resultados obtidos com modelos de machine learning para predição do sucesso de startups. Como proxy de sucesso considera-se a perspectiva do investidor, na qual a aquisição da startup ou realização de IPO (Initial Public Offering) são formas de recuperação do investimento. A revisão da literatura aborda startups e veículos de financiamento, estudos anteriores sobre predição do sucesso de startups via modelos de machine learning, e trade-offs entre técnicas de machine learning. Na parte empírica, foi realizada uma pesquisa quantitativa baseada em dados secundários oriundos da plataforma americana Crunchbase, com startups de 171 países. O design de pesquisa estabeleceu como filtro startups fundadas entre junho/2010 e junho/2015, e uma janela de predição entre junho/2015 e junho/2020 para prever o sucesso das startups. A amostra utilizada, após etapa de pré-processamento dos dados, foi de 18.571 startups. Foram utilizados seis modelos de classificação binária para a predição: Regressão Logística, Decision Tree, Random Forest, Extreme Gradiente Boosting, Support Vector Machine e Rede Neural. Ao final, os modelos Random Forest e Extreme Gradient Boosting apresentaram os melhores desempenhos na tarefa de classificação. Este artigo, envolvendo machine learning e startups, contribui para áreas de pesquisa híbridas ao mesclar os campos da Administração e Ciência de Dados. Além disso, contribui para investidores com uma ferramenta de mapeamento inicial de startups na busca de targets com maior probabilidade de sucesso.   


Sign in / Sign up

Export Citation Format

Share Document