scholarly journals Forest Vertical Structure Mapping Using Two-Seasonal Optic Images and LiDAR DSM Acquired from UAV Platform through Random Forest, XGBoost, and Support Vector Machine Approaches

2021 ◽  
Vol 13 (21) ◽  
pp. 4282
Author(s):  
Jin-Woo Yu ◽  
Young-Woong Yoon ◽  
Won-Kyung Baek ◽  
Hyung-Sup Jung

Research on the forest structure classification is essential, as it plays an important role in assessing the vitality and diversity of vegetation. However, classifying forest structure involves in situ surveying, which requires considerable time and money, and cannot be conducted directly in some instances; also, the update cycle of the classification data is very late. To overcome these drawbacks, feasibility studies on mapping the forest vertical structure from aerial images using machine learning techniques were conducted. In this study, we investigated (1) the performance improvement of the forest structure classification, using a high-resolution LiDAR-derived digital surface model (DSM) acquired from an unmanned aerial vehicle (UAV) platform and (2) the performance comparison of results obtained from the single-seasonal and two-seasonal data, using random forest (RF), extreme gradient boosting (XGBoost), and support vector machine (SVM). For the performance comparison, the UAV optic and LiDAR data were divided into three cases: (1) only used autumn data, (2) only used winter data, and (3) used both autumn and winter data. From the results, the best model was XGBoost, and the F1 scores achieved using this method were approximately 0.92 in the autumn and winter cases. A remarkable improvement was achieved when both two-seasonal images were used. The F1 score improved by 35.3% from 0.68 to 0.92. This implies that (1) the seasonal variation in the forest vertical structure can be more important than the spatial resolution, and (2) the classification performance achieved from the two-seasonal UAV optic images and LiDAR-derived DSMs can reach 0.9 with the application of an optimal machine learning approach.

2021 ◽  
Vol 12 (2) ◽  
pp. 28-55
Author(s):  
Fabiano Rodrigues ◽  
Francisco Aparecido Rodrigues ◽  
Thelma Valéria Rocha Rodrigues

Este estudo analisa resultados obtidos com modelos de machine learning para predição do sucesso de startups. Como proxy de sucesso considera-se a perspectiva do investidor, na qual a aquisição da startup ou realização de IPO (Initial Public Offering) são formas de recuperação do investimento. A revisão da literatura aborda startups e veículos de financiamento, estudos anteriores sobre predição do sucesso de startups via modelos de machine learning, e trade-offs entre técnicas de machine learning. Na parte empírica, foi realizada uma pesquisa quantitativa baseada em dados secundários oriundos da plataforma americana Crunchbase, com startups de 171 países. O design de pesquisa estabeleceu como filtro startups fundadas entre junho/2010 e junho/2015, e uma janela de predição entre junho/2015 e junho/2020 para prever o sucesso das startups. A amostra utilizada, após etapa de pré-processamento dos dados, foi de 18.571 startups. Foram utilizados seis modelos de classificação binária para a predição: Regressão Logística, Decision Tree, Random Forest, Extreme Gradiente Boosting, Support Vector Machine e Rede Neural. Ao final, os modelos Random Forest e Extreme Gradient Boosting apresentaram os melhores desempenhos na tarefa de classificação. Este artigo, envolvendo machine learning e startups, contribui para áreas de pesquisa híbridas ao mesclar os campos da Administração e Ciência de Dados. Além disso, contribui para investidores com uma ferramenta de mapeamento inicial de startups na busca de targets com maior probabilidade de sucesso.   


2021 ◽  
Vol 4 (2(112)) ◽  
pp. 58-72
Author(s):  
Chingiz Kenshimov ◽  
Zholdas Buribayev ◽  
Yedilkhan Amirgaliyev ◽  
Aisulyu Ataniyazova ◽  
Askhat Aitimov

In the course of our research work, the American, Russian and Turkish sign languages were analyzed. The program of recognition of the Kazakh dactylic sign language with the use of machine learning methods is implemented. A dataset of 5000 images was formed for each gesture, gesture recognition algorithms were applied, such as Random Forest, Support Vector Machine, Extreme Gradient Boosting, while two data types were combined into one database, which caused a change in the architecture of the system as a whole. The quality of the algorithms was also evaluated. The research work was carried out due to the fact that scientific work in the field of developing a system for recognizing the Kazakh language of sign dactyls is currently insufficient for a complete representation of the language. There are specific letters in the Kazakh language, because of the peculiarities of the spelling of the language, problems arise when developing recognition systems for the Kazakh sign language. The results of the work showed that the Support Vector Machine and Extreme Gradient Boosting algorithms are superior in real-time performance, but the Random Forest algorithm has high recognition accuracy. As a result, the accuracy of the classification algorithms was 98.86 % for Random Forest, 98.68 % for Support Vector Machine and 98.54 % for Extreme Gradient Boosting. Also, the evaluation of the quality of the work of classical algorithms has high indicators. The practical significance of this work lies in the fact that scientific research in the field of gesture recognition with the updated alphabet of the Kazakh language has not yet been conducted and the results of this work can be used by other researchers to conduct further research related to the recognition of the Kazakh dactyl sign language, as well as by researchers, engaged in the development of the international sign language


2020 ◽  
Vol 14 ◽  

Breast Cancer (BC) is amongst the most common and leading causes of deaths in women throughout the world. Recently, classification and data analysis tools are being widely used in the medical field for diagnosis, prognosis and decision making to help lower down the risks of people dying or suffering from diseases. Advanced machine learning methods have proven to give hope for patients as this has helped the doctors in early detection of diseases like Breast Cancer that can be fatal, in support with providing accurate outcomes. However, the results highly depend on the techniques used for feature selection and classification which will produce a strong machine learning model. In this paper, a performance comparison is conducted using four classifiers which are Multilayer Perceptron (MLP), Support Vector Machine (SVM), K-Nearest Neighbors (KNN) and Random Forest on the Wisconsin Breast Cancer dataset to spot the most effective predictors. The main goal is to apply best machine learning classification methods to predict the Breast Cancer as benign or malignant using terms such as accuracy, f-measure, precision and recall. Experimental results show that Random forest is proven to achieve the highest accuracy of 99.26% on this dataset and features, while SVM and KNN show 97.78% and 97.04% accuracy respectively. MLP shows the least accuracy of 94.07%. All the experiments are conducted using RStudio as the data mining tool platform.


2021 ◽  
pp. 289-301
Author(s):  
B. Martín ◽  
J. González–Arias ◽  
J. A. Vicente–Vírseda

Our aim was to identify an optimal analytical approach for accurately predicting complex spatio–temporal patterns in animal species distribution. We compared the performance of eight modelling techniques (generalized additive models, regression trees, bagged CART, k–nearest neighbors, stochastic gradient boosting, support vector machines, neural network, and random forest –enhanced form of bootstrap. We also performed extreme gradient boosting –an enhanced form of radiant boosting– to predict spatial patterns in abundance of migrating Balearic shearwaters based on data gathered within eBird. Derived from open–source datasets, proxies of frontal systems and ocean productivity domains that have been previously used to characterize the oceanographic habitats of seabirds were quantified, and then used as predictors in the models. The random forest model showed the best performance according to the parameters assessed (RMSE value and R2). The correlation between observed and predicted abundance with this model was also considerably high. This study shows that the combination of machine learning techniques and massive data provided by open data sources is a useful approach for identifying the long–term spatial–temporal distribution of species at regional spatial scales.


Author(s):  
Harsha A K

Abstract: Since the advent of encryption, there has been a steady increase in malware being transmitted over encrypted networks. Traditional approaches to detect malware like packet content analysis are inefficient in dealing with encrypted data. In the absence of actual packet contents, we can make use of other features like packet size, arrival time, source and destination addresses and other such metadata to detect malware. Such information can be used to train machine learning classifiers in order to classify malicious and benign packets. In this paper, we offer an efficient malware detection approach using classification algorithms in machine learning such as support vector machine, random forest and extreme gradient boosting. We employ an extensive feature selection process to reduce the dimensionality of the chosen dataset. The dataset is then split into training and testing sets. Machine learning algorithms are trained using the training set. These models are then evaluated against the testing set in order to assess their respective performances. We further attempt to tune the hyper parameters of the algorithms, in order to achieve better results. Random forest and extreme gradient boosting algorithms performed exceptionally well in our experiments, resulting in area under the curve values of 0.9928 and 0.9998 respectively. Our work demonstrates that malware traffic can be effectively classified using conventional machine learning algorithms and also shows the importance of dimensionality reduction in such classification problems. Keywords: Malware Detection, Extreme Gradient Boosting, Random Forest, Feature Selection.


Circulation ◽  
2020 ◽  
Vol 142 (Suppl_3) ◽  
Author(s):  
vardhmaan jain ◽  
Vikram Sharma ◽  
Agam Bansal ◽  
Cerise Kleb ◽  
Chirag Sheth ◽  
...  

Background: Post-transplant major adverse cardiovascular events (MACE) are amongst the leading cause of death amongst orthotopic liver transplant(OLT) recipients. Despite years of guideline directed therapy, there are limited data on predictors of post-OLT MACE. We assessed if machine learning algorithms (MLA) can predict MACE and all-cause mortality in patients undergoing OLT. Methods: We tested three MLA: support vector machine, extreme gradient boosting(XG-Boost) and random forest with traditional logistic regression for prediction of MACE and all-cause mortality on a cohort of consecutive patients undergoing OLT at our center between 2008-2019. The cohort was randomly split into a training (80%) and testing (20%) cohort. Model performance was assessed using c-statistic or AUC. Results: We included 1,459 consecutive patients with mean ± SD age 54.2 ± 13.8 years, 32% female who underwent OLT. There were 199 (13.6%) MACE and 289 (20%) deaths at a mean follow up of 4.56 ± 3.3 years. The random forest MLA was the best performing model for predicting MACE [AUC:0.78, 95% CI: 0.70-0.85] as well as mortality [AUC:0.69, 95% CI: 0.61-0.76], with all models performing better when predicting MACE vs mortality. See Table and Figure. Conclusion: Random forest machine learning algorithms were more predictive and discriminative than traditional regression models for predicting major adverse cardiovascular events and all-cause mortality in patients undergoing OLT. Validation and subsequent incorporation of MLA in clinical decision making for OLT candidacy could help risk stratify patients for post-transplant adverse cardiovascular events.


RSC Advances ◽  
2014 ◽  
Vol 4 (106) ◽  
pp. 61624-61630 ◽  
Author(s):  
N. S. Hari Narayana Moorthy ◽  
Silvia A. Martins ◽  
Sergio F. Sousa ◽  
Maria J. Ramos ◽  
Pedro A. Fernandes

Classification models to predict the solvation free energies of organic molecules were developed using decision tree, random forest and support vector machine approaches and with MACCS fingerprints, MOE and PaDEL descriptors.


Witheverypassingsecondsocialnetworkcommunityisgrowingrapidly,becauseofthat,attackershaveshownkeeninterestinthesekindsofplatformsandwanttodistributemischievouscontentsontheseplatforms.Withthefocus on introducing new set of characteristics and features forcounteractivemeasures,agreatdealofstudieshasresearchedthe possibility of lessening the malicious activities on social medianetworks. This research was to highlight features for identifyingspammers on Instagram and additional features were presentedto improve the performance of different machine learning algorithms. Performance of different machine learning algorithmsnamely, Multilayer Perceptron (MLP), Random Forest (RF), K-Nearest Neighbor (KNN) and Support Vector Machine (SVM)were evaluated on machine learning tools named, RapidMinerand WEKA. The results from this research tells us that RandomForest (RF) outperformed all other selected machine learningalgorithmsonbothselectedmachinelearningtools.OverallRandom Forest (RF) provided best results on RapidMiner. Theseresultsareusefulfortheresearcherswhoarekeentobuildmachine learning models to find out the spamming activities onsocialnetworkcommunities.


2020 ◽  
Vol 11 (40) ◽  
pp. 8-23
Author(s):  
Pius MARTHIN ◽  
Duygu İÇEN

Online product reviews have become a valuable source of information which facilitate customer decision with respect to a particular product. With the wealthy information regarding user's satisfaction and experiences about a particular drug, pharmaceutical companies make the use of online drug reviews to improve the quality of their products. Machine learning has enabled scientists to train more efficient models which facilitate decision making in various fields. In this manuscript we applied a drug review dataset used by (Gräβer, Kallumadi, Malberg,& Zaunseder, 2018), available freely from machine learning repository website of the University of California Irvine (UCI) to identify best machine learning model which provide a better prediction of the overall drug performance with respect to users' reviews. Apart from several manipulations done to improve model accuracy, all necessary procedures required for text analysis were followed including text cleaning and transformation of texts to numeric format for easy training machine learning models. Prior to modeling, we obtained overall sentiment scores for the reviews. Customer's reviews were summarized and visualized using a bar plot and word cloud to explore the most frequent terms. Due to scalability issues, we were able to use only the sample of the dataset. We randomly sampled 15000 observations from the 161297 training dataset and 10000 observations were randomly sampled from the 53766 testing dataset. Several machine learning models were trained using 10 folds cross-validation performed under stratified random sampling. The trained models include Classification and Regression Trees (CART), classification tree by C5.0, logistic regression (GLM), Multivariate Adaptive Regression Spline (MARS), Support vector machine (SVM) with both radial and linear kernels and a classification tree using random forest (Random Forest). Model selection was done through a comparison of accuracies and computational efficiency. Support vector machine (SVM) with linear kernel was significantly best with an accuracy of 83% compared to the rest. Using only a small portion of the dataset, we managed to attain reasonable accuracy in our models by applying the TF-IDF transformation and Latent Semantic Analysis (LSA) technique to our TDM.


Sign in / Sign up

Export Citation Format

Share Document