random forest classification
Recently Published Documents


TOTAL DOCUMENTS

427
(FIVE YEARS 278)

H-INDEX

22
(FIVE YEARS 7)

2022 ◽  
Vol 12 ◽  
Author(s):  
Meng Ren ◽  
Diao zhu Lin ◽  
Zhi Peng Liu ◽  
Kan Sun ◽  
Chuan Wang ◽  
...  

BackgroundIdentifying the metabolite profile of individuals with prediabetes who turned to type 2 diabetes (T2D) may give novel insights into early T2D interception. The purpose of this study was to identify metabolic markers that predict the development of T2D from prediabetes in a Chinese population.MethodsWe used an untargeted metabolomics approach to investigate the associations between serum metabolites and risk of prediabetes who turned to overt T2D (n=153, mean follow up 5 years) in a Chinese population (REACTION study). Results were compared with matched controls who had prediabetes at baseline [age: 56 ± 7 years old, body mass index (BMI): 24.2 ± 2.8 kg/m2] and at a 5-year follow-up [age: 61 ± 7 years old, BMI: 24.5 ± 3.1 kg/m2]. Confounding factors were adjusted and the associations between metabolites and diabetes risk were evaluated with multivariate logistic regression analysis. A 10-fold cross-validation random forest classification (RFC) model was used to select the optimal metabolites panels for predicting the development of diabetes, and to internally validate the discriminatory capability of the selected metabolites beyond conventional clinical risk factors.FindingsMetabolic alterations, including those associated with amino acid and lipid metabolism, were associated with an increased risk of prediabetes progressing to diabetes. The most important metabolites were inosine [odds ratio (OR) = 19.00; 95% confidence interval (CI): 4.23-85.37] and carvacrol (OR = 17.63; 95% CI: 4.98-62.34). Thirteen metabolites were found to improve T2D risk prediction beyond eight conventional T2D risk factors [area under the curve (AUC) was 0.98 for risk factors + metabolites vs 0.72 for risk factors, P < 0.05].InterpretationsUse of the metabolites identified in this study may help determine patients with prediabetes who are at highest risk of progressing to diabetes.


2022 ◽  
Vol 8 (2) ◽  
pp. 127-136
Author(s):  
Rahmatia Susanti ◽  
S. Supriatna ◽  
R. Rokhmatulah ◽  
Masita Dwi Mandini Manessa ◽  
Aris Poniman ◽  
...  

The national primary always growth and increase in line with the increase in population, such as the rise of rice consumption in Indonesia.  Paddy productivity influenced by the physical condition of the land and the declining of those factors can detected from the environmental vulnerability parameters. Purpose of this study was to compile a spatial model of paddy productivity based on environmental vulnerability in each planting phase using the remote sensing and GIS technology approaches. This spatial model is compiled based on the results of the application of two models, namely spatial model of paddy planting phase and paddy productivity. The spatial model of paddy planting phase obtained from the analysis of vegetation index from Sentinel-2A imagery using the random forest classification model. The variables for building the spatial model of the paddy planting phase are a combination of NDVI vegetation index, EVI, SAVI, NDWI, and time variables. The overall accuracy of the paddy planting phase model is 0.92 which divides the paddy planting phase into the initial phase of planting, vegetative phase, generative phase, and fallow phase. The paddy productivity model obtained from environmental vulnerability analysis with GIS using the linear regression method. The variables used are environmental vulnerability variables which consist of hazards from floods, droughts, landslides, and rainfall. Estimation of paddy productivity based on the influence of environmental vulnerability has the best accuracy done at the vegetative phase of 0.63 and the generative phase of 0.61 while in the initial phase of planting cannot be used because it has a weak relationship with an accuracy of 0.35.


2021 ◽  
Vol 14 (6) ◽  
pp. 3225
Author(s):  
Juarez Antonio da Silva Júnior ◽  
Ubiratan Joaquim da Silva Júnior ◽  
Admilson Da Penha Pacheco

A disponibilidade gratuita de dados de sensoriamento remoto em áreas atingidas por incêndios florestais em escala global oferece a oportunidade de geração sistemática de produtos terrestres de média resolução espacial, porém as conhecidas limitações de precisão é objeto de estudo em todo o mundo. Este artigo tem como objetivo analisar a acurácia da detecção de áreas queimadas utilizando o classificador Random Forest (RF) por meio de uma cena do sensor Radiômetro de Imagem Infravermelho Visível (VIIRS) (1Km) em quatro pontos da savana brasileira. Os resultados foram validados através dos produtos de referência espacial de áreas queimadas: Aq30m, Fire_cci e MCD64A1 por meio de uma abordagem estratificada possibilitando a amostragem dos dados no espaço e tempo. Os modelos de RF avaliados com seus parâmetros de entrada, em que, incluiu-se 400 árvores e um atributo, fornecendo uma taxa de erro abaixo de 4%. Os resultados mostraram que o mapeamento validado com o produto Aq30m apresentou importantes estimativas de Coeficiente de Sorensen-Dice enquanto a validação realizada entre os modelos globais, o MCD64A1 mostrou-se com maior exatidão (>50%) principalmente em feições de áreas queimadas de grandes proporções (> 200Km²). Em particular, a análise sugere que a validação de produtos de área queimada sempre deve estar ligada ao tempo mínimo da data dos dados de validação e o tamanho da área atingida pelo fogo. Os resultados mostram que esta abordagem é muito útil para ser usado para determinar áreas de floresta queimada.      Accuracy analysis for mapping burnt areas using a 1Km VIIRS scene and Random Forest classification A B S T R A C TThe availability of remote sensing data with medium spatial resolution has offered several mapping possibilities for areas affected by forest fires on the Earth's surface. In this context, the analysis of sensor spatial accuracy limitations has been the subject of global research. The objective of this study was to analyze the mapping accuracy of the VIIRS sensor on board the NOAA satellite, using the Random Forest (RF) classifier for the detection of burned areas, in four points of the Chapada dos Veadeiros National Park - Goiás, inserted in the Brazilian savanna. The methodology consisted in validating the classification using the Sorensen-Dice coefficient (SD) in a stratified approach, using as reference the products: Aq30m, Fire_cci and MCD64A1. As a result, the RF models, included 400 trees and one attribute, with an error of less than 4%. Among the global models, the MCD64A1 presented a significant accuracy, greater than 50%, especially in features of burned areas greater than 200Km². Thus, the data suggest that the quality of accuracy of the validation process of mapping products for burned areas is associated with the minimum time interval of availability of validation data and the size of the area affected by fire. Based on this, the results show effectiveness in using the RF algorithm on medium spatial resolution images for fire detection in seasonally dry forests, such as the Cerrado.Keywords: Cerrado, fires, Random Forest.


2021 ◽  
Vol 5 (6) ◽  
pp. 1083-1089
Author(s):  
Nur Ghaniaviyanto Ramadhan

News is information disseminated by newspapers, radio, television, the internet, and other media. According to the survey results, there are many news titles from various topics spread on the internet. This of course makes newsreaders have difficulty when they want to find the desired news topic to read. These problems can be solved by grouping or so-called classification. The classification process is carried out of course by using a computerized process. This study aims to classify several news topics in Indonesian language using the KNN classification model and word2vec to convert words into vectors which aim to facilitate the classification process. The use of KNN in this study also determines the optimal K value to be used. In addition to using the classification model, this study also uses a word embedding-based model, namely word2vec. The results obtained using the word2vec and KNN models have an accuracy of 89.2% with a value of K=7. The word2vec and KNN models are also superior to the support vector machine, logistic regression, and random forest classification models.  


2021 ◽  
Vol 14 (4) ◽  
pp. 2277-2284
Author(s):  
AN. Nithyaa AN. Nithyaa1 ◽  
Prem Kumar R ◽  
Gokul .M Gokul .M ◽  
Geetha Aananthi C.

This paper aims to automate the detection of cancer using digital image processing techniques in MATLAB software. The analysis of white blood cells (WBC) is a powerful diagnostic tool for the prediction of Leukemia. The automatic detection of leukemia is a challenging task, which remains an unresolved problem in the medical imaging field. This Automation in Biological laboratories can be done by extracting the features of the blood film images taken from the digital microscopes and processed using MATLAB software. The aim of this approach is to discover the WBC cancer cells in an earlier stage and to reduce the discrepancies in diagnosis, by improving the system learning methodology. This paper presents the potent algorithm, which will eliminate the dubiety, in diagnosing the cancers with similar symptoms. This Algorithm concentrates on major WBC cancers, such as Acute Lymphocytic Leukemia, Acute Myeloid Leukemia, Chronic Lymphocytic Leukemia and Chronic Myeloid Leukemia. As they are life threatening diseases, rapid and precise differentiation is necessary in clinical settings. These cancers are categorized by segmentation and feature extraction, which will be further, classified using Random forest classification (RFC). RFC will classify the cancer using a decision tree learning method, which uses predictors at each node to make better decision.


2021 ◽  
Vol 9 ◽  
Author(s):  
Marina D. A. Scarpelli ◽  
Benoit Liquet ◽  
David Tucker ◽  
Susan Fuller ◽  
Paul Roe

High rates of biodiversity loss caused by human-induced changes in the environment require new methods for large scale fauna monitoring and data analysis. While ecoacoustic monitoring is increasingly being used and shows promise, analysis and interpretation of the big data produced remains a challenge. Computer-generated acoustic indices potentially provide a biologically meaningful summary of sound, however, temporal autocorrelation, difficulties in statistical analysis of multi-index data and lack of consistency or transferability in different terrestrial environments have hindered the application of those indices in different contexts. To address these issues we investigate the use of time-series motif discovery and random forest classification of multi-indices through two case studies. We use a semi-automated workflow combining time-series motif discovery and random forest classification of multi-index (acoustic complexity, temporal entropy, and events per second) data to categorize sounds in unfiltered recordings according to the main source of sound present (birds, insects, geophony). Our approach showed more than 70% accuracy in label assignment in both datasets. The categories assigned were broad, but we believe this is a great improvement on traditional single index analysis of environmental recordings as we can now give ecological meaning to recordings in a semi-automated way that does not require expert knowledge and manual validation is only necessary for a small subset of the data. Furthermore, temporal autocorrelation, which is largely ignored by researchers, has been effectively eliminated through the time-series motif discovery technique applied here for the first time to ecoacoustic data. We expect that our approach will greatly assist researchers in the future as it will allow large datasets to be rapidly processed and labeled, enabling the screening of recordings for undesired sounds, such as wind, or target biophony (insects and birds) for biodiversity monitoring or bioacoustics research.


2021 ◽  
Vol 1 (2) ◽  
Author(s):  
Van Anh TRAN ◽  
Thi Le LE ◽  
Nhu Hung NGUYEN ◽  
Thanh Nghi LE ◽  
Hong Hanh TRAN

Vietnam is an Asian country with hot and humid tropical climate throughout the year. Forestsaccount for more than 40% of the total land area and have a very rich and diverse vegetation.Monitoring the changes in the vegetation cover is obviously important yet challenging, considering suchlarge varying areas and climatic conditions. A traditional remote sensing technique to monitor thevegetation cover involves the use of optical satellite images. However, in presence of the cloud cover,the analyses done using optical satellite image are not reliable. In such a scenario, radar images are auseful alternative due to the ability of radar pulses in penetrating through the clouds, regardless of day ornight. In this study, we have used multi temporal C band satellite images to monitor vegetation coverchanges for an area in Dau Tieng and Ben Cat districts of Binh Duong province, Mekong Delta,Vietnam. With a collection of 46 images between March 2015 and February 2017, the changes of fiveland cover types including vegetation loss and replanting in 2017 were analyzed by selecting two cases,using 9 images in the dry season of 3 years 2015, 2016 and 2017 and using all of 46 images to conductRandom Forest classifier with 100, 200, 300 and 500 trees respectively. The result in which the modelwith nine images and 300 trees gave the best accuracy with an overall accuracy of 98.4% and a Kappaof 0.97. The results demonstrated that using VH polarization, Sentinel-1 gives quite a good accuracy forvegetation cover change. Therefore, Sentinel-1 can also be used to generate reliable land cover mapssuitable for different applications.


2021 ◽  
Vol 13 (24) ◽  
pp. 5098
Author(s):  
Alexander M. Melancon ◽  
Andrew L. Molthan ◽  
Robert E. Griffin ◽  
John R. Mecikalski ◽  
Lori A. Schultz ◽  
...  

In response to Hurricane Florence of 2018, NASA JPL collected quad-pol L-band SAR data with the Uninhabited Aerial Vehicle Synthetic Aperture Radar (UAVSAR) instrument, observing record-setting river stages across North and South Carolina. Fully-polarized SAR images allow for mapping of inundation extent at a high spatial resolution with a unique advantage over optical imaging, stemming from the sensor’s ability to penetrate cloud cover and dense vegetation. This study used random forest classification to generate maps of inundation from L-band UAVSAR imagery processed using the Freeman–Durden decomposition method. An average overall classification accuracy of 87% is achieved with this methodology, with areas of both under- and overprediction for the focus classes of open water and inundated forest. Fuzzy logic operations using hydrologic variables are used to reduce the number of small noise-like features and false detections in areas unlikely to retain water. Following postclassification refinement, estimated flood extents were combined to an event maximum for societal impact assessments. Results from the Hurricane Florence case study are discussed in addition to the limitations of available validation data for accuracy assessments.


PLoS ONE ◽  
2021 ◽  
Vol 16 (12) ◽  
pp. e0260394
Author(s):  
Abdur R. Khan ◽  
Wisnu A. Wicaksono ◽  
Natalia J. Ott ◽  
Amisha T. Poret-Peterson ◽  
Greg T. Browne

Successive orchard plantings of almond and other Prunus species exhibit reduced growth and yield in many California soils. This phenomenon, known as Prunus replant disease (PRD), can be prevented by preplant soil fumigation or anaerobic soil disinfestation, but its etiology is poorly understood and its incidence and severity are hard to predict. We report here on relationships among physicochemical variables, microbial community structure, and PRD induction in 25 diverse replant soils from California. In a greenhouse bioassay, soil was considered to be “PRD-inducing” when growth of peach seedlings in it was significantly increased by preplant fumigation and pasteurization, compared to an untreated control. PRD was induced in 18 of the 25 soils, and PRD severity correlated positively with soil exchangeable-K, pH, %clay, total %N, and electrical conductivity. The structure of bacterial, fungal, and oomycete communities differed significantly between the PRD-inducing and non-inducing soils, based on PERMANOVA of Bray Curtis dissimilarities. Bacterial class MB-A2-108 of phylum Actinobacteria had high relative abundances among PRD-inducing soils, while Bacteroidia were relatively abundant among non-inducing soils. Among fungi, many ASVs classified only to kingdom level were relatively abundant among PRD-inducing soils whereas ASVs of Trichoderma were relatively abundant among non-inducing soils. Random forest classification effectively discriminated between PRD-inducing and non-inducing soils, revealing many bacterial ASVs with high explanatory values. Random forest regression effectively accounted for PRD severity, with soil exchangeable-K and pH having high predictive value. Our work revealed several biotic and abiotic variables worthy of further examination in PRD etiology.


Sign in / Sign up

Export Citation Format

Share Document