scholarly journals Application of Random Forest Algorithm for Merging Multiple Satellite Precipitation Products across South Korea

2021 ◽  
Vol 13 (20) ◽  
pp. 4033
Author(s):  
Giang V. Nguyen ◽  
Xuan-Hien Le ◽  
Linh Nguyen Van ◽  
Sungho Jung ◽  
Minho Yeon ◽  
...  

Precipitation is a crucial component of the water cycle and plays a key role in hydrological processes. Recently, satellite-based precipitation products (SPPs) have provided grid-based precipitation with spatiotemporal variability. However, SPPs contain a lot of uncertainty in estimated precipitation, and the spatial resolution of these products is still relatively coarse. To overcome these limitations, this study aims to generate new grid-based daily precipitation based on a combination of rainfall observation data with multiple SPPs for the period of 2003–2017 across South Korea. A Random Forest (RF) machine-learning algorithm model was applied for producing a new merged precipitation product. In addition, several statistical linear merging methods have been adopted to compare with the results achieved from the RF model. To investigate the efficiency of RF, rainfall data from 64 observed Automated Synoptic Observation System (ASOS) installations were collected to analyze the accuracy of products through several continuous as well as categorical indicators. The new precipitation values produced by the merging procedure generally not only report higher accuracy than a single satellite rainfall product but also indicate that RF is more effective than the statistical merging method. Thus, the achievements from this study point out that the RF model might be applied for merging multiple satellite precipitation products, especially in sparse region areas.

Atmosphere ◽  
2021 ◽  
Vol 12 (5) ◽  
pp. 552
Author(s):  
Bu-Yo Kim ◽  
Joo Wan Cha ◽  
Ki-Ho Chang ◽  
Chulkyu Lee

In this study, the visibility of South Korea was predicted (VISRF) using a random forest (RF) model based on ground observation data from the Automated Synoptic Observing System (ASOS) and air pollutant data from the European Centre for Medium-Range Weather Forecasts (ECMWF) Copernicus Atmosphere Monitoring Service (CAMS) model. Visibility was predicted and evaluated using a training set for the period 2017–2018 and a test set for 2019. VISRF results were compared and analyzed using visibility data from the ASOS (VISASOS) and the Unified Model (UM) Local Data Assimilation and Prediction System (LDAPS) (VISLDAPS) operated by the Korea Meteorological Administration (KMA). Bias, root mean square error (RMSE), and correlation coefficients (R) for the VISASOS and VISLDAPS datasets were 3.67 km, 6.12 km, and 0.36, respectively, compared to 0.14 km, 2.84 km, and 0.81, respectively, for the VISASOS and VISRF datasets. Based on these comparisons, the applied RF model offers significantly better predictive performance and more accurate visibility data (VISRF) than the currently available VISLDAPS outputs. This modeling approach can be implemented by authorities to accurately estimate visibility and thereby reduce accidents, risks to public health, and economic losses, as well as inform on urban development policies and environmental regulations.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Sunhae Kim ◽  
Hye-Kyung Lee ◽  
Kounseok Lee

AbstractMinnesota Multiphasic Personality Inventory-2 (MMPI-2) is a widely used tool for early detection of psychological maladjustment and assessing the level of adaptation for a large group in clinical settings, schools, and corporations. This study aims to evaluate the utility of MMPI-2 in assessing suicidal risk using the results of MMPI-2 and suicidal risk evaluation. A total of 7,824 datasets collected from college students were analyzed. The MMPI-2-Resturcutred Clinical Scales (MMPI-2-RF) and the response results for each question of the Mini International Neuropsychiatric Interview (MINI) suicidality module were used. For statistical analysis, random forest and K-Nearest Neighbors (KNN) techniques were used with suicidal ideation and suicide attempt as dependent variables and 50 MMPI-2 scale scores as predictors. On applying the random forest method to suicidal ideation and suicidal attempts, the accuracy was 92.9% and 95%, respectively, and the Area Under the Curves (AUCs) were 0.844 and 0.851, respectively. When the KNN method was applied, the accuracy was 91.6% and 94.7%, respectively, and the AUCs were 0.722 and 0.639, respectively. The study confirmed that machine learning using MMPI-2 for a large group provides reliable accuracy in classifying and predicting the subject's suicidal ideation and past suicidal attempts.


Water ◽  
2021 ◽  
Vol 13 (9) ◽  
pp. 1237
Author(s):  
Vanesa Mateo Pérez ◽  
José Manuel Mesa Fernández ◽  
Joaquín Villanueva Balsera ◽  
Cristina Alonso Álvarez

The content of fats, oils, and greases (FOG) in wastewater, as a result of food preparation, both in homes and in different commercial and industrial activities, is a growing problem. In addition to the blockages generated in the sanitary networks, it also represents a difficulty for the performance of wastewater treatment plants (WWTP), increasing energy and maintenance costs and worsening the performance of downstream treatment processes. The pretreatment stage of these facilities is responsible for removing most of the FOG to avoid these problems. However, so far, optimization has been limited to the correct design and initial installation dimensioning. Proper management of this initial stage is left to the experience of the operators to adjust the process when changes occur in the characteristics of the wastewater inlet. The main difficulty is the large number of factors influencing these changes. In this work, a prediction model of the FOG content in the inlet water is presented. The model is capable of correctly predicting 98.45% of the cases in training and 72.73% in testing, with a relative error of 10%. It was developed using random forest (RF) and the good results obtained (R2 = 0.9348 and RMSE = 0.089 in test) will make it possible to improve operations in this initial stage. The good features of this machine learning algorithm had not been used, so far, in the modeling of pretreatment parameters. This novel approach will result in a global improvement in the performance of this type of facility allowing early adoption of adjustments to the pretreatment process to remove the maximum amount of FOG.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Prabina Kumar Meher ◽  
Anil Rai ◽  
Atmakuri Ramakrishna Rao

Abstract Background Localization of messenger RNAs (mRNAs) plays a crucial role in the growth and development of cells. Particularly, it plays a major role in regulating spatio-temporal gene expression. The in situ hybridization is a promising experimental technique used to determine the localization of mRNAs but it is costly and laborious. It is also a known fact that a single mRNA can be present in more than one location, whereas the existing computational tools are capable of predicting only a single location for such mRNAs. Thus, the development of high-end computational tool is required for reliable and timely prediction of multiple subcellular locations of mRNAs. Hence, we develop the present computational model to predict the multiple localizations of mRNAs. Results The mRNA sequences from 9 different localizations were considered. Each sequence was first transformed to a numeric feature vector of size 5460, based on the k-mer features of sizes 1–6. Out of 5460 k-mer features, 1812 important features were selected by the Elastic Net statistical model. The Random Forest supervised learning algorithm was then employed for predicting the localizations with the selected features. Five-fold cross-validation accuracies of 70.87, 68.32, 68.36, 68.79, 96.46, 73.44, 70.94, 97.42 and 71.77% were obtained for the cytoplasm, cytosol, endoplasmic reticulum, exosome, mitochondrion, nucleus, pseudopodium, posterior and ribosome respectively. With an independent test set, accuracies of 65.33, 73.37, 75.86, 72.99, 94.26, 70.91, 65.53, 93.60 and 73.45% were obtained for the respective localizations. The developed approach also achieved higher accuracies than the existing localization prediction tools. Conclusions This study presents a novel computational tool for predicting the multiple localization of mRNAs. Based on the proposed approach, an online prediction server “mLoc-mRNA” is accessible at http://cabgrid.res.in:8080/mlocmrna/. The developed approach is believed to supplement the existing tools and techniques for the localization prediction of mRNAs.


2021 ◽  
Vol 13 (11) ◽  
pp. 2211
Author(s):  
Shuo Xu ◽  
Jie Cheng ◽  
Quan Zhang

Land surface temperature (LST) is an important parameter for mirroring the water–heat exchange and balance on the Earth’s surface. Passive microwave (PMW) LST can make up for the lack of thermal infrared (TIR) LST caused by cloud contamination, but its resolution is relatively low. In this study, we developed a TIR and PWM LST fusion method on based the random forest (RF) machine learning algorithm to obtain the all-weather LST with high spatial resolution. Since LST is closely related to land cover (LC) types, terrain, vegetation conditions, moisture condition, and solar radiation, these variables were selected as candidate auxiliary variables to establish the best model to obtain the fusion results of mainland China during 2010. In general, the fusion LST had higher spatial integrity than the MODIS LST and higher accuracy than downscaled AMSR-E LST. Additionally, the magnitude of LST data in the fusion results was consistent with the general spatiotemporal variations of LST. Compared with in situ observations, the RMSE of clear-sky fused LST and cloudy-sky fused LST were 2.12–4.50 K and 3.45–4.89 K, respectively. Combining the RF method and the DINEOF method, a complete all-weather LST with a spatial resolution of 0.01° can be obtained.


Electronics ◽  
2021 ◽  
Vol 10 (12) ◽  
pp. 1400
Author(s):  
Sun Park ◽  
JongWon Kim

The strawberry market in South Korea is actually the largest market among horticultural crops. Strawberry cultivation in South Korea changed from field cultivation to facility cultivation in order to increase production. However, the decrease in production manpower due to aging is increasing the demand for the automation of strawberry cultivation. Predicting the harvest of strawberries is an important research topic, as strawberry production requires the most manpower for harvest. In addition, the growing environment has a great influence on strawberry production as hydroponic cultivation of strawberries is increasing. In this paper, we design and implement an integrated system that monitors strawberry hydroponic environmental data and determines when to harvest with the concept of IoT-Edge-AI-Cloud. The proposed monitoring system collects, stores and visualizes strawberry growing environment data. The proposed harvest decision system classifies the strawberry maturity level in images using a deep learning algorithm. The monitoring and analysis results are visualized in an integrated interface, which provides a variety of basic data for strawberry cultivation. Even if the strawberry cultivation area increases, the proposed system can be easily expanded and flexibly based on a virtualized container with the concept of IoT-Edge-AI-Cloud. The monitoring system was verified by monitoring a hydroponic strawberry environment for 4 months. In addition, the harvest decision system was verified using strawberry pictures acquired from Smart Berry Farm.


2018 ◽  
Vol 25 (1) ◽  
pp. 129-143 ◽  
Author(s):  
Guo-Yuan Lien ◽  
Daisuke Hotta ◽  
Eugenia Kalnay ◽  
Takemasa Miyoshi ◽  
Tse-Chun Chen

Abstract. To successfully assimilate data from a new observing system, it is necessary to develop appropriate data selection strategies, assimilating only the generally useful data. This development work is usually done by trial and error using observing system experiments (OSEs), which are very time and resource consuming. This study proposes a new, efficient methodology to accelerate the development using ensemble forecast sensitivity to observations (EFSO). First, non-cycled assimilation of the new observation data is conducted to compute EFSO diagnostics for each observation within a large sample. Second, the average EFSO conditionally sampled in terms of various factors is computed. Third, potential data selection criteria are designed based on the non-cycled EFSO statistics, and tested in cycled OSEs to verify the actual assimilation impact. The usefulness of this method is demonstrated with the assimilation of satellite precipitation data. It is shown that the EFSO-based method can efficiently suggest data selection criteria that significantly improve the assimilation results.


Entropy ◽  
2021 ◽  
Vol 23 (7) ◽  
pp. 859
Author(s):  
Abdulaziz O. AlQabbany ◽  
Aqil M. Azmi

We are living in the age of big data, a majority of which is stream data. The real-time processing of this data requires careful consideration from different perspectives. Concept drift is a change in the data’s underlying distribution, a significant issue, especially when learning from data streams. It requires learners to be adaptive to dynamic changes. Random forest is an ensemble approach that is widely used in classical non-streaming settings of machine learning applications. At the same time, the Adaptive Random Forest (ARF) is a stream learning algorithm that showed promising results in terms of its accuracy and ability to deal with various types of drift. The incoming instances’ continuity allows for their binomial distribution to be approximated to a Poisson(1) distribution. In this study, we propose a mechanism to increase such streaming algorithms’ efficiency by focusing on resampling. Our measure, resampling effectiveness (ρ), fuses the two most essential aspects in online learning; accuracy and execution time. We use six different synthetic data sets, each having a different type of drift, to empirically select the parameter λ of the Poisson distribution that yields the best value for ρ. By comparing the standard ARF with its tuned variations, we show that ARF performance can be enhanced by tackling this important aspect. Finally, we present three case studies from different contexts to test our proposed enhancement method and demonstrate its effectiveness in processing large data sets: (a) Amazon customer reviews (written in English), (b) hotel reviews (in Arabic), and (c) real-time aspect-based sentiment analysis of COVID-19-related tweets in the United States during April 2020. Results indicate that our proposed method of enhancement exhibited considerable improvement in most of the situations.


2021 ◽  
Vol 11 (13) ◽  
pp. 6237
Author(s):  
Azharul Islam ◽  
KyungHi Chang

Unstructured data from the internet constitute large sources of information, which need to be formatted in a user-friendly way. This research develops a model that classifies unstructured data from data mining into labeled data, and builds an informational and decision-making support system (DMSS). We often have assortments of information collected by mining data from various sources, where the key challenge is to extract valuable information. We observe substantial classification accuracy enhancement for our datasets with both machine learning and deep learning algorithms. The highest classification accuracy (99% in training, 96% in testing) was achieved from a Covid corpus which is processed by using a long short-term memory (LSTM). Furthermore, we conducted tests on large datasets relevant to the Disaster corpus, with an LSTM classification accuracy of 98%. In addition, random forest (RF), a machine learning algorithm, provides a reasonable 84% accuracy. This research’s main objective is to increase the application’s robustness by integrating intelligence into the developed DMSS, which provides insight into the user’s intent, despite dealing with a noisy dataset. Our designed model selects the random forest and stochastic gradient descent (SGD) algorithms’ F1 score, where the RF method outperforms by improving accuracy by 2% (to 83% from 81%) compared with a conventional method.


Sign in / Sign up

Export Citation Format

Share Document