scholarly journals Detection of Precipitation and Fog Using Machine Learning on Backscatter Data from Lidar Ceilometer

2020 ◽  
Vol 10 (18) ◽  
pp. 6452 ◽  
Author(s):  
Yong-Hyuk Kim ◽  
Seung-Hyun Moon ◽  
Yourim Yoon

The lidar ceilometer estimates cloud height by analyzing backscatter data. This study examines weather detectability using a lidar ceilometer by making an unprecedented attempt at detecting weather phenomena through the application of machine learning techniques to the backscatter data obtained from a lidar ceilometer. This study investigates the weather phenomena of precipitation and fog, which are expected to greatly affect backscatter data. In this experiment, the backscatter data obtained from the lidar ceilometer, CL51, installed in Boseong, South Korea, were used. For validation, the data from the automatic weather station for precipitation and visibility sensor PWD20 for fog, installed at the same location, were used. The experimental results showed potential for precipitation detection, which yielded an F1 score of 0.34. However, fog detection was found to be very difficult and yielded an F1 score of 0.10.

2018 ◽  
Vol 7 (3) ◽  
pp. 1136
Author(s):  
V Devasekhar ◽  
P Natarajan

Data Mining is an extraction of important knowledge from the various databases using different kinds of approaches. In the multi agent, distributed mining the knowledge aggregation is one of challenging task. This paper tries to optimize the problem of aggregation and boils down into the solution, which is derived based on the machine learning statistical features of each agents. However, in this paper a novel optimization algorithm called Multi-Agent Based Data Mining Aggregation (MABDA) is used for present day’s scenarios. The MBADA algorithm has agents which collect extracted knowledge and summarizes the various levels of agent’s cluster data into an aggregation with maximum accuracies. To prove the effectiveness of the proposed algorithm, the experimental results are compared with relatively existing methods. 


Risks ◽  
2021 ◽  
Vol 9 (2) ◽  
pp. 32
Author(s):  
Jaewon Park ◽  
Minsoo Shin ◽  
Wookjae Heo

The purpose of this study is to find the most important variables that represent the future projections of the Bank of International Settlements’ (BIS) capital adequacy ratio, which is the index of financial soundness in a bank as a comprehensive and important measure of capital adequacy. This study analyzed the past 12 years of data from all domestic banks in South Korea. The research data include all financial information, such as key operating indicators, major business activities, and general information of the financial supervisory service of South Korea from 2008 to 2019. In this study, machine learning techniques, Random Forest Boruta algorithms, Random Forest Recursive Feature Elimination, and Bayesian Regularization Neural Networks (BRNN) were utilized. Among 1929 variables, this study found 38 most important variables for representing the BIS capital adequacy ratio. An additional comparison was executed to confirm the statistical validity of future prediction performance between BRNN and ordinary least squares (OLS) models. BRNN predicted the BIS capital adequacy ratio more robustly and accurately than the OLS models. We believe our findings would appeal to the readership of your journal such as the policymakers, managers and practitioners in the bank-related fields because this study highlights the key findings from the data-driven approaches using machine learning techniques.


Information ◽  
2021 ◽  
Vol 12 (12) ◽  
pp. 528
Author(s):  
David Opeoluwa Oyewola ◽  
Emmanuel Gbenga Dada ◽  
Sanjay Misra ◽  
Robertas Damaševičius

The application of machine learning techniques to the epidemiology of COVID-19 is a necessary measure that can be exploited to curtail the further spread of this endemic. Conventional techniques used to determine the epidemiology of COVID-19 are slow and costly, and data are scarce. We investigate the effects of noise filters on the performance of machine learning algorithms on the COVID-19 epidemiology dataset. Noise filter algorithms are used to remove noise from the datasets utilized in this study. We applied nine machine learning techniques to classify the epidemiology of COVID-19, which are bagging, boosting, support vector machine, bidirectional long short-term memory, decision tree, naïve Bayes, k-nearest neighbor, random forest, and multinomial logistic regression. Data from patients who contracted coronavirus disease were collected from the Kaggle database between 23 January 2020 and 24 June 2020. Noisy and filtered data were used in our experiments. As a result of denoising, machine learning models have produced high results for the prediction of COVID-19 cases in South Korea. For isolated cases after performing noise filtering operations, machine learning techniques achieved an accuracy between 98–100%. The results indicate that filtering noise from the dataset can improve the accuracy of COVID-19 case prediction algorithms.


Author(s):  
Aleksandar Haber ◽  
Francesco Pecora ◽  
Mobin Uddin Chowdhury ◽  
Melvin Summerville

Abstract Identification, estimation, and control of temperature dynamics are ubiquitous and challenging control engineering problems. The main challenges originate from the fact that the temperature dynamics is usually infinite dimensional, nonlinear, and coupled with other physical processes. Furthermore, the dominant system time constants are often long, and due to various time constraints that limit the measurement time, we are only able to collect a relatively small number of input-output data samples. Motivated by these challenges, in this paper we present experimental results of identifying the temperature dynamics using subspace and machine learning techniques. We have developed an experimental setup consisting of an aluminum bar whose temperature is controlled by four heat actuators and sensed by seven thermocouples. We address noise reduction, experiment design, model structure selection, and overfitting problems. Our experimental results show that the temperature dynamics of the experimental setup can be relatively accurately represented by low-order models.


2021 ◽  
Author(s):  
Luciano V. B. Espiridião ◽  
Laura L. Dias ◽  
Anderson A. Ferreira

Author name ambiguity is one of the most challenging issues that can compromise the information quality in a scholarly digital library. For years, researchers have been searched for solutions to solve such a problem. Despite the many methods already proposed, the question remains open. In this study, we address the issue of producing a more accurate disambiguation function by means of applying data augmentation in the set of data training. We also propose a SyGAR-based data augmentation approach and evaluate our proposal on three collections commonly used in works about author name disambiguation task. The experimental results showed scenarios where improvements are possible in the author name disambiguation task. The proposal of data augmentation outperforms other data augmentation approach, as well as improves some machine learning techniques that were not specifically designed for the author name disambiguation task.


2021 ◽  
Vol 13 (9) ◽  
pp. 4986
Author(s):  
Imatitikua D. Aiyanyo ◽  
Hamman Samuel ◽  
Heuiseok Lim

In this study, we qualitatively and quantitatively examine the effects of COVID-19 on classrooms, students, and educators. Using a new Twitter dataset specific to South Korea during the pandemic, we sample the sentiment and strain on students and educators using applied machine learning techniques in order to identify various topical pain points emerging during the pandemic. Our contributions include a novel and open source geo-fenced dataset on student and educator opinion within South Korea that we are making available to other researchers as well. We also identify trends in sentiment and polarity over the pandemic timeline, as well as key drivers behind the sentiments. Moreover, we provide a comparative analysis of two widely used pre-trained sentiment analysis approaches with TextBlob and VADER using statistical significance tests. Ultimately, we analyze how public opinion shifted on the pandemic in terms of positive sentiments about accessing course materials, online support communities, access to classes, and creativity, to negative sentiments about mental fatigue, job loss, student concerns, and overwhelmed institutions. We also initiate initial discussions about the concept of actionable sentiment analysis by overlapping polarity with the concept of trigger management to assist users in coping with negative emotions. We hope that insights from this preliminary study can promote further utilization of social media datasets to evaluate government messaging, population sentiment, and multi-dimensional analysis of pandemics.


Drones ◽  
2021 ◽  
Vol 5 (2) ◽  
pp. 31
Author(s):  
Bonggeun Song ◽  
Kyunghun Park

Since outdoor compost piles (OCPs) contain large amounts of nitrogen and phosphorus, they act as a major pollutant that deteriorates water quality, such as eutrophication and green algae, when the OCPs enter the river during rainfall. In South Korea, OCPs are frequently used, but there is a limitation that a lot of manpower and budget are consumed to investigate the current situation, so it is necessary to efficiently investigate the OCPs. This study compared the accuracy of various machine learning techniques for the efficient detection and management of outdoor compost piles (OCPs), a non-point pollution source in agricultural areas in South Korea, using unmanned aerial vehicle (UAV) images. RGB, multispectral, and thermal infrared UAV images were taken in August and October 2019. Additionally, vegetation indices (NDVI, NDRE, ENDVI, and GNDVI) and surface temperature were also considered. Four machine learning techniques, including support vector machine (SVM), decision tree (DT), random forest (RF), and k-NN, were implemented, and the machine learning technique with the highest accuracy was identified by adjusting several variables. The accuracy of all machine learning techniques was very high, reaching values of up to 0.96. Particularly, the accuracy of the RF method with the number of estimators set to 10 was highest, reaching 0.989 in August and 0.987 in October. The proposed method allows for the prediction of OCP location and area over large regions, thereby foregoing the need for OCP field measurements. Therefore, our findings provide highly useful data for the improvement of OCP management strategies and water quality.


2021 ◽  
Vol 30 (3) ◽  
pp. 1-38
Author(s):  
Yanjie Zhao ◽  
Li Li ◽  
Haoyu Wang ◽  
Haipeng Cai ◽  
Tegawendé F. Bissyandé ◽  
...  

Malware detection at scale in the Android realm is often carried out using machine learning techniques. State-of-the-art approaches such as DREBIN and MaMaDroid are reported to yield high detection rates when assessed against well-known datasets. Unfortunately, such datasets may include a large portion of duplicated samples, which may bias recorded experimental results and insights. In this article, we perform extensive experiments to measure the performance gap that occurs when datasets are de-duplicated. Our experimental results reveal that duplication in published datasets has a limited impact on supervised malware classification models. This observation contrasts with the finding of Allamanis on the general case of machine learning bias for big code. Our experiments, however, show that sample duplication more substantially affects unsupervised learning models (e.g., malware family clustering). Nevertheless, we argue that our fellow researchers and practitioners should always take sample duplication into consideration when performing machine-learning-based (via either supervised or unsupervised learning) Android malware detections, no matter how significant the impact might be.


2006 ◽  
Author(s):  
Christopher Schreiner ◽  
Kari Torkkola ◽  
Mike Gardner ◽  
Keshu Zhang

Sign in / Sign up

Export Citation Format

Share Document