Lightning Strike Location Identification Based on 3D Weather Radar Data

Lightning is an instantaneous, intense, and convective weather phenomenon that can produce great destructive power and easily cause serious economic losses and casualties. It always occurs in convective storms with small spatial scales and short life cycles. Weather radar is one of the best operational instruments that can monitor the detailed 3D structures of convective storms at high spatial and temporal resolutions. Thus, extracting the features related to lightning automatically from 3D weather radar data to identify lightning strike locations would significantly benefit future lightning predictions. This article makes a bold attempt to apply three-dimensional radar data to identify lightning strike locations, thereby laying the foundation for the subsequent accurate and real-time prediction of lightning locations. First, that issue is transformed into a binary classification problem. Then, a suitable dataset for the recognition of lightning strike locations based on 3D radar data is constructed for system training and evaluation purposes. Furthermore, the machine learning methods of a convolutional neural network, logistic regression, a random forest, and k-nearest neighbors are employed to carry out experiments. The results show that the convolutional neural network has the best performance in identifying lightning strike locations. This technique is followed by the random forest and k-nearest neighbors, and the logistic regression produces the worst manifestation.

Download Full-text

Convolutional Neural Network for Convective Storm Nowcasting Using 3-D Doppler Weather Radar Data

IEEE Transactions on Geoscience and Remote Sensing ◽

10.1109/tgrs.2019.2948070 ◽

2020 ◽

Vol 58 (2) ◽

pp. 1487-1495 ◽

Cited By ~ 3

Author(s):

Lei Han ◽

Juanzhen Sun ◽

Wei Zhang

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Weather Radar ◽

Radar Data ◽

Doppler Weather Radar ◽

Convective Storm

Download Full-text

Analisis Perbandingan Algoritma SVM, KNN, dan CNN untuk Klasifikasi Citra Cuaca

Jurnal Teknologi Informasi dan Ilmu Komputer ◽

10.25126/jtiik.2021824553 ◽

2021 ◽

Vol 8 (2) ◽

pp. 311

Author(s):

Mohammad Farid Naufal

Keyword(s):

Neural Network ◽

Machine Learning ◽

Computer Vision ◽

Support Vector Machine ◽

Convolutional Neural Network ◽

Cross Validation ◽

Nearest Neighbors ◽

Support Vector ◽

Classification Algorithms ◽

K Nearest Neighbors

Cuaca merupakan faktor penting yang dipertimbangkan untuk berbagai pengambilan keputusan. Klasifikasi cuaca manual oleh manusia membutuhkan waktu yang lama dan inkonsistensi. Computer vision adalah cabang ilmu yang digunakan komputer untuk mengenali atau melakukan klasifikasi citra. Hal ini dapat membantu pengembangan self autonomous machine agar tidak bergantung pada koneksi internet dan dapat melakukan kalkulasi sendiri secara real time. Terdapat beberapa algoritma klasifikasi citra populer yaitu K-Nearest Neighbors (KNN), Support Vector Machine (SVM), dan Convolutional Neural Network (CNN). KNN dan SVM merupakan algoritma klasifikasi dari Machine Learning sedangkan CNN merupakan algoritma klasifikasi dari Deep Neural Network. Penelitian ini bertujuan untuk membandingkan performa dari tiga algoritma tersebut sehingga diketahui berapa gap performa diantara ketiganya. Arsitektur uji coba yang dilakukan adalah menggunakan 5 cross validation. Beberapa parameter digunakan untuk mengkonfigurasikan algoritma KNN, SVM, dan CNN. Dari hasil uji coba yang dilakukan CNN memiliki performa terbaik dengan akurasi 0.942, precision 0.943, recall 0.942, dan F1 Score 0.942. AbstractWeather is an important factor that is considered for various decision making. Manual weather classification by humans is time consuming and inconsistent. Computer vision is a branch of science that computers use to recognize or classify images. This can help develop self-autonomous machines so that they are not dependent on an internet connection and can perform their own calculations in real time. There are several popular image classification algorithms, namely K-Nearest Neighbors (KNN), Support Vector Machine (SVM), and Convolutional Neural Network (CNN). KNN and SVM are Machine Learning classification algorithms, while CNN is a Deep Neural Networks classification algorithm. This study aims to compare the performance of that three algorithms so that the performance gap between the three is known. The test architecture is using 5 cross validation. Several parameters are used to configure the KNN, SVM, and CNN algorithms. From the test results conducted by CNN, it has the best performance with 0.942 accuracy, 0.943 precision, 0.942 recall, and F1 Score 0.942.

Download Full-text

Using Item Response Theory for Explainable Machine Learning in Predicting Mortality in the Intensive Care Unit: Case-Based Approach

Journal of Medical Internet Research ◽

10.2196/20268 ◽

2020 ◽

Vol 22 (9) ◽

pp. e20268

Author(s):

Adrienne Kline ◽

Theresa Kline ◽

Zahra Shakeri Hossein Abad ◽

Joon Lee

Keyword(s):

Neural Network ◽

Machine Learning ◽

Intensive Care Unit ◽

Logistic Regression ◽

Intensive Care ◽

Item Response ◽

Nearest Neighbors ◽

K Nearest Neighbors ◽

Linear Discriminant ◽

Case Based

Background Supervised machine learning (ML) is being featured in the health care literature with study results frequently reported using metrics such as accuracy, sensitivity, specificity, recall, or F1 score. Although each metric provides a different perspective on the performance, they remain to be overall measures for the whole sample, discounting the uniqueness of each case or patient. Intuitively, we know that all cases are not equal, but the present evaluative approaches do not take case difficulty into account. Objective A more case-based, comprehensive approach is warranted to assess supervised ML outcomes and forms the rationale for this study. This study aims to demonstrate how the item response theory (IRT) can be used to stratify the data based on how difficult each case is to classify, independent of the outcome measure of interest (eg, accuracy). This stratification allows the evaluation of ML classifiers to take the form of a distribution rather than a single scalar value. Methods Two large, public intensive care unit data sets, Medical Information Mart for Intensive Care III and electronic intensive care unit, were used to showcase this method in predicting mortality. For each data set, a balanced sample (n=8078 and n=21,940, respectively) and an imbalanced sample (n=12,117 and n=32,910, respectively) were drawn. A 2-parameter logistic model was used to provide scores for each case. Several ML algorithms were used in the demonstration to classify cases based on their health-related features: logistic regression, linear discriminant analysis, K-nearest neighbors, decision tree, naive Bayes, and a neural network. Generalized linear mixed model analyses were used to assess the effects of case difficulty strata, ML algorithm, and the interaction between them in predicting accuracy. Results The results showed significant effects (P<.001) for case difficulty strata, ML algorithm, and their interaction in predicting accuracy and illustrated that all classifiers performed better with easier-to-classify cases and that overall the neural network performed best. Significant interactions suggest that cases that fall in the most arduous strata should be handled by logistic regression, linear discriminant analysis, decision tree, or neural network but not by naive Bayes or K-nearest neighbors. Conventional metrics for ML classification have been reported for methodological comparison. Conclusions This demonstration shows that using the IRT is a viable method for understanding the data that are provided to ML algorithms, independent of outcome measures, and highlights how well classifiers differentiate cases of varying difficulty. This method explains which features are indicative of healthy states and why. It enables end users to tailor the classifier that is appropriate to the difficulty level of the patient for personalized medicine.

Download Full-text

PREDIKCIJA POZICIJE FUDBALSKOG IGRAČA UPOTREBOM ALGORITAMA MAŠINSKOG UČENJA

Zbornik radova Fakulteta tehničkih nauka u Novom Sadu ◽

10.24867/13be31skiljevic ◽

2021 ◽

Vol 36 (07) ◽

pp. 1267-1270

Author(s):

Aleksandar Kovačević ◽

Dragan Škiljević

Keyword(s):

Logistic Regression ◽

Random Forest ◽

Naive Bayes ◽

Nearest Neighbors ◽

Naïve Bayes ◽

K Nearest Neighbors

Fudbal je kolektivni sport koji se igra između dvije ekipe, sa po jedanaest igrača. Iako igrači igraju na unaprijed određenoj poziciji, oni mogu lako preći i na neku drugu poziciju. U ovome radu je vršena predikcija najbolje pozicije igrača na osnovu njegovih fizičkih i psihičkih osobina. Osnovni motiv ovoga rada jeste olakšavanje posla fubalskim stručnjacima koji se profesionalno bave svojim poslom. Rješenje ovoga projekta bi u velikoj mjeri olakšalo posao trenerima čiji klubovi se susreću sa mnoštvom povreda, pa je potrebno često vršiti promjenu formacije tima. To bi pomoglo da se u maksimalnoj mjeri iskoristi potencijal svakog igrača. Da bi se što lakše odredila pozicija na kojoj će određeni igrač igrati, u ovom radu, koristićemo skup podataka sa 65 atributa za svakog igrača, na osnovu kojih će se određivati pozicija uz pomoć obučavanja sledećih modela: Multiomial Logistic Regression, K-Nearest Neighbors, Random Forest, Gaussian Naive Bayes, Suport Vector Machine.

Download Full-text

Development of a weed detection system using machine learning and neural network algorithms

Eastern-European Journal of Enterprise Technologies ◽

10.15587/1729-4061.2021.246706 ◽

2021 ◽

Vol 6 (2 (114)) ◽

Author(s):

Baydaulet Urmashev ◽

Zholdas Buribayev ◽

Zhazira Amirgaliyeva ◽

Aisulu Ataniyazova ◽

Mukhtar Zhassuzak ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Random Forest ◽

Decision Tree ◽

Detection System ◽

Learning Algorithms ◽

Nearest Neighbors ◽

Plant Diseases ◽

Weed Detection ◽

K Nearest Neighbors

The detection of weeds at the stages of cultivation is very important for detecting and preventing plant diseases and eliminating significant crop losses, and traditional methods of performing this process require large costs and human resources, in addition to exposing workers to the risk of contamination with harmful chemicals. To solve the above tasks, also in order to save herbicides and pesticides, to obtain environmentally friendly products, a program for detecting agricultural pests using the classical K-Nearest Neighbors, Random Forest and Decision Tree algorithms, as well as YOLOv5 neural network, is proposed. After analyzing the geographical areas of the country, from the images of the collected weeds, a proprietary database with more than 1000 images for each class was formed. A brief review of the researchers' scientific papers describing the methods they developed for identifying, classifying and discriminating weeds based on machine learning algorithms, convolutional neural networks and deep learning algorithms is given. As a result of the research, a weed detection system based on the YOLOv5 architecture was developed and quality estimates of the above algorithms were obtained. According to the results of the assessment, the accuracy of weed detection by the K-Nearest Neighbors, Random Forest and Decision Tree classifiers was 83.3 %, 87.5 %, and 80 %. Due to the fact that the images of weeds of each species differ in resolution and level of illumination, the results of the neural network have corresponding indicators in the intervals of 0.82–0.92 for each class. Quantitative results obtained on real data demonstrate that the proposed approach can provide good results in classifying low-resolution images of weeds.

Download Full-text

Computação em Nuvem e Aprendizado de Máquina para Análise de Grandes Volumes de Dados Educacionais

10.5753/eniac.2020.12117 ◽

2020 ◽

Author(s):

Francisco Neto ◽

Romero Silva ◽

Roberta Gouveia ◽

Maria Batista ◽

Igor Oliveira

Keyword(s):

Logistic Regression ◽

Random Forest ◽

Decision Tree ◽

Nearest Neighbors ◽

Apache Spark ◽

K Nearest Neighbors

Este artigo descreve a aplicação de aprendizado de máquina supervisionado e não supervisionado em grandes volumes de dados abertos governamentais do INEP, por meio dos algoritmos K-Nearest Neighbors, Logistic Regression, Decision Tree, Random Forest e K-means. A metodologia fundamenta-se nos processos CRISP-DM e KDD, sendo necessária a utilização da plataforma em nuvem DataBricks, além das tecnologias de clusters Hadoop e Apache Spark. Tais tecnologias proporcionaram alto poder de processamento para execução dos experimentos, o que viabilizou a avaliação de desempenho dos modelos e a descoberta de conhecimento da educação básica brasileira.

Download Full-text

A Machine Learning-based System for Financial Fraud Detection

10.5753/eniac.2021.18250 ◽

2021 ◽

Author(s):

João Paulo A. Andrade ◽

Leonardo S. Paulucio ◽

Thiago M. Paixão ◽

Rodrigo F. Berriel ◽

Teresa Cristina Janes Carneiro ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Random Forest ◽

Nearest Neighbors ◽

Financial Data ◽

Support Vector ◽

Financial Fraud ◽

K Nearest Neighbors ◽

Governmental Agencies ◽

A Company

Companies created for money-laundering or as a means for taxevasion are harmful to the country's economy and society. This problem is usually tackled by governmental agencies by having officials to pore over companies' financial data and to single out those that exhibit fraudulent behavior. Such work tends to be slow-paced and tedious. This paper proposes a machine learning-based system capable of classifying whether a company is likely to be involved in fraud or not. Based on financial and tax data from various companies, four different classifiers – k-Nearest Neighbors, Random Forest, Support Vector Machine (SVM), and a Neural Network – were trained and then used to indicate fraud. The best-performing model achieved a macro-averaged F1-score of 92.98% with the Random Forest.

Download Full-text

Financial Fraud Detection in Healthcare Using Machine Learning and Deep Learning Techniques

Security and Communication Networks ◽

10.1155/2021/9293877 ◽

2021 ◽

Vol 2021 ◽

pp. 1-8

Author(s):

Abolfazl Mehbodniya ◽

Izhar Alam ◽

Sagar Pande ◽

Rahul Neware ◽

Kantilal Pitambar Rane ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Logistic Regression ◽

Deep Learning ◽

Random Forest ◽

Convolutional Neural Network ◽

Nearest Neighbor ◽

Credit Cards ◽

Healthcare Sector ◽

K Nearest Neighbor

Healthcare sector is one of the prominent sectors in which a lot of data can be collected not only in terms of health but also in terms of finances. Major frauds happen in the healthcare sector due to the utilization of credit cards as the continuous enhancement of electronic payments, and credit card fraud monitoring has been a challenge in terms of financial condition to the different service providers. Hence, continuous enhancement is necessary for the system for detecting frauds. Various fraud scenarios happen continuously, which has a massive impact on financial losses. Many technologies such as phishing or virus-like Trojans are mostly used to collect sensitive information about credit cards and their owner details. Therefore, efficient technology should be there for identifying the different types of fraudulent conduct in credit cards. In this paper, various machine learning and deep learning approaches are used for detecting frauds in credit cards and different algorithms such as Naive Bayes, Logistic Regression, K-Nearest Neighbor (KNN), Random Forest, and the Sequential Convolutional Neural Network are skewed for training the other standard and abnormal features of transactions for detecting the frauds in credit cards. For evaluating the accuracy of the model, publicly available data are used. The different algorithm results visualized the accuracy as 96.1%, 94.8%, 95.89%, 97.58%, and 92.3%, corresponding to various methodologies such as Naive Bayes, Logistic Regression, K-Nearest Neighbor (KNN), Random Forest, and the Sequential Convolutional Neural Network, respectively. The comparative analysis visualized that the KNN algorithm generates better results than other approaches.

Download Full-text

Using Item Response Theory for Explainable Machine Learning in Predicting Mortality in the Intensive Care Unit: Case-Based Approach (Preprint)

10.2196/preprints.20268 ◽

2020 ◽

Author(s):

Adrienne Kline ◽

Theresa Kline ◽

Zahra Shakeri Hossein Abad ◽

Joon Lee

Keyword(s):

Neural Network ◽

Machine Learning ◽

Intensive Care Unit ◽

Logistic Regression ◽

Intensive Care ◽

Item Response ◽

Nearest Neighbors ◽

K Nearest Neighbors ◽

Linear Discriminant ◽

Case Based

BACKGROUND Supervised machine learning (ML) is being featured in the health care literature with study results frequently reported using metrics such as accuracy, sensitivity, specificity, recall, or F1 score. Although each metric provides a different perspective on the performance, they remain to be overall measures for the whole sample, discounting the uniqueness of each case or patient. Intuitively, we know that all cases are not equal, but the present evaluative approaches do not take case difficulty into account. OBJECTIVE A more case-based, comprehensive approach is warranted to assess supervised ML outcomes and forms the rationale for this study. This study aims to demonstrate how the item response theory (IRT) can be used to stratify the data based on how difficult each case is to classify, independent of the outcome measure of interest (eg, accuracy). This stratification allows the evaluation of ML classifiers to take the form of a distribution rather than a single scalar value. METHODS Two large, public intensive care unit data sets, Medical Information Mart for Intensive Care III and electronic intensive care unit, were used to showcase this method in predicting mortality. For each data set, a balanced sample (n=8078 and n=21,940, respectively) and an imbalanced sample (n=12,117 and n=32,910, respectively) were drawn. A 2-parameter logistic model was used to provide scores for each case. Several ML algorithms were used in the demonstration to classify cases based on their health-related features: logistic regression, linear discriminant analysis, K-nearest neighbors, decision tree, naive Bayes, and a neural network. Generalized linear mixed model analyses were used to assess the effects of case difficulty strata, ML algorithm, and the interaction between them in predicting accuracy. RESULTS The results showed significant effects (P<.001) for case difficulty strata, ML algorithm, and their interaction in predicting accuracy and illustrated that all classifiers performed better with easier-to-classify cases and that overall the neural network performed best. Significant interactions suggest that cases that fall in the most arduous strata should be handled by logistic regression, linear discriminant analysis, decision tree, or neural network but not by naive Bayes or K-nearest neighbors. Conventional metrics for ML classification have been reported for methodological comparison. CONCLUSIONS This demonstration shows that using the IRT is a viable method for understanding the data that are provided to ML algorithms, independent of outcome measures, and highlights how well classifiers differentiate cases of varying difficulty. This method explains which features are indicative of healthy states and why. It enables end users to tailor the classifier that is appropriate to the difficulty level of the patient for personalized medicine.

Download Full-text

Obtención de un modelo de minería de datos aplicado a la deserción universitaria del programa de Ingeniería de Sistemas de la Universidad de Cundinamarca

Revista Ontare ◽

10.21158/23823399.v7.n0.2019.2676 ◽

2020 ◽

Vol 7 ◽

Author(s):

Holmes Yesid Ayala-Yaguara ◽

Gina Maribel Valenzuela-Sabogal ◽

Alexander Espinosa-García

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Support Vector Machines ◽

Random Forest ◽

Nearest Neighbors ◽

Knowledge Discovery In Databases ◽

Support Vector ◽

K Nearest Neighbors ◽

Vector Machines ◽

Feature Importance

En el presente artículo se describe la obtención de un modelo de minería de datos aplicado al problema de la deserción universitaria en el programa de Ingeniería de Sistemas de la Universidad de Cundinamarca, extensión Facatativá. El modelo se estructuró mediante la metodología de minería de datos KDD (knowledge discovery in databases) haciendo uso del lenguaje de programación Python, la librería de procesamiento de datos Pandas y de machine learning Sklearn. Para el proceso se tuvieron en cuenta problemas adicionales al proceso de minería, como, por ejemplo, la alta dimensionalidad, por lo cual se aplicaron los métodos de selección de las variables estadístico univariado, feature importance y SelectFromModel (Sklearn). En el proyecto se seleccionaron cinco técnicas de minería de datos para evaluarlas: vecinos más cercanos (K nearest neighbors, KNN), árboles de decisión (decision tree, DT), árboles aleatorios (random forest, RF), regresión logística (logistic regression, LR) y máquinas de vectores soporte (support vector machines, SVM). Respecto a la selección del modelo final se evaluaron los resultados de cada modelo en las métricas de precisión, matriz de confusión y métricas adicionales de la matriz de confusión. Por último, se ajustaron los parámetros del modelo seleccionado y se evaluó la generalización del modelo al graficar su curva de aprendizaje.

Download Full-text