Intelligent methods for improving the accuracy of prediction of rare hazardous events in railway transportation

The paper Aims to examine various approaches to the ways of improving the quality of predictions and classification of unbalanced data that allow improving the accuracy of rare event classification. When predicting the onset of rare events using machine learning techniques, researchers face the problem of inconsistency between the quality of trained models and their actual ability to correctly predict the occurrence of a rare event. The paper examines model training under unbalanced initial data. The subject of research is the information on incidents and hazardous events at railway power supply facilities. The problem of unbalanced data is expressed in the noticeable imbalance between the types of observed events, i.e., the numbers of instances. Methods. While handling unbalanced data, depending on the nature of the problem at hand, the quality and size of the initial data, various Data Science-based techniques of improving the quality of classification models and prediction are used. Some of those methods are focused on attributes and parameters of classification models. Those include FAST, CFS, fuzzy classifiers, GridSearchCV, etc. Another group of methods is oriented towards generating representative subsets out of initial datasets, i.e., samples. Data sampling techniques allow examining the effect of class proportions on the quality of machine learning. In particular, in this paper, the NearMiss method is considered in detail. Results. The problem of class imbalance in respect to the analysis of the number of incidents at railway facilities has existed since 2015. Despite the decreasing share of hazardous events at railway power supply facilities in the three years since 2018, an increase in the number of such events cannot be ruled out. Monthly statistics of hazardous event distribution exhibit no trend for declines and peaks. In this context, the optimal period of observation of the number of incidents and hazardous events is a month. A visualization of the class ratio has shown the absence of a clear boundary between the members of the majority class (incidents) and those of the minority class (hazardous events). The class ratio was studied in two and three dimensions, in actual values and using the method of main components. Such “proximity” of classes is one of the causes of wrong predictions. In this paper, the authors analysed past research of the ways of improving the quality of machine learning based on unbalanced data. The terms that describe the degree of class imbalances have been defined and clarified. The strengths and weaknesses of 50 various methods of handling such data were studied and set forth. Out of the set of methods of handling the numbers of class members as part of the classification (prediction of the occurrence) of rare hazardous events in railway transportation, the NearMiss method was chosen. It allows experimenting with the ratios and methods of selecting class members. As the results of a series of experiments, the accuracy of rare hazardous event classification was improved from 0 to 70-90%.

Download Full-text

Hybrid features prediction model of movie quality using Multi-machine learning techniques for effective business resource planning

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-201844 ◽

2021 ◽

Vol 40 (5) ◽

pp. 9361-9382 ◽

Cited By ~ 1

Author(s):

Naeem Iqbal ◽

Rashid Ahmad ◽

Faisal Jamil ◽

Do-Hyeun Kim

Keyword(s):

Machine Learning ◽

Social Media ◽

Resource Planning ◽

Experimental Results ◽

Quality Prediction ◽

Classification Models ◽

Hybrid Features ◽

Social Media Data ◽

Media Data

Quality prediction plays an essential role in the business outcome of the product. Due to the business interest of the concept, it has extensively been studied in the last few years. Advancement in machine learning (ML) techniques and with the advent of robust and sophisticated ML algorithms, it is required to analyze the factors influencing the success of the movies. This paper presents a hybrid features prediction model based on pre-released and social media data features using multiple ML techniques to predict the quality of the pre-released movies for effective business resource planning. This study aims to integrate pre-released and social media data features to form a hybrid features-based movie quality prediction (MQP) model. The proposed model comprises of two different experimental models; (i) predict movies quality using the original set of features and (ii) develop a subset of features based on principle component analysis technique to predict movies success class. This work employ and implement different ML-based classification models, such as Decision Tree (DT), Support Vector Machines with the linear and quadratic kernel (L-SVM and Q-SVM), Logistic Regression (LR), Bagged Tree (BT) and Boosted Tree (BOT), to predict the quality of the movies. Different performance measures are utilized to evaluate the performance of the proposed ML-based classification models, such as Accuracy (AC), Precision (PR), Recall (RE), and F-Measure (FM). The experimental results reveal that BT and BOT classifiers performed accurately and produced high accuracy compared to other classifiers, such as DT, LR, LSVM, and Q-SVM. The BT and BOT classifiers achieved an accuracy of 90.1% and 89.7%, which shows an efficiency of the proposed MQP model compared to other state-of-art- techniques. The proposed work is also compared with existing prediction models, and experimental results indicate that the proposed MQP model performed slightly better compared to other models. The experimental results will help the movies industry to formulate business resources effectively, such as investment, number of screens, and release date planning, etc.

Download Full-text

ROC curve, lift chart and calibration plot

Advances in Methodology and Statistics ◽

10.51936/noqf3710 ◽

2006 ◽

Vol 3 (1) ◽

Author(s):

Miha Vuk ◽

Tomaž Curk

Keyword(s):

Machine Learning ◽

Data Mining ◽

Roc Curve ◽

Classification Accuracy ◽

Empirical Evaluation ◽

Calibration Plot ◽

Mathematical Framework ◽

Classification Models ◽

Classification Quality

This paper presents ROC curve, lift chart and calibration plot, three well known graphical techniques that are useful for evaluating the quality of classification models used in data mining and machine learning. Each technique, normally used and studied separately, defines its own measure of classification quality and its visualization. Here, we give a brief survey of the methods and establish a common mathematical framework which adds some new aspects, explanations and interrelations between these techniques. We conclude with an empirical evaluation and a few examples on how to use the presented techniques to boost classification accuracy.

Download Full-text

Data Balancing Method for Training Segmentation Neural Networks

10.51130/graphicon-2020-2-4-19 ◽

2020 ◽

pp. short19-1-short19-9

Author(s):

Alexey Kochkarev ◽

Alexander Khvostikov ◽

Dmitry Korshunov ◽

Andrey Krylov ◽

Mikhail Boguslavskiy

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Training Data ◽

Learning Ability ◽

Unbalanced Data ◽

Distance Transform ◽

Machine Learning Model ◽

Overall Performance ◽

Medical Dataset

Data imbalance is a common problem in machine learning and image processing. The lack of training data for the rarest classes can lead to worse learning ability and negatively affect the quality of segmentation. In this paper, we focus on the problem of data balancing for the task of image segmentation. We review major trends in handling unbalanced data and propose a new method for data balancing, based on Distance Transform. This method is designed for using in segmentation convolutional neural networks (CNNs), but it is universal and can be used with any patch-based segmentation machine learning model. The evaluation of the proposed data balancing method is performed on two datasets. The first is medical dataset LiTS, containing CT images of liver with tumor abnormalities. The second one is a geological dataset, containing of photographs of polished sections of different ores. The proposed algorithm enhances the data balance between classes and improves the overall performance of CNN model.

Download Full-text

ON THE EXPERT’S RIGHT TO BE PRESENT AT LEGAL PROCEEDINGS

Theory and Practice of Forensic Science and Criminalistics ◽

10.32353/khrife.2015.19 ◽

2016 ◽

Vol 15 ◽

pp. 163-171

Author(s):

M. G. Shcherbakovskiy

Keyword(s):

Initial Data ◽

Legal Proceedings ◽

Reliable Assessment ◽

Data Acquiring ◽

Selection Of

The article discusses the reasonsfor an expert to participate in legal proceedings. The gnoseological reason for that consists of the bad quality of materials subject to examination that renders the examination either completely impossible or compromises objective, reasoned and reliable assessment of the findings. The procedural reason consists ofa proscription for an expert to collect evidence himself or herself. The author investigates into the ways of how an expert can participate in legal proceedings. If the defense invites an expert to participate in the proceedings, then it is recommended that his or her involvement should be in the presence of attesting witnesses and recorded in the protocol. In the course of the legal proceedings an expert has the following tasks: adding initial data, acquiring new initial data, understanding the situation of the incident, acquiring new objects to be studied, including samples for examination. An expert’s participation in legal proceedings differs from the participation of a specialist or an examination on the scene of the incident. The author describes the tasks that an expert solves in the course of legal proceedings, the peculiarities ofan investigation experiment practices, the selection of samples for an examination, inspection, interrogation.

Download Full-text

A Literature Review Study of Software Defect Prediction using Machine Learning Techniques

International Journal of Emerging Research in Management and Technology ◽

10.23956/ijermt.v6i6.286 ◽

2018 ◽

Vol 6 (6) ◽

pp. 300 ◽

Cited By ~ 3

Author(s):

Feidu Akmel ◽

Ermiyas Birihanu ◽

Bahir Siraj

Keyword(s):

Machine Learning ◽

Software Metrics ◽

Quality Standard ◽

Machine Learning Techniques ◽

Software Systems ◽

Health Care Insurance ◽

Software Defect ◽

Learning Techniques ◽

Software Product

Software systems are any software product or applications that support business domains such as Manufacturing,Aviation, Health care, insurance and so on.Software quality is a means of measuring how software is designed and how well the software conforms to that design. Some of the variables that we are looking for software quality are Correctness, Product quality, Scalability, Completeness and Absence of bugs, However the quality standard that was used from one organization is different from other for this reason it is better to apply the software metrics to measure the quality of software. Attributes that we gathered from source code through software metrics can be an input for software defect predictor. Software defect are an error that are introduced by software developer and stakeholders. Finally, in this study we discovered the application of machine learning on software defect that we gathered from the previous research works.

Download Full-text

Data science in economics: comprehensive review of advanced machine learning and deep learning methods

10.31232/osf.io/4pxq2 ◽

2020 ◽

Author(s):

Saeed Nosratabadi ◽

Amir Mosavi ◽

Puhong Duan ◽

Pedram Ghamisi ◽

Ferdinand Filip ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Data Science ◽

State Of The Art ◽

Science Methods ◽

Learning Models ◽

Diverse Range ◽

Hybrid Machine ◽

Economics Research

This paper provides a state-of-the-art investigation of advances in data science in emerging economic applications. The analysis was performed on novel data science methods in four individual classes of deep learning models, hybrid deep learning models, hybrid machine learning, and ensemble models. Application domains include a wide and diverse range of economics research from the stock market, marketing, and e-commerce to corporate banking and cryptocurrency. Prisma method, a systematic literature review methodology, was used to ensure the quality of the survey. The findings reveal that the trends follow the advancement of hybrid models, which, based on the accuracy metric, outperform other learning algorithms. It is further expected that the trends will converge toward the advancements of sophisticated hybrid deep learning models.

Download Full-text

Pollutants in Organic Chemistry and Medicinal Chemistry Education Laboratory. Experimental and Machine Learning Studies

Current Topics in Medicinal Chemistry ◽

10.2174/1568026620666200211110043 ◽

2020 ◽

Vol 20 (9) ◽

pp. 720-730

Author(s):

Iker Montes-Bageneta ◽

Urtzi Akesolo ◽

Sara López ◽

Maria Merino ◽

Eneritz Anakabe ◽

...

Keyword(s):

Organic Chemistry ◽

Machine Learning ◽

Chemistry Education ◽

Organic Waste ◽

Computational Modelling ◽

University Education ◽

Academic Factors ◽

Academic Year ◽

Statistical Analysis Software

Aims: Computational modelling may help us to detect the more important factors governing this process in order to optimize it. Background: The generation of hazardous organic waste in teaching and research laboratories poses a big problem that universities have to manage. Methods: In this work, we report on the experimental measurement of waste generation on the chemical education laboratories within our department. We measured the waste generated in the teaching laboratories of the Organic Chemistry Department II (UPV/EHU), in the second semester of the 2017/2018 academic year. Likewise, to know the anthropogenic and social factors related to the generation of waste, a questionnaire has been utilized. We focused on all students of Experimentation in Organic Chemistry (EOC) and Organic Chemistry II (OC2) subjects. It helped us to know their prior knowledge about waste, awareness of the problem of separate organic waste and the correct use of the containers. These results, together with the volumetric data, have been analyzed with statistical analysis software. We obtained two Perturbation-Theory Machine Learning (PTML) models including chemical, operational, and academic factors. The dataset analyzed included 6050 cases of laboratory practices vs. practices of reference. Results: These models predict the values of acetone waste with R2 = 0.88 and non-halogenated waste with R2 = 0.91. Conclusion: This work opens a new gate to the implementation of more sustainable techniques and a circular economy with the aim of improving the quality of university education processes.

Download Full-text

Mimicking Anti-Viruses with Machine Learning and Entropy Profiles

Entropy ◽

10.3390/e21050513 ◽

2019 ◽

Vol 21 (5) ◽

pp. 513 ◽

Cited By ~ 4

Author(s):

Héctor D. Menéndez ◽

José Luis Llorente

Keyword(s):

Machine Learning ◽

Classification Algorithms ◽

Security Breach

The quality of anti-virus software relies on simple patterns extracted from binary files. Although these patterns have proven to work on detecting the specifics of software, they are extremely sensitive to concealment strategies, such as polymorphism or metamorphism. These limitations also make anti-virus software predictable, creating a security breach. Any black hat with enough information about the anti-virus behaviour can make its own copy of the software, without any access to the original implementation or database. In this work, we show how this is indeed possible by combining entropy patterns with classification algorithms. Our results, applied to 57 different anti-virus engines, show that we can mimic their behaviour with an accuracy close to 98% in the best case and 75% in the worst, applied on Windows’ disk resident malware.

Download Full-text

Machine learning based accident prediction in secure IoT enable transportation system

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189743 ◽

2021 ◽

pp. 1-13 ◽

Cited By ~ 1

Author(s):

Bhabendu Kumar Mohanta ◽

Debasish Jena ◽

Niva Mohapatra ◽

Somula Ramasubbareddy ◽

Bharat S. Rawal

Keyword(s):

Machine Learning ◽

Smart City ◽

Information Exchange ◽

Secure Communication ◽

Intelligent Transportation System ◽

Transportation System ◽

Information And Communications Technology ◽

Classification Models ◽

Architecture Model ◽

Accident Severity

Smart city has come a long way since the development of emerging technology like Information and communications technology (ICT), Internet of Things (IoT), Machine Learning (ML), Block chain and Artificial Intelligence. The Intelligent Transportation System (ITS) is an important application in a rapidly growing smart city. Prediction of the automotive accident severity plays a very crucial role in the smart transportation system. The main motive behind this research is to determine the specific features which could affect vehicle accident severity. In this paper, some of the classification models, specifically Logistic Regression, Artificial Neural network, Decision Tree, K-Nearest Neighbors, and Random Forest have been implemented for predicting the accident severity. All the models have been verified, and the experimental results prove that these classification models have attained considerable accuracy. The paper also explained a secure communication architecture model for secure information exchange among all the components associated with the ITS. Finally paper implemented web base Message alert system which will be used for alert the users through smart IoT devices.

Download Full-text

Machine-Learning-Based Radiomics MRI Model for Survival Prediction of Recurrent Glioblastomas Treated with Bevacizumab

Diagnostics ◽

10.3390/diagnostics11071263 ◽

2021 ◽

Vol 11 (7) ◽

pp. 1263

Author(s):

Samy Ammari ◽

Raoul Sallé de Chou ◽

Tarek Assi ◽

Mehdi Touat ◽

Emilie Chouzenoux ◽

...

Keyword(s):

Machine Learning ◽

Therapeutic Option ◽

Binary Classification ◽

Progression Free Survival ◽

Recurrent Glioblastoma ◽

Machine Learning Algorithms ◽

Survival Prediction ◽

Classification Models ◽

Angiogenic Therapy ◽

Recurrent Gbm

Anti-angiogenic therapy with bevacizumab is a widely used therapeutic option for recurrent glioblastoma (GBM). Nevertheless, the therapeutic response remains highly heterogeneous among GBM patients with discordant outcomes. Recent data have shown that radiomics, an advanced recent imaging analysis method, can help to predict both prognosis and therapy in a multitude of solid tumours. The objective of this study was to identify novel biomarkers, extracted from MRI and clinical data, which could predict overall survival (OS) and progression-free survival (PFS) in GBM patients treated with bevacizumab using machine-learning algorithms. In a cohort of 194 recurrent GBM patients (age range 18–80), radiomics data from pre-treatment T2 FLAIR and gadolinium-injected MRI images along with clinical features were analysed. Binary classification models for OS at 9, 12, and 15 months were evaluated. Our classification models successfully stratified the OS. The AUCs were equal to 0.78, 0.85, and 0.76 on the test sets (0.79, 0.82, and 0.87 on the training sets) for the 9-, 12-, and 15-month endpoints, respectively. Regressions yielded a C-index of 0.64 (0.74) for OS and 0.57 (0.69) for PFS. These results suggest that radiomics could assist in the elaboration of a predictive model for treatment selection in recurrent GBM patients.

Download Full-text