Data Calibration Based on Multisensor Using Classification Analysis: A Random Forests Approach

This paper analyzes the problem of meaningless outliers in traffic detective data sets and researches characteristics about the data of monophyletic detector and multisensor detector based on real-time data on highway. Based on analysis of the current random forests algorithm, which is a learning algorithm of high accuracy and fast speed, new optimum random forests about filtrating outlier in the sample are proposed, which employ bagging strategy combined with boosting strategy. Random forests of different number of trees are applied to analyze status classification of meaningless outliers in traffic detective data sets, respectively, based on traffic flow, spot mean speed, and roadway occupancy rate of traffic parameters. The results show that optimum model of random forest is more accurate to filtrate meaningless outliers in traffic detective data collected from road intersections. With filtrated data for processing, transportation information system can decrease the influence of error data to improve highway traffic information services.

Download Full-text

Multi-Class Assessment Based on Random Forests

Education Sciences ◽

10.3390/educsci11030092 ◽

2021 ◽

Vol 11 (3) ◽

pp. 92

Author(s):

Mehdi Berriri ◽

Sofiane Djema ◽

Gaëtan Rey ◽

Christel Dartigues-Pallez

Keyword(s):

Higher Education ◽

Machine Learning ◽

Random Forests ◽

Learning Algorithm ◽

Teaching Staff ◽

Machine Learning Algorithm ◽

Process Data ◽

Training Courses ◽

Education Courses

Today, many students are moving towards higher education courses that do not suit them and end up failing. The purpose of this study is to help provide counselors with better knowledge so that they can offer future students courses corresponding to their profile. The second objective is to allow the teaching staff to propose training courses adapted to students by anticipating their possible difficulties. This is possible thanks to a machine learning algorithm called Random Forest, allowing for the classification of the students depending on their results. We had to process data, generate models using our algorithm, and cross the results obtained to have a better final prediction. We tested our method on different use cases, from two classes to five classes. These sets of classes represent the different intervals with an average ranging from 0 to 20. Thus, an accuracy of 75% was achieved with a set of five classes and up to 85% for sets of two and three classes.

Download Full-text

Evaluating the Impact of Highway-Railway Grade Crossings on Travel Time Reliability on a Highway Network Level

Transportation Research Record Journal of the Transportation Research Board ◽

10.1177/0361198118792756 ◽

2018 ◽

Vol 2672 (10) ◽

pp. 1-11 ◽

Cited By ~ 1

Author(s):

Zifeng Wu ◽

Laurence R. Rilett ◽

Yifeng Chen

Keyword(s):

Travel Time ◽

Traffic Information ◽

Area Network ◽

Travel Time Reliability ◽

Highway Traffic ◽

Time Data ◽

Highway Network ◽

Grade Crossings ◽

The Impact ◽

Series Systems

Highway-rail grade crossings (HRGCs) have a range of safety and operational impacts on highway traffic networks. This paper illustrates a methodology for evaluating travel-time reliability for the routes and networks affected by trains traveling through HRGCs. A sub-area network including three HRGCs is used as the study network, and a simulation model calibrated to local traffic conditions and signal preemption strategies using field data is used as the platform to generate travel time data for analysis. Time-dependent reliability intervals for route travel time are generated based on route travel-time means and standard deviations. OD level reliability is calculated using a generic reliability engineering approach for parallel and series systems. The route travel time reliability results can be provided as real-time traffic information to assist drivers’ route-choice decisions. The OD level reliability is a way to quantify the impact of HRGCs on highway network operation. This effort fills the gap of reliability research for HRGCs on the route and sub-area network level, and contributes to improving the efficiency of decision-making for both traffic engineers and drivers.

Download Full-text

Automatic Classification of Open-Ended Questions: Check-All-That-Apply Questions

Social Science Computer Review ◽

10.1177/0894439319869210 ◽

2019 ◽

pp. 089443931986921 ◽

Cited By ~ 1

Author(s):

Matthias Schonlau ◽

Hyukjun Gweon ◽

Marika Wenemark

Keyword(s):

Learning Algorithm ◽

Civil Disobedience ◽

Automatic Classification ◽

Average Error ◽

Data Sets ◽

Statistical Machine Learning ◽

Binary Relevance ◽

Label Correlations ◽

Classifier Chains

Text data from open-ended questions in surveys are challenging to analyze and are often ignored. Open-ended questions are important though because they do not constrain respondents’ answers. Where open-ended questions are necessary, often human coders manually code answers. When data sets are large, it is impractical or too costly to manually code all answer texts. Instead, text answers can be converted into numerical variables, and a statistical/machine learning algorithm can be trained on a subset of manually coded data. This statistical model is then used to predict the codes of the remainder. We consider open-ended questions where the answers are coded into multiple labels (all-that-apply questions). For example, in the open-ended question in our Happy example respondents are explicitly told they may list multiple things that make them happy. Algorithms for multilabel data take into account the correlation among the answer codes and may therefore give better prediction results. For example, when giving examples of civil disobedience, respondents talking about “minor nonviolent offenses” were also likely to talk about “crimes.” We compare the performance of two different multilabel algorithms (random k-labelsets [RAKEL], classifier chains [CC]) to the default method of binary relevance (BR) which applies single-label algorithms to each code separately. Performance is evaluated on data from three open-ended questions (Happy, Civil Disobedience, and Immigrant). We found weak bivariate label correlations in the Happy data (90th percentile: 7.6%), and stronger bivariate label correlations in the Civil Disobedience (90th percentile: 17.2%) and Immigrant (90th percentile: 19.2%) data. For the data with stronger correlations, we found both multilabel methods performed substantially better than BR using 0/1 loss (“at least one label is incorrect”) and had little effect when using Hamming loss (average error). For data with weak label correlations, we found no difference in performance between multilabel methods and BR. We conclude that automatic classification of open-ended questions that allow multiple answers may benefit from using multilabel algorithms for 0/1 loss. The degree of correlations among the labels may be a useful prognostic tool.

Download Full-text

Weather Prediction using Machine Learning and IOT

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.d9130.049420 ◽

2020 ◽

Vol 9 (4) ◽

pp. 2094-2098

Keyword(s):

Machine Learning ◽

Weather Forecasting ◽

Learning Algorithm ◽

Weather Prediction ◽

Weather Conditions ◽

Data Sets ◽

Machine Learning Algorithm ◽

Time Data ◽

Weather Parameters ◽

Set Up

This project proposes a method for forecasting weather conditions and predicting rainfall by means of machine learning. Here, there are two set ups: one, to measure the weather parameters like temperature, humidity using sensors along with Arduino and another set up, to display the current values(status) and predicted rainfall based on the trained machine learning data sets. The weather forecasting and prediction is done based on the older datasets collected and compared with the current values. The user need not have a backup of huge data to predict the rainfall. Instead a machine learning algorithm can suffice the same. The temperature, humidity sensor modules are used to measure weather parameters and interfaced to an Arduino controller. The proposed setup will compare the forecast value with real-time data, and the predict rainfall based on the dataset fed to the machine learning algorithm.

Download Full-text

Comparative Analysis of Random Forests with Statistical and Machine Learning Methods in Predicting Fault-Prone Classes

Cross-Disciplinary Applications of Artificial Intelligence and Pattern Recognition - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-61350-429-1.ch023 ◽

2012 ◽

pp. 428-449 ◽

Cited By ~ 1

Author(s):

Ruchika Malhotra ◽

Arvinder Kaur ◽

Yogesh Singh

Keyword(s):

Machine Learning ◽

Random Forests ◽

Data Sets ◽

Classification Problems ◽

Learning Methods ◽

Machine Learning Methods ◽

Code Metrics ◽

Machine Learning Models ◽

Better Than

There are available metrics for predicting fault prone classes, which may help software organizations for planning and performing testing activities. This may be possible due to proper allocation of resources on fault prone parts of the design and code of the software. Hence, importance and usefulness of such metrics is understandable, but empirical validation of these metrics is always a great challenge. Random Forest (RF) algorithm has been successfully applied for solving regression and classification problems in many applications. In this work, the authors predict faulty classes/modules using object oriented metrics and static code metrics. This chapter evaluates the capability of RF algorithm and compares its performance with nine statistical and machine learning methods in predicting fault prone software classes. The authors applied RF on six case studies based on open source, commercial software and NASA data sets. The results indicate that the prediction performance of RF is generally better than statistical and machine learning models. Further, the classification of faulty classes/modules using the RF method is better than the other methods in most of the data sets.

Download Full-text

Aspect based feature extraction and sentiment classification of review data sets using Incremental machine learning algorithm

2017 Third International Conference on Advances in Electrical, Electronics, Information, Communication and Bio-Informatics (AEEICB) ◽

10.1109/aeeicb.2017.7972395 ◽

2017 ◽

Cited By ~ 4

Author(s):

Rajalaxmi Hegde ◽

Seema S.

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Learning Algorithm ◽

Sentiment Classification ◽

Data Sets ◽

Machine Learning Algorithm

Download Full-text

Optimal Classification of Hypersonic Inlet Start/Unstart Based on Manifold Learning

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.274.200 ◽

2013 ◽

Vol 274 ◽

pp. 200-203

Author(s):

Ri Sheng Zheng ◽

Jun Tao Chang ◽

Hui Xin He ◽

Fu Chen

Keyword(s):

Manifold Learning ◽

Learning Algorithm ◽

Operation Mode ◽

High Dimensional ◽

Classification Criterion ◽

Decision Tree Algorithm ◽

Time Data ◽

Hypersonic Inlet ◽

Optimal Classification

Inlet start/unstart detection has been the focus of researching hypersonic inlet, the operation mode of the inlet detection is the prerequisite for the unstart protection control of scramjet. Actually, due to computational complexity and high dimension discrete experimental data, all of these factors are against for the classification of real-time data. To solve this problem, firstly, the 2-D wind tunnel experiment is carried out, inlet start/unstart experiment phenomenon are analyzed; Secondly, isomap algorithm is introduced to reduce high dimensional data , the optimal classification method were obtained with the weighted embedded manifold learning algorithm, At last the superiority of the classification criterion is verified by decision tree algorithm.

Download Full-text

Computer vision for automatic detection and classification of fabric defect employing deep learning algorithm

International Journal of Clothing Science and Technology ◽

10.1108/ijcst-11-2018-0135 ◽

2019 ◽

Vol 31 (4) ◽

pp. 510-521 ◽

Cited By ~ 12

Author(s):

Pandia Rajan Jeyaraj ◽

Edward Rajan Samuel Nadar

Keyword(s):

Defect Detection ◽

Classification Accuracy ◽

Learning Algorithm ◽

Data Sets ◽

Defect Classification ◽

Content Type ◽

Deep Learning Algorithm ◽

Average Accuracy ◽

Fabric Defect

Purpose The purpose of this paper is to focus on the design and development of computer-aided fabric defect detection and classification employing advanced learning algorithm. Design/methodology/approach To make a fast and effective classification of fabric defect, the authors have considered a characteristic of texture, namely its colour. A deep convolutional neural network is formed to learn from the training phase of various defect data sets. In the testing phase, the authors have utilised a learning feature for defect classification. Findings The improvement in the defect classification accuracy has been achieved by employing deep learning algorithm. The authors have tested the defect classification accuracy on six different fabric materials and have obtained an average accuracy of 96.55 per cent with 96.4 per cent sensitivity and 0.94 success rate. Practical implications The authors had evaluated the method by using 20 different data sets collected from different raw fabrics. Also, the authors have tested the algorithm in standard data set provided by Ministry of Textile. In the testing task, the authors have obtained an average accuracy of 94.85 per cent, with six defects being successfully recognised by the proposed algorithm. Originality/value The quantitative value of performance index shows the effectiveness of developed classification algorithm. Moreover, the computational time for different fabric processing was presented to verify the computational range of proposed algorithm with the conventional fabric processing techniques. Hence, this proposed computer vision-based fabric defects detection system is used for an accurate defect detection and computer-aided analysis system.

Download Full-text

Unified classification of mouse retinal ganglion cells using function, morphology, and gene expression

10.1101/2021.06.10.447922 ◽

2021 ◽

Author(s):

Jilian Goetz ◽

Zachary F Jessen ◽

Anne Jacobi ◽

Adam Mani ◽

Sam Cooler ◽

...

Keyword(s):

Retinal Ganglion Cells ◽

Learning Algorithm ◽

Ganglion Cells ◽

Open Data ◽

Data Sets ◽

Retinal Ganglion ◽

Transcriptomic Data ◽

Molecular Features ◽

Neuronal Classification

Classification and characterization of neuronal types is critical for understanding their function and dysfunction. Neuronal classification schemes typically rely on measurements of electrophysiological, morphological and molecular features, but aligning these data sets has been challenging. Here, we present a unified classification of retinal ganglion cells (RGCs), the sole retinal output neurons. We used visually-evoked responses to classify 1777 mouse RGCs into 42 types. We also obtained morphological or transcriptomic data from subsets and used these measurements to align the functional classification to publicly available morphological and transcriptomic data sets. We created an online database that allows users to browse or download the data and to classify RGCs from their light responses using a machine-learning algorithm. This work provides a resource for studies of RGCs, their upstream circuits in the retina, and their projections in the brain, and establishes a framework for future efforts in neuronal classification and open data distribution.

Download Full-text

Review for "Smart grid security enhancement by detection and classification of non‐technical losses employing deep learning algorithm"

10.1002/2050-7038.12521/v1/review4 ◽

2020 ◽

Keyword(s):

Deep Learning ◽

Smart Grid ◽

Learning Algorithm ◽

Grid Security ◽

Security Enhancement ◽

Smart Grid Security ◽

Deep Learning Algorithm

Download Full-text