scholarly journals Data Calibration Based on Multisensor Using Classification Analysis: A Random Forests Approach

2015 ◽  
Vol 2015 ◽  
pp. 1-8 ◽  
Author(s):  
Xue Xing ◽  
Dexin Yu ◽  
Wei Zhang

This paper analyzes the problem of meaningless outliers in traffic detective data sets and researches characteristics about the data of monophyletic detector and multisensor detector based on real-time data on highway. Based on analysis of the current random forests algorithm, which is a learning algorithm of high accuracy and fast speed, new optimum random forests about filtrating outlier in the sample are proposed, which employ bagging strategy combined with boosting strategy. Random forests of different number of trees are applied to analyze status classification of meaningless outliers in traffic detective data sets, respectively, based on traffic flow, spot mean speed, and roadway occupancy rate of traffic parameters. The results show that optimum model of random forest is more accurate to filtrate meaningless outliers in traffic detective data collected from road intersections. With filtrated data for processing, transportation information system can decrease the influence of error data to improve highway traffic information services.

2021 ◽  
Vol 11 (3) ◽  
pp. 92
Author(s):  
Mehdi Berriri ◽  
Sofiane Djema ◽  
Gaëtan Rey ◽  
Christel Dartigues-Pallez

Today, many students are moving towards higher education courses that do not suit them and end up failing. The purpose of this study is to help provide counselors with better knowledge so that they can offer future students courses corresponding to their profile. The second objective is to allow the teaching staff to propose training courses adapted to students by anticipating their possible difficulties. This is possible thanks to a machine learning algorithm called Random Forest, allowing for the classification of the students depending on their results. We had to process data, generate models using our algorithm, and cross the results obtained to have a better final prediction. We tested our method on different use cases, from two classes to five classes. These sets of classes represent the different intervals with an average ranging from 0 to 20. Thus, an accuracy of 75% was achieved with a set of five classes and up to 85% for sets of two and three classes.


Author(s):  
Zifeng Wu ◽  
Laurence R. Rilett ◽  
Yifeng Chen

Highway-rail grade crossings (HRGCs) have a range of safety and operational impacts on highway traffic networks. This paper illustrates a methodology for evaluating travel-time reliability for the routes and networks affected by trains traveling through HRGCs. A sub-area network including three HRGCs is used as the study network, and a simulation model calibrated to local traffic conditions and signal preemption strategies using field data is used as the platform to generate travel time data for analysis. Time-dependent reliability intervals for route travel time are generated based on route travel-time means and standard deviations. OD level reliability is calculated using a generic reliability engineering approach for parallel and series systems. The route travel time reliability results can be provided as real-time traffic information to assist drivers’ route-choice decisions. The OD level reliability is a way to quantify the impact of HRGCs on highway network operation. This effort fills the gap of reliability research for HRGCs on the route and sub-area network level, and contributes to improving the efficiency of decision-making for both traffic engineers and drivers.


2019 ◽  
pp. 089443931986921 ◽  
Author(s):  
Matthias Schonlau ◽  
Hyukjun Gweon ◽  
Marika Wenemark

Text data from open-ended questions in surveys are challenging to analyze and are often ignored. Open-ended questions are important though because they do not constrain respondents’ answers. Where open-ended questions are necessary, often human coders manually code answers. When data sets are large, it is impractical or too costly to manually code all answer texts. Instead, text answers can be converted into numerical variables, and a statistical/machine learning algorithm can be trained on a subset of manually coded data. This statistical model is then used to predict the codes of the remainder. We consider open-ended questions where the answers are coded into multiple labels (all-that-apply questions). For example, in the open-ended question in our Happy example respondents are explicitly told they may list multiple things that make them happy. Algorithms for multilabel data take into account the correlation among the answer codes and may therefore give better prediction results. For example, when giving examples of civil disobedience, respondents talking about “minor nonviolent offenses” were also likely to talk about “crimes.” We compare the performance of two different multilabel algorithms (random k-labelsets [RAKEL], classifier chains [CC]) to the default method of binary relevance (BR) which applies single-label algorithms to each code separately. Performance is evaluated on data from three open-ended questions (Happy, Civil Disobedience, and Immigrant). We found weak bivariate label correlations in the Happy data (90th percentile: 7.6%), and stronger bivariate label correlations in the Civil Disobedience (90th percentile: 17.2%) and Immigrant (90th percentile: 19.2%) data. For the data with stronger correlations, we found both multilabel methods performed substantially better than BR using 0/1 loss (“at least one label is incorrect”) and had little effect when using Hamming loss (average error). For data with weak label correlations, we found no difference in performance between multilabel methods and BR. We conclude that automatic classification of open-ended questions that allow multiple answers may benefit from using multilabel algorithms for 0/1 loss. The degree of correlations among the labels may be a useful prognostic tool.


This project proposes a method for forecasting weather conditions and predicting rainfall by means of machine learning. Here, there are two set ups: one, to measure the weather parameters like temperature, humidity using sensors along with Arduino and another set up, to display the current values(status) and predicted rainfall based on the trained machine learning data sets. The weather forecasting and prediction is done based on the older datasets collected and compared with the current values. The user need not have a backup of huge data to predict the rainfall. Instead a machine learning algorithm can suffice the same. The temperature, humidity sensor modules are used to measure weather parameters and interfaced to an Arduino controller. The proposed setup will compare the forecast value with real-time data, and the predict rainfall based on the dataset fed to the machine learning algorithm.


Author(s):  
Ruchika Malhotra ◽  
Arvinder Kaur ◽  
Yogesh Singh

There are available metrics for predicting fault prone classes, which may help software organizations for planning and performing testing activities. This may be possible due to proper allocation of resources on fault prone parts of the design and code of the software. Hence, importance and usefulness of such metrics is understandable, but empirical validation of these metrics is always a great challenge. Random Forest (RF) algorithm has been successfully applied for solving regression and classification problems in many applications. In this work, the authors predict faulty classes/modules using object oriented metrics and static code metrics. This chapter evaluates the capability of RF algorithm and compares its performance with nine statistical and machine learning methods in predicting fault prone software classes. The authors applied RF on six case studies based on open source, commercial software and NASA data sets. The results indicate that the prediction performance of RF is generally better than statistical and machine learning models. Further, the classification of faulty classes/modules using the RF method is better than the other methods in most of the data sets.


2013 ◽  
Vol 274 ◽  
pp. 200-203
Author(s):  
Ri Sheng Zheng ◽  
Jun Tao Chang ◽  
Hui Xin He ◽  
Fu Chen

Inlet start/unstart detection has been the focus of researching hypersonic inlet, the operation mode of the inlet detection is the prerequisite for the unstart protection control of scramjet. Actually, due to computational complexity and high dimension discrete experimental data, all of these factors are against for the classification of real-time data. To solve this problem, firstly, the 2-D wind tunnel experiment is carried out, inlet start/unstart experiment phenomenon are analyzed; Secondly, isomap algorithm is introduced to reduce high dimensional data , the optimal classification method were obtained with the weighted embedded manifold learning algorithm, At last the superiority of the classification criterion is verified by decision tree algorithm.


2019 ◽  
Vol 31 (4) ◽  
pp. 510-521 ◽  
Author(s):  
Pandia Rajan Jeyaraj ◽  
Edward Rajan Samuel Nadar

Purpose The purpose of this paper is to focus on the design and development of computer-aided fabric defect detection and classification employing advanced learning algorithm. Design/methodology/approach To make a fast and effective classification of fabric defect, the authors have considered a characteristic of texture, namely its colour. A deep convolutional neural network is formed to learn from the training phase of various defect data sets. In the testing phase, the authors have utilised a learning feature for defect classification. Findings The improvement in the defect classification accuracy has been achieved by employing deep learning algorithm. The authors have tested the defect classification accuracy on six different fabric materials and have obtained an average accuracy of 96.55 per cent with 96.4 per cent sensitivity and 0.94 success rate. Practical implications The authors had evaluated the method by using 20 different data sets collected from different raw fabrics. Also, the authors have tested the algorithm in standard data set provided by Ministry of Textile. In the testing task, the authors have obtained an average accuracy of 94.85 per cent, with six defects being successfully recognised by the proposed algorithm. Originality/value The quantitative value of performance index shows the effectiveness of developed classification algorithm. Moreover, the computational time for different fabric processing was presented to verify the computational range of proposed algorithm with the conventional fabric processing techniques. Hence, this proposed computer vision-based fabric defects detection system is used for an accurate defect detection and computer-aided analysis system.


2021 ◽  
Author(s):  
Jilian Goetz ◽  
Zachary F Jessen ◽  
Anne Jacobi ◽  
Adam Mani ◽  
Sam Cooler ◽  
...  

Classification and characterization of neuronal types is critical for understanding their function and dysfunction. Neuronal classification schemes typically rely on measurements of electrophysiological, morphological and molecular features, but aligning these data sets has been challenging. Here, we present a unified classification of retinal ganglion cells (RGCs), the sole retinal output neurons. We used visually-evoked responses to classify 1777 mouse RGCs into 42 types. We also obtained morphological or transcriptomic data from subsets and used these measurements to align the functional classification to publicly available morphological and transcriptomic data sets. We created an online database that allows users to browse or download the data and to classify RGCs from their light responses using a machine-learning algorithm. This work provides a resource for studies of RGCs, their upstream circuits in the retina, and their projections in the brain, and establishes a framework for future efforts in neuronal classification and open data distribution.


Sign in / Sign up

Export Citation Format

Share Document