Cost-sensitive sparse linear regression for crowd counting with imbalanced training data

In this paper, we propose a new method for code-switching (CS) automatic speech recognition (ASR) in Korean. First, the phonetic variations in English pronunciation spoken by Korean speakers should be considered. Thus, we tried to find a unified pronunciation model based on phonetic knowledge and deep learning. Second, we extracted the CS sentences semantically similar to the target domain and then applied the language model (LM) adaptation to solve the biased modeling toward Korean due to the imbalanced training data. In this experiment, training data were AI Hub (1033 h) in Korean and Librispeech (960 h) in English. As a result, when compared to the baseline, the proposed method improved the error reduction rate (ERR) by up to 11.6% with phonetic variant modeling and by 17.3% when semantically similar sentences were applied to the LM adaptation. If we considered only English words, the word correction rate improved up to 24.2% compared to that of the baseline. The proposed method seems to be very effective in CS speech recognition.

Download Full-text

Prediction Models for Truck Accidents at Freeway Ramps in Washington State Using Regression and Artificial Intelligence Techniques

Transportation Research Record Journal of the Transportation Research Board ◽

10.3141/1635-04 ◽

1998 ◽

Vol 1635 (1) ◽

pp. 30-36 ◽

Cited By ~ 9

Author(s):

Wael H. Awad ◽

Bruce N. Janson

Keyword(s):

Artificial Intelligence ◽

Neural Networks ◽

Linear Regression ◽

Prediction Models ◽

Washington State ◽

Training Data ◽

Coefficient Of Determination ◽

Training Process ◽

Truck Accidents ◽

High Level

Three different modeling approaches were applied to explain truck accidents at interchanges in Washington State during a 27-month period. Three models were developed for each ramp type including linear regression, neural networks, and a hybrid system using fuzzy logic and neural networks. The study showed that linear regression was able to predict accident frequencies that fell within one standard deviation from the overall mean of the dependent variable. However, the coefficient of determination was very low in all cases. The other two artificial intelligence (AI) approaches showed a high level of performance in identifying different patterns of accidents in the training data and presented a better fit when compared to the regression model. However, the ability of these AI models to predict test data that were not included in the training process showed unsatisfactory results.

Download Full-text

Personal Adaptive Method to Assess Mental Tension during Daily Life Using Heart Rate Variability

Methods of Information in Medicine ◽

10.3414/me11-01-0027 ◽

2012 ◽

Vol 51 (01) ◽

pp. 39-44 ◽

Cited By ~ 11

Author(s):

K. Matsuoka ◽

K. Yoshino

Keyword(s):

Heart Rate ◽

Heart Rate Variability ◽

Linear Regression ◽

Multiple Linear Regression ◽

Test Data ◽

Daily Life ◽

Pearson Correlation ◽

Multiple Linear Regression Model ◽

Training Data ◽

Data Set

SummaryObjectives: The aim of this study is to present a method of assessing psychological tension that is optimized to every individual on the basis of the heart rate variability (HRV) data which, to eliminate the influence of the inter-individual variability, are measured in a long time period during daily life.Methods: HRV and body accelerations were recorded from nine normal subjects for two months of normal daily life. Fourteen HRV indices were calculated with the HRV data at 512 seconds prior to the time of every mental tension level report. Data to be analyzed were limited to those with body accelerations of 30 mG (0.294 m/s2) and lower. Further, the differences from the reference values in the same time zone were calculated with both the mental tension score (Δtension) and HRV index values (ΔHRVI). The multiple linear regression model that estimates Δtension from the scores for principal components of ΔHRVI were then constructed for each individual. The data were divided into training data set and test data set in accordance with the twofold cross validation method. Multiple linear regression coefficients were determined using the training data set, and with the optimized model its generalization capability was checked using the test data set.Results: The subjects’ mean Pearson correlation coefficient was 0.52 with the training data set and 0.40 with the test data set. The subjects’ mean coefficient of determination was 0.28 with the training data set and 0.11 with the test data set.Conclusion: We proposed a method of assessing psychological tension that is optimized to every individual based on HRV data measured over a long period of daily life.

Download Full-text

RegBoost: a gradient boosted multivariate regression algorithm

International Journal of Crowd Science ◽

10.1108/ijcs-10-2019-0029 ◽

2020 ◽

Vol 4 (1) ◽

pp. 60-72 ◽

Cited By ~ 1

Author(s):

Wen Li ◽

Wei Wang ◽

Wenjun Huo

Keyword(s):

Linear Regression ◽

Multivariate Regression ◽

Design Methodology ◽

Training Data ◽

Gradient Boosting ◽

Content Type ◽

Boosted Decision Tree ◽

Ensemble Algorithm ◽

Linear Regression Modeling ◽

First Time

Purpose Inspired by the basic idea of gradient boosting, this study aims to design a novel multivariate regression ensemble algorithm RegBoost by using multivariate linear regression as a weak predictor. Design/methodology/approach To achieve nonlinearity after combining all linear regression predictors, the training data is divided into two branches according to the prediction results using the current weak predictor. The linear regression modeling is recursively executed in two branches. In the test phase, test data is distributed to a specific branch to continue with the next weak predictor. The final result is the sum of all weak predictors across the entire path. Findings Through comparison experiments, it is found that the algorithm RegBoost can achieve similar performance to the gradient boosted decision tree (GBDT). The algorithm is very effective compared to linear regression. Originality/value This paper attempts to design a novel regression algorithm RegBoost with reference to GBDT. To the best of the knowledge, for the first time, RegBoost uses linear regression as a weak predictor, and combine with gradient boosting to build an ensemble algorithm.

Download Full-text

Coping with imbalanced training data for improved terrain prediction in autonomous outdoor robot navigation

2010 IEEE International Conference on Robotics and Automation ◽

10.1109/robot.2010.5509634 ◽

2010 ◽

Cited By ~ 4

Author(s):

Michael J Procopio ◽

Jane Mulligan ◽

Greg Grudic

Keyword(s):

Robot Navigation ◽

Training Data ◽

Imbalanced Training Data

Download Full-text

Modeling of Cu-Au prospectivity in the Carajás mineral province (Brazil) through machine learning: Dealing with imbalanced training data

Ore Geology Reviews ◽

10.1016/j.oregeorev.2020.103611 ◽

2020 ◽

Vol 124 ◽

pp. 103611 ◽

Cited By ~ 1

Author(s):

Elias Martins Guerra Prado ◽

Carlos Roberto de Souza Filho ◽

Emmanuel John M. Carranza ◽

João Gabriel Motta

Keyword(s):

Machine Learning ◽

Training Data ◽

Carajás Mineral Province ◽

Imbalanced Training Data

Download Full-text

Distributional Transformation Improves Decoding Accuracy When Predicting Chronological Age From Structural MRI

Frontiers in Psychiatry ◽

10.3389/fpsyt.2020.604268 ◽

2020 ◽

Vol 11 ◽

Author(s):

Joram Soch

Keyword(s):

Linear Regression ◽

Predictive Analytics ◽

Human Subjects ◽

Structural Mri ◽

Biological Data ◽

Training Data ◽

Chronological Age ◽

Support Vector ◽

Underlying Distribution ◽

Decoding Accuracy

When predicting a certain subject-level variable (e.g., age in years) from measured biological data (e.g., structural MRI scans), the decoding algorithm does not always preserve the distribution of the variable to predict. In such a situation, distributional transformation (DT), i.e., mapping the predicted values to the variable's distribution in the training data, might improve decoding accuracy. Here, we tested the potential of DT within the 2019 Predictive Analytics Competition (PAC) which aimed at predicting chronological age of adult human subjects from structural MRI data. In a low-dimensional setting, i.e., with less features than observations, we applied multiple linear regression, support vector regression and deep neural networks for out-of-sample prediction of subject age. We found that (i) when the number of features is low, no method outperforms linear regression; and (ii) except when using deep regression, distributional transformation increases decoding performance, reducing the mean absolute error (MAE) by about half a year. We conclude that DT can be advantageous when predicting variables that are non-controlled, but have an underlying distribution in healthy or diseased populations.

Download Full-text

Overdischarge Detection and Prevention with Temperature Monitoring of Li-ion Batteries and Linear Regression-based Machine Learning

Journal of Electrochemical Energy Conversion and Storage ◽

10.1115/1.4051296 ◽

2021 ◽

pp. 1-11

Author(s):

Bing Li ◽

Casey Jones ◽

Vikas Tomar

Keyword(s):

Machine Learning ◽

Linear Regression ◽

Learning Algorithm ◽

Linear Regression Analysis ◽

Peak Temperature ◽

Training Data ◽

Battery Management System ◽

Cell Temperature ◽

Li Ion ◽

Cutoff Voltage

Abstract This work focuses on the use of linear regression analysis-based machine learning for the prediction of the end of discharge of a commercial prismatic lithium (Li)-ion cell. The cell temperature was recorded during the cycling of Li-ion cells and the relation between the open circuit voltage and cell temperature was used in the development of the linear regression-based machine learning algorithm. The peak temperature was selected as the indicator of battery end of discharge. A battery management system using a pyboard microcontroller was constructed to monitor the temperature of the cell under test, and was also used to control a MOSFET that acted as a switch to disconnect the cell from the circuit. The method used an initial 10 charge and discharge cycles at a rate of 1C as the training data, then another charge and discharge cycle for the testing data. During the test cycling, the discharge was continued beyond the cutoff voltage to initiate an overdischarge while the temperature of the cell was continuously monitored. The experiment was performed on 3 different cells, and the overdischarge for each was secured within 0.1 V of the cutoff voltage. The results of these experiments show that a linear regression-based analysis can be implemented to detect an overdischarge condition of a cell based on the anticipated peak temperature during discharge.

Download Full-text

Fuzzy Asymmetric Support Vector Machines

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.433-440.7479 ◽

2012 ◽

Vol 433-440 ◽

pp. 7479-7486

Author(s):

Rui Kong ◽

Qiong Wang ◽

Gu Yu Hu ◽

Zhi Song Pan

Keyword(s):

Support Vector Machines ◽

Medical Diagnosis ◽

Credit Card ◽

Training Data ◽

Support Vector ◽

Imbalanced Datasets ◽

Credit Card Fraud ◽

Vector Machines ◽

Imbalanced Training Data ◽

Error Costs

Support Vector Machines (SVM) has been extensively studied and has shown remarkable success in many applications. However the success of SVM is very limited when it is applied to the problem of learning from imbalanced datasets in which negative instances heavily outnumber the positive instances (e.g. in medical diagnosis and detecting credit card fraud). In this paper, we propose the fuzzy asymmetric algorithm to augment SVMs to deal with imbalanced training-data problems, called FASVM, which is based on fuzzy memberships, combined with different error costs (DEC) algorithm. We compare the performance of our algorithm against these two algorithms, along with different error costs and regular SVM and show that our algorithm outperforms all of them.

Download Full-text

Deep Learning Case Study on Imbalanced Training Data for Automatic Bird Identification

Deep Learning: Algorithms and Applications - Studies in Computational Intelligence ◽

10.1007/978-3-030-31760-7_8 ◽

2019 ◽

pp. 231-262

Author(s):

Juha Niemi ◽

Juha T. Tanttu

Keyword(s):

Deep Learning ◽

Training Data ◽

Imbalanced Training Data

Download Full-text