Cost-sensitive sparse linear regression for crowd counting with imbalanced training data

Author(s):  
Xiaolin Huang ◽  
Yuexian Zou ◽  
Yi Wang
2021 ◽  
Vol 11 (6) ◽  
pp. 2866
Author(s):  
Damheo Lee ◽  
Donghyun Kim ◽  
Seung Yun ◽  
Sanghun Kim

In this paper, we propose a new method for code-switching (CS) automatic speech recognition (ASR) in Korean. First, the phonetic variations in English pronunciation spoken by Korean speakers should be considered. Thus, we tried to find a unified pronunciation model based on phonetic knowledge and deep learning. Second, we extracted the CS sentences semantically similar to the target domain and then applied the language model (LM) adaptation to solve the biased modeling toward Korean due to the imbalanced training data. In this experiment, training data were AI Hub (1033 h) in Korean and Librispeech (960 h) in English. As a result, when compared to the baseline, the proposed method improved the error reduction rate (ERR) by up to 11.6% with phonetic variant modeling and by 17.3% when semantically similar sentences were applied to the LM adaptation. If we considered only English words, the word correction rate improved up to 24.2% compared to that of the baseline. The proposed method seems to be very effective in CS speech recognition.


Author(s):  
Wael H. Awad ◽  
Bruce N. Janson

Three different modeling approaches were applied to explain truck accidents at interchanges in Washington State during a 27-month period. Three models were developed for each ramp type including linear regression, neural networks, and a hybrid system using fuzzy logic and neural networks. The study showed that linear regression was able to predict accident frequencies that fell within one standard deviation from the overall mean of the dependent variable. However, the coefficient of determination was very low in all cases. The other two artificial intelligence (AI) approaches showed a high level of performance in identifying different patterns of accidents in the training data and presented a better fit when compared to the regression model. However, the ability of these AI models to predict test data that were not included in the training process showed unsatisfactory results.


2012 ◽  
Vol 51 (01) ◽  
pp. 39-44 ◽  
Author(s):  
K. Matsuoka ◽  
K. Yoshino

SummaryObjectives: The aim of this study is to present a method of assessing psychological tension that is optimized to every individual on the basis of the heart rate variability (HRV) data which, to eliminate the influence of the inter-individual variability, are measured in a long time period during daily life.Methods: HRV and body accelerations were recorded from nine normal subjects for two months of normal daily life. Fourteen HRV indices were calculated with the HRV data at 512 seconds prior to the time of every mental tension level report. Data to be analyzed were limited to those with body accelerations of 30 mG (0.294 m/s2) and lower. Further, the differences from the reference values in the same time zone were calculated with both the mental tension score (Δtension) and HRV index values (ΔHRVI). The multiple linear regression model that estimates Δtension from the scores for principal components of ΔHRVI were then constructed for each individual. The data were divided into training data set and test data set in accordance with the twofold cross validation method. Multiple linear regression coefficients were determined using the training data set, and with the optimized model its generalization capability was checked using the test data set.Results: The subjects’ mean Pearson correlation coefficient was 0.52 with the training data set and 0.40 with the test data set. The subjects’ mean coefficient of determination was 0.28 with the training data set and 0.11 with the test data set.Conclusion: We proposed a method of assessing psychological tension that is optimized to every individual based on HRV data measured over a long period of daily life.


2020 ◽  
Vol 4 (1) ◽  
pp. 60-72 ◽  
Author(s):  
Wen Li ◽  
Wei Wang ◽  
Wenjun Huo

Purpose Inspired by the basic idea of gradient boosting, this study aims to design a novel multivariate regression ensemble algorithm RegBoost by using multivariate linear regression as a weak predictor. Design/methodology/approach To achieve nonlinearity after combining all linear regression predictors, the training data is divided into two branches according to the prediction results using the current weak predictor. The linear regression modeling is recursively executed in two branches. In the test phase, test data is distributed to a specific branch to continue with the next weak predictor. The final result is the sum of all weak predictors across the entire path. Findings Through comparison experiments, it is found that the algorithm RegBoost can achieve similar performance to the gradient boosted decision tree (GBDT). The algorithm is very effective compared to linear regression. Originality/value This paper attempts to design a novel regression algorithm RegBoost with reference to GBDT. To the best of the knowledge, for the first time, RegBoost uses linear regression as a weak predictor, and combine with gradient boosting to build an ensemble algorithm.


2020 ◽  
Vol 124 ◽  
pp. 103611 ◽  
Author(s):  
Elias Martins Guerra Prado ◽  
Carlos Roberto de Souza Filho ◽  
Emmanuel John M. Carranza ◽  
João Gabriel Motta

2020 ◽  
Vol 11 ◽  
Author(s):  
Joram Soch

When predicting a certain subject-level variable (e.g., age in years) from measured biological data (e.g., structural MRI scans), the decoding algorithm does not always preserve the distribution of the variable to predict. In such a situation, distributional transformation (DT), i.e., mapping the predicted values to the variable's distribution in the training data, might improve decoding accuracy. Here, we tested the potential of DT within the 2019 Predictive Analytics Competition (PAC) which aimed at predicting chronological age of adult human subjects from structural MRI data. In a low-dimensional setting, i.e., with less features than observations, we applied multiple linear regression, support vector regression and deep neural networks for out-of-sample prediction of subject age. We found that (i) when the number of features is low, no method outperforms linear regression; and (ii) except when using deep regression, distributional transformation increases decoding performance, reducing the mean absolute error (MAE) by about half a year. We conclude that DT can be advantageous when predicting variables that are non-controlled, but have an underlying distribution in healthy or diseased populations.


Author(s):  
Bing Li ◽  
Casey Jones ◽  
Vikas Tomar

Abstract This work focuses on the use of linear regression analysis-based machine learning for the prediction of the end of discharge of a commercial prismatic lithium (Li)-ion cell. The cell temperature was recorded during the cycling of Li-ion cells and the relation between the open circuit voltage and cell temperature was used in the development of the linear regression-based machine learning algorithm. The peak temperature was selected as the indicator of battery end of discharge. A battery management system using a pyboard microcontroller was constructed to monitor the temperature of the cell under test, and was also used to control a MOSFET that acted as a switch to disconnect the cell from the circuit. The method used an initial 10 charge and discharge cycles at a rate of 1C as the training data, then another charge and discharge cycle for the testing data. During the test cycling, the discharge was continued beyond the cutoff voltage to initiate an overdischarge while the temperature of the cell was continuously monitored. The experiment was performed on 3 different cells, and the overdischarge for each was secured within 0.1 V of the cutoff voltage. The results of these experiments show that a linear regression-based analysis can be implemented to detect an overdischarge condition of a cell based on the anticipated peak temperature during discharge.


2012 ◽  
Vol 433-440 ◽  
pp. 7479-7486
Author(s):  
Rui Kong ◽  
Qiong Wang ◽  
Gu Yu Hu ◽  
Zhi Song Pan

Support Vector Machines (SVM) has been extensively studied and has shown remarkable success in many applications. However the success of SVM is very limited when it is applied to the problem of learning from imbalanced datasets in which negative instances heavily outnumber the positive instances (e.g. in medical diagnosis and detecting credit card fraud). In this paper, we propose the fuzzy asymmetric algorithm to augment SVMs to deal with imbalanced training-data problems, called FASVM, which is based on fuzzy memberships, combined with different error costs (DEC) algorithm. We compare the performance of our algorithm against these two algorithms, along with different error costs and regular SVM and show that our algorithm outperforms all of them.


Sign in / Sign up

Export Citation Format

Share Document