Analysis of Factors Contributing to Vehicle-Pedestrian Crash Severity Incorporating Data Imbalance Treatment

2021 ◽  
Author(s):  
Jinming Liu ◽  
Yuanqing Wang ◽  
Bei Zhou
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Yingxi Yang ◽  
Hui Wang ◽  
Wen Li ◽  
Xiaobo Wang ◽  
Shizhao Wei ◽  
...  

Abstract Background Protein post-translational modification (PTM) is a key issue to investigate the mechanism of protein’s function. With the rapid development of proteomics technology, a large amount of protein sequence data has been generated, which highlights the importance of the in-depth study and analysis of PTMs in proteins. Method We proposed a new multi-classification machine learning pipeline MultiLyGAN to identity seven types of lysine modified sites. Using eight different sequential and five structural construction methods, 1497 valid features were remained after the filtering by Pearson correlation coefficient. To solve the data imbalance problem, Conditional Generative Adversarial Network (CGAN) and Conditional Wasserstein Generative Adversarial Network (CWGAN), two influential deep generative methods were leveraged and compared to generate new samples for the types with fewer samples. Finally, random forest algorithm was utilized to predict seven categories. Results In the tenfold cross-validation, accuracy (Acc) and Matthews correlation coefficient (MCC) were 0.8589 and 0.8376, respectively. In the independent test, Acc and MCC were 0.8549 and 0.8330, respectively. The results indicated that CWGAN better solved the existing data imbalance and stabilized the training error. Alternatively, an accumulated feature importance analysis reported that CKSAAP, PWM and structural features were the three most important feature-encoding schemes. MultiLyGAN can be found at https://github.com/Lab-Xu/MultiLyGAN. Conclusions The CWGAN greatly improved the predictive performance in all experiments. Features derived from CKSAAP, PWM and structure schemes are the most informative and had the greatest contribution to the prediction of PTM.


2021 ◽  
Vol 21 (S2) ◽  
Author(s):  
Kun Zeng ◽  
Yibin Xu ◽  
Ge Lin ◽  
Likeng Liang ◽  
Tianyong Hao

Abstract Background Eligibility criteria are the primary strategy for screening the target participants of a clinical trial. Automated classification of clinical trial eligibility criteria text by using machine learning methods improves recruitment efficiency to reduce the cost of clinical research. However, existing methods suffer from poor classification performance due to the complexity and imbalance of eligibility criteria text data. Methods An ensemble learning-based model with metric learning is proposed for eligibility criteria classification. The model integrates a set of pre-trained models including Bidirectional Encoder Representations from Transformers (BERT), A Robustly Optimized BERT Pretraining Approach (RoBERTa), XLNet, Pre-training Text Encoders as Discriminators Rather Than Generators (ELECTRA), and Enhanced Representation through Knowledge Integration (ERNIE). Focal Loss is used as a loss function to address the data imbalance problem. Metric learning is employed to train the embedding of each base model for feature distinguish. Soft Voting is applied to achieve final classification of the ensemble model. The dataset is from the standard evaluation task 3 of 5th China Health Information Processing Conference containing 38,341 eligibility criteria text in 44 categories. Results Our ensemble method had an accuracy of 0.8497, a precision of 0.8229, and a recall of 0.8216 on the dataset. The macro F1-score was 0.8169, outperforming state-of-the-art baseline methods by 0.84% improvement on average. In addition, the performance improvement had a p-value of 2.152e-07 with a standard t-test, indicating that our model achieved a significant improvement. Conclusions A model for classifying eligibility criteria text of clinical trials based on multi-model ensemble learning and metric learning was proposed. The experiments demonstrated that the classification performance was improved by our ensemble model significantly. In addition, metric learning was able to improve word embedding representation and the focal loss reduced the impact of data imbalance to model performance.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Michael Bauer ◽  
Leah Hines ◽  
Emilia Pawlowski ◽  
Jin Luo ◽  
Anne Scott ◽  
...  

Abstract Background In New York State (NYS), motor vehicle (MV) injury to child passengers is a leading cause of hospitalization and emergency department (ED) visits in children aged 0–12 years. NYS laws require appropriate child restraints for ages 0–7 years and safety belts for ages 8 and up while traveling in a private passenger vehicle, but do not specify a seating position. Methods Factors associated with injury in front-seated (n = 11,212) compared to rear-seated (n = 93,092) passengers aged 0–12 years were examined by age groups 0–3, 4–7 and 8–12 years using the 2012–2014 NYS Crash Outcome Data Evaluation System (CODES). CODES consists of Department of Motor Vehicle (DMV) crash reports linked to ED visits and hospitalizations. The front seat was row 1 and the rear rows 2–3. Vehicle towed from scene and air bag deployed were proxies for crash severity. Injury was dichotomized based on Maximum Abbreviated Injury Severity (MAIS) scores greater than zero. Multivariable logistic regression (odds ratios (OR) with 95% CI) was used to examine factors predictive of injury for the total population and for each age group. Results Front-seated children had more frequent injury than those rear-seated (8.46% vs. 4.92%, p < 0.0001). Children in child restraints experienced fewer medically-treated injuries compared to seat belted or unrestrained children (3.80, 6.50 and 13.62%, p < 0.0001 respectively). A higher proportion of children traveling with an unrestrained vs. restrained driver experienced injury (14.50% vs 5.26%, p < 0.0001). After controlling for crash severity, multivariable adjusted predictors of injury for children aged 0–12 years included riding in the front seat (1.20, 1.10–1.31), being unrestrained vs. child restraint (2.13, 1.73–2.62), being restrained in a seat belt vs. child restraint (1.20, 1.11–1.31), and traveling in a car vs. other vehicle type (1.21, 1.14–1.28). Similarly, protective factors included traveling with a restrained driver (0.61, 0.50–0.75), a driver aged < 25 years (0.91, 0.82–0.99), being an occupant of a later vehicle model year 2005–2008 (0.68, 0.53–0.89) or 2009–2015 (0.55, 0.42–0.71) compared to older model years (1970–1993). Conclusions Compared to front-seated children, rear-seated children and children in age-appropriate restraints had lower adjusted odds of medically-treated injury.


Sign in / Sign up

Export Citation Format

Share Document