Linear Regularization-based Analysis and Prediction of Human Mobility in the U.S. during the COVID-19 Pandemic

Since the increasing spread of COVID-19 in the U.S., with currently the highest number of confirmed cases and deaths in the world, most states in the nation have enforced travel restrictions resulting in drastic reductions in mobility and travel. However, the overall impact and long-term implications of this crisis to mobility still remain uncertain. To this end, this study develops an analytical framework that determines the most significant factors impacting human mobility and travel in the U.S. during the pandemic. In particular, we use Least Absolute Shrinkage and Selection Operator (LASSO) to identify the significant variables influencing human mobility and utilize linear regularization algorithms, including Ridge, LASSO, and Elastic Net modeling techniques to model and predict human mobility and travel. State-level data were obtained from various open-access sources for the period from January 1, 2020 to June 13, 2020. The entire data set was divided into a training data-set and a test data-set and the variables selected by LASSO were used to train four different models by ordinary linear regression, Ridge regression, LASSO and Elastic Net regression algorithms, using the training data-set. Finally, the prediction accuracy of the developed models was examined on the test data. The results indicate that among all models, the Ridge regression provides the most superior performance with the least error, while both LASSO and Elastic Net performed better than the ordinary linear model.

Download Full-text

Analysis and Prediction of Human Mobility in the United States during the Early Stages of the COVID-19 Pandemic using Regularized Linear Models

Transportation Research Record Journal of the Transportation Research Board ◽

10.1177/03611981211067794 ◽

2022 ◽

pp. 036119812110677

Author(s):

Meghna Chakraborty ◽

Md Shakir Mahmud ◽

Timothy J. Gates ◽

Subhrajit Sinha

Keyword(s):

United States ◽

Test Data ◽

Linear Models ◽

Human Mobility ◽

The United States ◽

Elastic Net ◽

Training Data ◽

Superior Performance ◽

Data Set ◽

Travel Restrictions

Since the United States started grappling with the COVID-19 pandemic, with the highest number of confirmed cases and deaths in the world as of August 2020, most states have enforced travel restrictions resulting in drastic reductions in mobility and travel. However, the long-term implications of this crisis to mobility still remain uncertain. To this end, this study proposes an analytical framework that determines the most significant factors affecting human mobility in the United States during the early days of the pandemic. Particularly, the study uses least absolute shrinkage and selection operator (LASSO) regularization to identify the most significant variables influencing human mobility and uses linear regularization algorithms, including ridge, LASSO, and elastic net modeling techniques, to predict human mobility. State-level data were obtained from various sources from January 1, 2020 to June 13, 2020. The entire data set was divided into a training and a test data set, and the variables selected by LASSO were used to train models by the linear regularization algorithms, using the training data set. Finally, the prediction accuracy of the developed models was examined on the test data. The results indicate that several factors, including the number of new cases, social distancing, stay-at-home orders, domestic travel restrictions, mask-wearing policy, socioeconomic status, unemployment rate, transit mode share, percent of population working from home, and percent of older (60+ years) and African and Hispanic American populations, among others, significantly influence daily trips. Moreover, among all models, ridge regression provides the most superior performance with the least error, whereas both LASSO and elastic net performed better than the ordinary linear model.

Download Full-text

Evaluation of automated cephalometric analysis based on the latest deep learning method

The Angle Orthodontist ◽

10.2319/021220-100.1 ◽

2021 ◽

Author(s):

Hye-Won Hwang ◽

Jun-Ho Moon ◽

Min-Gyu Kim ◽

Richard E. Donatelli ◽

Shin-Jae Lee

Keyword(s):

Deep Learning ◽

Test Data ◽

Training Data ◽

Superior Performance ◽

Cephalometric Analysis ◽

Data Sets ◽

Learning Method ◽

Test Results ◽

Classification Rate ◽

Data Set

ABSTRACT Objectives To compare an automated cephalometric analysis based on the latest deep learning method of automatically identifying cephalometric landmarks (AI) with previously published AI according to the test style of the worldwide AI challenges at the International Symposium on Biomedical Imaging conferences held by the Institute of Electrical and Electronics Engineers (IEEE ISBI). Materials and Methods This latest AI was developed by using a total of 1983 cephalograms as training data. In the training procedures, a modification of a contemporary deep learning method, YOLO version 3 algorithm, was applied. Test data consisted of 200 cephalograms. To follow the same test style of the AI challenges at IEEE ISBI, a human examiner manually identified the IEEE ISBI-designated 19 cephalometric landmarks, both in training and test data sets, which were used as references for comparison. Then, the latest AI and another human examiner independently detected the same landmarks in the test data set. The test results were compared by the measures that appeared at IEEE ISBI: the success detection rate (SDR) and the success classification rates (SCR). Results SDR of the latest AI in the 2-mm range was 75.5% and SCR was 81.5%. These were greater than any other previous AIs. Compared to the human examiners, AI showed a superior success classification rate in some cephalometric analysis measures. Conclusions This latest AI seems to have superior performance compared to previous AI methods. It also seems to demonstrate cephalometric analysis comparable to human examiners.

Download Full-text

Synthetic Sonic Log Generation With Machine Learning: A Contest Summary From Five Methods

Petrophysics – The SPWLA Journal of Formation Evaluation and Reservoir Description ◽

10.30632/pjv62n4-2021a4 ◽

2021 ◽

Vol 62 (4) ◽

pp. 393-406

Author(s):

Yanxiang Yu ◽

◽

Chicheng Xu ◽

Siddharth Misra ◽

Weichang Li ◽

...

Keyword(s):

Machine Learning ◽

Test Data ◽

Short Term Memory ◽

Rock Physics ◽

Training Data ◽

Machine Learning Techniques ◽

Blind Test ◽

Data Set ◽

Benchmark Model ◽

Sonic Log

Compressional and shear sonic traveltime logs (DTC and DTS, respectively) are crucial for subsurface characterization and seismic-well tie. However, these two logs are often missing or incomplete in many oil and gas wells. Therefore, many petrophysical and geophysical workflows include sonic log synthetization or pseudo-log generation based on multivariate regression or rock physics relations. Started on March 1, 2020, and concluded on May 7, 2020, the SPWLA PDDA SIG hosted a contest aiming to predict the DTC and DTS logs from seven “easy-to-acquire” conventional logs using machine-learning methods (GitHub, 2020). In the contest, a total number of 20,525 data points with half-foot resolution from three wells was collected to train regression models using machine-learning techniques. Each data point had seven features, consisting of the conventional “easy-to-acquire” logs: caliper, neutron porosity, gamma ray (GR), deep resistivity, medium resistivity, photoelectric factor, and bulk density, respectively, as well as two sonic logs (DTC and DTS) as the target. The separate data set of 11,089 samples from a fourth well was then used as the blind test data set. The prediction performance of the model was evaluated using root mean square error (RMSE) as the metric, shown in the equation below: RMSE=sqrt(1/2*1/m* [∑_(i=1)^m▒〖(〖DTC〗_pred^i-〖DTC〗_true^i)〗^2 + 〖(〖DTS〗_pred^i-〖DTS〗_true^i)〗^2 ] In the benchmark model, (Yu et al., 2020), we used a Random Forest regressor and conducted minimal preprocessing to the training data set; an RMSE score of 17.93 was achieved on the test data set. The top five models from the contest, on average, beat the performance of our benchmark model by 27% in the RMSE score. In the paper, we will review these five solutions, including preprocess techniques and different machine-learning models, including neural network, long short-term memory (LSTM), and ensemble trees. We found that data cleaning and clustering were critical for improving the performance in all models.

Download Full-text

Data Analysis With Shapley Values For Automatic Subject Selection in Alzheimer's Disease Data Sets Using Interpretable Machine Learning

10.21203/rs.3.rs-245707/v1 ◽

2021 ◽

Author(s):

Louise Bloch ◽

Christoph M. Friedrich

Keyword(s):

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Test Data ◽

Noisy Data ◽

Training Data ◽

Data Sets ◽

Data Set ◽

Model Interpretation ◽

Percentage Points ◽

Shapley Values

Abstract Background: The prediction of whether Mild Cognitive Impaired (MCI) subjects will prospectively develop Alzheimer's Disease (AD) is important for the recruitment and monitoring of subjects for therapy studies. Machine Learning (ML) is suitable to improve early AD prediction. The etiology of AD is heterogeneous, which leads to noisy data sets. Additional noise is introduced by multicentric study designs and varying acquisition protocols. This article examines whether an automatic and fair data valuation method based on Shapley values can identify subjects with noisy data. Methods: An ML-workow was developed and trained for a subset of the Alzheimer's Disease Neuroimaging Initiative (ADNI) cohort. The validation was executed for an independent ADNI test data set and for the Australian Imaging, Biomarker and Lifestyle Flagship Study of Ageing (AIBL) cohort. The workow included volumetric Magnetic Resonance Imaging (MRI) feature extraction, subject sample selection using data Shapley, Random Forest (RF) and eXtreme Gradient Boosting (XGBoost) for model training and Kernel SHapley Additive exPlanations (SHAP) values for model interpretation. This model interpretation enables clinically relevant explanation of individual predictions. Results: The XGBoost models which excluded 116 of the 467 subjects from the training data set based on their Logistic Regression (LR) data Shapley values outperformed the models which were trained on the entire training data set and which reached a mean classification accuracy of 58.54 % by 14.13 % (8.27 percentage points) on the independent ADNI test data set. The XGBoost models, which were trained on the entire training data set reached a mean accuracy of 60.35 % for the AIBL data set. An improvement of 24.86 % (15.00 percentage points) could be reached for the XGBoost models if those 72 subjects with the smallest RF data Shapley values were excluded from the training data set. Conclusion: The data Shapley method was able to improve the classification accuracies for the test data sets. Noisy data was associated with the number of ApoEϵ4 alleles and volumetric MRI measurements. Kernel SHAP showed that the black-box models learned biologically plausible associations.

Download Full-text

Personal Adaptive Method to Assess Mental Tension during Daily Life Using Heart Rate Variability

Methods of Information in Medicine ◽

10.3414/me11-01-0027 ◽

2012 ◽

Vol 51 (01) ◽

pp. 39-44 ◽

Cited By ~ 11

Author(s):

K. Matsuoka ◽

K. Yoshino

Keyword(s):

Heart Rate ◽

Heart Rate Variability ◽

Linear Regression ◽

Multiple Linear Regression ◽

Test Data ◽

Daily Life ◽

Pearson Correlation ◽

Multiple Linear Regression Model ◽

Training Data ◽

Data Set

SummaryObjectives: The aim of this study is to present a method of assessing psychological tension that is optimized to every individual on the basis of the heart rate variability (HRV) data which, to eliminate the influence of the inter-individual variability, are measured in a long time period during daily life.Methods: HRV and body accelerations were recorded from nine normal subjects for two months of normal daily life. Fourteen HRV indices were calculated with the HRV data at 512 seconds prior to the time of every mental tension level report. Data to be analyzed were limited to those with body accelerations of 30 mG (0.294 m/s2) and lower. Further, the differences from the reference values in the same time zone were calculated with both the mental tension score (Δtension) and HRV index values (ΔHRVI). The multiple linear regression model that estimates Δtension from the scores for principal components of ΔHRVI were then constructed for each individual. The data were divided into training data set and test data set in accordance with the twofold cross validation method. Multiple linear regression coefficients were determined using the training data set, and with the optimized model its generalization capability was checked using the test data set.Results: The subjects’ mean Pearson correlation coefficient was 0.52 with the training data set and 0.40 with the test data set. The subjects’ mean coefficient of determination was 0.28 with the training data set and 0.11 with the test data set.Conclusion: We proposed a method of assessing psychological tension that is optimized to every individual based on HRV data measured over a long period of daily life.

Download Full-text

Towards improving the accuracy of aortic transvalvular pressure gradients: rethinking Bernoulli

Medical & Biological Engineering & Computing ◽

10.1007/s11517-020-02186-w ◽

2020 ◽

Vol 58 (8) ◽

pp. 1667-1679

Author(s):

Benedikt Franke ◽

J. Weese ◽

I. Waechter-Stehle ◽

J. Brüning ◽

T. Kuehne ◽

...

Keyword(s):

Test Data ◽

Ground Truth ◽

Training Data ◽

Patient Specific ◽

Pressure Gradients ◽

Bernoulli Model ◽

Bernoulli Equation ◽

Data Set ◽

Non Invasive ◽

Adjusted Model

Abstract The transvalvular pressure gradient (TPG) is commonly estimated using the Bernoulli equation. However, the method is known to be inaccurate. Therefore, an adjusted Bernoulli model for accurate TPG assessment was developed and evaluated. Numerical simulations were used to calculate TPGCFD in patient-specific geometries of aortic stenosis as ground truth. Geometries, aortic valve areas (AVA), and flow rates were derived from computed tomography scans. Simulations were divided in a training data set (135 cases) and a test data set (36 cases). The training data was used to fit an adjusted Bernoulli model as a function of AVA and flow rate. The model-predicted TPGModel was evaluated using the test data set and also compared against the common Bernoulli equation (TPGB). TPGB and TPGModel both correlated well with TPGCFD (r > 0.94), but significantly overestimated it. The average difference between TPGModel and TPGCFD was much lower: 3.3 mmHg vs. 17.3 mmHg between TPGB and TPGCFD. Also, the standard error of estimate was lower for the adjusted model: SEEModel = 5.3 mmHg vs. SEEB = 22.3 mmHg. The adjusted model’s performance was more accurate than that of the conventional Bernoulli equation. The model might help to improve non-invasive assessment of TPG.

Download Full-text

Diagnostic assessment of a deep learning system for detecting atrial fibrillation in pulse waveforms

Heart ◽

10.1136/heartjnl-2018-313147 ◽

2018 ◽

Vol 104 (23) ◽

pp. 1921-1928 ◽

Cited By ~ 36

Author(s):

Ming-Zher Poh ◽

Yukkee Cheung Poh ◽

Pak-Hei Chan ◽

Chun-Ka Wong ◽

Louise Pun ◽

...

Keyword(s):

Atrial Fibrillation ◽

Deep Learning ◽

Test Data ◽

Predictive Value ◽

Characteristic Curve ◽

Performance Comparison ◽

Learning System ◽

Training Data ◽

Validation Data ◽

Data Set

ObjectiveTo evaluate the diagnostic performance of a deep learning system for automated detection of atrial fibrillation (AF) in photoplethysmographic (PPG) pulse waveforms.MethodsWe trained a deep convolutional neural network (DCNN) to detect AF in 17 s PPG waveforms using a training data set of 149 048 PPG waveforms constructed from several publicly available PPG databases. The DCNN was validated using an independent test data set of 3039 smartphone-acquired PPG waveforms from adults at high risk of AF at a general outpatient clinic against ECG tracings reviewed by two cardiologists. Six established AF detectors based on handcrafted features were evaluated on the same test data set for performance comparison.ResultsIn the validation data set (3039 PPG waveforms) consisting of three sequential PPG waveforms from 1013 participants (mean (SD) age, 68.4 (12.2) years; 46.8% men), the prevalence of AF was 2.8%. The area under the receiver operating characteristic curve (AUC) of the DCNN for AF detection was 0.997 (95% CI 0.996 to 0.999) and was significantly higher than all the other AF detectors (AUC range: 0.924–0.985). The sensitivity of the DCNN was 95.2% (95% CI 88.3% to 98.7%), specificity was 99.0% (95% CI 98.6% to 99.3%), positive predictive value (PPV) was 72.7% (95% CI 65.1% to 79.3%) and negative predictive value (NPV) was 99.9% (95% CI 99.7% to 100%) using a single 17 s PPG waveform. Using the three sequential PPG waveforms in combination (<1 min in total), the sensitivity was 100.0% (95% CI 87.7% to 100%), specificity was 99.6% (95% CI 99.0% to 99.9%), PPV was 87.5% (95% CI 72.5% to 94.9%) and NPV was 100% (95% CI 99.4% to 100%).ConclusionsIn this evaluation of PPG waveforms from adults screened for AF in a real-world primary care setting, the DCNN had high sensitivity, specificity, PPV and NPV for detecting AF, outperforming other state-of-the-art methods based on handcrafted features.

Download Full-text

DEPTH ESTIMATION OF SHALLOW WATER USING MULTISPECTRAL SATELLITE IMAGERY SENTINEL-2A

Jurnal Segara ◽

10.15578/segara.v16i3.8562 ◽

2020 ◽

Vol 16 (3) ◽

Author(s):

Arip Rahman

Keyword(s):

Shallow Water ◽

Test Data ◽

Remote Sensing Data ◽

Depth Estimation ◽

Training Data ◽

Coefficient Of Determination ◽

Support Vector ◽

Data Set ◽

Svm Algorithm ◽

Sentinel 2A

Shallow water bathymetry estimation from remote sensing data has been increasing widespread, as an alternative to traditional bathymetry measurement that has disturbed by technical and logistic problem. Deriving bathymetry data from Sentinel 2A images, at visible wavelength (blue, green and red) 10 meter spatial resolution was carried out around the waters of the Kemujan Island Karimunjawa National Park Central Java. Amount of 1280 points data are used as training data sets and 854 points data as test data set produced from sounding. Dark Object Substraction (DOS) has been to correct atmospherically the Sentinel-2A images. Several algorithm has been applied to derive bathymetry data, including: linear transform, ratio transform and support vector machine (SVM). The highest correlation between depth prediction and observe resulted from SVM algorithm with a coefficient of determination (R2) 0.71 (training data) and 0.56 (test data). The assessment of the accuracy of the three methods using RMSE and MAE values, the SVM algorithm has the smallest value (< 1 m). This indicates that the SVM algorithm has a high accuracy compared to the other two methods. The bathymetry map derived from Sentinel 2A imagery cannot be used as a reference for navigation.

Download Full-text

Neural-Network-Based Building Energy Consumption Prediction with Training Data Generation

Processes ◽

10.3390/pr7100731 ◽

2019 ◽

Vol 7 (10) ◽

pp. 731 ◽

Cited By ~ 1

Author(s):

Sanghyuk Lee ◽

Jaehoon Cha ◽

Moon Keun Kim ◽

Kyeong Soo Kim ◽

Van Huy Pham ◽

...

Keyword(s):

Neural Network ◽

Energy Consumption ◽

False Negative ◽

Building Energy ◽

Training Data ◽

Superior Performance ◽

Data Generation ◽

Building Energy Consumption ◽

Data Set ◽

Input Variables

The importance of neural network (NN) modelling is evident from its performance benefits in a myriad of applications, where, unlike conventional techniques, NN modeling provides superior performance without relying on complex filtering and/or time-consuming parameter tuning specific to applications and their wider ranges of conditions. In this paper, we employ NN modelling with training data generation based on sensitivity analysis for the prediction of building energy consumption to improve performance and reliability. Unlike our previous work, where insignificant input variables are successively screened out based on their mean impact values (MIVs) during the training process, we use the receiver operating characteristic (ROC) plot to generate reliable data with a conservative or progressive point of view, which overcomes the issue of data insufficiency of the MIV method: By properly setting boundaries for input variables based on the ROC plot and their statistics, instead of completely screening them out as in the MIV-based method, we can generate new training data that maximize true positive and false negative numbers from the partial data set. Then a NN model is constructed and trained with the generated training data using Levenberg–Marquardt back propagation (LM-BP) to perform electricity prediction for commercial buildings. The performance of the proposed data generation methods is compared with that of the MIV method through experiments, whose results show that data generation using successive and cross pattern provides satisfactory performance, following energy consumption trends with good phase. Among the two options in data generation, i.e., successive and two data combination, the successive option shows lower root mean square error (RMSE) than the combination one by around 400~900 kWh (i.e., 30%~75%).

Download Full-text

Validating the genomic signature of pediatric septic shock

Physiological Genomics ◽

10.1152/physiolgenomics.00025.2008 ◽

2008 ◽

Vol 34 (1) ◽

pp. 127-134 ◽

Cited By ~ 59

Author(s):

Natalie Cvijanovich ◽

Thomas P. Shanley ◽

Richard Lin ◽

Geoffrey L. Allen ◽

Neal J. Thomas ◽

...

Keyword(s):

Septic Shock ◽

Test Data ◽

Training Data ◽

Lymphocyte Function ◽

Data Set ◽

Class Prediction ◽

Expression Signature ◽

Genome Wide ◽

Genome Wide Expression ◽

Normal Controls

We previously generated genome-wide expression data (microarray) from children with septic shock having the potential to lead the field into novel areas of investigation. Herein we seek to validate our data through a bioinformatic approach centered on a validation patient cohort. Forty-two children with a clinical diagnosis of septic shock and 15 normal controls served as the training data set, while 30 separate children with septic shock and 14 separate normal controls served as the test data set. Class prediction modeling using the training data set and the previously reported genome-wide expression signature of pediatric septic shock correctly identified 95–100% of controls and septic shock patients in the test data set, depending on the class prediction algorithm and the gene selection method. Subjecting the test data set to an identical filtering strategy as that used for the training data set, demonstrated 75% concordance between the two gene lists. Subjecting the test data set to a purely statistical filtering strategy, with highly stringent correction for multiple comparisons, demonstrated <50% concordance with the previous gene filtering strategy. However, functional analysis of this statistics-based gene list demonstrated similar functional annotations and signaling pathways as that seen in the training data set. In particular, we validated that pediatric septic shock is characterized by large-scale repression of genes related to zinc homeostasis and lymphocyte function. These data demonstrate that the previously reported genome-wide expression signature of pediatric septic shock is applicable to a validation cohort of patients.

Download Full-text