Solving musculoskeletal biomechanics with machine learning

PeerJ Computer Science ◽

10.7717/peerj-cs.663 ◽

2021 ◽

Vol 7 ◽

pp. e663

Author(s):

Yaroslav Smirnov ◽

Denys Smirnov ◽

Anton Popov ◽

Sergiy Yakovenko

Keyword(s):

Machine Learning ◽

Degrees Of Freedom ◽

Muscle Length ◽

Gradient Boosting ◽

Computational Technique ◽

Model Errors ◽

Ann Model ◽

Moment Arms ◽

Hand Model ◽

Light Gradient

Deep learning is a relatively new computational technique for the description of the musculoskeletal dynamics. The experimental relationships of muscle geometry in different postures are the high-dimensional spatial transformations that can be approximated by relatively simple functions, which opens the opportunity for machine learning (ML) applications. In this study, we challenged general ML algorithms with the problem of approximating the posture-dependent moment arm and muscle length relationships of the human arm and hand muscles. We used two types of algorithms, light gradient boosting machine (LGB) and fully connected artificial neural network (ANN) solving the wrapping kinematics of 33 muscles spanning up to six degrees of freedom (DOF) each for the arm and hand model with 18 DOFs. The input-output training and testing datasets, where joint angles were the input and the muscle length and moment arms were the output, were generated by our previous phenomenological model based on the autogenerated polynomial structures. Both models achieved a similar level of errors: ANN model errors were 0.08 ± 0.05% for muscle lengths and 0.53 ± 0.29% for moment arms, and LGB model made similar errors—0.18 ± 0.06% and 0.13 ± 0.07%, respectively. LGB model reached the training goal with only 103 samples, while ANN required 106 samples; however, LGB models were about 39 times slower than ANN models in the evaluation. The sufficient performance of developed models demonstrates the future applicability of ML for musculoskeletal transformations in a variety of applications, such as in advanced powered prosthetics.

Download Full-text

Solving musculoskeletal biomechanics with machine learning

10.1101/2020.08.24.263962 ◽

2020 ◽

Author(s):

Yaroslav Smirnov ◽

Denis Smirnov ◽

Anton Popov ◽

Sergiy Yakovenko

Keyword(s):

Machine Learning ◽

Degrees Of Freedom ◽

Machine Learning Algorithms ◽

High Dimensional ◽

Gradient Boosting ◽

Model Errors ◽

Input Output ◽

Moment Arms ◽

Light Gradient ◽

Artificial Neural

AbstractDeep learning is a relatively new computational technique for the description of the musculoskeletal dynamics. The experimental relationships of muscle geometry in different postures are the high-dimensional spatial transformations that can be approximated by relatively simple functions, which opens the opportunity for machine learning applications. In this study, we challenged general machine learning algorithms with the problem of approximating the posture-dependent moment arm and muscle length relationships of the human arm and hand muscles. We used two types of algorithms, light gradient boosting machine (LGB) and fully connected artificial neural network (ANN) solving the wrapping kinematics of 33 muscles spanning up to six degrees of freedom (DOF) each for the arm and hand model with 18 DOFs. The input-output training and testing datasets were generated by our previous phenomenological model based on the autogenerated polynomial structures (Sobinov et al., 2019). Both models achieved a similar level of errors: ANN model errors were 0.08±0.05% for muscle lengths and 0.53±0.29% for moment arms, and LGB model made similar errors—0.18±0.06% and 0.13±0.07%, respectively. LGB model reached the training goal with only 10^3 samples, while ANN required 10^6 samples; however, LGB models were about 39 slower than ANN models in the evaluation. The sufficient performance of developed models demonstrates the future applicability of machine learning for musculoskeletal transformations in a variety of applications, such as in advanced powered prosthetics.Author SummaryThe accurate decoding of arm and hand motor intent from biological signals remains a key challenge. Solving this task with machine learning requires vast posture- and task-dependent data for identifying structural and functional parameters within dynamic musculoskeletal relationships. This problem is related to the curse of dimensionality where the processing complexity grows exponentially with the number of degrees of freedom described by the model. Here, we developed a tool based on artificial neural networks (ANN) to solve the kinematic transformation from posture to muscle path length and muscle moment arms. We used an accurate model of posture-dependent muscle moment arms and length to train and test the ability of ANN to solve this high-dimensional and computationally intense transformation and compare it to the boosted decision tree approach. We demonstrated that model-driven training is an efficient method to handle the encoding of high-dimensional musculoskeletal relationships. Adding muscles to the transformation, which increases the input-output complexity, does not reduce the prediction accuracy and does not require the increase in the number of elements within the network demonstrating the viability of this approach for applications using musculoskeletal biomechanics.

Download Full-text

Development and validation of a difficult laryngoscopy prediction model using machine learning of neck circumference and thyromental height

BMC Anesthesiology ◽

10.1186/s12871-021-01343-4 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Jong Ho Kim ◽

Haewon Kim ◽

Ji Su Jang ◽

Sung Mi Hwang ◽

So Young Lim ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Confidence Interval ◽

Neck Circumference ◽

Difficult Laryngoscopy ◽

Gradient Boosting ◽

Test Set ◽

Equal Distribution ◽

Light Gradient ◽

Extreme Gradient Boosting

Abstract Background Predicting difficult airway is challengeable in patients with limited airway evaluation. The aim of this study is to develop and validate a model that predicts difficult laryngoscopy by machine learning of neck circumference and thyromental height as predictors that can be used even for patients with limited airway evaluation. Methods Variables for prediction of difficulty laryngoscopy included age, sex, height, weight, body mass index, neck circumference, and thyromental distance. Difficult laryngoscopy was defined as Grade 3 and 4 by the Cormack-Lehane classification. The preanesthesia and anesthesia data of 1677 patients who had undergone general anesthesia at a single center were collected. The data set was randomly stratified into a training set (80%) and a test set (20%), with equal distribution of difficulty laryngoscopy. The training data sets were trained with five algorithms (logistic regression, multilayer perceptron, random forest, extreme gradient boosting, and light gradient boosting machine). The prediction models were validated through a test set. Results The model’s performance using random forest was best (area under receiver operating characteristic curve = 0.79 [95% confidence interval: 0.72–0.86], area under precision-recall curve = 0.32 [95% confidence interval: 0.27–0.37]). Conclusions Machine learning can predict difficult laryngoscopy through a combination of several predictors including neck circumference and thyromental height. The performance of the model can be improved with more data, a new variable and combination of models.

Download Full-text

Boosting Algorithm Choice in Predictive Machine Learning Models for Fracturing Applications

10.2118/205642-ms ◽

2021 ◽

Author(s):

Abdul Muqtadir Khan

Keyword(s):

Machine Learning ◽

Data Science ◽

Oil And Gas ◽

Oil And Gas Industry ◽

Injection Rate ◽

Model Construction ◽

Gradient Boosting ◽

Light Gradient ◽

Fracture Damage ◽

Boosting Technique

Abstract With the advancement in machine learning (ML) applications, some recent research has been conducted to optimize fracturing treatments. There are a variety of models available using various objective functions for optimization and different mathematical techniques. There is a need to extend the ML techniques to optimize the choice of algorithm. For fracturing treatment design, the literature for comparative algorithm performance is sparse. The research predominantly shows that compared to the most commonly used regressors and classifiers, some sort of boosting technique consistently outperforms on model testing and prediction accuracy. A database was constructed for a heterogeneous reservoir. Four widely used boosting algorithms were used on the database to predict the design only from the output of a short injection/falloff test. Feature importance analysis was done on eight output parameters from the falloff analysis, and six were finalized for the model construction. The outputs selected for prediction were fracturing fluid efficiency, proppant mass, maximum proppant concentration, and injection rate. Extreme gradient boost (XGBoost), categorical boost (CatBoost), adaptive boost (AdaBoost), and light gradient boosting machine (LGBM) were the algorithms finalized for the comparative study. The sensitivity was done for a different number of classes (four, five, and six) to establish a balance between accuracy and prediction granularity. The results showed that the best algorithm choice was between XGBoost and CatBoost for the predicted parameters under certain model construction conditions. The accuracy for all outputs for the holdout sets varied between 80 and 92%, showing robust significance for a wider utilization of these models. Data science has contributed to various oil and gas industry domains and has tremendous applications in the stimulation domain. The research and review conducted in this paper add a valuable resource for the user to build digital databases and use the appropriate algorithm without much trial and error. Implementing this model reduced the complexity of the proppant fracturing treatment redesign process, enhanced operational efficiency, and reduced fracture damage by eliminating minifrac steps with crosslinked gel.

Download Full-text

RegioML: Predicting the regioselectivity of electrophilic aromatic substitution reactions using machine learning

10.33774/chemrxiv-2021-l2fvl ◽

2021 ◽

Author(s):

Nicolai Ree ◽

Andreas H. Göller ◽

Jan H. Jensen

Keyword(s):

Machine Learning ◽

Tight Binding ◽

Reaction Centers ◽

Gradient Boosting ◽

Electrophilic Aromatic Substitution ◽

Aromatic Substitution ◽

Substitution Reactions ◽

Test Set ◽

Light Gradient ◽

Out Of Sample

We present RegioML, an atom-based machine learning model for predicting the regioselectivities of electrophilic aromatic substitution reactions. The model relies on CM5 atomic charges computed using semiempirical tight binding (GFN1-xTB) combined with the ensemble decision tree variant light gradient boosting machine (LightGBM). The model is trained and tested on 21,201 bromination reactions with 101K reaction centers, which is split into a training, test, and out-of-sample datasets with 58K, 15K, and 27K reaction centers, respectively. The accuracy is 93% for the test set and 90% for the out-of-sample set, while the precision (the percentage of positive predictions that are correct) is 88% and 80%, respectively. The test-set performance is very similar to the graph-based WLN method developed by Struble et al. (React. Chem. Eng. 2020, 5, 896) though the comparison is complicated by the possibility that some of the test and out-of-sample molecules are used to train WLN. RegioML out-performs our physics-based RegioSQM20 method (J. Cheminform. 2021, 13:10) where the precision is only 75%. Even for the out-of-sample dataset, RegioML slightly outperforms RegioSQM20. The good performance of RegioML and WLN is in large part due to the large datasets available for this type of reaction. However, for reactions where there is little experimental data, physics-based approaches like RegioSQM20 can be used to generate synthetic data for model training. We demonstrate this by showing that the performance of RegioSQM20 can be reproduced by a ML-model trained on RegioSQM20-generated data.

Download Full-text

Interpretable Machine Learning for Early Neurological Deterioration Prediction in Atrial Fibrillation-Related Stroke

10.21203/rs.3.rs-446890/v1 ◽

2021 ◽

Author(s):

Seong Hwan Kim ◽

Eun-Tae Jeon ◽

Sungwook Yu ◽

Kyungmi O ◽

Chi Kyung Kim ◽

...

Keyword(s):

Machine Learning ◽

Atrial Fibrillation ◽

Neurological Deterioration ◽

Gradient Boosting ◽

Support Vector ◽

Light Gradient ◽

Interpretable Machine Learning ◽

Extreme Gradient Boosting ◽

Early Neurological Deterioration ◽

Feature Importance

Abstract We aimed to develop a novel prediction model for early neurological deterioration (END) based on an interpretable machine learning (ML) algorithm for atrial fibrillation (AF)-related stroke and to evaluate the prediction accuracy and feature importance of ML models. Data from multi-center prospective stroke registries in South Korea were collected. After stepwise data preprocessing, we utilized logistic regression, support vector machine, extreme gradient boosting, light gradient boosting machine (LightGBM), and multilayer perceptron models. We used the Shapley additive explanations (SHAP) method to evaluate feature importance. Of the 3,623 stroke patients, the 2,363 who had arrived at the hospital within 24 hours of symptom onset and had available information regarding END were included. Of these, 318 (13.5%) had END. The LightGBM model showed the highest area under the receiver operating characteristic curve (0.778, 95% CI, 0.726 - 0.830). The feature importance analysis revealed that fasting glucose level and the National Institute of Health Stroke Scale score were the most influential factors. Among ML algorithms, the LightGBM model was particularly useful for predicting END, as it revealed new and diverse predictors. Additionally, the SHAP method can be adjusted to individualize the features’ effects on the predictive power of the model.

Download Full-text

Development of a Diabetes Melitus Detection and Prediction Model Using Light Gradient Boosting Machine and K-Nearest Neighbour

10.36108/ujees/1202.30.0160 ◽

2021 ◽

Vol 3 (1) ◽

Author(s):

B. A Omodunbi

Keyword(s):

Diabetes Mellitus ◽

Machine Learning ◽

Hybrid Model ◽

Learning Model ◽

Experimental Result ◽

Gradient Boosting ◽

Light Gradient ◽

Machine Learning Model ◽

Gradient Boosting Machine ◽

Receiver Operating

Diabetes mellitus is a health disorder that occurs when the blood sugar level becomes extremely high due to body resistance in producing the required amount of insulin. The aliment happens to be among the major causes of death in Nigeria and the world at large. This study was carried out to detect diabetes mellitus by developing a hybrid model that comprises of two machine learning model namely Light Gradient Boosting Machine (LGBM) and K-Nearest Neighbor (KNN). This research is aimed at developing a machine learning model for detecting the occurrence of diabetes in patients. The performance metrics employed in evaluating the finding for this study are Receiver Operating Characteristics (ROC) Curve, Five-fold Cross-validation, precision, and accuracy score. The proposed system had an accuracy of 91% and the area under the Receiver Operating Characteristic Curve was 93%. The experimental result shows that the prediction accuracy of the hybrid model is better than traditional machine learning

Download Full-text

Modeling of nitrogen solubility in normal alkanes using machine learning methods compared with cubic and PC-SAFT equations of state

Scientific Reports ◽

10.1038/s41598-021-03643-8 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Seyed Ali Madani ◽

Mohammad-Reza Mohammadi ◽

Saeid Atashrouz ◽

Ali Abedi ◽

Abdolhossein Hemmati-Sarapardeh ◽

...

Keyword(s):

Machine Learning ◽

Molecular Weight ◽

Oil Recovery ◽

Equations Of State ◽

Coefficient Of Determination ◽

Gradient Boosting ◽

Operating Pressure ◽

Normal Alkanes ◽

Light Gradient ◽

Extreme Gradient Boosting

AbstractAccurate prediction of the solubility of gases in hydrocarbons is a crucial factor in designing enhanced oil recovery (EOR) operations by gas injection as well as separation, and chemical reaction processes in a petroleum refinery. In this work, nitrogen (N2) solubility in normal alkanes as the major constituents of crude oil was modeled using five representative machine learning (ML) models namely gradient boosting with categorical features support (CatBoost), random forest, light gradient boosting machine (LightGBM), k-nearest neighbors (k-NN), and extreme gradient boosting (XGBoost). A large solubility databank containing 1982 data points was utilized to establish the models for predicting N2 solubility in normal alkanes as a function of pressure, temperature, and molecular weight of normal alkanes over broad ranges of operating pressure (0.0212–69.12 MPa) and temperature (91–703 K). The molecular weight range of normal alkanes was from 16 to 507 g/mol. Also, five equations of state (EOSs) including Redlich–Kwong (RK), Soave–Redlich–Kwong (SRK), Zudkevitch–Joffe (ZJ), Peng–Robinson (PR), and perturbed-chain statistical associating fluid theory (PC-SAFT) were used comparatively with the ML models to estimate N2 solubility in normal alkanes. Results revealed that the CatBoost model is the most precise model in this work with a root mean square error of 0.0147 and coefficient of determination of 0.9943. ZJ EOS also provided the best estimates for the N2 solubility in normal alkanes among the EOSs. Lastly, the results of relevancy factor analysis indicated that pressure has the greatest influence on N2 solubility in normal alkanes and the N2 solubility increases with increasing the molecular weight of normal alkanes.

Download Full-text

Integrated Model for COVID-19 Diagnosis Based on Computed Tomography AI, and Clinical Features: A Multicenter Cohort Study

10.21203/rs.3.rs-979599/v1 ◽

2021 ◽

Author(s):

Yuki Kataoka ◽

Yuya Kimura ◽

Tatsuyoshi Ikenoue ◽

Yoshinori Matsuoka ◽

Junji Kumasawa ◽

...

Keyword(s):

Machine Learning ◽

Computed Tomography ◽

Cohort Study ◽

Clinical Features ◽

Tertiary Care ◽

Gradient Boosting ◽

Diagnostic Model ◽

Full Model ◽

Light Gradient ◽

Better Than

Abstract Background We developed and validated a machine learning diagnostic model for novel coronavirus (COVID-19) disease, integrating artificial-intelligence-based computed tomography (CT) imaging and clinical features. Methods We conducted a retrospective cohort study in 11 Japanese tertiary care facilities that treated COVID-19 patients. Participants were tested using both real-time reverse transcription polymerase chain reaction (RT-PCR) and chest CT between January 1 and May 30, 2020. We chronologically split the dataset in each hospital into training and test sets, containing patients in a 7:3 ratio. Light Gradient Boosting Machine model was used for analysis. Results A total of 703 patients were included with two models — the full model and the A-blood model — developed for their diagnosis. The A-blood model included eight variables (the Ali-M3 confidence, along with seven clinical features of blood counts and biochemistry markers). The areas under the receiver-operator curve of both models (0.91, 95% confidence interval (CI), 0.86 to 0.95 for the full model and 0.90, 95% CI, 0.86 to 0.94 for the A-blood model) were better than that of the Ali-M3 confidence (0.78, 95% CI, 0.71 to 0.83) in the test set. Conclusions The A-blood model, a COVID-19 diagnostic model developed in this study, combines machine-learning and CT evaluation with blood test data and is better than the Ali-M3 framework existing for this purpose. This would significantly aid physicians in making a quicker diagnosis of COVID-19.

Download Full-text

Cubical homology-based Image Classification - A Comparative Study

10.36939/ir.202112231202 ◽

2021 ◽

Author(s):

◽

Seungho Choe

Keyword(s):

Machine Learning ◽

Image Classification ◽

Digital Image ◽

Persistent Homology ◽

Topological Data Analysis ◽

Connected Components ◽

Gradient Boosting ◽

Topological Features ◽

Light Gradient ◽

Cubical Homology

Persistent homology is a powerful tool in topological data analysis (TDA) to compute, study and encode efficiently multi-scale topological features and is being increasingly used in digital image classification. The topological features represent number of connected components, cycles, and voids that describe the shape of data. Persistent homology extracts the birth and death of these topological features through a filtration process. The lifespan of these features can represented using persistent diagrams (topological signatures). Cubical homology is a more efficient method for extracting topological features from a 2D image and uses a collection of cubes to compute the homology, which fits the digital image structure of grids. In this research, we propose a cubical homology-based algorithm for extracting topological features from 2D images to generate their topological signatures. Additionally, we propose a score, which measures the significance of each of the sub-simplices in terms of persistence. Also, gray level co-occurrence matrix (GLCM) and contrast limited adapting histogram equalization (CLAHE) are used as a supplementary method for extracting features. Machine learning techniques are then employed to classify images using the topological signatures. Among the eight tested algorithms with six published image datasets with varying pixel sizes, classes, and distributions, our experiments demonstrate that cubical homology-based machine learning with deep residual network (ResNet 1D) and Light Gradient Boosting Machine (lightGBM) shows promise with the extracted topological features.

Download Full-text

Protein pKa prediction by tree-based machine learning

10.26434/chemrxiv-2021-4d420 ◽

2021 ◽

Author(s):

Ada Y. Chen ◽

Juyong Lee ◽

Ana Damjanovic ◽

Bernard R. Brooks

Keyword(s):

Machine Learning ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Pka Prediction ◽

Light Gradient ◽

Structure Database ◽

Gradient Boosting Machine ◽

Extreme Gradient Boosting ◽

Better Than ◽

Protein Pka

We present four tree-based machine learning models for protein pKa prediction. The four models, Random Forest, Extra Trees, eXtreme Gradient Boosting (XGBoost) and Light Gradient Boosting Machine (LightGBM), were trained on three experimental PDB and pKa datasets, two of which included a notable portion of internal residues. We observed similar performance among the four machine learning algorithms. The best model trained on the largest dataset performs 37% better than the widely used empirical pKa prediction tool PROPKA. The overall RMSE for this model is 0.69, with surface and buried RMSE values being 0.56 and 0.78, respectively, considering six residue types (Asp, Glu, His, Lys, Cys and Tyr), and 0.63 when considering Asp, Glu, His and Lys only. We provide pKa predictions for proteins in human proteome from the AlphaFold Protein Structure Database and observed that 1% of Asp/Glu/Lys residues have highly shifted pKa values close to the physiological pH.

Download Full-text