Generating Pseudo-Data to Enhance the Performance of Classification-Based Engineering Design: A Preliminary Investigation

Author(s):  
Xianping Du ◽  
Onur Bilgen ◽  
Hongyi Xu

Abstract Machine learning for classification has been used widely in engineering design, for example, feasible domain recognition and hidden pattern discovery. Training an accurate machine learning model requires a large dataset; however, high computational or experimental costs are major issues in obtaining a large dataset for real-world problems. One possible solution is to generate a large pseudo dataset with surrogate models, which is established with a smaller set of real training data. However, it is not well understood whether the pseudo dataset can benefit the classification model by providing more information or deteriorates the machine learning performance due to the prediction errors and uncertainties introduced by the surrogate model. This paper presents a preliminary investigation towards this research question. A classification-and-regressiontree model is employed to recognize the design subspaces to support design decision-making. It is implemented on the geometric design of a vehicle energy-absorbing structure based on finite element simulations. Based on a small set of real-world data obtained by simulations, a surrogate model based on Gaussian process regression is employed to generate pseudo datasets for training. The results showed that the tree-based method could help recognize feasible design domains efficiently. Furthermore, the additional information provided by the surrogate model enhances the accuracy of classification. One important conclusion is that the accuracy of the surrogate model determines the quality of the pseudo dataset and hence, the improvements in the machine learning model.

2020 ◽  
Vol 32 ◽  
pp. 03032
Author(s):  
Sahil Parab ◽  
Piyush Rathod ◽  
Durgesh Patil ◽  
Vishwanath Chikkareddi

Diabetes Detection has been one of the many challenges which is being faced by the medical as well as technological communities. The principles of machine learning and its algorithms is used in order to detect the possibility of a diabetic patient based on their level of glucose concentration , insulin levels and other medically point of view required test reports. The basic diabetes detection model uses Bayesian classification machine learning algorithm, but even though the model is able to detect diabetes, the efficiency is not acceptable at all times because of the drawbacks of the single algorithm of the model. A Hybrid Machine Learning Model is used to overcome the drawbacks produced by a single algorithm model. A Hybrid Model is constructed by implementing multiple applicable machine learning algorithms such as the SVM model and Bayesian’s Classification model or any other models in order to overcome drawbacks faced by each other and also provide their mutually contributed efficiency. In a perfect case scenario the new hybrid machine learning model will be able to provide more efficiency as compared to the old Bayesian’s classification model.


PLoS ONE ◽  
2021 ◽  
Vol 16 (4) ◽  
pp. e0240200
Author(s):  
Miguel Marcos ◽  
Moncef Belhassen-García ◽  
Antonio Sánchez-Puente ◽  
Jesús Sampedro-Gomez ◽  
Raúl Azibeiro ◽  
...  

Background Efficient and early triage of hospitalized Covid-19 patients to detect those with higher risk of severe disease is essential for appropriate case management. Methods We trained, validated, and externally tested a machine-learning model to early identify patients who will die or require mechanical ventilation during hospitalization from clinical and laboratory features obtained at admission. A development cohort with 918 Covid-19 patients was used for training and internal validation, and 352 patients from another hospital were used for external testing. Performance of the model was evaluated by calculating the area under the receiver-operating-characteristic curve (AUC), sensitivity and specificity. Results A total of 363 of 918 (39.5%) and 128 of 352 (36.4%) Covid-19 patients from the development and external testing cohort, respectively, required mechanical ventilation or died during hospitalization. In the development cohort, the model obtained an AUC of 0.85 (95% confidence interval [CI], 0.82 to 0.87) for predicting severity of disease progression. Variables ranked according to their contribution to the model were the peripheral blood oxygen saturation (SpO2)/fraction of inspired oxygen (FiO2) ratio, age, estimated glomerular filtration rate, procalcitonin, C-reactive protein, updated Charlson comorbidity index and lymphocytes. In the external testing cohort, the model performed an AUC of 0.83 (95% CI, 0.81 to 0.85). This model is deployed in an open source calculator, in which Covid-19 patients at admission are individually stratified as being at high or non-high risk for severe disease progression. Conclusions This machine-learning model, applied at hospital admission, predicts risk of severe disease progression in Covid-19 patients.


Author(s):  
Miguel Marcos ◽  
Moncef Belhassen-Garcia ◽  
Antonio Sanchez- Puente ◽  
Jesus Sampedro-Gomez ◽  
Raul Azibeiro ◽  
...  

BACKGROUND: Efficient and early triage of hospitalized Covid-19 patients to detect those with higher risk of severe disease is essential for appropriate case management. METHODS: We trained, validated, and externally tested a machine-learning model to early identify patients who will die or require mechanical ventilation during hospitalization from clinical and laboratory features obtained at admission. A development cohort with 918 Covid-19 patients was used for training and internal validation, and 352 patients from another hospital were used for external testing. Performance of the model was evaluated by calculating the area under the receiver-operating-characteristic curve (AUC), sensitivity and specificity. RESULTS: A total of 363 of 918 (39.5%) and 128 of 352 (36.4%) Covid-19 patients from the development and external testing cohort, respectively, required mechanical ventilation or died during hospitalization. In the development cohort, the model obtained an AUC of 0.85 (95% confidence interval [CI], 0.82 to 0.87) for predicting severity of disease progression. Variables ranked according to their contribution to the model were the peripheral blood oxygen saturation (SpO2)/fraction of inspired oxygen (FiO2) ratio, age, estimated glomerular filtration rate, procalcitonin, C-reactive protein, updated Charlson comorbidity index and lymphocytes. In the external testing cohort, the model performed an AUC of 0.83 (95% CI, 0.81 to 0.85). This model is deployed in an open source calculator, in which Covid-19 patients at admission are individually stratified as being at high or non-high risk for severe disease progression. CONCLUSIONS: This machine-learning model, applied at hospital admission, predicts risk of severe disease progression in Covid-19 patients.


2020 ◽  
Vol 38 (15_suppl) ◽  
pp. 2051-2051
Author(s):  
Jeffrey J. Kirshner ◽  
Kelly Cohn ◽  
Steven Dunder ◽  
Karri Donahue ◽  
Madeline Richey ◽  
...  

2051 Background: Efforts to facilitate patient identification for clinical trials in routine practice, such as automating electronic health record (EHR) data reviews, are hindered by the lack of information on metastatic status in structured format. We developed a machine learning tool that infers metastatic status from unstructured EHR data, and we describe its real-world implementation. Methods: This machine learning model scans EHR documents, extracting features from text snippets surrounding key words (ie, ‘Metastatic’ ‘Progression’ ‘Local’). A regularized logistic regression model was trained, and used to classify patients across 5 metastatic status inference categories: highly-likely and likely positive, highly-likely and likely negative, and unknown. The model accuracy was characterized using the Flatiron Health EHR-derived de-identified database of patients with solid tumors, where manually abstracted information served as standard accurate reference. We assessed model accuracy using sensitivity and specificity (patients in the ‘unknown’ category omitted from numerator), negative and positive predictive values (NPV, PPV; patients ‘unknown’ included in denominator), and its performance in a real-world dataset. In a separate validation, we evaluated the accuracy gained upon additional user review of the model outputs after integration of this tool into workflows. Results: This metastatic status inference model was characterized using a sample of 66,532 patients. The model sensitivity and specificity (95%CI) were 82.% (82, 83) and 95% (95, 96), respectively; PPV was 89% (89, 90) and NPV was 94% (94, 94). In the validation sample (N = 200 originated from 5 distinct care sites), and after user review of model outputs, values increased to 97% (85, 100) for sensitivity, 98% (95, 100) for specificity, 92 (78, 98) for PPV and 99% (97, 100) for NPV. The model assigned 163/200 patients to the highly-likely categories, which were deemed not to require further EHR review by users. The prevalence of errors was 4% without user review, and 2% after user review. Conclusions: This machine learning model infers metastatic status from unstructured EHR data with high accuracy. The tool assigns metastatic status with high confidence in more than 75% of cases without requiring additional manual review, allowing more efficient identification of clinical trial candidates and clinical trial matching, thus mitigating a key barrier for clinical trial participation in community clinics.


2020 ◽  
Author(s):  
Ka Man Fong ◽  
Shek Yin Au ◽  
George Wing Yiu Ng ◽  
Anne Kit Hung Leung

Abstract Background: Researchers have long been struggling to improve the disease severity score in mortality prediction in ICU. The digitalization of medical health records and advancement of computation power have promoted the use of machine learning in critical care. This study aimed to develop an interpretable machine learning model using datasets from multicenters, and to compare with the APACHE IV, in predicting hospital mortality of patients admitted to ICU.Method: The datasets were assembled from the eICU database including 136145 patients across 208 hospitals throughout the U.S. and 5 ICUs in Hong Kong, including 10909 patients. The two datasets were first combined into one large dataset before 80:20 stratified split into the training set and the test set. The XGBoost machine algorithm was chosen to predict the hospital mortality. The variables in the model were the same as those included in the APACHE IV score. The discrimination and calibration of the model were assessed. The model would be interpreted using the Shapley Additive explanations values.Results: Of the 147054 patients in the whole cohort, the hospital mortality was 9.3%. The area under the precision-recall curve for the XGBoost algorithm was 0.57, and 0.49 for APACHE IV. Similarly, the XGBoost reached an area under the receiving operating curve (AUROC) of 0.90, while APACHE IV had an AUROC of 0.87. Additionally, the XGBoost algorithm showed better calibration than the APACHE IV. The three most important variables were age, heart rate, and whether the patient was on ventilator.Conclusions: The severity score developed by machine learning model using mutlicenter datasets outperformed the APACHE IV in predicting hospital mortality for patients admitted to ICU.


Author(s):  
Himanshu Bajpai

Providing support on the rolled-out application/services is one of the major factors in increasing the customer satisfaction which in turn increases the customer retention. Since we are in the era of automation where most of the day-to-day jobs are taken care of or are facilitated by the technologies around us, hence there is a need to reduce manual effort in triaging the support tickets and hence facilitating the person on call to better close the tickets on time with proper remediation. The machine learning model which will be the product of this complete paper will not only help in classifying the tickets but also, if applicable will give the best possible remediation of the ticket there by reducing the manual effort and the time taken on providing necessary solution on the ticket. The objectives of the work are as follows - a) Understand the data that is present in the ticket and figure out the basic understanding like, categories of issues, trends etc. b) Prepare the data which is ready for applying different classification algorithms. d) Identify the best machine learning model which can classify the new incident with utmost accuracy. e) Prepare a machine learning model which can suggest the best possible remediation of the ticket. f) Integrate the best classification model and solution recommender model and wrap it as an API which can be used by end user.


2021 ◽  
Vol 11 (24) ◽  
pp. 11735
Author(s):  
Seungheon Chae ◽  
Ahnryul Choi ◽  
Hyunwoo Jung ◽  
Tae Hyong Kim ◽  
Kyungran Kim ◽  
...  

Accurately measuring the lower extremities and L5/S1 moments is important since L5/S1 moments are the principal parameters that measure the risk of musculoskeletal diseases during lifting. In this study, protocol that predicts lower extremities and L5/S1 moments with an insole sensor was proposed to replace the prior methods that have spatial constraints. The protocol is hierarchically composed of a classification model and a regression model to predict joint moments. Additionally, a single LSTM model was developed to compare with proposed protocol. To optimize hyperparameters of the machine learning model and input feature, Bayesian optimization method was adopted. As a result, the proposed protocol showed a relative root mean square error (rRMSE) of 8.06~13.88% while the single LSTM showed 9.30~18.66% rRMSE. This protocol in this research is expected to be a starting point for developing a system for estimating the lower extremity and L5/S1 moment during lifting that can replace the complex prior method and adopted to workplace environments. This novel study has the potential to precisely design a feedback iterative control system of an exoskeleton for the appropriate generation of an actuator torque.


Sign in / Sign up

Export Citation Format

Share Document