scholarly journals Software Engineering for Machine Learning in Health Informatics

2019 ◽  
Author(s):  
Mohammed Moreb ◽  
Oguz Ata

Abstract Background We propose a novel framework for health Informatics: framework and methodology of Software Engineering for machine learning in Health Informatics (SEMLHI). This framework shed light on its features, that allow users to study and analyze the requirements, determine the function of objects related to the system and determine the machine learning algorithms that will be used for the dataset.Methods Based on original data that collected from the hospital in Palestine government in the past three years, first the data validated and all outlier removed, analyzed using develop framework in order to compare ML provide patients with real-time. Our proposed module comparison with three Systems Engineering Methods Vee, agile and SEMLHI. The result used by implement prototype system, which require machine learning algorithm, after development phase, questionnaire deliver to developer to indicate the result using three methodology. SEMLHI framework, is composed into four components: software, machine learning model, machine learning algorithms, and health informatics data, Machine learning Algorithm component used five algorithms use to evaluate the accuracy for machine learning models on component.Results we compare our approach with the previously published systems in terms of performance to evaluate the accuracy for machine learning models, the results of accuracy with different algorithms applied for 750 case, linear SVG have about 0.57 value compared with KNeighbors classifier, logistic regression, multinomial NB, random forest classifier. This research investigates the interaction between SE, and ML within the context of health informatics, our proposed framework define the methodology for developers to analyzing and developing software for the health informatic model, and create a space, in which software engineering, and ML experts could work on the ML model lifecycle, on the disease level and the subtype level.Conclusions This article is an ongoing effort towards defining and translating an existing research pipeline into four integrated modules, as framework system using the dataset from healthcare to reduce cost estimation by using a new suggested methodology. The framework is available as open source software, licensed under GNU General Public License Version 3 to encourage others to contribute to the future development of the SEMLHI framework.

10.2196/23458 ◽  
2021 ◽  
Vol 23 (2) ◽  
pp. e23458
Author(s):  
Kenji Ikemura ◽  
Eran Bellin ◽  
Yukako Yagi ◽  
Henny Billett ◽  
Mahmoud Saada ◽  
...  

Background During a pandemic, it is important for clinicians to stratify patients and decide who receives limited medical resources. Machine learning models have been proposed to accurately predict COVID-19 disease severity. Previous studies have typically tested only one machine learning algorithm and limited performance evaluation to area under the curve analysis. To obtain the best results possible, it may be important to test different machine learning algorithms to find the best prediction model. Objective In this study, we aimed to use automated machine learning (autoML) to train various machine learning algorithms. We selected the model that best predicted patients’ chances of surviving a SARS-CoV-2 infection. In addition, we identified which variables (ie, vital signs, biomarkers, comorbidities, etc) were the most influential in generating an accurate model. Methods Data were retrospectively collected from all patients who tested positive for COVID-19 at our institution between March 1 and July 3, 2020. We collected 48 variables from each patient within 36 hours before or after the index time (ie, real-time polymerase chain reaction positivity). Patients were followed for 30 days or until death. Patients’ data were used to build 20 machine learning models with various algorithms via autoML. The performance of machine learning models was measured by analyzing the area under the precision-recall curve (AUPCR). Subsequently, we established model interpretability via Shapley additive explanation and partial dependence plots to identify and rank variables that drove model predictions. Afterward, we conducted dimensionality reduction to extract the 10 most influential variables. AutoML models were retrained by only using these 10 variables, and the output models were evaluated against the model that used 48 variables. Results Data from 4313 patients were used to develop the models. The best model that was generated by using autoML and 48 variables was the stacked ensemble model (AUPRC=0.807). The two best independent models were the gradient boost machine and extreme gradient boost models, which had an AUPRC of 0.803 and 0.793, respectively. The deep learning model (AUPRC=0.73) was substantially inferior to the other models. The 10 most influential variables for generating high-performing models were systolic and diastolic blood pressure, age, pulse oximetry level, blood urea nitrogen level, lactate dehydrogenase level, D-dimer level, troponin level, respiratory rate, and Charlson comorbidity score. After the autoML models were retrained with these 10 variables, the stacked ensemble model still had the best performance (AUPRC=0.791). Conclusions We used autoML to develop high-performing models that predicted the survival of patients with COVID-19. In addition, we identified important variables that correlated with mortality. This is proof of concept that autoML is an efficient, effective, and informative method for generating machine learning–based clinical decision support tools.


Author(s):  
Pratyush Kaware

In this paper a cost-effective sensor has been implemented to read finger bend signals, by attaching the sensor to a finger, so as to classify them based on the degree of bent as well as the joint about which the finger was being bent. This was done by testing with various machine learning algorithms to get the most accurate and consistent classifier. Finally, we found that Support Vector Machine was the best algorithm suited to classify our data, using we were able predict live state of a finger, i.e., the degree of bent and the joints involved. The live voltage values from the sensor were transmitted using a NodeMCU micro-controller which were converted to digital and uploaded on a database for analysis.


The aim of this research is to do risk modelling after analysis of twitter posts based on certain sentiment analysis. In this research we analyze posts of several users or a particular user to check whether they can be cause of concern to the society or not. Every sentiment like happy, sad, anger and other emotions are going to provide scaling of severity in the conclusion of final table on which machine learning algorithm is applied. The data which is put under the machine learning algorithms are been monitored over a period of time and it is related to a particular topic in an area


Author(s):  
Virendra Tiwari ◽  
Balendra Garg ◽  
Uday Prakash Sharma

The machine learning algorithms are capable of managing multi-dimensional data under the dynamic environment. Despite its so many vital features, there are some challenges to overcome. The machine learning algorithms still requires some additional mechanisms or procedures for predicting a large number of new classes with managing privacy. The deficiencies show the reliable use of a machine learning algorithm relies on human experts because raw data may complicate the learning process which may generate inaccurate results. So the interpretation of outcomes with expertise in machine learning mechanisms is a significant challenge in the machine learning algorithm. The machine learning technique suffers from the issue of high dimensionality, adaptability, distributed computing, scalability, the streaming data, and the duplicity. The main issue of the machine learning algorithm is found its vulnerability to manage errors. Furthermore, machine learning techniques are also found to lack variability. This paper studies how can be reduced the computational complexity of machine learning algorithms by finding how to make predictions using an improved algorithm.


2022 ◽  
pp. 34-46
Author(s):  
Amtul Waheed ◽  
Jana Shafi ◽  
Saritha V.

In today's world of advanced technologies in IoT and ITS in smart cities scenarios, there are many different projections such as improved data propagation in smart roads and cooperative transportation networks, autonomous and continuously connected vehicles, and low latency applications in high capacity environments and heterogeneous connectivity and speed. This chapter presents the performance of the speed of vehicles on roadways employing machine learning methods. Input variable for each learning algorithm is the density that is measured as vehicle per mile and volume that is measured as vehicle per hour. And the result shows that the output variable is the speed that is measured as miles per hour represent the performance of each algorithm. The performance of machine learning algorithms is calculated by comparing the result of predictions made by different machine learning algorithms with true speed using the histogram. A result recommends that speed is varying according to the histogram.


2020 ◽  
Vol 17 (9) ◽  
pp. 4294-4298
Author(s):  
B. R. Sunil Kumar ◽  
B. S. Siddhartha ◽  
S. N. Shwetha ◽  
K. Arpitha

This paper intends to use distinct machine learning algorithms and exploring its multi-features. The primary advantage of machine learning is, a machine learning algorithm can predict its work automatically by learning what to do with information. This paper reveals the concept of machine learning and its algorithms which can be used for different applications such as health care, sentiment analysis and many more. Sometimes the programmers will get confused which algorithm to apply for their applications. This paper provides an idea related to the algorithm used on the basis of how accurately it fits. Based on the collected data, one of the algorithms can be selected based upon its pros and cons. By considering the data set, the base model is developed, trained and tested. Then the trained model is ready for prediction and can be deployed on the basis of feasibility.


Author(s):  
Shuangxia Ren ◽  
Jill Zupetic ◽  
Mehdi Nouraie ◽  
Xinghua Lu ◽  
Richard D. Boyce ◽  
...  

AbstractBackgroundThe partial pressure of oxygen (PaO2)/fraction of oxygen delivered (FIO2) ratio is the reference standard for assessment of hypoxemia in mechanically ventilated patients. Non-invasive monitoring with the peripheral saturation of oxygen (SpO2) is increasingly utilized to estimate PaO2 because it does not require invasive sampling. Several equations have been reported to impute PaO2/FIO2 from SpO2 /FIO2. However, machine-learning algorithms to impute the PaO2 from the SpO2 has not been compared to published equations.Research QuestionHow do machine learning algorithms perform at predicting the PaO2 from SpO2 compared to previously published equations?MethodsThree machine learning algorithms (neural network, regression, and kernel-based methods) were developed using 7 clinical variable features (n=9,900 ICU events) and subsequently 3 features (n=20,198 ICU events) as input into the models from data available in mechanically ventilated patients from the Medical Information Mart for Intensive Care (MIMIC) III database. As a regression task, the machine learning models were used to impute PaO2 values. As a classification task, the models were used to predict patients with moderate-to-severe hypoxemic respiratory failure based on a clinically relevant cut-off of PaO2/FIO2 ≤ 150. The accuracy of the machine learning models was compared to published log-linear and non-linear equations. An online imputation calculator was created.ResultsCompared to seven features, three features (SpO2, FiO2 and PEEP) were sufficient to impute PaO2/FIO2 ratio using a large dataset. Any of the tested machine learning models enabled imputation of PaO2/FIO2 from the SpO2/FIO2 with lower error and had greater accuracy in predicting PaO2/FIO2 ≤ 150 compared to published equations. Using three features, the machine learning models showed superior performance in imputing PaO2 across the entire span of SpO2 values, including those ≥ 97%.InterpretationThe improved performance shown for the machine learning algorithms suggests a promising framework for future use in large datasets.


2021 ◽  
Author(s):  
Catherine Ollagnier ◽  
Claudia Kasper ◽  
Anna Wallenbeck ◽  
Linda Keeling ◽  
Siavash A Bigdeli

Tail biting is a detrimental behaviour that impacts the welfare and health of pigs. Early detection of tail biting precursor signs allows for preventive measures to be taken, thus avoiding the occurrence of the tail biting event. This study aimed to build a machine-learning algorithm for real time detection of upcoming tail biting outbreaks, using feeding behaviour data recorded by an electronic feeder. Prediction capacities of seven machine learning algorithms (e.g., random forest, neural networks) were evaluated from daily feeding data collected from 65 pens originating from 2 herds of grower-finisher pigs (25-100kg), in which 27 tail biting events occurred. Data were divided into training and testing data, either by randomly splitting data into 75% (training set) and 25% (testing set), or by randomly selecting pens to constitute the testing set. The random forest algorithm was able to predict 70% of the upcoming events with an accuracy of 94%, when predicting events in pens for which it had previous data. The detection of events for unknown pens was less sensitive, and the neural network model was able to detect 14% of the upcoming events with an accuracy of 63%. A machine-learning algorithm based on ongoing data collection should be considered for implementation into automatic feeder systems for real time prediction of tail biting events.


2020 ◽  
Vol 7 (1) ◽  
Author(s):  
Thérence Nibareke ◽  
Jalal Laassiri

Abstract Introduction Nowadays large data volumes are daily generated at a high rate. Data from health system, social network, financial, government, marketing, bank transactions as well as the censors and smart devices are increasing. The tools and models have to be optimized. In this paper we applied and compared Machine Learning algorithms (Linear Regression, Naïve bayes, Decision Tree) to predict diabetes. Further more, we performed analytics on flight delays. The main contribution of this paper is to give an overview of Big Data tools and machine learning models. We highlight some metrics that allow us to choose a more accurate model. We predict diabetes disease using three machine learning models and then compared their performance. Further more we analyzed flight delay and produced a dashboard which can help managers of flight companies to have a 360° view of their flights and take strategic decisions. Case description We applied three Machine Learning algorithms for predicting diabetes and we compared the performance to see what model give the best results. We performed analytics on flights datasets to help decision making and predict flight delays. Discussion and evaluation The experiment shows that the Linear Regression, Naive Bayesian and Decision Tree give the same accuracy (0.766) but Decision Tree outperforms the two other models with the greatest score (1) and the smallest error (0). For the flight delays analytics, the model could show for example the airport that recorded the most flight delays. Conclusions Several tools and machine learning models to deal with big data analytics have been discussed in this paper. We concluded that for the same datasets, we have to carefully choose the model to use in prediction. In our future works, we will test different models in other fields (climate, banking, insurance.).


2018 ◽  
Vol 10 (8) ◽  
pp. 76 ◽  
Author(s):  
Marcio Teixeira ◽  
Tara Salman ◽  
Maede Zolanvari ◽  
Raj Jain ◽  
Nader Meskin ◽  
...  

This paper presents the development of a Supervisory Control and Data Acquisition (SCADA) system testbed used for cybersecurity research. The testbed consists of a water storage tank’s control system, which is a stage in the process of water treatment and distribution. Sophisticated cyber-attacks were conducted against the testbed. During the attacks, the network traffic was captured, and features were extracted from the traffic to build a dataset for training and testing different machine learning algorithms. Five traditional machine learning algorithms were trained to detect the attacks: Random Forest, Decision Tree, Logistic Regression, Naïve Bayes and KNN. Then, the trained machine learning models were built and deployed in the network, where new tests were made using online network traffic. The performance obtained during the training and testing of the machine learning models was compared to the performance obtained during the online deployment of these models in the network. The results show the efficiency of the machine learning models in detecting the attacks in real time. The testbed provides a good understanding of the effects and consequences of attacks on real SCADA environments.


Sign in / Sign up

Export Citation Format

Share Document