scholarly journals Systematic literature review of machine learning methods used in the analysis of real-world data for patient-provider decision making

2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Alan Brnabic ◽  
Lisa M. Hess

Abstract Background Machine learning is a broad term encompassing a number of methods that allow the investigator to learn from the data. These methods may permit large real-world databases to be more rapidly translated to applications to inform patient-provider decision making. Methods This systematic literature review was conducted to identify published observational research of employed machine learning to inform decision making at the patient-provider level. The search strategy was implemented and studies meeting eligibility criteria were evaluated by two independent reviewers. Relevant data related to study design, statistical methods and strengths and limitations were identified; study quality was assessed using a modified version of the Luo checklist. Results A total of 34 publications from January 2014 to September 2020 were identified and evaluated for this review. There were diverse methods, statistical packages and approaches used across identified studies. The most common methods included decision tree and random forest approaches. Most studies applied internal validation but only two conducted external validation. Most studies utilized one algorithm, and only eight studies applied multiple machine learning algorithms to the data. Seven items on the Luo checklist failed to be met by more than 50% of published studies. Conclusions A wide variety of approaches, algorithms, statistical software, and validation strategies were employed in the application of machine learning methods to inform patient-provider decision making. There is a need to ensure that multiple machine learning approaches are used, the model selection strategy is clearly defined, and both internal and external validation are necessary to be sure that decisions for patient care are being made with the highest quality evidence. Future work should routinely employ ensemble methods incorporating multiple machine learning algorithms.

2019 ◽  
Vol 24 (34) ◽  
pp. 3998-4006
Author(s):  
Shijie Fan ◽  
Yu Chen ◽  
Cheng Luo ◽  
Fanwang Meng

Background: On a tide of big data, machine learning is coming to its day. Referring to huge amounts of epigenetic data coming from biological experiments and clinic, machine learning can help in detecting epigenetic features in genome, finding correlations between phenotypes and modifications in histone or genes, accelerating the screen of lead compounds targeting epigenetics diseases and many other aspects around the study on epigenetics, which consequently realizes the hope of precision medicine. Methods: In this minireview, we will focus on reviewing the fundamentals and applications of machine learning methods which are regularly used in epigenetics filed and explain their features. Their advantages and disadvantages will also be discussed. Results: Machine learning algorithms have accelerated studies in precision medicine targeting epigenetics diseases. Conclusion: In order to make full use of machine learning algorithms, one should get familiar with the pros and cons of them, which will benefit from big data by choosing the most suitable method(s).


2020 ◽  
Vol 7 (2) ◽  
pp. 129-134
Author(s):  
Takudzwa Fadziso

In modern times, the collection of data is not a big deal but using it in a meaningful is a challenging task. Different organizations are using artificial intelligence and machine learning for collecting and utilizing the data. These should also be used in the medical because different disease requires the prediction. One of these diseases is asthma that is continuously increasing and affecting more and more people. The major issue is that it is difficult to diagnose in children. Machine learning algorithms can help in diagnosing it early so that the doctors can start the treatment early. Machine learning algorithms can perform this prediction so this study will be helpful for both the doctors and patients. There are different machine learning predictive algorithms are available that have been used for this purpose.  


2021 ◽  
Author(s):  
Dhairya Vyas

In terms of Machine Learning, the majority of the data can be grouped into four categories: numerical data, category data, time-series data, and text. We use different classifiers for different data properties, such as the Supervised; Unsupervised; and Reinforcement. Each Categorises has classifier we have tested almost all machine learning methods and make analysis among them.


2019 ◽  
Author(s):  
Levi John Wolf ◽  
Elijah Knaap

Dimension reduction is one of the oldest concerns in geographical analysis. Despite significant, longstanding attention in geographical problems, recent advances in non-linear techniques for dimension reduction, called manifold learning, have not been adopted in classic data-intensive geographical problems. More generally, machine learning methods for geographical problems often focus more on applying standard machine learning algorithms to geographic data, rather than applying true "spatially-correlated learning," in the words of Kohonen. As such, we suggest a general way to incentivize geographical learning in machine learning algorithms, and link it to many past methods that introduced geography into statistical techniques. We develop a specific instance of this by specifying two geographical variants of Isomap, a non-linear dimension reduction, or "manifold learning," technique. We also provide a method for assessing what is added by incorporating geography and estimate the manifold's intrinsic geographic scale. To illustrate the concepts and provide interpretable results, we conducting a dimension reduction on geographical and high-dimensional structure of social and economic data on Brooklyn, New York. Overall, this paper's main endeavor--defining and explaining a way to "geographize" many machine learning methods--yields interesting and novel results for manifold learning the estimation of intrinsic geographical scale in unsupervised learning.


Sensors ◽  
2019 ◽  
Vol 19 (7) ◽  
pp. 1521 ◽  
Author(s):  
Tomasz Rymarczyk ◽  
Grzegorz Kłosowski ◽  
Edward Kozłowski ◽  
Paweł Tchórzewski

The main goal of this work was to compare the selected machine learning methods with the classic deterministic method in the industrial field of electrical impedance tomography. The research focused on the development and comparison of algorithms and models for the analysis and reconstruction of data using electrical tomography. The novelty was the use of original machine learning algorithms. Their characteristic feature is the use of many separately trained subsystems, each of which generates a single pixel of the output image. Artificial Neural Network (ANN), LARS and Elastic net methods were used to solve the inverse problem. These algorithms have been modified by a corresponding increase in equations (multiply) for electrical impedance tomography using the finite element method grid. The Gauss-Newton method was used as a reference to machine learning methods. The algorithms were trained using learning data obtained through computer simulation based on real models. The results of the experiments showed that in the considered cases the best quality of reconstructions was achieved by ANN. At the same time, ANN was the slowest in terms of both the training process and the speed of image generation. Other machine learning methods were comparable with the deterministic Gauss-Newton method and with each other.


Author(s):  
Hong Cui

Despite the sub-language nature of taxonomic descriptions of animals and plants, researchers have warned about the existence of large variations among different description collections in terms of information content and its representation. These variations impose a serious threat to the development of automatic tools to structure large volumes of text-based descriptions. This paper presents a general approach to mark up different collections of taxonomic descriptions with XML, using two large-scale floras as examples. The markup system, MARTT, is based on machine learning methods and enhanced by machine learned domain rules and conventions. Experiments show that our simple and efficient machine learning algorithms outperform significantly general purpose algorithms and that rules learned from one flora can be used when marking up a second flora and help to improve the markup performance, especially for elements that have sparse training examples.Malgré la nature de sous-langage des descriptions taxinomiques des animaux et des plantes, les chercheurs reconnaissent l’existence de vastes variations parmi différentes collections de descriptions, en termes de contenu informationnel et de leur représentation. Ces variations présentent une menace sérieuse pour le développement d’outils automatiques pour la structuration de larges… 


2019 ◽  
Vol 21 (9) ◽  
pp. 693-699 ◽  
Author(s):  
A. Alper Öztürk ◽  
A. Bilge Gündüz ◽  
Ozan Ozisik

Aims and Objectives: Solid Lipid Nanoparticles (SLNs) are pharmaceutical delivery systems that have advantages such as controlled drug release, long-term stability etc. Particle Size (PS) is one of the important criteria of SLNs. These factors affect drug release rate, bio-distribution etc. In this study, the formulation of SLNs using high-speed homogenization technique has been evaluated. The main emphasis of the work is to study whether the effect of mixing time and formulation ingredients on PS can be modeled. For this purpose, different machine learning algorithms have been applied and evaluated using the mean absolute error metric. Materials and Methods: SLNs were prepared by high-speed homogenizaton. PS, size distribution and zeta potential measurements were performed on freshly prepared samples. In order to model the formulation of the particles in terms of mixing time and formulation ingredients and evaluate the predictability of PS depending on these parameters, different machine learning algorithms were applied on the prepared dataset and the performances of the algorithms were also evaluated. Results: PS of SLNs obtained was in the range of 263-498nm. The results present that PS of SLNs can be best estimated by decision tree based methods, among which Random Forest has the least mean absolute error value with 0.028. As a result, the estimation of machine learning algorithms demonstrates that particle size can be estimated by both decision rule-based machine learning methods and function fitting machine learning methods. Conclusion: Our findings present that machine learning methods can be highly useful for determining formulation parameters for further research.


SPE Journal ◽  
2020 ◽  
Vol 25 (03) ◽  
pp. 1241-1258 ◽  
Author(s):  
Ruizhi Zhong ◽  
Raymond L. Johnson ◽  
Zhongwei Chen

Summary Accurate coal identification is critical in coal seam gas (CSG) (also known as coalbed methane or CBM) developments because it determines well completion design and directly affects gas production. Density logging using radioactive source tools is the primary tool for coal identification, adding well trips to condition the hole and additional well costs for logging runs. In this paper, machine learning methods are applied to identify coals from drilling and logging-while-drilling (LWD) data to reduce overall well costs. Machine learning algorithms include logistic regression (LR), support vector machine (SVM), artificial neural network (ANN), random forest (RF), and extreme gradient boosting (XGBoost). The precision, recall, and F1 score are used as evaluation metrics. Because coal identification is an imbalanced data problem, the performance on the minority class (i.e., coals) is limited. To enhance the performance on coal prediction, two data manipulation techniques [naive random oversampling (NROS) technique and synthetic minority oversampling technique (SMOTE)] are separately coupled with machine learning algorithms. Case studies are performed with data from six wells in the Surat Basin, Australia. For the first set of experiments (single-well experiments), both the training data and test data are in the same well. The machine learning methods can identify coal pay zones for sections with poor or missing logs. It is found that rate of penetration (ROP) is the most important feature. The second set of experiments (multiple-well experiments) uses the training data from multiple nearby wells, which can predict coal pay zones in a new well. The most important feature is gamma ray. After placing slotted casings, all wells have coal identification rates greater than 90%, and three wells have coal identification rates greater than 99%. This indicates that machine learning methods (either XGBoost or ANN/RF with NROS/SMOTE) can be an effective way to identify coal pay zones and reduce coring or logging costs in CSG developments.


Sign in / Sign up

Export Citation Format

Share Document