Machine Learning Methods in Precision Medicine Targeting Epigenetic Diseases

2019 ◽  
Vol 24 (34) ◽  
pp. 3998-4006
Author(s):  
Shijie Fan ◽  
Yu Chen ◽  
Cheng Luo ◽  
Fanwang Meng

Background: On a tide of big data, machine learning is coming to its day. Referring to huge amounts of epigenetic data coming from biological experiments and clinic, machine learning can help in detecting epigenetic features in genome, finding correlations between phenotypes and modifications in histone or genes, accelerating the screen of lead compounds targeting epigenetics diseases and many other aspects around the study on epigenetics, which consequently realizes the hope of precision medicine. Methods: In this minireview, we will focus on reviewing the fundamentals and applications of machine learning methods which are regularly used in epigenetics filed and explain their features. Their advantages and disadvantages will also be discussed. Results: Machine learning algorithms have accelerated studies in precision medicine targeting epigenetics diseases. Conclusion: In order to make full use of machine learning algorithms, one should get familiar with the pros and cons of them, which will benefit from big data by choosing the most suitable method(s).

2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Alan Brnabic ◽  
Lisa M. Hess

Abstract Background Machine learning is a broad term encompassing a number of methods that allow the investigator to learn from the data. These methods may permit large real-world databases to be more rapidly translated to applications to inform patient-provider decision making. Methods This systematic literature review was conducted to identify published observational research of employed machine learning to inform decision making at the patient-provider level. The search strategy was implemented and studies meeting eligibility criteria were evaluated by two independent reviewers. Relevant data related to study design, statistical methods and strengths and limitations were identified; study quality was assessed using a modified version of the Luo checklist. Results A total of 34 publications from January 2014 to September 2020 were identified and evaluated for this review. There were diverse methods, statistical packages and approaches used across identified studies. The most common methods included decision tree and random forest approaches. Most studies applied internal validation but only two conducted external validation. Most studies utilized one algorithm, and only eight studies applied multiple machine learning algorithms to the data. Seven items on the Luo checklist failed to be met by more than 50% of published studies. Conclusions A wide variety of approaches, algorithms, statistical software, and validation strategies were employed in the application of machine learning methods to inform patient-provider decision making. There is a need to ensure that multiple machine learning approaches are used, the model selection strategy is clearly defined, and both internal and external validation are necessary to be sure that decisions for patient care are being made with the highest quality evidence. Future work should routinely employ ensemble methods incorporating multiple machine learning algorithms.


2022 ◽  
Author(s):  
Marcus Kubsch ◽  
Christina Krist ◽  
Joshua Rosenberg

Machine learning has become commonplace in educational research and science education research, especially to support assessment efforts. Such applications of machine learning have shown their promise in replicating and scaling human-driven codes of students’ work. Despite this promise, we and other scholars argue that machine learning has not achieved its transformational potential. We argue that this is because our field is currently lacking frameworks for supporting creative, principled, and critical endeavors to use machine learning in science education research. To offer considerations for science education researchers’ use of ML, we present a framework, Distributing Epistemic Functions and Tasks (DEFT), that highlights the functions and tasks that pertain to generating knowledge that can be carried out by either trained researchers or machine learning algorithms. Such considerations are critical decisions that should occur alongside those about, for instance, the type of data or algorithm used. We apply this framework to two cases, one that exemplifies the cutting-edge use of machine learning in science education research and another that offers a wholly different means of using machine learning and human-driven inquiry together. We conclude with strategies for researchers to adopt machine learning and call for the field to rethink how we prepare science education researchers in an era of great advances in computational power and access to machine learning methods.


Author(s):  
Qifang Bi ◽  
Katherine E Goodman ◽  
Joshua Kaminsky ◽  
Justin Lessler

Abstract Machine learning is a branch of computer science that has the potential to transform epidemiologic sciences. Amid a growing focus on “Big Data,” it offers epidemiologists new tools to tackle problems for which classical methods are not well-suited. In order to critically evaluate the value of integrating machine learning algorithms and existing methods, however, it is essential to address language and technical barriers between the two fields that can make it difficult for epidemiologists to read and assess machine learning studies. Here, we provide an overview of the concepts and terminology used in machine learning literature, which encompasses a diverse set of tools with goals ranging from prediction to classification to clustering. We provide a brief introduction to 5 common machine learning algorithms and 4 ensemble-based approaches. We then summarize epidemiologic applications of machine learning techniques in the published literature. We recommend approaches to incorporate machine learning in epidemiologic research and discuss opportunities and challenges for integrating machine learning and existing epidemiologic research methods.


2021 ◽  
Author(s):  
Dhairya Vyas

In terms of Machine Learning, the majority of the data can be grouped into four categories: numerical data, category data, time-series data, and text. We use different classifiers for different data properties, such as the Supervised; Unsupervised; and Reinforcement. Each Categorises has classifier we have tested almost all machine learning methods and make analysis among them.


2019 ◽  
Author(s):  
Levi John Wolf ◽  
Elijah Knaap

Dimension reduction is one of the oldest concerns in geographical analysis. Despite significant, longstanding attention in geographical problems, recent advances in non-linear techniques for dimension reduction, called manifold learning, have not been adopted in classic data-intensive geographical problems. More generally, machine learning methods for geographical problems often focus more on applying standard machine learning algorithms to geographic data, rather than applying true "spatially-correlated learning," in the words of Kohonen. As such, we suggest a general way to incentivize geographical learning in machine learning algorithms, and link it to many past methods that introduced geography into statistical techniques. We develop a specific instance of this by specifying two geographical variants of Isomap, a non-linear dimension reduction, or "manifold learning," technique. We also provide a method for assessing what is added by incorporating geography and estimate the manifold's intrinsic geographic scale. To illustrate the concepts and provide interpretable results, we conducting a dimension reduction on geographical and high-dimensional structure of social and economic data on Brooklyn, New York. Overall, this paper's main endeavor--defining and explaining a way to "geographize" many machine learning methods--yields interesting and novel results for manifold learning the estimation of intrinsic geographical scale in unsupervised learning.


2021 ◽  
Vol 19 (3) ◽  
pp. 55-64
Author(s):  
K. N. Maiorov ◽  

The paper examines the life cycle of field development, analyzes the processes of the field development design stage for the application of machine learning methods. For each process, relevant problems are highlighted, existing solutions based on machine learning methods, ideas and problems are proposed that could be effectively solved by machine learning methods. For the main part of the processes, examples of solutions are briefly described; the advantages and disadvantages of the approaches are identified. The most common solution method is feed-forward neural networks. Subject to preliminary normalization of the input data, this is the most versatile algorithm for regression and classification problems. However, in the problem of selecting wells for hydraulic fracturing, a whole ensemble of machine learning models was used, where, in addition to a neural network, there was a random forest, gradient boosting and linear regression. For the problem of optimizing the placement of a grid of oil wells, the disadvantages of existing solutions based on a neural network and a simple reinforcement learning approach based on Markov decision-making process are identified. A deep reinforcement learning algorithm called Alpha Zero is proposed, which has previously shown significant results in the role of artificial intelligence for games. This algorithm is a decision tree search that directs the neural network: only those branches that have received the best estimates from the neural network are considered more thoroughly. The paper highlights the similarities between the tasks for which Alpha Zero was previously used, and the task of optimizing the placement of a grid of oil producing wells. Conclusions are made about the possibility of using and modifying the algorithm of the optimization problem being solved. Аn approach is proposed to take into account symmetric states in a Monte Carlo tree to reduce the number of required simulations.


Sensors ◽  
2019 ◽  
Vol 19 (7) ◽  
pp. 1521 ◽  
Author(s):  
Tomasz Rymarczyk ◽  
Grzegorz Kłosowski ◽  
Edward Kozłowski ◽  
Paweł Tchórzewski

The main goal of this work was to compare the selected machine learning methods with the classic deterministic method in the industrial field of electrical impedance tomography. The research focused on the development and comparison of algorithms and models for the analysis and reconstruction of data using electrical tomography. The novelty was the use of original machine learning algorithms. Their characteristic feature is the use of many separately trained subsystems, each of which generates a single pixel of the output image. Artificial Neural Network (ANN), LARS and Elastic net methods were used to solve the inverse problem. These algorithms have been modified by a corresponding increase in equations (multiply) for electrical impedance tomography using the finite element method grid. The Gauss-Newton method was used as a reference to machine learning methods. The algorithms were trained using learning data obtained through computer simulation based on real models. The results of the experiments showed that in the considered cases the best quality of reconstructions was achieved by ANN. At the same time, ANN was the slowest in terms of both the training process and the speed of image generation. Other machine learning methods were comparable with the deterministic Gauss-Newton method and with each other.


Author(s):  
Hong Cui

Despite the sub-language nature of taxonomic descriptions of animals and plants, researchers have warned about the existence of large variations among different description collections in terms of information content and its representation. These variations impose a serious threat to the development of automatic tools to structure large volumes of text-based descriptions. This paper presents a general approach to mark up different collections of taxonomic descriptions with XML, using two large-scale floras as examples. The markup system, MARTT, is based on machine learning methods and enhanced by machine learned domain rules and conventions. Experiments show that our simple and efficient machine learning algorithms outperform significantly general purpose algorithms and that rules learned from one flora can be used when marking up a second flora and help to improve the markup performance, especially for elements that have sparse training examples.Malgré la nature de sous-langage des descriptions taxinomiques des animaux et des plantes, les chercheurs reconnaissent l’existence de vastes variations parmi différentes collections de descriptions, en termes de contenu informationnel et de leur représentation. Ces variations présentent une menace sérieuse pour le développement d’outils automatiques pour la structuration de larges… 


2019 ◽  
Vol 21 (9) ◽  
pp. 693-699 ◽  
Author(s):  
A. Alper Öztürk ◽  
A. Bilge Gündüz ◽  
Ozan Ozisik

Aims and Objectives: Solid Lipid Nanoparticles (SLNs) are pharmaceutical delivery systems that have advantages such as controlled drug release, long-term stability etc. Particle Size (PS) is one of the important criteria of SLNs. These factors affect drug release rate, bio-distribution etc. In this study, the formulation of SLNs using high-speed homogenization technique has been evaluated. The main emphasis of the work is to study whether the effect of mixing time and formulation ingredients on PS can be modeled. For this purpose, different machine learning algorithms have been applied and evaluated using the mean absolute error metric. Materials and Methods: SLNs were prepared by high-speed homogenizaton. PS, size distribution and zeta potential measurements were performed on freshly prepared samples. In order to model the formulation of the particles in terms of mixing time and formulation ingredients and evaluate the predictability of PS depending on these parameters, different machine learning algorithms were applied on the prepared dataset and the performances of the algorithms were also evaluated. Results: PS of SLNs obtained was in the range of 263-498nm. The results present that PS of SLNs can be best estimated by decision tree based methods, among which Random Forest has the least mean absolute error value with 0.028. As a result, the estimation of machine learning algorithms demonstrates that particle size can be estimated by both decision rule-based machine learning methods and function fitting machine learning methods. Conclusion: Our findings present that machine learning methods can be highly useful for determining formulation parameters for further research.


SPE Journal ◽  
2020 ◽  
Vol 25 (03) ◽  
pp. 1241-1258 ◽  
Author(s):  
Ruizhi Zhong ◽  
Raymond L. Johnson ◽  
Zhongwei Chen

Summary Accurate coal identification is critical in coal seam gas (CSG) (also known as coalbed methane or CBM) developments because it determines well completion design and directly affects gas production. Density logging using radioactive source tools is the primary tool for coal identification, adding well trips to condition the hole and additional well costs for logging runs. In this paper, machine learning methods are applied to identify coals from drilling and logging-while-drilling (LWD) data to reduce overall well costs. Machine learning algorithms include logistic regression (LR), support vector machine (SVM), artificial neural network (ANN), random forest (RF), and extreme gradient boosting (XGBoost). The precision, recall, and F1 score are used as evaluation metrics. Because coal identification is an imbalanced data problem, the performance on the minority class (i.e., coals) is limited. To enhance the performance on coal prediction, two data manipulation techniques [naive random oversampling (NROS) technique and synthetic minority oversampling technique (SMOTE)] are separately coupled with machine learning algorithms. Case studies are performed with data from six wells in the Surat Basin, Australia. For the first set of experiments (single-well experiments), both the training data and test data are in the same well. The machine learning methods can identify coal pay zones for sections with poor or missing logs. It is found that rate of penetration (ROP) is the most important feature. The second set of experiments (multiple-well experiments) uses the training data from multiple nearby wells, which can predict coal pay zones in a new well. The most important feature is gamma ray. After placing slotted casings, all wells have coal identification rates greater than 90%, and three wells have coal identification rates greater than 99%. This indicates that machine learning methods (either XGBoost or ANN/RF with NROS/SMOTE) can be an effective way to identify coal pay zones and reduce coring or logging costs in CSG developments.


Sign in / Sign up

Export Citation Format

Share Document