Machine Learning With R

Author(s):  
Kumar Abhishek Gaurav ◽  
Ladly Patel

In this chapter, the author explained the importance of the R language in machine learning and steps to the installation of R in a different environment like Windows and Linux. The author also describes the basic concepts of R like its syntax, data types, variables, function, operator, etc. with examples in detail. In advanced R, the author explained different charts to plot different data using a barplot function. Using barplot, different graphs like histograms, pie charts can be drawn. The author has also shown how to label the axis of the graph and how to plot a different color. The chapter also consists of some basic R programming examples like a program to make a calculator, checking Armstrong's number, etc. The author also describes the steps and process to install tensor flow.

2020 ◽  
Vol 15 ◽  
Author(s):  
Deeksha Saxena ◽  
Mohammed Haris Siddiqui ◽  
Rajnish Kumar

Background: Deep learning (DL) is an Artificial neural network-driven framework with multiple levels of representation for which non-linear modules combined in such a way that the levels of representation can be enhanced from lower to a much abstract level. Though DL is used widely in almost every field, it has largely brought a breakthrough in biological sciences as it is used in disease diagnosis and clinical trials. DL can be clubbed with machine learning, but at times both are used individually as well. DL seems to be a better platform than machine learning as the former does not require an intermediate feature extraction and works well with larger datasets. DL is one of the most discussed fields among the scientists and researchers these days for diagnosing and solving various biological problems. However, deep learning models need some improvisation and experimental validations to be more productive. Objective: To review the available DL models and datasets that are used in disease diagnosis. Methods: Available DL models and their applications in disease diagnosis were reviewed discussed and tabulated. Types of datasets and some of the popular disease related data sources for DL were highlighted. Results: We have analyzed the frequently used DL methods, data types and discussed some of the recent deep learning models used for solving different biological problems. Conclusion: The review presents useful insights about DL methods, data types, selection of DL models for the disease diagnosis.


Author(s):  
Dhamanpreet Kaur ◽  
Matthew Sobiesk ◽  
Shubham Patil ◽  
Jin Liu ◽  
Puran Bhagat ◽  
...  

Abstract Objective This study seeks to develop a fully automated method of generating synthetic data from a real dataset that could be employed by medical organizations to distribute health data to researchers, reducing the need for access to real data. We hypothesize the application of Bayesian networks will improve upon the predominant existing method, medBGAN, in handling the complexity and dimensionality of healthcare data. Materials and Methods We employed Bayesian networks to learn probabilistic graphical structures and simulated synthetic patient records from the learned structure. We used the University of California Irvine (UCI) heart disease and diabetes datasets as well as the MIMIC-III diagnoses database. We evaluated our method through statistical tests, machine learning tasks, preservation of rare events, disclosure risk, and the ability of a machine learning classifier to discriminate between the real and synthetic data. Results Our Bayesian network model outperformed or equaled medBGAN in all key metrics. Notable improvement was achieved in capturing rare variables and preserving association rules. Discussion Bayesian networks generated data sufficiently similar to the original data with minimal risk of disclosure, while offering additional transparency, computational efficiency, and capacity to handle more data types in comparison to existing methods. We hope this method will allow healthcare organizations to efficiently disseminate synthetic health data to researchers, enabling them to generate hypotheses and develop analytical tools. Conclusion We conclude the application of Bayesian networks is a promising option for generating realistic synthetic health data that preserves the features of the original data without compromising data privacy.


2020 ◽  
Vol 12 (7) ◽  
pp. 1218
Author(s):  
Laura Tuşa ◽  
Mahdi Khodadadzadeh ◽  
Cecilia Contreras ◽  
Kasra Rafiezadeh Shahi ◽  
Margret Fuchs ◽  
...  

Due to the extensive drilling performed every year in exploration campaigns for the discovery and evaluation of ore deposits, drill-core mapping is becoming an essential step. While valuable mineralogical information is extracted during core logging by on-site geologists, the process is time consuming and dependent on the observer and individual background. Hyperspectral short-wave infrared (SWIR) data is used in the mining industry as a tool to complement traditional logging techniques and to provide a rapid and non-invasive analytical method for mineralogical characterization. Additionally, Scanning Electron Microscopy-based image analyses using a Mineral Liberation Analyser (SEM-MLA) provide exhaustive high-resolution mineralogical maps, but can only be performed on small areas of the drill-cores. We propose to use machine learning algorithms to combine the two data types and upscale the quantitative SEM-MLA mineralogical data to drill-core scale. This way, quasi-quantitative maps over entire drill-core samples are obtained. Our upscaling approach increases result transparency and reproducibility by employing physical-based data acquisition (hyperspectral imaging) combined with mathematical models (machine learning). The procedure is tested on 5 drill-core samples with varying training data using random forests, support vector machines and neural network regression models. The obtained mineral abundance maps are further used for the extraction of mineralogical parameters such as mineral association.


Geophysics ◽  
2019 ◽  
Vol 84 (2) ◽  
pp. O39-O47 ◽  
Author(s):  
Ryan Smith ◽  
Tapan Mukerji ◽  
Tony Lupo

Predicting well production in unconventional oil and gas settings is challenging due to the combined influence of engineering, geologic, and geophysical inputs on well productivity. We have developed a machine-learning workflow that incorporates geophysical and geologic data, as well as engineering completion parameters, into a model for predicting well production. The study area is in southwest Texas in the lower Eagle Ford Group. We make use of a time-series method known as functional principal component analysis to summarize the well-production time series. Next, we use random forests, a machine-learning regression technique, in combination with our summarized well data to predict the full time series of well production. The inputs to this model are geologic, geophysical, and engineering data. We are then able to predict the well-production time series, with 65%–76% accuracy. This method incorporates disparate data types into a robust, predictive model that predicts well production in unconventional resources.


IEEE Access ◽  
2019 ◽  
Vol 7 ◽  
pp. 27389-27400 ◽  
Author(s):  
Wilson Castro ◽  
Jimy Oblitas ◽  
Miguel De-La-Torre ◽  
Carlos Cotrina ◽  
Karen Bazan ◽  
...  

2021 ◽  
Vol 11 (8) ◽  
pp. 785
Author(s):  
Quentin Miagoux ◽  
Vidisha Singh ◽  
Dereck de Mézquita ◽  
Valerie Chaudru ◽  
Mohamed Elati ◽  
...  

Rheumatoid arthritis (RA) is a multifactorial, complex autoimmune disease that involves various genetic, environmental, and epigenetic factors. Systems biology approaches provide the means to study complex diseases by integrating different layers of biological information. Combining multiple data types can help compensate for missing or conflicting information and limit the possibility of false positives. In this work, we aim to unravel mechanisms governing the regulation of key transcription factors in RA and derive patient-specific models to gain more insights into the disease heterogeneity and the response to treatment. We first use publicly available transcriptomic datasets (peripheral blood) relative to RA and machine learning to create an RA-specific transcription factor (TF) co-regulatory network. The TF cooperativity network is subsequently enriched in signalling cascades and upstream regulators using a state-of-the-art, RA-specific molecular map. Then, the integrative network is used as a template to analyse patients’ data regarding their response to anti-TNF treatment and identify master regulators and upstream cascades affected by the treatment. Finally, we use the Boolean formalism to simulate in silico subparts of the integrated network and identify combinations and conditions that can switch on or off the identified TFs, mimicking the effects of single and combined perturbations.


2021 ◽  
Vol 8 ◽  
Author(s):  
Sita Karki ◽  
Ricardo Bermejo ◽  
Robert Wilkes ◽  
Michéal Mac Monagail ◽  
Eve Daly ◽  
...  

Graphical AbstractOverall research workflow showing data types, study area, model development and biomass results.


2021 ◽  
Vol 14 (11) ◽  
pp. 2483-2490
Author(s):  
Maximilian Bandle ◽  
Jana Giceva

A wealth of technology has evolved around relational databases over decades that has been successfully tried and tested in many settings and use cases. Yet, the majority of it remains overlooked in the pursuit of performance (e.g., NoSQL) or new functionality (e.g., graph data or machine learning). In this paper, we argue that a wide range of techniques readily available in databases are crucial to tackling the challenges the IT industry faces in terms of hardware trends management, growing workloads, and the overall complexity of a rapidly changing application and platform landscape. However, to be truly useful, these techniques must be freed from the legacy component of database engines: relational operators. Therefore, we argue that to make databases more flexible as platforms and to extend their functionality to new data types and operations requires exposing a lower level of abstraction: instead of working with SQL it would be desirable for database engines to compile, optimize, and run a collection of sub-operators for manipulating and managing data, offering them as an external interface. In this paper, we discuss the advantages of this, provide an initial list of such sub-operators, and show how they can be used in practice.


2021 ◽  
Author(s):  
Florian Wellmann ◽  
Miguel de la Varga ◽  
Nilgün Güdük ◽  
Jan von Harten ◽  
Fabian Stamm ◽  
...  

<p>Geological models, as 3-D representations of subsurface structures and property distributions, are used in many economic, scientific, and societal decision processes. These models are built on prior assumptions and imperfect information, and they often result from an integration of geological and geophysical data types with varying quality. These aspects result in uncertainties about the predicted subsurface structures and property distributions, which will affect the subsequent decision process.</p><p>We discuss approaches to evaluate uncertainties in geological models and to integrate geological and geophysical information in combined workflows. A first step is the consideration of uncertainties in prior model parameters on the basis of uncertainty propagation (forward uncertainty quantification). When applied to structural geological models with discrete classes, these methods result in a class probability for each point in space, often represented in tessellated grid cells. These results can then be visualized or forwarded to process simulations. Another option is to add risk functions for subsequent decision analyses. In recent work, these geological uncertainty fields have also been used as an input to subsequent geophysical inversions.</p><p>A logical extension to these existing approaches is the integration of geological forward operators into inverse frameworks, to enable a full flow of inference for a wider range of relevant parameters. We investigate here specifically the use of probabilistic machine learning tools in combination with geological and geophysical modeling. Challenges exist due to the hierarchical nature of the probabilistic models, but modern sampling strategies allow for efficient sampling in these complex settings. We showcase the application with examples combining geological modeling and geophysical potential field measurements in an integrated model for improved decision making.</p>


Web Services ◽  
2019 ◽  
pp. 105-126
Author(s):  
N. Nawin Sona

This chapter aims to give an overview of the wide range of Big Data approaches and technologies today. The data features of Volume, Velocity, and Variety are examined against new database technologies. It explores the complexity of data types, methodologies of storage, access and computation, current and emerging trends of data analysis, and methods of extracting value from data. It aims to address the need for clarity regarding the future of RDBMS and the newer systems. And it highlights the methods in which Actionable Insights can be built into public sector domains, such as Machine Learning, Data Mining, Predictive Analytics and others.


Sign in / Sign up

Export Citation Format

Share Document