Quantitative Demonstration of a High-Fidelity Oil-Based Mud Resistivity Imager Using a Controlled Experiment

The objective of this paper is to describe and validate a new approach for acquiring images that provides both qualitative and quantitative information on the formation electrical properties using a high-resolution, oil-based mud imager (HROBMI) tool. This new multifrequency imaging tool is able to function at high frequencies (in the MHz range) in oil-based muds. To allow for the quantitative estimation of formation and mud properties from the HROBMI data, a hybrid machine-learning/inversion approach was implemented. In this hybrid approach, machine-learning models corresponding to different candidate mud properties are trained, and the resulting regression functions are stored. For a given measurement data set, predictions of these different models are used to quickly identify an optimum mud candidate. This information is then fed into an inversion algorithm that provides accurate quantitative information on the logging environment of the HROBMI. The accuracy of this algorithm has been verified using a test fixture that enables the change of formation properties in different mud environments. The measurements from the HROBMI are a function of the formation properties: resistivity and permittivity, frequency, and mud properties. The hybrid algorithm can untangle HROBMI data from multiple frequencies to obtain true formation resistivity images independent of the other parameters that affect the tool measurements. In addition, the algorithm provides formation permittivity images as well as a standoff image. The results have been provided from both the controlled experiments in the test fixture and from field logs.

Download Full-text

An Intelligent Multicriteria Model for Diagnosing Dementia in People Infected with Human Immunodeficiency Virus

Applied Sciences ◽

10.3390/app112110457 ◽

2021 ◽

Vol 11 (21) ◽

pp. 10457

Author(s):

Luana I. C. C. Pinheiro ◽

Maria Lúcia D. Pereira ◽

Evandro C. de Andrade ◽

Luciano C. Nunes ◽

Wilson C. de Abreu ◽

...

Keyword(s):

Machine Learning ◽

Human Immunodeficiency Virus ◽

Mental Disorders ◽

Neurological Disorders ◽

Hybrid Approach ◽

International Classification Of Diseases ◽

Machine Learning Algorithms ◽

World Health ◽

Data Set ◽

Immunodeficiency Virus

Hybrid models to detect dementia based on Machine Learning can provide accurate diagnoses in individuals with neurological disorders and cognitive complications caused by Human Immunodeficiency Virus (HIV) infection. This study proposes a hybrid approach, using Machine Learning algorithms associated with the multicriteria method of Verbal Decision Analysis (VDA). Dementia, which affects many HIV-infected individuals, refers to neurodevelopmental and mental disorders. Some manuals standardize the information used in the correct detection of neurological disorders with cognitive complications. Among the most common manuals used are the DSM-5 (Diagnostic and Statistical Manual of Mental Disorders, 5th edition) of the American Psychiatric Association and the International Classification of Diseases, 10th edition (ICD-10)—both published by World Health Organization (WHO). The model is designed to explore the predictive of specific data. Furthermore, a well-defined database data set improves and optimizes the diagnostic models sought in the research.

Download Full-text

Fault Diagnosis via Neural Ordinary Differential Equations

Applied Sciences ◽

10.3390/app11093776 ◽

2021 ◽

Vol 11 (9) ◽

pp. 3776

Author(s):

Luis Enciso-Salas ◽

Gustavo Pérez-Zuñiga ◽

Javier Sotomayor-Moriano

Keyword(s):

Machine Learning ◽

Fault Diagnosis ◽

Differential Equations ◽

Ordinary Differential Equations ◽

Data Augmentation ◽

Hybrid Approach ◽

Training Data ◽

Data Series ◽

Learning Approaches ◽

Data Set

Implementation of model-based fault diagnosis systems can be a difficult task due to the complex dynamics of most systems, an appealing alternative to avoiding modeling is to use machine learning-based techniques for which the implementation is more affordable nowadays. However, the latter approach often requires extensive data processing. In this paper, a hybrid approach using recent developments in neural ordinary differential equations is proposed. This approach enables us to combine a natural deep learning technique with an estimated model of the system, making the training simpler and more efficient. For evaluation of this methodology, a nonlinear benchmark system is used by simulation of faults in actuators, sensors, and process. Simulation results show that the proposed methodology requires less processing for the training in comparison with conventional machine learning approaches since the data-set is directly taken from the measurements and inputs. Furthermore, since the model used in the essay is only a structural approximation of the plant; no advanced modeling is required. This approach can also alleviate some pitfalls of training data-series, such as complicated data augmentation methodologies and the necessity for big amounts of data.

Download Full-text

Automatically Generating 60,000 CAD Variants for Big Data Applications

Volume 1: 39th Computers and Information in Engineering Conference ◽

10.1115/detc2019-97378 ◽

2019 ◽

Author(s):

Satchit Ramnath ◽

Payam Haghighi ◽

Ji Hoon Kim ◽

Duane Detwiler ◽

Michael Berry ◽

...

Keyword(s):

Machine Learning ◽

Computer Aided Design ◽

Structural Integrity ◽

Hybrid Approach ◽

Machine Learning Algorithms ◽

Data Sets ◽

Design Data ◽

Data Set ◽

Large Computer ◽

Cad Models

Abstract Machine learning is opening up new ways of optimizing designs but it requires large data sets for training and verification. While such data sets already exist for financial, sales and business applications, this is not the case for engineering product design data. This paper discusses our efforts in curating a large Computer Aided Design (CAD) data set with desired variety and validity for automotive body structural compositions. Manual creation of 60,000 CAD variants is obviously not viable so we examine several approaches that can be automated with commercial CAD systems such as Parametric Design, Feature Based Design, Design Tables/Catalogs of Variants and Macros. We discuss pros and cons of each method and how we devised a combination of these approaches. This hybrid approach was used in association with DOE tables. Since the geometric configurations and characteristics need to be correlated to performance (structural integrity), the paper also demonstrates automated workflows to perform FEA on CAD models generated. Key simulation results can then be associated with CAD geometry and, for example, processes using machine learning algorithms for both supervised and unsupervised learning. The information obtained from the application of such methods to historical CAD models may help to understand the reasoning behind experiential design decisions. With the increase in computing power and network speed, such datasets together with novel machine learning methods, could assist in generating better designs, which could potentially be obtained by a combination of existing ones, or might provide insights into completely new design concepts meeting or exceeding the performance requirements.

Download Full-text

A comparative evaluation of shear stress modeling based on machine learning methods in small streams

Journal of Hydroinformatics ◽

10.2166/hydro.2015.142 ◽

2015 ◽

Vol 17 (5) ◽

pp. 805-816 ◽

Cited By ~ 5

Author(s):

Onur Genç ◽

Bilal Gonen ◽

Mehmet Ardıçlıoğlu

Keyword(s):

Machine Learning ◽

Shear Stress ◽

Stress Distribution ◽

Measurement Data ◽

Machine Learning Algorithms ◽

Shear Stress Distribution ◽

Classification And Regression Tree ◽

Data Set ◽

Small Streams ◽

Artificial Neural

Predicting shear stress distribution has proved to be a critical problem to solve. Hence, the basic objective of this paper is to develop a prediction of shear stress distribution by machine learning algorithms including artificial neural networks, classification and regression tree, generalized linear models. The data set, which is large and feature-rich, is utilized to improve machine learning-based predictive models and extract the most important predictive factors. The 10-fold cross-validation approach was used to determine the performances of prediction methods. The predictive performances of the proposed models were found to be very close to each other. However, the results indicated that the artificial neural network, which has the R value of 0.92 ± 0.03, achieved the best classification performance overall accuracy on the 10-fold holdout sample. The predictions of all machine learning models were well correlated with measurement data.

Download Full-text

Identification of NLOS and Multi-Path Conditions in UWB Localization Using Machine Learning Methods

10.20944/preprints202004.0503.v1 ◽

2020 ◽

Author(s):

Cung Lian Sang ◽

Bastian Steinhagen ◽

Jonas Dominik Homburg ◽

Michael Adams ◽

Marc Hesse ◽

...

Keyword(s):

Machine Learning ◽

Indoor Localization ◽

Measurement Data ◽

Ultra Wideband ◽

Line Of Sight ◽

Support Vector ◽

Case Scenario ◽

Worst Case ◽

Data Set ◽

Worst Case Scenario

In Ultra-wideband (UWB)-based wireless ranging or distance measurement, differentiation between line-of-sight~(LOS), non-line-of-sight~(NLOS), and multi-path (MP) conditions are important for precise indoor localization. This is because the accuracy of the reported measured distance in UWB ranging systems is directly affected by the measurement conditions (LOS, NLOS or MP). However, the major contributions in literature only address the binary classification between LOS and NLOS in UWB ranging systems. The MP condition is usually ignored. In fact, the MP condition also has a significant impact on the ranging errors of the UWB compared to the direct LOS measurement results. Though, the magnitudes of the error contained in MP conditions are generally lower than completely blocked NLOS scenarios. This paper addresses machine learning techniques for identification of the mentioned three classes (LOS, NLOS, and MP) in the UWB indoor localization system using an experimental data-set. The data-set was collected in different conditions at different scenarios in indoor environments. Using the collected real measurement data, we compare three machine learning (ML) classifiers, i.e., support vector machine (SVM), random forest (RF) based on an ensemble learning method, and multilayer perceptron (MLP) based on a deep artificial neural network, in terms of their performance. The results show that applying ML methods in UWB ranging systems are effective in identification of the above-mentioned three classes. In specific, the overall accuracy reaches up to 91.9% in the best-case scenario and 72.9% in the worst-case scenario. Regarding the F1-score, it is 0.92 in the best-case and 0.69 in the worst-case scenario. For reproducible results and further exploration, we (will) provide the publicly accessible experimental research data discussed in this paper at PUB - Publications at Bielefeld University. The evaluations of the three classifiers are conducted using the open-source python machine learning library scikit-learn.

Download Full-text

Exchange Spin Coupling from Gaussian Process Regression

10.26434/chemrxiv.12589541.v3 ◽

2020 ◽

Author(s):

Marc Philipp Bahlke ◽

Natnael Mogos ◽

Jonny Proppe ◽

Carmen Herrmann

Keyword(s):

Machine Learning ◽

Gaussian Process ◽

Gaussian Process Regression ◽

Molecular Magnets ◽

Molecular Structures ◽

Spin Coupling ◽

Structure Property ◽

Data Set ◽

Uncertainty Estimates

Heisenberg exchange spin coupling between metal centers is essential for describing and understanding the electronic structure of many molecular catalysts, metalloenzymes, and molecular magnets for potential application in information technology. We explore the machine-learnability of exchange spin coupling, which has not been studied yet. We employ Gaussian process regression since it can potentially deal with small training sets (as likely associated with the rather complex molecular structures required for exploring spin coupling) and since it provides uncertainty estimates (“error bars”) along with predicted values. We compare a range of descriptors and kernels for 257 small dicopper complexes and find that a simple descriptor based on chemical intuition, consisting only of copper-bridge angles and copper-copper distances, clearly outperforms several more sophisticated descriptors when it comes to extrapolating towards larger experimentally relevant complexes. Exchange spin coupling is similarly easy to learn as the polarizability, while learning dipole moments is much harder. The strength of the sophisticated descriptors lies in their ability to linearize structure-property relationships, to the point that a simple linear ridge regression performs just as well as the kernel-based machine-learning model for our small dicopper data set. The superior extrapolation performance of the simple descriptor is unique to exchange spin coupling, reinforcing the crucial role of choosing a suitable descriptor, and highlighting the interesting question of the role of chemical intuition vs. systematic or automated selection of features for machine learning in chemistry and material science.

Download Full-text

Random Forest Refinement of Pairwise Potentials for Protein-ligand Decoy Detection

10.26434/chemrxiv.8047820.v1 ◽

2019 ◽

Cited By ~ 1

Author(s):

Jun Pei ◽

Zheng Zheng ◽

Hyunji Kim ◽

Lin Song ◽

Sarah Walworth ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Probability Function ◽

Pair Potential ◽

Scoring Function ◽

Stable Structure ◽

Scoring Functions ◽

Atom Pair ◽

Data Set ◽

Atom Pairs

An accurate scoring function is expected to correctly select the most stable structure from a set of pose candidates. One can hypothesize that a scoring function’s ability to identify the most stable structure might be improved by emphasizing the most relevant atom pairwise interactions. However, it is hard to evaluate the relevant importance for each atom pair using traditional means. With the introduction of machine learning methods, it has become possible to determine the relative importance for each atom pair present in a scoring function. In this work, we use the Random Forest (RF) method to refine a pair potential developed by our laboratory (GARF6) by identifying relevant atom pairs that optimize the performance of the potential on our given task. Our goal is to construct a machine learning (ML) model that can accurately differentiate the native ligand binding pose from candidate poses using a potential refined by RF optimization. We successfully constructed RF models on an unbalanced data set with the ‘comparison’ concept and, the resultant RF models were tested on CASF-2013.5 In a comparison of the performance of our RF models against 29 scoring functions, we found our models outperformed the other scoring functions in predicting the native pose. In addition, we used two artificial designed potential models to address the importance of the GARF potential in the RF models: (1) a scrambled probability function set, which was obtained by mixing up atom pairs and probability functions in GARF, and (2) a uniform probability function set, which share the same peak positions with GARF but have fixed peak heights. The results of accuracy comparison from RF models based on the scrambled, uniform, and original GARF potential clearly showed that the peak positions in the GARF potential are important while the well depths are not. <br>

Download Full-text

Hybrid Approach to Sentiment Analysis based on Syntactic Analy- sis and Machine Learning

Language and Information ◽

10.29403/li.14.2.9 ◽

2010 ◽

Vol 14 (2) ◽

pp. 159-181

Author(s):

MUNPYO HONG ◽

MIYOUNG SHIN ◽

Shinhye Park ◽

Hyungmin Lee

Keyword(s):

Machine Learning ◽

Sentiment Analysis ◽

Hybrid Approach

Download Full-text

In silico Prediction of Inhibitory Constant of Thrombin Inhibitors Using Machine Learning

Combinatorial Chemistry & High Throughput Screening ◽

10.2174/1386207322666181220130232 ◽

2019 ◽

Vol 21 (9) ◽

pp. 662-669 ◽

Cited By ~ 1

Author(s):

Junnan Zhao ◽

Lu Zhu ◽

Weineng Zhou ◽

Lingfeng Yin ◽

Yuchen Wang ◽

...

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Regression Tree ◽

Large Data ◽

Thrombin Inhibitors ◽

Coagulation Cascade ◽

Gradient Boosting ◽

Support Vector ◽

Data Set ◽

Descriptor Selection

Background: Thrombin is the central protease of the vertebrate blood coagulation cascade, which is closely related to cardiovascular diseases. The inhibitory constant Ki is the most significant property of thrombin inhibitors. Method: This study was carried out to predict Ki values of thrombin inhibitors based on a large data set by using machine learning methods. Taking advantage of finding non-intuitive regularities on high-dimensional datasets, machine learning can be used to build effective predictive models. A total of 6554 descriptors for each compound were collected and an efficient descriptor selection method was chosen to find the appropriate descriptors. Four different methods including multiple linear regression (MLR), K Nearest Neighbors (KNN), Gradient Boosting Regression Tree (GBRT) and Support Vector Machine (SVM) were implemented to build prediction models with these selected descriptors. Results: The SVM model was the best one among these methods with R2=0.84, MSE=0.55 for the training set and R2=0.83, MSE=0.56 for the test set. Several validation methods such as yrandomization test and applicability domain evaluation, were adopted to assess the robustness and generalization ability of the model. The final model shows excellent stability and predictive ability and can be employed for rapid estimation of the inhibitory constant, which is full of help for designing novel thrombin inhibitors.

Download Full-text

Comparative Analysis of Machine Learning Techniques Using Predictive Modeling

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813999200904164539 ◽

2020 ◽

Vol 13 ◽

Author(s):

Ritu Khandelwal ◽

Hemlata Goyal ◽

Rajveer Singh Shekhawat

Keyword(s):

Machine Learning ◽

Comparative Analysis ◽

Data Science ◽

Training Data ◽

Machine Learning Techniques ◽

Future Trends ◽

Data Set ◽

Learning Stage ◽

Learning Techniques ◽

Different Types

Introduction: Machine learning is an intelligent technology that works as a bridge between businesses and data science. With the involvement of data science, the business goal focuses on findings to get valuable insights on available data. The large part of Indian Cinema is Bollywood which is a multi-million dollar industry. This paper attempts to predict whether the upcoming Bollywood Movie would be Blockbuster, Superhit, Hit, Average or Flop. For this Machine Learning techniques (classification and prediction) will be applied. To make classifier or prediction model first step is the learning stage in which we need to give the training data set to train the model by applying some technique or algorithm and after that different rules are generated which helps to make a model and predict future trends in different types of organizations. Methods: All the techniques related to classification and Prediction such as Support Vector Machine(SVM), Random Forest, Decision Tree, Naïve Bayes, Logistic Regression, Adaboost, and KNN will be applied and try to find out efficient and effective results. All these functionalities can be applied with GUI Based workflows available with various categories such as data, Visualize, Model, and Evaluate. Result: To make classifier or prediction model first step is learning stage in which we need to give the training data set to train the model by applying some technique or algorithm and after that different rules are generated which helps to make a model and predict future trends in different types of organizations Conclusion: This paper focuses on Comparative Analysis that would be performed based on different parameters such as Accuracy, Confusion Matrix to identify the best possible model for predicting the movie Success. By using Advertisement Propaganda, they can plan for the best time to release the movie according to the predicted success rate to gain higher benefits. Discussion: Data Mining is the process of discovering different patterns from large data sets and from that various relationships are also discovered to solve various problems that come in business and helps to predict the forthcoming trends. This Prediction can help Production Houses for Advertisement Propaganda and also they can plan their costs and by assuring these factors they can make the movie more profitable.

Download Full-text