FEASIBILITY OF USING GROUP METHOD OF DATA HANDLING (GMDH) APPROACH FOR HORIZONTAL COORDINATE TRANSFORMATION

Machine learning algorithms have emerged as a new paradigm shift in geoscience computations and applications. The present study aims to assess the suitability of Group Method of Data Handling (GMDH) in coordinate transformation. The data used for the coordinate transformation constitute the Ghana national triangulation network which is based on the two-horizontal geodetic datums (Accra 1929 and Leigon 1977) utilised for geospatial applications in Ghana. The GMDH result was compared with other standard methods such as Backpropagation Neural Network (BPNN), Radial Basis Function Neural Network (RBFNN), 2D conformal, and 2D affine. It was observed that the proposed GMDH approach is very efficient in transforming coordinates from the Leigon 1977 datum to the official mapping datum of Ghana, i.e. Accra 1929 datum. It was also found that GMDH could produce comparable and satisfactory results just like the widely used BPNN and RBFNN. However, the classical transformation methods (2D affine and 2D conformal) performed poorly when compared with the machine learning models (GMDH, BPNN and RBFNN). The computational strength of the machine learning models’ is attributed to its self-adaptive capability to detect patterns in data set without considering the existence of functional relationships between the input and output variables. To this end, the proposed GMDH model could be used as a supplementary computational tool to the existing transformation procedures used in the Ghana geodetic reference network.

Download Full-text

Data Augmentation and Pretraining for Template-Based Retrosynthetic Prediction in Computer-Aided Synthesis Planning

10.26434/chemrxiv.11811564.v1 ◽

2020 ◽

Cited By ~ 1

Author(s):

Michael Fortunato ◽

Connor W. Coley ◽

Brian Barnes ◽

Klavs F. Jensen

Keyword(s):

Neural Network ◽

Machine Learning ◽

Data Augmentation ◽

Machine Learning Algorithms ◽

Learning Models ◽

The Neural Network ◽

Computer Aided ◽

Synthesis Planning ◽

The One ◽

Machine Learning Models

This work presents efforts to augment the performance of data-driven machine learning algorithms for reaction template recommendation used in computer-aided synthesis planning software. Often, machine learning models designed to perform the task of prioritizing reaction templates or molecular transformations are focused on reporting high accuracy metrics for the one-to-one mapping of product molecules in reaction databases to the template extracted from the recorded reaction. The available templates that get selected for inclusion in these machine learning models have been previously limited to those that appear frequently in the reaction databases and exclude potentially useful transformations. By augmenting open-access datasets of organic reactions with artificially calculated template applicability and pretraining a template relevance neural network on this augmented applicability dataset, we report an increase in the template applicability recall and an increase in the diversity of predicted precursors. The augmentation and pretraining effectively teaches the neural network an increased set of templates that could theoretically lead to successful reactions for a given target. Even on a small dataset of well curated reactions, the data augmentation and pretraining methods resulted in an increase in top-1 accuracy, especially for rare templates, indicating these strategies can be very useful for small datasets.

Download Full-text

Data Augmentation and Pretraining for Template-Based Retrosynthetic Prediction in Computer-Aided Synthesis Planning

10.26434/chemrxiv.11811564 ◽

2020 ◽

Author(s):

Michael Fortunato ◽

Connor W. Coley ◽

Brian Barnes ◽

Klavs F. Jensen

Keyword(s):

Neural Network ◽

Machine Learning ◽

Data Augmentation ◽

Machine Learning Algorithms ◽

Learning Models ◽

The Neural Network ◽

Computer Aided ◽

Synthesis Planning ◽

The One ◽

Machine Learning Models

Download Full-text

Machine Learning Based Algorithms to Impute PaO 2 from SpO2 Values and Development of an Online Calculator

10.21203/rs.3.rs-1053360/v1 ◽

2021 ◽

Author(s):

Shuangxia Ren ◽

Jill A. Zupetic ◽

Mohammadreza Tabary ◽

Rebecca DeSensi ◽

Mehdi Nouraie ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Clinical Variable ◽

Learning Models ◽

Icu Patients ◽

Non Linear ◽

Online Calculator ◽

Machine Learning Models

Abstract We created an online calculator using machine learning algorithms to impute the partial pressure of oxygen (PaO2)/fraction of delivered oxygen (FiO2) ratio using the non-invasive peripheral saturation of oxygen (SpO2) and compared the accuracy of the machine learning models we developed to previously published equations. We generated three machine learning algorithms (neural network, regression, and kernel-based methods) using 7 clinical variable features (N=9,900 ICU events) and subsequently 3 features (N=20,198 ICU events) as input into the models. Data from mechanically ventilated ICU patients were obtained from the publicly available Medical Information Mart for Intensive Care (MIMIC III) database and used for analysis. Compared to seven features, three features (SpO2, FiO2 and PEEP) were sufficient to impute PaO2 from the SpO2. Any of the tested machine learning models enabled imputation of PaO2 from the SpO2 with lower error and showed greater accuracy in predicting PaO2/FiO2 < 150 compared to the previously published log-linear and non-linear equations. Imputation using data from an independent validation cohort of ICU patients (N = 133) from 2 hospitals within the University of Pittsburgh Medical Center (UPMC) showed greater accuracy with the neural network and kernel-based machine learning models compared to the previously published non-linear equation.

Download Full-text

Development of Rainfall Prediction Models Using Machine Learning Approaches for Different Agro-Climatic Zones

Advances in Data Mining and Database Management - Handbook of Research on Automated Feature Engineering and Advanced Applications in Data Science ◽

10.4018/978-1-7998-6659-6.ch005 ◽

2021 ◽

pp. 72-94

Author(s):

Diwakar Naidu ◽

Babita Majhi ◽

Surendra Kumar Chandniha

Keyword(s):

Neural Network ◽

Machine Learning ◽

Large Scale ◽

Prediction Models ◽

Machine Learning Algorithms ◽

Learning Approaches ◽

Learning Models ◽

Climatic Zones ◽

Environmental Prediction ◽

Machine Learning Models

This study focuses on modelling the changes in rainfall patterns in different agro-climatic zones due to climate change through statistical downscaling of large-scale climate variables using machine learning approaches. Potential of three machine learning algorithms, multilayer artificial neural network (MLANN), radial basis function neural network (RBFNN), and least square support vector machine (LS-SVM) have been investigated. The large-scale climate variable are obtained from National Centre for Environmental Prediction (NCEP) reanalysis product and used as predictors for model development. Proposed machine learning models are applied to generate projected time series of rainfall for the period 2021-2050 using the Hadley Centre coupled model (HadCM3) B2 emission scenario data as predictors. An increasing trend in anticipated rainfall is observed during 2021-2050 in all the ACZs of Chhattisgarh State. Among the machine learning models, RBFNN found as more feasible technique for modeling of monthly rainfall in this region.

Download Full-text

Network Anomaly Detection based on Late Fusion of Several Machine Learning Algorithms

International journal of Computer Networks & Communications ◽

10.5121/ijcnc.2020.12608 ◽

2020 ◽

Vol 12 (6) ◽

pp. 117-131

Author(s):

Tran Hoang Hai ◽

Le Huy Hoang ◽

Eui-nam Huh

Keyword(s):

Machine Learning ◽

Detection System ◽

Denial Of Service ◽

Machine Learning Algorithms ◽

Enterprise Networks ◽

Learning Models ◽

Data Set ◽

Adaptive Boosting ◽

Random Decision Forests ◽

Machine Learning Models

Today's Internet and enterprise networks are so popular as they can easily provide multimedia and ecommerce services to millions of users over the Internet in our daily lives. Since then, security has been a challenging problem in the Internet's world. That issue is called Cyberwar, in which attackers can aim or raise Distributed Denial of Service (DDoS) to others to take down the operation of enterprises Intranet. Therefore, the need of applying an Intrusion Detection System (IDS) is very important to enterprise networks. In this paper, we propose a smarter solution to detect network anomalies in Cyberwar using Stacking techniques in which we apply three popular machine learning models: k-nearest neighbor algorithm (KNN), Adaptive Boosting (AdaBoost), and Random Decision Forests (RandomForest). Our proposed scheme uses the Logistic Regression method to automatically search for better parameters to the Stacking model. We do the performance evaluation of our proposed scheme on the latest data set NSLKDD 2019 dataset. We also compare the achieved results with individual machine learning models to show that our proposed model achieves much higher accuracy than previous works.

Download Full-text

Profitable Algorithmic Trading Strategy

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.39101 ◽

2021 ◽

Vol 9 (12) ◽

pp. 2424-2433

Author(s):

Prince Nathan S

Keyword(s):

Neural Network ◽

Machine Learning ◽

Logistic Regression ◽

Data Analysis ◽

Data Science ◽

Principal Component ◽

Machine Learning Techniques ◽

Learning Models ◽

Data Set ◽

Machine Learning Models

Abstract: Cryptocurrency has drastically increased its growth in recent years and Bitcoin (BTC) is a very popular type of currency among all the other types of cryptocurrencies which is been used in most of the sectors nowadays for trading, transactions, bookings, etc. In this paper, we aim to predict the change in bitcoin prices by using machine learning techniques on data from Investing.com. We interpret the output and accuracy rate using various machine learning models. To see whether to buy or sell the bitcoin we created exploratory data analysis from a year of data set and predict the next 5 days change using machine learning models like logistic Regression, Logistic Regression with PCA (Principal Component Analysis), and Neural network. Keywords: Data Science, Machine Learning, Regression, PCA, Neural Network, Data Analysis

Download Full-text

Machine Learning in Futures Markets

Journal of Risk and Financial Management ◽

10.3390/jrfm14030119 ◽

2021 ◽

Vol 14 (3) ◽

pp. 119

Author(s):

Fabian Waldow ◽

Matthias Schnaubelt ◽

Christopher Krauss ◽

Thomas Günter Fischer

Keyword(s):

Machine Learning ◽

Futures Markets ◽

Learning Models ◽

Cross Sectional ◽

Data Set ◽

Statistical Arbitrage ◽

Out Of Sample ◽

Sample Testing ◽

Arbitrage Strategy ◽

Machine Learning Models

In this paper, we demonstrate how a well-established machine learning-based statistical arbitrage strategy can be successfully transferred from equity to futures markets. First, we preprocess futures time series comprised of front months to render them suitable for our returns-based trading framework and compile a data set comprised of 60 futures covering nearly 10 trading years. Next, we train several machine learning models to predict whether the h-day-ahead return of each future out- or underperforms the corresponding cross-sectional median return. Finally, we enter long/short positions for the top/flop-k futures for a duration of h days and assess the financial performance of the resulting portfolio in an out-of-sample testing period. Thereby, we find the machine learning models to yield statistically significant out-of-sample break-even transaction costs of 6.3 bp—a clear challenge to the semi-strong form of market efficiency. Finally, we discuss sources of profitability and the robustness of our findings.

Download Full-text

Prediction of aircraft estimated time of arrival using machine learning methods

The Aeronautical Journal ◽

10.1017/aer.2021.13 ◽

2021 ◽

pp. 1-15

Author(s):

O. Basturk ◽

C. Cetek

Keyword(s):

Machine Learning ◽

Web Application ◽

Absolute Error ◽

Machine Learning Algorithms ◽

Weather Data ◽

Time Of Arrival ◽

Learning Models ◽

Trajectory Data ◽

Different Sources ◽

Machine Learning Models

ABSTRACT In this study, prediction of aircraft Estimated Time of Arrival (ETA) is proposed using machine learning algorithms. Accurate prediction of ETA is important for management of delay and air traffic flow, runway assignment, gate assignment, collaborative decision making (CDM), coordination of ground personnel and equipment, and optimisation of arrival sequence etc. Machine learning is able to learn from experience and make predictions with weak assumptions or no assumptions at all. In the proposed approach, general flight information, trajectory data and weather data were obtained from different sources in various formats. Raw data were converted to tidy data and inserted into a relational database. To obtain the features for training the machine learning models, the data were explored, cleaned and transformed into convenient features. New features were also derived from the available data. Random forests and deep neural networks were used to train the machine learning models. Both models can predict the ETA with a mean absolute error (MAE) less than 6min after departure, and less than 3min after terminal manoeuvring area (TMA) entrance. Additionally, a web application was developed to dynamically predict the ETA using proposed models.

Download Full-text

Characterizing and Evaluating the Zoonotic Potential of Novel Viruses Discovered in Vampire Bats

Viruses ◽

10.3390/v13020252 ◽

2021 ◽

Vol 13 (2) ◽

pp. 252

Author(s):

Laura M. Bergner ◽

Nardus Mollentze ◽

Richard J. Orton ◽

Carlos Tello ◽

Alice Broos ◽

...

Keyword(s):

Machine Learning ◽

Phylogenetic Analyses ◽

Human Infection ◽

Machine Learning Algorithms ◽

Zoonotic Potential ◽

Metagenomic Sequencing ◽

Learning Models ◽

Sequencing Data ◽

Vampire Bats ◽

Machine Learning Models

The contemporary surge in metagenomic sequencing has transformed knowledge of viral diversity in wildlife. However, evaluating which newly discovered viruses pose sufficient risk of infecting humans to merit detailed laboratory characterization and surveillance remains largely speculative. Machine learning algorithms have been developed to address this imbalance by ranking the relative likelihood of human infection based on viral genome sequences, but are not yet routinely applied to viruses at the time of their discovery. Here, we characterized viral genomes detected through metagenomic sequencing of feces and saliva from common vampire bats (Desmodus rotundus) and used these data as a case study in evaluating zoonotic potential using molecular sequencing data. Of 58 detected viral families, including 17 which infect mammals, the only known zoonosis detected was rabies virus; however, additional genomes were detected from the families Hepeviridae, Coronaviridae, Reoviridae, Astroviridae and Picornaviridae, all of which contain human-infecting species. In phylogenetic analyses, novel vampire bat viruses most frequently grouped with other bat viruses that are not currently known to infect humans. In agreement, machine learning models built from only phylogenetic information ranked all novel viruses similarly, yielding little insight into zoonotic potential. In contrast, genome composition-based machine learning models estimated different levels of zoonotic potential, even for closely related viruses, categorizing one out of four detected hepeviruses and two out of three picornaviruses as having high priority for further research. We highlight the value of evaluating zoonotic potential beyond ad hoc consideration of phylogeny and provide surveillance recommendations for novel viruses in a wildlife host which has frequent contact with humans and domestic animals.

Download Full-text

High performance logistic regression for privacy-preserving genome analysis

BMC Medical Genomics ◽

10.1186/s12920-020-00869-9 ◽

2021 ◽

Vol 14 (1) ◽

Author(s):

Martine De Cock ◽

Rafael Dowsley ◽

Anderson C. A. Nascimento ◽

Davis Railsback ◽

Jianwei Shen ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Genome Analysis ◽

Local Area Network ◽

Local Area ◽

Activation Function ◽

Area Network ◽

Learning Models ◽

Data Set ◽

Machine Learning Models

Abstract Background In biomedical applications, valuable data is often split between owners who cannot openly share the data because of privacy regulations and concerns. Training machine learning models on the joint data without violating privacy is a major technology challenge that can be addressed by combining techniques from machine learning and cryptography. When collaboratively training machine learning models with the cryptographic technique named secure multi-party computation, the price paid for keeping the data of the owners private is an increase in computational cost and runtime. A careful choice of machine learning techniques, algorithmic and implementation optimizations are a necessity to enable practical secure machine learning over distributed data sets. Such optimizations can be tailored to the kind of data and Machine Learning problem at hand. Methods Our setup involves secure two-party computation protocols, along with a trusted initializer that distributes correlated randomness to the two computing parties. We use a gradient descent based algorithm for training a logistic regression like model with a clipped ReLu activation function, and we break down the algorithm into corresponding cryptographic protocols. Our main contributions are a new protocol for computing the activation function that requires neither secure comparison protocols nor Yao’s garbled circuits, and a series of cryptographic engineering optimizations to improve the performance. Results For our largest gene expression data set, we train a model that requires over 7 billion secure multiplications; the training completes in about 26.90 s in a local area network. The implementation in this work is a further optimized version of the implementation with which we won first place in Track 4 of the iDASH 2019 secure genome analysis competition. Conclusions In this paper, we present a secure logistic regression training protocol and its implementation, with a new subprotocol to securely compute the activation function. To the best of our knowledge, we present the fastest existing secure multi-party computation implementation for training logistic regression models on high dimensional genome data distributed across a local area network.

Download Full-text