Profitable Algorithmic Trading Strategy

Abstract Background In biomedical applications, valuable data is often split between owners who cannot openly share the data because of privacy regulations and concerns. Training machine learning models on the joint data without violating privacy is a major technology challenge that can be addressed by combining techniques from machine learning and cryptography. When collaboratively training machine learning models with the cryptographic technique named secure multi-party computation, the price paid for keeping the data of the owners private is an increase in computational cost and runtime. A careful choice of machine learning techniques, algorithmic and implementation optimizations are a necessity to enable practical secure machine learning over distributed data sets. Such optimizations can be tailored to the kind of data and Machine Learning problem at hand. Methods Our setup involves secure two-party computation protocols, along with a trusted initializer that distributes correlated randomness to the two computing parties. We use a gradient descent based algorithm for training a logistic regression like model with a clipped ReLu activation function, and we break down the algorithm into corresponding cryptographic protocols. Our main contributions are a new protocol for computing the activation function that requires neither secure comparison protocols nor Yao’s garbled circuits, and a series of cryptographic engineering optimizations to improve the performance. Results For our largest gene expression data set, we train a model that requires over 7 billion secure multiplications; the training completes in about 26.90 s in a local area network. The implementation in this work is a further optimized version of the implementation with which we won first place in Track 4 of the iDASH 2019 secure genome analysis competition. Conclusions In this paper, we present a secure logistic regression training protocol and its implementation, with a new subprotocol to securely compute the activation function. To the best of our knowledge, we present the fastest existing secure multi-party computation implementation for training logistic regression models on high dimensional genome data distributed across a local area network.

Download Full-text

Applying Machine Learning Techniques to Predict the Properties of Energetic Materials

10.26434/chemrxiv.5883157.v2 ◽

2018 ◽

Cited By ~ 1

Author(s):

Daniel Elton ◽

Zois Boukouvalas ◽

Mark S. Butrico ◽

Mark D. Fuge ◽

Peter W. Chung

Keyword(s):

Machine Learning ◽

Energetic Materials ◽

Molecular Structures ◽

Machine Learning Techniques ◽

Small Data ◽

Detonation Pressure ◽

Learning Models ◽

Data Set ◽

Learning Techniques ◽

Machine Learning Models

We present a proof of concept that machine learning techniques can be used to predict the properties of CNOHF energetic molecules from their molecular structures. We focus on a small but diverse dataset consisting of 109 molecular structures spread across ten compound classes. Up until now, candidate molecules for energetic materials have been screened using predictions from expensive quantum simulations and thermochemical codes. We present a comprehensive comparison of machine learning models and several molecular featurization methods - sum over bonds, custom descriptors, Coulomb matrices, bag of bonds, and fingerprints. The best featurization was sum over bonds (bond counting), and the best model was kernel ridge regression. Despite having a small data set, we obtain acceptable errors and Pearson correlations for the prediction of detonation pressure, detonation velocity, explosive energy, heat of formation, density, and other properties out of sample. By including another dataset with 309 additional molecules in our training we show how the error can be pushed lower, although the convergence with number of molecules is slow. Our work paves the way for future applications of machine learning in this domain, including automated lead generation and interpreting machine learning models to obtain novel chemical insights.

Download Full-text

Statistical and machine learning models for classification of human wear and delivery days in accelerometry data

10.1101/2020.12.31.424867 ◽

2021 ◽

Author(s):

Ryan Moore ◽

Kristin R. Archer ◽

Leena Choi

Keyword(s):

Neural Network ◽

Machine Learning ◽

Logistic Regression ◽

Random Forest ◽

Human Activity ◽

Recurrent Neural Network ◽

Learning Models ◽

Learning Context ◽

Machine Learning Models

AbstractPurposeAccelerometers are increasingly utilized in healthcare research to assess human activity. Accelerometry data are often collected by mailing accelerometers to participants, who wear the accelerometers to collect data on their activity. The devices are then mailed back to the laboratory for analysis. We develop models to classify days in accelerometry data as activity from actual human wear or the delivery process. These models can be used to automate the cleaning of accelerometry datasets that are adulterated with activity from delivery.MethodsFor the classification of delivery days in accelerometry data, we developed statistical and machine learning models in a supervised learning context using a large human activity and delivery labeled accelerometry dataset. We extracted several features, which were included to develop random forest, logistic regression, mixed effects regression, and multilayer perceptron models, while convolutional neural network, recurrent neural network, and hybrid convolutional recurrent neural network models were developed without feature extraction. Model performances were assessed using Monte Carlo cross-validation.ResultsWe found that a hybrid convolutional recurrent neural network performed best in the classification task with an F1 score of 0.960 but simpler models such as logistic regression and random forest also had excellent performance with F1 scores of 0.951 and 0.957, respectively.ConclusionThe models developed in this study can be used to classify days in accelerometry data as either human or delivery activity. An analyst can weigh the larger computational cost and greater performance of the convolutional recurrent neural network against the faster but slightly less powerful random forest or logistic regression. The best performing models for classification of delivery data are publicly available on the open source R package, PhysicalActivity.

Download Full-text

Applying Machine Learning Techniques to Predict the Properties of Energetic Materials

10.26434/chemrxiv.5883157.v1 ◽

2018 ◽

Author(s):

Daniel Elton ◽

Zois Boukouvalas ◽

Mark S. Butrico ◽

Mark D. Fuge ◽

Peter W. Chung

Keyword(s):

Machine Learning ◽

Energetic Materials ◽

Molecular Structures ◽

Machine Learning Techniques ◽

Small Data ◽

Detonation Pressure ◽

Learning Models ◽

Data Set ◽

Learning Techniques ◽

Machine Learning Models

We present a proof of concept that machine learning techniques can be used to predict the properties of CNOHF energetic molecules from their molecular structures. We focus on a small but diverse dataset consisting of 109 molecular structures spread across ten compound classes. Up until now, candidate molecules for energetic materials have been screened using predictions from expensive quantum simulations and thermochemical codes. We present a comprehensive comparison of machine learning models and several molecular featurization methods - sum over bonds, custom descriptors, Coulomb matrices, Bag of Bonds, and fingerprints. The best featurization was sum over bonds (bond counting), and the best model was kernel ridge regression. Despite having a small data set, we obtain acceptable errors and Pearson correlations for the prediction of detonation pressure, detonation velocity, explosive energy, heat of formation, density, and other properties out of sample. By including another dataset with 309 additional molecules in our training we show how the error can be pushed lower, although the convergence with number of molecules is slow. Our work paves the way for future applications of machine learning in this domain, including automated lead generation and interpreting machine learning models to obtain novel chemical insights.

Download Full-text

Applying Machine Learning Techniques to Predict the Properties of Energetic Materials

10.26434/chemrxiv.5883157 ◽

2018 ◽

Author(s):

Daniel Elton ◽

Zois Boukouvalas ◽

Mark S. Butrico ◽

Mark D. Fuge ◽

Peter W. Chung

Keyword(s):

Machine Learning ◽

Energetic Materials ◽

Molecular Structures ◽

Machine Learning Techniques ◽

Small Data ◽

Detonation Pressure ◽

Learning Models ◽

Data Set ◽

Learning Techniques ◽

Machine Learning Models

We present a proof of concept that machine learning techniques can be used to predict the properties of CNOHF energetic molecules from their molecular structures. We focus on a small but diverse dataset consisting of 109 molecular structures spread across ten compound classes. Up until now, candidate molecules for energetic materials have been screened using predictions from expensive quantum simulations and thermochemical codes. We present a comprehensive comparison of machine learning models and several molecular featurization methods - sum over bonds, custom descriptors, Coulomb matrices, bag of bonds, and fingerprints. The best featurization was sum over bonds (bond counting), and the best model was kernel ridge regression. Despite having a small data set, we obtain acceptable errors and Pearson correlations for the prediction of detonation pressure, detonation velocity, explosive energy, heat of formation, density, and other properties out of sample. By including another dataset with 309 additional molecules in our training we show how the error can be pushed lower, although the convergence with number of molecules is slow. Our work paves the way for future applications of machine learning in this domain, including automated lead generation and interpreting machine learning models to obtain novel chemical insights.

Download Full-text

FEASIBILITY OF USING GROUP METHOD OF DATA HANDLING (GMDH) APPROACH FOR HORIZONTAL COORDINATE TRANSFORMATION

Geodesy and Cartography ◽

10.3846/gac.2020.10486 ◽

2020 ◽

Vol 46 (2) ◽

pp. 55-66

Author(s):

Bernard Kumi-Boateng ◽

Yao Yevenyo Ziggah

Keyword(s):

Neural Network ◽

Machine Learning ◽

Coordinate Transformation ◽

Machine Learning Algorithms ◽

Group Method ◽

Data Handling ◽

Learning Models ◽

Data Set ◽

Functional Relationships ◽

Machine Learning Models

Machine learning algorithms have emerged as a new paradigm shift in geoscience computations and applications. The present study aims to assess the suitability of Group Method of Data Handling (GMDH) in coordinate transformation. The data used for the coordinate transformation constitute the Ghana national triangulation network which is based on the two-horizontal geodetic datums (Accra 1929 and Leigon 1977) utilised for geospatial applications in Ghana. The GMDH result was compared with other standard methods such as Backpropagation Neural Network (BPNN), Radial Basis Function Neural Network (RBFNN), 2D conformal, and 2D affine. It was observed that the proposed GMDH approach is very efficient in transforming coordinates from the Leigon 1977 datum to the official mapping datum of Ghana, i.e. Accra 1929 datum. It was also found that GMDH could produce comparable and satisfactory results just like the widely used BPNN and RBFNN. However, the classical transformation methods (2D affine and 2D conformal) performed poorly when compared with the machine learning models (GMDH, BPNN and RBFNN). The computational strength of the machine learning models’ is attributed to its self-adaptive capability to detect patterns in data set without considering the existence of functional relationships between the input and output variables. To this end, the proposed GMDH model could be used as a supplementary computational tool to the existing transformation procedures used in the Ghana geodetic reference network.

Download Full-text

Classification Comparative Analysis for Detection of Brain Tumor Using Neural Network, Logistic Regression & KNN Classifier with VGG19 Convolution Neural Network Feature Extraction

10.21467/proceedings.114.6 ◽

2021 ◽

Author(s):

Vijaya Kamble ◽

Rohin Daruwala

Keyword(s):

Neural Network ◽

Machine Learning ◽

Feature Extraction ◽

Logistic Regression ◽

Brain Tumor ◽

Medical Image Analysis ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Data Set ◽

Knn Classifier

In recent years due to advancements in digital imaging machine learning techniques are used in medical image analysis for the prognosis and diagnosis of various abnormalities in the human body. Various Machine learning algorithms, convolution and deep neural networks are used for classification, detection and prediction of various brain tumors. The proposed approach is a different comparative classification analysis approach which is based on three different classification namely KNN classifier,Logistic regression & neural network as classifier. It is based on a deep learning feature extraction technique using VGG19. This VGG 19-layer image recognition model trained on Imgenet. Generally, MRI data sequences are analyzed in terms of different modalities and every modality contains rich tissue information. So, feature exaction from MRI sequences is very important task for brain tumor classification. Our approach demonstrated fair classification on BRATS Benchmarks 2018 data set with different modalities and sizes of images,results are without any human annotations. Based on selected classifiers all the classifiers gives accuracy above 90%. It is good compared to other state of art methods.

Download Full-text

Comparison of Multivariable Logistic Regression and Machine Learning Models for Predicting Bronchopulmonary Dysplasia or Death in Very Preterm Infants

Frontiers in Pediatrics ◽

10.3389/fped.2021.759776 ◽

2021 ◽

Vol 9 ◽

Author(s):

Faiza Khurshid ◽

Helen Coo ◽

Amal Khalil ◽

Jonathan Messiha ◽

Joseph Y. Ting ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Logistic Regression ◽

Bronchopulmonary Dysplasia ◽

Prediction Models ◽

Neural Network Ensemble ◽

Learning Models ◽

K Nearest Neighbor ◽

Accurate Identification ◽

Machine Learning Models

Bronchopulmonary dysplasia (BPD) is the most prevalent and clinically significant complication of prematurity. Accurate identification of at-risk infants would enable ongoing intervention to improve outcomes. Although postnatal exposures are known to affect an infant's likelihood of developing BPD, most existing BPD prediction models do not allow risk to be evaluated at different time points, and/or are not suitable for use in ethno-diverse populations. A comprehensive approach to developing clinical prediction models avoids assumptions as to which method will yield the optimal results by testing multiple algorithms/models. We compared the performance of machine learning and logistic regression models in predicting BPD/death. Our main cohort included infants <33 weeks' gestational age (GA) admitted to a Canadian Neonatal Network site from 2016 to 2018 (n = 9,006) with all analyses repeated for the <29 weeks' GA subcohort (n = 4,246). Models were developed to predict, on days 1, 7, and 14 of admission to neonatal intensive care, the composite outcome of BPD/death prior to discharge. Ten-fold cross-validation and a 20% hold-out sample were used to measure area under the curve (AUC). Calibration intercepts and slopes were estimated by regressing the outcome on the log-odds of the predicted probabilities. The model AUCs ranged from 0.811 to 0.886. Model discrimination was lower in the <29 weeks' GA subcohort (AUCs 0.699–0.790). Several machine learning models had a suboptimal calibration intercept and/or slope (k-nearest neighbor, random forest, artificial neural network, stacking neural network ensemble). The top-performing algorithms will be used to develop multinomial models and an online risk estimator for predicting BPD severity and death that does not require information on ethnicity.

Download Full-text

High Performance Logistic Regression for Privacy-Preserving Genome Analysis

10.21203/rs.3.rs-26375/v1 ◽

2020 ◽

Cited By ~ 1

Author(s):

Martine De Cock ◽

Rafael Dowsley ◽

Anderson C.A. Nascimento ◽

Davis Railsback ◽

Jianwei Shen ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Genome Analysis ◽

Local Area Network ◽

Local Area ◽

Activation Function ◽

Area Network ◽

Learning Models ◽

Data Set ◽

Machine Learning Models

Abstract Background: In biomedical applications, valuable data is often split between owners who cannot openly share the data because of privacy regulations and concerns. Training Machine Learning models on the joint data without violating privacy is a major technology challenge that can be addressed by combining techniques from Machine Learning and cryptography. When collaboratively training Machine Learning models with the cryptographic technique named secure Multi-Party Computation, the price paid for keeping the data of the owners private is an increase in computational cost and runtime. A careful choice of Machine Learning techniques, algorithmic and implementation optimizations are a necessity to enable practical secure Machine Learning over distributed data sets. Such optimizations can be tailored to the kind of data and Machine Learning problem at hand. Methods: Our setup involves secure Two-Party Computation protocols, along with a trusted initializer that distributes correlated randomness to the two computing parties. We use a gradient descent based algorithm for training a logistic regression model, and we break down the algorithm into corresponding cryptographic protocols. Our main contributions are a new protocol for computing the activation function that requires neither secure comparison protocols nor Yao's garbled circuits, and a series of cryptographic engineering optimizations to improve the performance. Results: For our largest gene expression data set, we train a model that requires over 7 billion secure multiplications; the training completes in about 26.90 seconds in a local area network. The implementation in this work is a further optimized version of the implementation with which we won first place in Track 4 of the iDASH 2019 secure genome analysis competition. Conclusions: In this paper, we present a secure logistic regression training protocol and its implementation, with a new subprotocol to securely compute the activation function. To the best of our knowledge, we present the fastest existing secure Multi-Party Computation implementation for training logistic regression models on high dimensional genome data distributed across a local area network.

Download Full-text

Application of Machine Learning Techniques to Predict Binding Affinity for Drug Targets: A Study of Cyclin-Dependent Kinase 2

Current Medicinal Chemistry ◽

10.2174/2213275912666191102162959 ◽

2020 ◽

Vol 28 (2) ◽

pp. 253-265 ◽

Cited By ~ 3

Author(s):

Gabriela Bitencourt-Ferreira ◽

Amauri Duarte da Silva ◽

Walter Filgueira de Azevedo

Keyword(s):

Machine Learning ◽

Binding Affinity ◽

Predictive Performance ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Scoring Functions ◽

Cyclin Dependent Kinase ◽

Learning Models ◽

Learning Techniques ◽

Machine Learning Models

Background: The elucidation of the structure of cyclin-dependent kinase 2 (CDK2) made it possible to develop targeted scoring functions for virtual screening aimed to identify new inhibitors for this enzyme. CDK2 is a protein target for the development of drugs intended to modulate cellcycle progression and control. Such drugs have potential anticancer activities. Objective: Our goal here is to review recent applications of machine learning methods to predict ligand- binding affinity for protein targets. To assess the predictive performance of classical scoring functions and targeted scoring functions, we focused our analysis on CDK2 structures. Methods: We have experimental structural data for hundreds of binary complexes of CDK2 with different ligands, many of them with inhibition constant information. We investigate here computational methods to calculate the binding affinity of CDK2 through classical scoring functions and machine- learning models. Results: Analysis of the predictive performance of classical scoring functions available in docking programs such as Molegro Virtual Docker, AutoDock4, and Autodock Vina indicated that these methods failed to predict binding affinity with significant correlation with experimental data. Targeted scoring functions developed through supervised machine learning techniques showed a significant correlation with experimental data. Conclusion: Here, we described the application of supervised machine learning techniques to generate a scoring function to predict binding affinity. Machine learning models showed superior predictive performance when compared with classical scoring functions. Analysis of the computational models obtained through machine learning could capture essential structural features responsible for binding affinity against CDK2.

Download Full-text