Effect of river flow on the quality of estuarine and coastal waters using machine learning models

<div>This article describes an application of high-throughput fingerprints (HTSFP) built upon industrial data accumulated over the years. </div><div>The fingerprint was used to build machine learning models (multi-task deep learning + SVM) for compound activity predictions towards a panel of 131 targets. </div><div>Quality of the predictions and the scaffold hopping potential of the HTSFP were systematically compared to traditional structural descriptors ECFP. </div><div><br></div>

Download Full-text

Cyber-Physical LPG Debutanizer Distillation Columns: Machine Learning-Based Soft Sensors for Product Quality Monitoring

10.20944/preprints202110.0364.v1 ◽

2021 ◽

Author(s):

Jože M. Rožanec ◽

Elena Trajkova ◽

Jinzhi Lu ◽

Nikolaos Sarantinoudis ◽

Georgios Arampatzis ◽

...

Keyword(s):

Machine Learning ◽

Product Quality ◽

Learning Models ◽

State Monitoring ◽

Soft Sensors ◽

Distillation Columns ◽

Operational Conditions ◽

Equipment State ◽

Machine Learning Models

Refineries execute a series of interlinked processes, where the product of one unit serves as the input to another process. Potential failures within these processes affect the quality of the end products, operational efficiency, and revenue of the entire refinery. In this context, implementation of a real-time cognitive module, referring to predictive machine learning models, enables to provide equipment state monitoring services and to generate decision-making for equipment operations. In this paper, we propose two machine learning models: 1) to forecast the amount of pentane (C5) content in the final product mixture; 2) to identify if C5 content exceeds the specification thresholds for the final product quality. We validate our approach by using a use case from a real-world refinery. In addition, we develop a visualization to assess which features are considered most important during feature selection, and later by the machine learning models. Finally, we provide insights on the sensor values in the dataset, which help to identify the operational conditions for using such machine learning models.

Download Full-text

Cyber-Physical LPG Debutanizer Distillation Columns: Machine-Learning-Based Soft Sensors for Product Quality Monitoring

Applied Sciences ◽

10.3390/app112411790 ◽

2021 ◽

Vol 11 (24) ◽

pp. 11790

Author(s):

Jože Martin Rožanec ◽

Elena Trajkova ◽

Jinzhi Lu ◽

Nikolaos Sarantinoudis ◽

George Arampatzis ◽

...

Keyword(s):

Machine Learning ◽

Product Quality ◽

Learning Models ◽

State Monitoring ◽

Soft Sensors ◽

Distillation Columns ◽

Operational Conditions ◽

Equipment State ◽

Machine Learning Models

Refineries execute a series of interlinked processes, where the product of one unit serves as the input to another process. Potential failures within these processes affect the quality of the end products, operational efficiency, and revenue of the entire refinery. In this context, implementation of a real-time cognitive module, referring to predictive machine learning models, enables the provision of equipment state monitoring services and the generation of decision-making for equipment operations. In this paper, we propose two machine learning models: (1) to forecast the amount of pentane (C5) content in the final product mixture; (2) to identify if C5 content exceeds the specification thresholds for the final product quality. We validate our approach using a use case from a real-world refinery. In addition, we develop a visualization to assess which features are considered most important during feature selection, and later by the machine learning models. Finally, we provide insights on the sensor values in the dataset, which help to identify the operational conditions for using such machine learning models.

Download Full-text

18 Methodology and reporting quality of studies using machine learning models for medical diagnosis: a methodological systematic review

10.1136/bmjebm-2019-ebmlive.99 ◽

2019 ◽

Author(s):

Mohamed Yusuf ◽

Ignacio Atal ◽

Jacques Li ◽

Phil Smith ◽

Philippe Ravaud ◽

...

Keyword(s):

Machine Learning ◽

Systematic Review ◽

Medical Diagnosis ◽

Reporting Quality ◽

Learning Models ◽

Machine Learning Models

Download Full-text

DeepSatData: Building large scale datasets of satellite images for training machine learning models

10.36227/techrxiv.16558482.v1 ◽

2021 ◽

Author(s):

Michael Tarasiou

Keyword(s):

Machine Learning ◽

Large Scale ◽

Ground Truth ◽

Semantic Segmentation ◽

Point Of View ◽

Learning Models ◽

Ground Truth Data ◽

Machine Learning Models ◽

Sentinel 2

This paper presents DeepSatData a pipeline for automatically generating satellite imagery datasets for training machine learning models. We also discuss design considerations with emphasis on dense classification tasks, e.g. semantic segmentation. The implementation presented makes use of freely available Sentinel-2 data which allows the generation of large scale datasets required for training deep neural networks (DNN). We discuss issues faced from the point of view of DNN training and evaluation such as checking the quality of ground truth data and comment on the scalability of the approach.

Download Full-text

DATA QUALITY CONSIDERATIONS FOR PETROPHYSICAL MACHINE LEARNING MODELS

10.30632/spwla-2021-0036 ◽

2021 ◽

Author(s):

Andrew McDonald ◽

Keyword(s):

Machine Learning ◽

Big Data ◽

Data Quality ◽

Input Data ◽

Well Log ◽

Learning Models ◽

Log Data ◽

Quality Issues ◽

Machine Learning Models

Decades of subsurface exploration and characterisation have led to the collation and storage of large volumes of well related data. The amount of data gathered daily continues to grow rapidly as technology and recording methods improve. With the increasing adoption of machine learning techniques in the subsurface domain, it is essential that the quality of the input data is carefully considered when working with these tools. If the input data is of poor quality, the impact on precision and accuracy of the prediction can be significant. Consequently, this can impact key decisions about the future of a well or a field. This study focuses on well log data, which can be highly multi-dimensional, diverse and stored in a variety of file formats. Well log data exhibits key characteristics of Big Data: Volume, Variety, Velocity, Veracity and Value. Well data can include numeric values, text values, waveform data, image arrays, maps, volumes, etc. All of which can be indexed by time or depth in a regular or irregular way. A significant portion of time can be spent gathering data and quality checking it prior to carrying out petrophysical interpretations and applying machine learning models. Well log data can be affected by numerous issues causing a degradation in data quality. These include missing data - ranging from single data points to entire curves; noisy data from tool related issues; borehole washout; processing issues; incorrect environmental corrections; and mislabelled data. Having vast quantities of data does not mean it can all be passed into a machine learning algorithm with the expectation that the resultant prediction is fit for purpose. It is essential that the most important and relevant data is passed into the model through appropriate feature selection techniques. Not only does this improve the quality of the prediction, it also reduces computational time and can provide a better understanding of how the models reach their conclusion. This paper reviews data quality issues typically faced by petrophysicists when working with well log data and deploying machine learning models. First, an overview of machine learning and Big Data is covered in relation to petrophysical applications. Secondly, data quality issues commonly faced with well log data are discussed. Thirdly, methods are suggested on how to deal with data issues prior to modelling. Finally, multiple case studies are discussed covering the impacts of data quality on predictive capability.

Download Full-text

Quality of Different Machine Learning Models in Error Discovery for Parallel Genome Sequencing

10.7546/crabs.2018.07.08 ◽

2018 ◽

Author(s):

Milko Krachunov ◽

Milena Sokolova ◽

Valeriya Simeonova ◽

Maria Nisheva ◽

Irena Avdjieva ◽

...

Keyword(s):

Machine Learning ◽

Genome Sequencing ◽

Learning Models ◽

Machine Learning Models

Download Full-text

Spatial prediction and mapping of water quality of Owabi reservoir from satellite imageries and machine learning models

The Egyptian Journal of Remote Sensing and Space Science ◽

10.1016/j.ejrs.2021.06.006 ◽

2021 ◽

Author(s):

Yvonne Yeboah Adusei ◽

Jonathan Quaye-Ballard ◽

Albert Amatey Adjaottor ◽

Alex Appiah Mensah

Keyword(s):

Machine Learning ◽

Water Quality ◽

Spatial Prediction ◽

Learning Models ◽

Machine Learning Models

Download Full-text

Securing Your Relationship: Quality of Intimate Relationships During the COVID-19 Pandemic Can Be Predicted by Attachment Style

Frontiers in Psychology ◽

10.3389/fpsyg.2021.647956 ◽

2021 ◽

Vol 12 ◽

Author(s):

Stephanie J. Eder ◽

Andrew A. Nicholson ◽

Michal M. Stefanczyk ◽

Michał Pieniak ◽

Judit Martínez-Molina ◽

...

Keyword(s):

Machine Learning ◽

Relationship Quality ◽

Attachment Style ◽

Linear Models ◽

Self Report ◽

Learning Models ◽

Anxious Attachment ◽

Major Predictor ◽

Machine Learning Models

The COVID-19 pandemic along with the restrictions that were introduced within Europe starting in spring 2020 allows for the identification of predictors for relationship quality during unstable and stressful times. The present study began as strict measures were enforced in response to the rising spread of the COVID-19 virus within Austria, Poland, Spain and Czech Republic. Here, we investigated quality of romantic relationships among 313 participants as movement restrictions were implemented and subsequently phased out cross-nationally. Participants completed self-report questionnaires over a period of 7 weeks, where we predicted relationship quality and change in relationship quality using machine learning models that included a variety of potential predictors related to psychological, demographic and environmental variables. On average, our machine learning models predicted 29% (linear models) and 22% (non-linear models) of the variance with regard to relationship quality. Here, the most important predictors consisted of attachment style (anxious attachment being more influential than avoidant), age, and number of conflicts within the relationship. Interestingly, environmental factors such as the local severity of the pandemic did not exert a measurable influence with respect to predicting relationship quality. As opposed to overall relationship quality, the change in relationship quality during lockdown restrictions could not be predicted accurately by our machine learning models when utilizing our selected features. In conclusion, we demonstrate cross-culturally that attachment security is a major predictor of relationship quality during COVID-19 lockdown restrictions, whereas fear, pathogenic threat, sexual behavior, and the severity of governmental regulations did not significantly influence the accuracy of prediction.

Download Full-text

COMPARATIVE ANALYSIS OF INFORMATIVE FEATURES QUANTITY AND COMPOSITION SELECTION METHODS FOR THE COMPUTER ATTACKS CLASSIFICATION USING THE UNSW-NB15 DATASET

T-Comm ◽

10.36724/2072-8735-2020-14-10-53-60 ◽

2020 ◽

Vol 14 (10) ◽

pp. 53-60

Author(s):

Oleg I. Sheluhin ◽

◽

Valentina P. Ivannikova ◽

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Comparative Analysis ◽

Attack Detection ◽

Features Selection ◽

Data Preparation ◽

Learning Models ◽

Data Set ◽

Machine Learning Models

A comparative analysis of statistical and model-based methods for selecting the quantity and the composition of informative features was performed using the UNSW-NB15 database for machine learning models training for attack detection. Feature selection is one of the most important steps in data preparation for machine learning tasks. It allows to increase a quality of machine learning models: it reduces sizes of the fitted models, training time and probability of overfitting. The research was conducted using Python programming language libraries: scikit-learn, which includes various machine learning models and functions for data preparation and models estimation, and FeatureSelector, which contains functions for statistical data analysis. Numerical results of experimental research of application of both statistical methods of features selection and machine learning models-based methods are provided. As the result, the reduced set of features is obtained, which allows improving the quality of classification by removing noise features that have little effect on the final result and reducing the quantity of informative features of the data set from 41 to 17. It is shown that the most effective among the analyzed methods for feature selection is the statistical method SelectKBest with the function chi2, which allows to obtain a reduced set of features providing an accuracy of classification as high as 90% in comparation with 74% provided with the full set.

Download Full-text