Gaining New Insight into Machine-Learning Datasets via Multiple Binary-Feature Frequency Ranks with a Mobile Benign/Malware Apps Example

Dead oil viscosity is a critical parameter to solve numerous reservoir engineering problems and one of the most unreliable properties to predict with classical black oil correlations. Determination of dead oil viscosity by experiments is expensive and time-consuming, which means developing an accurate and quick prediction model is required. This paper implements six machine learning models: random forest (RF), lightgbm, XGBoost, multilayer perceptron (MLP) neural network, stochastic real-valued (SRV) and SuperLearner to predict dead oil viscosity. More than 2000 pressure–volume–temperature (PVT) data were used for developing and testing these models. A huge range of viscosity data were used, from light intermediate to heavy oil. In this study, we give insight into the performance of different functional forms that have been used in the literature to formulate dead oil viscosity. The results show that the functional form f(γAPI,T), has the best performance, and additional correlating parameters might be unnecessary. Furthermore, SuperLearner outperformed other machine learning (ML) algorithms as well as common correlations that are based on the metric analysis. The SuperLearner model can potentially replace the empirical models for viscosity predictions on a wide range of viscosities (any oil type). Ultimately, the proposed model is capable of simulating the true physical trend of the dead oil viscosity with variations of oil API gravity, temperature and shear rate.

Download Full-text

Insight into metastatic oral cancer tissue from novel analyses using FTIR spectroscopy and aperture IR-SNOM

The Analyst ◽

10.1039/d1an00922b ◽

2021 ◽

Author(s):

Barnaby Ellis ◽

Conor A Whitley ◽

Safaa Al Jedani ◽

Caroline Smith ◽

Philip Gunning ◽

...

Keyword(s):

Machine Learning ◽

Squamous Cell Carcinoma ◽

Oral Cancer ◽

Oral Squamous Cell Carcinoma ◽

Lymphoid Tissue ◽

Learning Algorithm ◽

Cancer Tissue ◽

Machine Learning Algorithm ◽

Nodal Metastases ◽

Insight Into

A novel machine learning algorithm is shown to accurately discriminate between oral squamous cell carcinoma (OSCC) nodal metastases and surrounding lymphoid tissue on the basis of a single metric, the...

Download Full-text

An insight into predictive parameters of tablet capping by machine learning and multivariate tools

International Journal of Pharmaceutics ◽

10.1016/j.ijpharm.2021.120439 ◽

2021 ◽

pp. 120439

Author(s):

Shubhajit Paul ◽

Yukteshwar Baranwal ◽

Yin-Chao Tseng

Keyword(s):

Machine Learning ◽

Predictive Parameters ◽

Insight Into

Download Full-text

Machine Learning and Sensor Fusion for Estimating Continuous Energy Expenditure

AI Magazine ◽

10.1609/aimag.v33i2.2408 ◽

2012 ◽

Vol 33 (2) ◽

pp. 55 ◽

Cited By ~ 17

Author(s):

Nisarg Vyas ◽

Jonathan Farringdon ◽

David Andre ◽

John Ivo Stivoric

Keyword(s):

Machine Learning ◽

Energy Expenditure ◽

Sensor Data ◽

Machine Learning Techniques ◽

Sensor Technology ◽

Health Goals ◽

System A ◽

Multi Sensor Data Fusion ◽

Learning Techniques ◽

Insight Into

In this article we provide insight into the BodyMedia FIT armband system — a wearable multi-sensor technology that continuously monitors physiological events related to energy expenditure for weight management using machine learning and data modeling methods. Since becoming commercially available in 2001, more than half a million users have used the system to track their physiological parameters and to achieve their individual health goals including weight-loss. We describe several challenges that arise in applying machine learning techniques to the health care domain and present various solutions utilized in the armband system. We demonstrate how machine learning and multi-sensor data fusion techniques are critical to the system’s success.

Download Full-text

Machine learning techniques provide insight into materials with competing order parameters

Scilight ◽

10.1063/10.0003488 ◽

2021 ◽

Vol 2021 (6) ◽

pp. 061102

Author(s):

Meredith Fore

Keyword(s):

Machine Learning ◽

Order Parameters ◽

Machine Learning Techniques ◽

Learning Techniques ◽

Competing Order ◽

Insight Into

Download Full-text

Identifying Corporate Sustainability Issues by Analyzing Shareholder Resolutions: A Machine-Learning Text Analytics Approach

Sustainability ◽

10.3390/su12114753 ◽

2020 ◽

Vol 12 (11) ◽

pp. 4753

Author(s):

Viju Raghupathi ◽

Jie Ren ◽

Wullianallur Raghupathi

Keyword(s):

Machine Learning ◽

Corporate Sustainability ◽

Primary Source ◽

Text Analytics ◽

Useful Knowledge ◽

Word Cloud ◽

Textual Data ◽

Shareholder Resolution ◽

The Impact ◽

Insight Into

Corporations have embraced the idea of corporate environmental, social, and governance (ESG) under the general framework of sustainability. Studies have measured and analyzed the impact of internal sustainability efforts on the performance of individual companies, policies, and projects. This exploratory study attempts to extract useful insight from shareholder sustainability resolutions using machine learning-based text analytics. Prior research has studied corporate sustainability disclosures from public reports. By studying shareholder resolutions, we gain insight into the shareholders’ perspectives and objectives. The primary source for this study is the Ceres sustainability shareholder resolution database, with 1737 records spanning 2009–2019. The study utilizes a combination of text analytic approaches (i.e., word cloud, co-occurrence, row-similarities, clustering, classification, etc.) to extract insights. These are novel methods of transforming textual data into useful knowledge about corporate sustainability endeavors. This study demonstrates that stakeholders, such as shareholders, can influence corporate sustainability via resolutions. The incorporation of text analytic techniques offers insight to researchers who study vast collections of unstructured bodies of text, improving the understanding of shareholder resolutions and reaching a wider audience.

Download Full-text

Fuel Consumption Prediction for Construction Trucks: A Noninvasive Approach Using Dedicated Sensors and Machine Learning

Infrastructures ◽

10.3390/infrastructures6110157 ◽

2021 ◽

Vol 6 (11) ◽

pp. 157

Author(s):

Gonçalo Pereira ◽

Manuel Parente ◽

João Moutinho ◽

Manuel Sampaio

Keyword(s):

Machine Learning ◽

Fuel Consumption ◽

Great Variability ◽

Accurate Estimation ◽

Estimation Model ◽

Digital Twins ◽

Significant Step ◽

The Cost ◽

Consumption Prediction ◽

Insight Into

Decision support and optimization tools to be used in construction often require an accurate estimation of the cost variables to maximize their benefit. Heavy machinery is traditionally one of the greatest costs to consider mainly due to fuel consumption. These typically diesel-powered machines have a great variability of fuel consumption depending on the scenario of utilization. This paper describes the creation of a framework aiming to estimate the fuel consumption of construction trucks depending on the carried load, the slope, the distance, and the pavement type. Having a more accurate estimation will increase the benefit of these optimization tools. The fuel consumption estimation model was developed using Machine Learning (ML) algorithms supported by data, which were gathered through several sensors, in a specially designed datalogger with wireless communication and opportunistic synchronization, in a real context experiment. The results demonstrated the viability of the method, providing important insight into the advantages associated with the combination of sensorization and the machine learning models in a real-world construction setting. Ultimately, this study comprises a significant step towards the achievement of IoT implementation from a Construction 4.0 viewpoint, especially when considering its potential for real-time and digital twins applications.

Download Full-text

Automated Biomarker Candidate Discovery in Imaging Mass Spectrometry Data Through Spatially Localized Shapley Additive Explanations

10.1101/2020.12.23.424201 ◽

2020 ◽

Author(s):

Leonoor E.M. Tideman ◽

Lukasz G. Migas ◽

Katerina V. Djambazova ◽

Nathan Heath Patterson ◽

Richard M. Caprioli ◽

...

Keyword(s):

Machine Learning ◽

Mass Spectrometry ◽

Molecular Species ◽

Imaging Mass Spectrometry ◽

Mass Spectrometry Data ◽

Decision Making Process ◽

Spatial Specificity ◽

The Relationship ◽

Insight Into ◽

Imaging Mass Spectrometry Data

AbstractThe search for molecular species that are differentially expressed between biological states is an important step towards discovering promising biomarker candidates. In imaging mass spectrometry (IMS), performing this search manually is often impractical due to the large size and high-dimensionality of IMS datasets. Instead, we propose an interpretable machine learning workflow that automatically identifies biomarker candidates by their mass-to-charge ratios, and that quantitatively estimates their relevance to recognizing a given biological class using Shapley additive explanations (SHAP). The task of biomarker candidate discovery is translated into a feature ranking problem: given a classification model that assigns pixels to different biological classes on the basis of their mass spectra, the molecular species that the model uses as features are ranked in descending order of relative predictive importance such that the top-ranking features have a higher likelihood of being useful biomarkers. Besides providing the user with an experiment-wide measure of a molecular species’ biomarker potential, our workflow delivers spatially localized explanations of the classification model’s decision-making process in the form of a novel representation called SHAP maps. SHAP maps deliver insight into the spatial specificity of biomarker candidates by highlighting in which regions of the tissue sample each feature provides discriminative information and in which regions it does not. SHAP maps also enable one to determine whether the relationship between a biomarker candidate and a biological state of interest is correlative or anticorrelative. Our automated approach to estimating a molecular species’ potential for characterizing a user-provided biological class, combined with the untargeted and multiplexed nature of IMS, allows for the rapid screening of thousands of molecular species and the obtention of a broader biomarker candidate shortlist than would be possible through targeted manual assessment. Our biomarker candidate discovery workflow is demonstrated on mouse-pup and rat kidney case studies.HighlightsOur workflow automates the discovery of biomarker candidates in imaging mass spectrometry data by using state-of-the-art machine learning methodology to produce a shortlist of molecular species that are differentially expressed with regards to a user-provided biological class.A model interpretability method called Shapley additive explanations (SHAP), with observational Shapley values, enables us to quantify the local and global predictive importance of molecular species with respect to recognizing a user-provided biological class.By providing spatially localized explanations for a classification model’s decision-making process, SHAP maps deliver insight into the spatial specificity of biomarker candidates and enable one to determine whether (and where) the relationship between a biomarker candidate and the class of interest is correlative or anticorrelative.

Download Full-text

Predict+Optimise with Ranking Objectives: Exhaustively Learning Linear Functions

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/151 ◽

2019 ◽

Author(s):

Emir Demirovic ◽

Peter J. Stuckey ◽

James Bailey ◽

Jeffrey Chan ◽

Christopher Leckie ◽

...

Keyword(s):

Machine Learning ◽

Computational Complexity ◽

Combinatorial Optimisation ◽

Linear Functions ◽

Optimal Parameters ◽

Common Goal ◽

Learning Function ◽

Input Parameters ◽

Theoretical Insight ◽

Insight Into

We study the predict+optimise problem, where machine learning and combinatorial optimisation must interact to achieve a common goal. These problems are important when optimisation needs to be performed on input parameters that are not fully observed but must instead be estimated using machine learning. Our contributions are two-fold: 1) we provide theoretical insight into the properties and computational complexity of predict+optimise problems in general, and 2) develop a novel framework that, in contrast to related work, guarantees to compute the optimal parameters for a linear learning function given any ranking optimisation problem. We illustrate the applicability of our framework for the particular case of the unit-weighted knapsack predict+optimise problem and evaluate on benchmarks from the literature.

Download Full-text

Research on threat detection in cyber security based on machine learning

Journal of Physics Conference Series ◽

10.1088/1742-6596/2113/1/012074 ◽

2021 ◽

Vol 2113 (1) ◽

pp. 012074

Author(s):

Qiwei Ke

Keyword(s):

Machine Learning ◽

Cyber Security ◽

Information Privacy ◽

Machine Learning Techniques ◽

Threat Detection ◽

Vulnerability Detection ◽

New Information ◽

Learning Techniques ◽

Potential Problems ◽

Insight Into

Abstract The volume of the data has been rocketed since the new information era arrives. How to protect information privacy and detect the threat whenever the intrusion happens has become a hot topic. In this essay, we are going to look into the latest machine learning techniques (including deep learning) which are applicable in intrusion detection, malware detection, and vulnerability detection. And the comparison between the traditional methods and novel methods will be demonstrated in detail. Specially, we would examine the whole experiment process of representative examples from recent research projects to give a better insight into how the models function and cooperate. In addition, some potential problems and improvements would be illustrated at the end of each section.

Download Full-text