scholarly journals Gaining New Insight into Machine-Learning Datasets via Multiple Binary-Feature Frequency Ranks with a Mobile Benign/Malware Apps Example

2021 ◽  
Vol 8 (2) ◽  
pp. 103-121
Author(s):  
Gurol Canbek ◽  
Energies ◽  
2021 ◽  
Vol 14 (4) ◽  
pp. 930
Author(s):  
Fahimeh Hadavimoghaddam ◽  
Mehdi Ostadhassan ◽  
Ehsan Heidaryan ◽  
Mohammad Ali Sadri ◽  
Inna Chapanova ◽  
...  

Dead oil viscosity is a critical parameter to solve numerous reservoir engineering problems and one of the most unreliable properties to predict with classical black oil correlations. Determination of dead oil viscosity by experiments is expensive and time-consuming, which means developing an accurate and quick prediction model is required. This paper implements six machine learning models: random forest (RF), lightgbm, XGBoost, multilayer perceptron (MLP) neural network, stochastic real-valued (SRV) and SuperLearner to predict dead oil viscosity. More than 2000 pressure–volume–temperature (PVT) data were used for developing and testing these models. A huge range of viscosity data were used, from light intermediate to heavy oil. In this study, we give insight into the performance of different functional forms that have been used in the literature to formulate dead oil viscosity. The results show that the functional form f(γAPI,T), has the best performance, and additional correlating parameters might be unnecessary. Furthermore, SuperLearner outperformed other machine learning (ML) algorithms as well as common correlations that are based on the metric analysis. The SuperLearner model can potentially replace the empirical models for viscosity predictions on a wide range of viscosities (any oil type). Ultimately, the proposed model is capable of simulating the true physical trend of the dead oil viscosity with variations of oil API gravity, temperature and shear rate.


The Analyst ◽  
2021 ◽  
Author(s):  
Barnaby Ellis ◽  
Conor A Whitley ◽  
Safaa Al Jedani ◽  
Caroline Smith ◽  
Philip Gunning ◽  
...  

A novel machine learning algorithm is shown to accurately discriminate between oral squamous cell carcinoma (OSCC) nodal metastases and surrounding lymphoid tissue on the basis of a single metric, the...


AI Magazine ◽  
2012 ◽  
Vol 33 (2) ◽  
pp. 55 ◽  
Author(s):  
Nisarg Vyas ◽  
Jonathan Farringdon ◽  
David Andre ◽  
John Ivo Stivoric

In this article we provide insight into the BodyMedia FIT armband system — a wearable multi-sensor technology that continuously monitors physiological events related to energy expenditure for weight management using machine learning and data modeling methods. Since becoming commercially available in 2001, more than half a million users have used the system to track their physiological parameters and to achieve their individual health goals including weight-loss. We describe several challenges that arise in applying machine learning techniques to the health care domain and present various solutions utilized in the armband system. We demonstrate how machine learning and multi-sensor data fusion techniques are critical to the system’s success.


2020 ◽  
Vol 12 (11) ◽  
pp. 4753
Author(s):  
Viju Raghupathi ◽  
Jie Ren ◽  
Wullianallur Raghupathi

Corporations have embraced the idea of corporate environmental, social, and governance (ESG) under the general framework of sustainability. Studies have measured and analyzed the impact of internal sustainability efforts on the performance of individual companies, policies, and projects. This exploratory study attempts to extract useful insight from shareholder sustainability resolutions using machine learning-based text analytics. Prior research has studied corporate sustainability disclosures from public reports. By studying shareholder resolutions, we gain insight into the shareholders’ perspectives and objectives. The primary source for this study is the Ceres sustainability shareholder resolution database, with 1737 records spanning 2009–2019. The study utilizes a combination of text analytic approaches (i.e., word cloud, co-occurrence, row-similarities, clustering, classification, etc.) to extract insights. These are novel methods of transforming textual data into useful knowledge about corporate sustainability endeavors. This study demonstrates that stakeholders, such as shareholders, can influence corporate sustainability via resolutions. The incorporation of text analytic techniques offers insight to researchers who study vast collections of unstructured bodies of text, improving the understanding of shareholder resolutions and reaching a wider audience.


2021 ◽  
Vol 6 (11) ◽  
pp. 157
Author(s):  
Gonçalo Pereira ◽  
Manuel Parente ◽  
João Moutinho ◽  
Manuel Sampaio

Decision support and optimization tools to be used in construction often require an accurate estimation of the cost variables to maximize their benefit. Heavy machinery is traditionally one of the greatest costs to consider mainly due to fuel consumption. These typically diesel-powered machines have a great variability of fuel consumption depending on the scenario of utilization. This paper describes the creation of a framework aiming to estimate the fuel consumption of construction trucks depending on the carried load, the slope, the distance, and the pavement type. Having a more accurate estimation will increase the benefit of these optimization tools. The fuel consumption estimation model was developed using Machine Learning (ML) algorithms supported by data, which were gathered through several sensors, in a specially designed datalogger with wireless communication and opportunistic synchronization, in a real context experiment. The results demonstrated the viability of the method, providing important insight into the advantages associated with the combination of sensorization and the machine learning models in a real-world construction setting. Ultimately, this study comprises a significant step towards the achievement of IoT implementation from a Construction 4.0 viewpoint, especially when considering its potential for real-time and digital twins applications.


2020 ◽  
Author(s):  
Leonoor E.M. Tideman ◽  
Lukasz G. Migas ◽  
Katerina V. Djambazova ◽  
Nathan Heath Patterson ◽  
Richard M. Caprioli ◽  
...  

AbstractThe search for molecular species that are differentially expressed between biological states is an important step towards discovering promising biomarker candidates. In imaging mass spectrometry (IMS), performing this search manually is often impractical due to the large size and high-dimensionality of IMS datasets. Instead, we propose an interpretable machine learning workflow that automatically identifies biomarker candidates by their mass-to-charge ratios, and that quantitatively estimates their relevance to recognizing a given biological class using Shapley additive explanations (SHAP). The task of biomarker candidate discovery is translated into a feature ranking problem: given a classification model that assigns pixels to different biological classes on the basis of their mass spectra, the molecular species that the model uses as features are ranked in descending order of relative predictive importance such that the top-ranking features have a higher likelihood of being useful biomarkers. Besides providing the user with an experiment-wide measure of a molecular species’ biomarker potential, our workflow delivers spatially localized explanations of the classification model’s decision-making process in the form of a novel representation called SHAP maps. SHAP maps deliver insight into the spatial specificity of biomarker candidates by highlighting in which regions of the tissue sample each feature provides discriminative information and in which regions it does not. SHAP maps also enable one to determine whether the relationship between a biomarker candidate and a biological state of interest is correlative or anticorrelative. Our automated approach to estimating a molecular species’ potential for characterizing a user-provided biological class, combined with the untargeted and multiplexed nature of IMS, allows for the rapid screening of thousands of molecular species and the obtention of a broader biomarker candidate shortlist than would be possible through targeted manual assessment. Our biomarker candidate discovery workflow is demonstrated on mouse-pup and rat kidney case studies.HighlightsOur workflow automates the discovery of biomarker candidates in imaging mass spectrometry data by using state-of-the-art machine learning methodology to produce a shortlist of molecular species that are differentially expressed with regards to a user-provided biological class.A model interpretability method called Shapley additive explanations (SHAP), with observational Shapley values, enables us to quantify the local and global predictive importance of molecular species with respect to recognizing a user-provided biological class.By providing spatially localized explanations for a classification model’s decision-making process, SHAP maps deliver insight into the spatial specificity of biomarker candidates and enable one to determine whether (and where) the relationship between a biomarker candidate and the class of interest is correlative or anticorrelative.


Author(s):  
Emir Demirovic ◽  
Peter J. Stuckey ◽  
James Bailey ◽  
Jeffrey Chan ◽  
Christopher Leckie ◽  
...  

We study the predict+optimise problem, where machine learning and combinatorial optimisation must interact to achieve a common goal. These problems are important when optimisation needs to be performed on input parameters that are not fully observed but must instead be estimated using machine learning. Our contributions are two-fold: 1) we provide theoretical insight into the properties and computational complexity of predict+optimise problems in general, and 2) develop a novel framework that, in contrast to related work, guarantees to compute the optimal parameters for a linear learning function given any ranking optimisation problem. We illustrate the applicability of our framework for the particular case of the unit-weighted knapsack predict+optimise problem and evaluate on benchmarks from the literature.


2021 ◽  
Vol 2113 (1) ◽  
pp. 012074
Author(s):  
Qiwei Ke

Abstract The volume of the data has been rocketed since the new information era arrives. How to protect information privacy and detect the threat whenever the intrusion happens has become a hot topic. In this essay, we are going to look into the latest machine learning techniques (including deep learning) which are applicable in intrusion detection, malware detection, and vulnerability detection. And the comparison between the traditional methods and novel methods will be demonstrated in detail. Specially, we would examine the whole experiment process of representative examples from recent research projects to give a better insight into how the models function and cooperate. In addition, some potential problems and improvements would be illustrated at the end of each section.


Sign in / Sign up

Export Citation Format

Share Document