Defining the extent of gene function using ROC curvature

Mapping Intimacies ◽

10.1101/2021.09.03.458825 ◽

2021 ◽

Author(s):

Stephan Fischer ◽

Jesse Gillis

Keyword(s):

Functional Characterization ◽

Roc Curves ◽

Equivalence Classes ◽

Data Driven ◽

Functional Equivalence ◽

Functional Annotations ◽

High Throughput Data ◽

Wide Range ◽

Performance Curves ◽

Straight Lines

Machine learning in genomics plays a key role in leveraging high-throughput data, but assessing the generalizability of performance has been a persistent challenge. Here, we propose to evaluate the generalizability of gene characterizations through the shape of performance curves. We identify Functional Equivalence Classes (FECs), uniform subsets of annotated and unannotated genes that jointly drive performance, by assessing the presence of straight lines in ROC curves. FECs are widespread across modalities and methods, and can be used to evaluate the extent and context-specificity of functional annotations in a data-driven manner. For example, FECs suggest that B cell markers can be decomposed into shared primary markers (10 to 50 genes), and tissue-specific secondary markers (100 to 500 genes). In addition, FECs are compatible with a wide range of functional encodings, with marker sets spanning at most 5% of the genome and data-driven extensions of Gene Ontology sets spanning up to 40% of the genome. Simple to assess visually and statistically, the identification of FECs in performance curves paves the way for novel functional characterization and increased robustness in analysis.

Download Full-text

Predicting specific impulse distributions for spherical explosives in the extreme near-field using a Gaussian function

International Journal of Protective Structures ◽

10.1177/2041419621993492 ◽

2021 ◽

pp. 204141962199349

Author(s):

Jordan J Pannell ◽

George Panoutsos ◽

Sam B Cooke ◽

Dan J Pope ◽

Sam E Rigby

Keyword(s):

Scaling Laws ◽

Near Field ◽

Specific Impulse ◽

Gaussian Function ◽

Data Driven ◽

Blast Load ◽

Structural Deformation ◽

Load Prediction ◽

Validation Data ◽

Wide Range

Accurate quantification of the blast load arising from detonation of a high explosive has applications in transport security, infrastructure assessment and defence. In order to design efficient and safe protective systems in such aggressive environments, it is of critical importance to understand the magnitude and distribution of loading on a structural component located close to an explosive charge. In particular, peak specific impulse is the primary parameter that governs structural deformation under short-duration loading. Within this so-called extreme near-field region, existing semi-empirical methods are known to be inaccurate, and high-fidelity numerical schemes are generally hampered by a lack of available experimental validation data. As such, the blast protection community is not currently equipped with a satisfactory fast-running tool for load prediction in the near-field. In this article, a validated computational model is used to develop a suite of numerical near-field blast load distributions, which are shown to follow a similar normalised shape. This forms the basis of the data-driven predictive model developed herein: a Gaussian function is fit to the normalised loading distributions, and a power law is used to calculate the magnitude of the curve according to established scaling laws. The predictive method is rigorously assessed against the existing numerical dataset, and is validated against new test models and available experimental data. High levels of agreement are demonstrated throughout, with typical variations of <5% between experiment/model and prediction. The new approach presented in this article allows the analyst to rapidly compute the distribution of specific impulse across the loaded face of a wide range of target sizes and near-field scaled distances and provides a benchmark for data-driven modelling approaches to capture blast loading phenomena in more complex scenarios.

Download Full-text

Combustion Tuning for a Gas Turbine Power Plant Using Data-Driven and Machine Learning Approach

Journal of Engineering for Gas Turbines and Power ◽

10.1115/1.4050020 ◽

2021 ◽

Vol 143 (3) ◽

Author(s):

Suhui Li ◽

Huaxin Zhu ◽

Min Zhu ◽

Gang Zhao ◽

Xiaofeng Wei

Keyword(s):

Machine Learning ◽

Power Plant ◽

Gas Turbine ◽

Operating Conditions ◽

Data Driven ◽

Ann Model ◽

Promising Alternative ◽

Combustion Performance ◽

Wide Range ◽

Gas Turbine Power Plant

Abstract Conventional physics-based or experimental-based approaches for gas turbine combustion tuning are time consuming and cost intensive. Recent advances in data analytics provide an alternative method. In this paper, we present a cross-disciplinary study on the combustion tuning of an F-class gas turbine that combines machine learning with physics understanding. An artificial-neural-network-based (ANN) model is developed to predict the combustion performance (outputs), including NOx emissions, combustion dynamics, combustor vibrational acceleration, and turbine exhaust temperature. The inputs of the ANN model are identified by analyzing the key operating variables that impact the combustion performance, such as the pilot and the premixed fuel flow, and the inlet guide vane angle. The ANN model is trained by field data from an F-class gas turbine power plant. The trained model is able to describe the combustion performance at an acceptable accuracy in a wide range of operating conditions. In combination with the genetic algorithm, the model is applied to optimize the combustion performance of the gas turbine. Results demonstrate that the data-driven method offers a promising alternative for combustion tuning at a low cost and fast turn-around.

Download Full-text

Rule following in functional equivalence classes

European Journal of Behavior Analysis ◽

10.1080/15021149.2002.11434201 ◽

2002 ◽

Vol 3 (1) ◽

pp. 21-29 ◽

Cited By ~ 7

Author(s):

Sean McGuigan ◽

Mickey Keenan

Keyword(s):

Equivalence Classes ◽

Functional Equivalence ◽

Rule Following

Download Full-text

A Recommender for Choosing Data Systems based on Application Profiling and Benchmarking

10.5753/sbbd.2021.17883 ◽

2021 ◽

Author(s):

Elton Figueiredo de Souza Soares ◽

Renan Souza ◽

Raphael Melo Thiago ◽

Marcelo de Oliveira Costa Machado ◽

Leonardo Guerreiro Azevedo

Keyword(s):

Empirical Evidence ◽

Informed Decision ◽

Data System ◽

Data Driven ◽

Data Systems ◽

Ongoing Work ◽

Wide Range

In our data-driven society, there are hundreds of possible data systems in the market with a wide range of configuration parameters, making it very hard for enterprises and users to choose the most suitable data systems. There is a lack of representative empirical evidence to help users make an informed decision. Using benchmark results is a widely adopted practice, but like there are several data systems, there are various benchmarks. This ongoing work presents an architecture and methods of a system that supports the recommendation of the most suitable data system for an application. We also illustrates how the recommendation would work in a fictitious scenario.

Download Full-text

Building clone-consistent ecosystem models

PLoS Computational Biology ◽

10.1371/journal.pcbi.1008635 ◽

2021 ◽

Vol 17 (2) ◽

pp. e1008635

Author(s):

Gerrit Ansmann ◽

Tobias Bollenbach

Keyword(s):

Functional Analysis ◽

Building Blocks ◽

Data Driven ◽

Ecological Studies ◽

Ecosystem Models ◽

Linear Combinations ◽

Model Consistency ◽

Wide Range ◽

Critical Requirement ◽

Two Populations

Many ecological studies employ general models that can feature an arbitrary number of populations. A critical requirement imposed on such models is clone consistency: If the individuals from two populations are indistinguishable, joining these populations into one shall not affect the outcome of the model. Otherwise a model produces different outcomes for the same scenario. Using functional analysis, we comprehensively characterize all clone-consistent models: We prove that they are necessarily composed from basic building blocks, namely linear combinations of parameters and abundances. These strong constraints enable a straightforward validation of model consistency. Although clone consistency can always be achieved with sufficient assumptions, we argue that it is important to explicitly name and consider the assumptions made: They may not be justified or limit the applicability of models and the generality of the results obtained with them. Moreover, our insights facilitate building new clone-consistent models, which we illustrate for a data-driven model of microbial communities. Finally, our insights point to new relevant forms of general models for theoretical ecology. Our framework thus provides a systematic way of comprehending ecological models, which can guide a wide range of studies.

Download Full-text

Multidimensional Approximation of Nonlinear Dynamical Systems

Journal of Computational and Nonlinear Dynamics ◽

10.1115/1.4043148 ◽

2019 ◽

Vol 14 (6) ◽

Cited By ~ 5

Author(s):

Patrick Gelß ◽

Stefan Klus ◽

Jens Eisert ◽

Christof Schütte

Keyword(s):

Dynamical Systems ◽

Nonlinear Dynamical Systems ◽

Measurement Data ◽

Data Driven ◽

High Dimensional ◽

Data Sets ◽

Nonlinear Dynamical ◽

Tensor Network ◽

Wide Range ◽

Multidimensional Approximation

A key task in the field of modeling and analyzing nonlinear dynamical systems is the recovery of unknown governing equations from measurement data only. There is a wide range of application areas for this important instance of system identification, ranging from industrial engineering and acoustic signal processing to stock market models. In order to find appropriate representations of underlying dynamical systems, various data-driven methods have been proposed by different communities. However, if the given data sets are high-dimensional, then these methods typically suffer from the curse of dimensionality. To significantly reduce the computational costs and storage consumption, we propose the method multidimensional approximation of nonlinear dynamical systems (MANDy) which combines data-driven methods with tensor network decompositions. The efficiency of the introduced approach will be illustrated with the aid of several high-dimensional nonlinear dynamical systems.

Download Full-text

Plugging Small RNAs into the Network

mSystems ◽

10.1128/msystems.00422-20 ◽

2020 ◽

Vol 5 (3) ◽

Author(s):

Lars Barquist

Keyword(s):

Small Rnas ◽

Posttranscriptional Regulation ◽

Network Inference ◽

Functional Characterization ◽

The State ◽

Diverse Range ◽

Regulatory Interactions ◽

Link Type ◽

Wide Range

ABSTRACT Small RNAs (sRNAs) have been discovered in every bacterium examined and have been shown to play important roles in the regulation of a diverse range of behaviors, from metabolism to infection. However, despite a wide range of available techniques for discovering and validating sRNA regulatory interactions, only a minority of these molecules have been well characterized. In part, this is due to the nature of posttranscriptional regulation: the activity of an sRNA depends on the state of the transcriptome as a whole, so characterization is best carried out under the conditions in which it is naturally active. In this issue of mSystems, Arrieta-Ortiz and colleagues (M. L. Arrieta-Ortiz, C. Hafemeister, B. Shuster, N. S. Baliga, et al., mSystems 5:e00057-20, 2020, https://doi.org/10.1128/mSystems.00057-20) present a network inference approach based on estimating sRNA activity across transcriptomic compendia. This shows promise not only for identifying new sRNA regulatory interactions but also for pinpointing the conditions in which these interactions occur, providing a new avenue toward functional characterization of sRNAs.

Download Full-text

Analysis of Measurements in Vaned Diffusers of Centrifugal Compressors

Journal of Turbomachinery ◽

10.1115/1.3262156 ◽

1988 ◽

Vol 110 (1) ◽

pp. 115-121 ◽

Cited By ~ 1

Author(s):

W. Stein ◽

M. Rautenberg

Keyword(s):

Flow Field ◽

Flow Structure ◽

Calculation Method ◽

Geometric Parameters ◽

Variable Geometry ◽

Centrifugal Compressors ◽

Wide Range ◽

Performance Curves ◽

Different Types ◽

Flow Phenomena

In vaned diffusers of centrifugal compressors many different flow phenomena interfere with one another, and different geometric parameters influence the flow field. Variations of these parameters allow the designer to optimize the diffuser for a certain application or to use a variable geometry for controlling the stage over a wide range. Two vaned diffusers that differ only in their passage widths are investigated using different types of measuring technique, in order to analyze the flow structure and to use it as a verification of a calculation method that allows detailed predictions of flow field parameters inside the diffuser, by taking into account geometric variations. Using this method predictions of the flow field of a variable geometry diffuser are made and are compared with the measured performance curves of the stage.

Download Full-text

Friction Torque and Leakage Based Data-Driven Approach for Rotary Seal Prognostics in Manufacturing Industry

Volume 1: Additive Manufacturing; Manufacturing Equipment and Systems; Bio and Sustainable Manufacturing ◽

10.1115/msec2019-2819 ◽

2019 ◽

Author(s):

Madhumitha Ramachandran ◽

Zahed Siddique

Keyword(s):

Manufacturing Industry ◽

Friction Torque ◽

Operating Conditions ◽

Oil Field ◽

Data Driven ◽

Sensor Technology ◽

Physical Parameters ◽

Direct Estimation ◽

Rotary Seal ◽

Wide Range

Abstract Rotary seals are found in many manufacturing equipment and machines used for various applications under a wide range of operating conditions. Rotary seal failure can be catastrophic and can lead to costly downtime and large expenses; so it is extremely important to assess the degradation of rotary seal to avoid fatal breakdown of machineries. Physics-based rotary seal prognostics require direct estimation of different physical parameters to assess the degradation of seals. Data-driven prognostics utilizing sensor technology and computational capabilities can aid in the in-direct estimation of rotary seals’ running condition unlike the physics-based approach. An important aspect of data-driven prognostics is to collect appropriate data in order to reduce the cost and time associated with the data collection, storage and computation. Seals in machineries operate in harsh conditions, especially in the oil field, seals are exposed to harsh environment and aggressive fluids which gradually reduces the elastic modulus and hardness of seals, resulting in lower friction torque and excessive leakage. Therefore, in this study we implement a data-driven prognostics approach which utilizes friction torque and leakage signals along with Multilayer Perceptron as a classifier to compare the performance of the two metrics in classifying the running condition of rotary seals. Friction torque was found to have a better performance than leakage in terms of differentiating the running condition of rotary seals throughout its service life. Although this approach was designed for seals in oil and gas industry, this approach can be implemented in any manufacturing industry with similar applications.

Download Full-text