scholarly journals Defining the extent of gene function using ROC curvature

2021 ◽  
Author(s):  
Stephan Fischer ◽  
Jesse Gillis

Machine learning in genomics plays a key role in leveraging high-throughput data, but assessing the generalizability of performance has been a persistent challenge. Here, we propose to evaluate the generalizability of gene characterizations through the shape of performance curves. We identify Functional Equivalence Classes (FECs), uniform subsets of annotated and unannotated genes that jointly drive performance, by assessing the presence of straight lines in ROC curves. FECs are widespread across modalities and methods, and can be used to evaluate the extent and context-specificity of functional annotations in a data-driven manner. For example, FECs suggest that B cell markers can be decomposed into shared primary markers (10 to 50 genes), and tissue-specific secondary markers (100 to 500 genes). In addition, FECs are compatible with a wide range of functional encodings, with marker sets spanning at most 5% of the genome and data-driven extensions of Gene Ontology sets spanning up to 40% of the genome. Simple to assess visually and statistically, the identification of FECs in performance curves paves the way for novel functional characterization and increased robustness in analysis.

2021 ◽  
pp. 204141962199349
Author(s):  
Jordan J Pannell ◽  
George Panoutsos ◽  
Sam B Cooke ◽  
Dan J Pope ◽  
Sam E Rigby

Accurate quantification of the blast load arising from detonation of a high explosive has applications in transport security, infrastructure assessment and defence. In order to design efficient and safe protective systems in such aggressive environments, it is of critical importance to understand the magnitude and distribution of loading on a structural component located close to an explosive charge. In particular, peak specific impulse is the primary parameter that governs structural deformation under short-duration loading. Within this so-called extreme near-field region, existing semi-empirical methods are known to be inaccurate, and high-fidelity numerical schemes are generally hampered by a lack of available experimental validation data. As such, the blast protection community is not currently equipped with a satisfactory fast-running tool for load prediction in the near-field. In this article, a validated computational model is used to develop a suite of numerical near-field blast load distributions, which are shown to follow a similar normalised shape. This forms the basis of the data-driven predictive model developed herein: a Gaussian function is fit to the normalised loading distributions, and a power law is used to calculate the magnitude of the curve according to established scaling laws. The predictive method is rigorously assessed against the existing numerical dataset, and is validated against new test models and available experimental data. High levels of agreement are demonstrated throughout, with typical variations of <5% between experiment/model and prediction. The new approach presented in this article allows the analyst to rapidly compute the distribution of specific impulse across the loaded face of a wide range of target sizes and near-field scaled distances and provides a benchmark for data-driven modelling approaches to capture blast loading phenomena in more complex scenarios.


2021 ◽  
Vol 143 (3) ◽  
Author(s):  
Suhui Li ◽  
Huaxin Zhu ◽  
Min Zhu ◽  
Gang Zhao ◽  
Xiaofeng Wei

Abstract Conventional physics-based or experimental-based approaches for gas turbine combustion tuning are time consuming and cost intensive. Recent advances in data analytics provide an alternative method. In this paper, we present a cross-disciplinary study on the combustion tuning of an F-class gas turbine that combines machine learning with physics understanding. An artificial-neural-network-based (ANN) model is developed to predict the combustion performance (outputs), including NOx emissions, combustion dynamics, combustor vibrational acceleration, and turbine exhaust temperature. The inputs of the ANN model are identified by analyzing the key operating variables that impact the combustion performance, such as the pilot and the premixed fuel flow, and the inlet guide vane angle. The ANN model is trained by field data from an F-class gas turbine power plant. The trained model is able to describe the combustion performance at an acceptable accuracy in a wide range of operating conditions. In combination with the genetic algorithm, the model is applied to optimize the combustion performance of the gas turbine. Results demonstrate that the data-driven method offers a promising alternative for combustion tuning at a low cost and fast turn-around.


2021 ◽  
Author(s):  
Elton Figueiredo de Souza Soares ◽  
Renan Souza ◽  
Raphael Melo Thiago ◽  
Marcelo de Oliveira Costa Machado ◽  
Leonardo Guerreiro Azevedo

In our data-driven society, there are hundreds of possible data systems in the market with a wide range of configuration parameters, making it very hard for enterprises and users to choose the most suitable data systems. There is a lack of representative empirical evidence to help users make an informed decision. Using benchmark results is a widely adopted practice, but like there are several data systems, there are various benchmarks. This ongoing work presents an architecture and methods of a system that supports the recommendation of the most suitable data system for an application. We also illustrates how the recommendation would work in a fictitious scenario.


2021 ◽  
Vol 17 (2) ◽  
pp. e1008635
Author(s):  
Gerrit Ansmann ◽  
Tobias Bollenbach

Many ecological studies employ general models that can feature an arbitrary number of populations. A critical requirement imposed on such models is clone consistency: If the individuals from two populations are indistinguishable, joining these populations into one shall not affect the outcome of the model. Otherwise a model produces different outcomes for the same scenario. Using functional analysis, we comprehensively characterize all clone-consistent models: We prove that they are necessarily composed from basic building blocks, namely linear combinations of parameters and abundances. These strong constraints enable a straightforward validation of model consistency. Although clone consistency can always be achieved with sufficient assumptions, we argue that it is important to explicitly name and consider the assumptions made: They may not be justified or limit the applicability of models and the generality of the results obtained with them. Moreover, our insights facilitate building new clone-consistent models, which we illustrate for a data-driven model of microbial communities. Finally, our insights point to new relevant forms of general models for theoretical ecology. Our framework thus provides a systematic way of comprehending ecological models, which can guide a wide range of studies.


Author(s):  
Patrick Gelß ◽  
Stefan Klus ◽  
Jens Eisert ◽  
Christof Schütte

A key task in the field of modeling and analyzing nonlinear dynamical systems is the recovery of unknown governing equations from measurement data only. There is a wide range of application areas for this important instance of system identification, ranging from industrial engineering and acoustic signal processing to stock market models. In order to find appropriate representations of underlying dynamical systems, various data-driven methods have been proposed by different communities. However, if the given data sets are high-dimensional, then these methods typically suffer from the curse of dimensionality. To significantly reduce the computational costs and storage consumption, we propose the method multidimensional approximation of nonlinear dynamical systems (MANDy) which combines data-driven methods with tensor network decompositions. The efficiency of the introduced approach will be illustrated with the aid of several high-dimensional nonlinear dynamical systems.


mSystems ◽  
2020 ◽  
Vol 5 (3) ◽  
Author(s):  
Lars Barquist

ABSTRACT Small RNAs (sRNAs) have been discovered in every bacterium examined and have been shown to play important roles in the regulation of a diverse range of behaviors, from metabolism to infection. However, despite a wide range of available techniques for discovering and validating sRNA regulatory interactions, only a minority of these molecules have been well characterized. In part, this is due to the nature of posttranscriptional regulation: the activity of an sRNA depends on the state of the transcriptome as a whole, so characterization is best carried out under the conditions in which it is naturally active. In this issue of mSystems, Arrieta-Ortiz and colleagues (M. L. Arrieta-Ortiz, C. Hafemeister, B. Shuster, N. S. Baliga, et al., mSystems 5:e00057-20, 2020, https://doi.org/10.1128/mSystems.00057-20) present a network inference approach based on estimating sRNA activity across transcriptomic compendia. This shows promise not only for identifying new sRNA regulatory interactions but also for pinpointing the conditions in which these interactions occur, providing a new avenue toward functional characterization of sRNAs.


1988 ◽  
Vol 110 (1) ◽  
pp. 115-121 ◽  
Author(s):  
W. Stein ◽  
M. Rautenberg

In vaned diffusers of centrifugal compressors many different flow phenomena interfere with one another, and different geometric parameters influence the flow field. Variations of these parameters allow the designer to optimize the diffuser for a certain application or to use a variable geometry for controlling the stage over a wide range. Two vaned diffusers that differ only in their passage widths are investigated using different types of measuring technique, in order to analyze the flow structure and to use it as a verification of a calculation method that allows detailed predictions of flow field parameters inside the diffuser, by taking into account geometric variations. Using this method predictions of the flow field of a variable geometry diffuser are made and are compared with the measured performance curves of the stage.


Author(s):  
Madhumitha Ramachandran ◽  
Zahed Siddique

Abstract Rotary seals are found in many manufacturing equipment and machines used for various applications under a wide range of operating conditions. Rotary seal failure can be catastrophic and can lead to costly downtime and large expenses; so it is extremely important to assess the degradation of rotary seal to avoid fatal breakdown of machineries. Physics-based rotary seal prognostics require direct estimation of different physical parameters to assess the degradation of seals. Data-driven prognostics utilizing sensor technology and computational capabilities can aid in the in-direct estimation of rotary seals’ running condition unlike the physics-based approach. An important aspect of data-driven prognostics is to collect appropriate data in order to reduce the cost and time associated with the data collection, storage and computation. Seals in machineries operate in harsh conditions, especially in the oil field, seals are exposed to harsh environment and aggressive fluids which gradually reduces the elastic modulus and hardness of seals, resulting in lower friction torque and excessive leakage. Therefore, in this study we implement a data-driven prognostics approach which utilizes friction torque and leakage signals along with Multilayer Perceptron as a classifier to compare the performance of the two metrics in classifying the running condition of rotary seals. Friction torque was found to have a better performance than leakage in terms of differentiating the running condition of rotary seals throughout its service life. Although this approach was designed for seals in oil and gas industry, this approach can be implemented in any manufacturing industry with similar applications.


2020 ◽  
Vol 6 (3) ◽  
pp. 573-581 ◽  
Author(s):  
Zhong-Hui Shen ◽  
Yang Shen ◽  
Xiao-Xing Cheng ◽  
Han-Xing Liu ◽  
Long-Qing Chen ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document