Using set theory to reduce redundancy in pathway sets

1.Abstract1.01BackgroundThe consolidation of pathway databases, such as KEGG[1], Reactome[2]and ConsensusPathDB[3], has generated widespread biological interest, however the issue of pathway redundancy impedes the use of these consolidated datasets. Attempts to reduce this redundancy have focused on visualizing pathway overlap or merging pathways, but the resulting pathways may be of heterogeneous sizes and cover multiple biological functions. Efforts have also been made to deal with redundancy in pathway data by consolidating enriched pathways into a number of clusters or concepts. We present an alternative approach, which generates pathway subsets capable of covering all of genes presented within either pathway databases or enrichment results, generating substantial reductions in redundancy.1.02ResultsWe propose a method that uses set cover to reduce pathway redundancy, without merging pathways. The proposed approach considers three objectives: removal of pathway redundancy, controlling pathway size and coverage of the gene set. By applying set cover to the ConsensusPathDB dataset we were able to produce a reduced set of pathways, representing 100% of the genes in the original data set with 74% less redundancy, or 95% of the genes with 88% less redundancy. We also developed an algorithm to simplify enrichment data and applied it to a set of enriched osteoarthritis pathways, revealing that within the top ten pathways, five were redundant subsets of more enriched pathways. Applying set cover to the enrichment results removed these redundant pathways allowing more informative pathways to take their place.1.03ConclusionOur method provides an alternative approach for handling pathway redundancy, while ensuring that the pathways are of homogeneous size and gene coverage is maximised. Pathways are not altered from their original form, allowing biological knowledge regarding the data set to be directly applicable. We demonstrate the ability of the algorithms to prioritise redundancy reduction, pathway size control or gene set coverage. The application of set cover to pathway enrichment results produces an optimised summary of the pathways that best represent the differentially regulated gene set.

Download Full-text

Smoothing Gene Expression Data with Network Information Improves Consistency of Regulated Genes

Statistical Applications in Genetics and Molecular Biology ◽

10.2202/1544-6115.1618 ◽

2011 ◽

Vol 10 (1) ◽

Cited By ~ 6

Author(s):

Guro Dørum ◽

Lars Snipen ◽

Margrete Solheim ◽

Solve Saebo

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Gene Networks ◽

Simulated Data ◽

Real Data ◽

Biological Knowledge ◽

Expression Data ◽

Data Set ◽

Gene Set ◽

Network Information

Gene set analysis methods have become a widely used tool for including prior biological knowledge in the statistical analysis of gene expression data. Advantages of these methods include increased sensitivity, easier interpretation and more conformity in the results. However, gene set methods do not employ all the available information about gene relations. Genes are arranged in complex networks where the network distances contain detailed information about inter-gene dependencies. We propose a method that uses gene networks to smooth gene expression data with the aim of reducing the number of false positives and identify important subnetworks. Gene dependencies are extracted from the network topology and are used to smooth genewise test statistics. To find the optimal degree of smoothing, we propose using a criterion that considers the correlation between the network and the data. The network smoothing is shown to improve the ability to identify important genes in simulated data. Applied to a real data set, the smoothing accentuates parts of the network with a high density of differentially expressed genes.

Download Full-text

Developmental Toxicity Risk Assessment: A Rough Sets Approach

Methods of Information in Medicine ◽

10.1055/s-0038-1634890 ◽

1993 ◽

Vol 32 (01) ◽

pp. 47-54 ◽

Cited By ~ 12

Author(s):

F. R. Jelovsek ◽

M. Razzaghi ◽

R. R. Hashemi

Keyword(s):

Discriminant Analysis ◽

Rough Sets ◽

Human Subjects ◽

Developmental Toxicity ◽

Original Data ◽

Original Form ◽

Data Set ◽

Predictive Values ◽

Study Results ◽

Toxicity Risk

Abstract:A rough-sets approach was applied to a data set consisting of animal study results and other compound characteristics to generate local and global (certain/possible) sets of rules for prediction of developmental toxicity in human subjects. A modified version of the rough-sets approach is proposed to allow the construction of an approximate set of rules to use for prediction in a manner similar to that of discriminant analysis. The modified rough-sets approach is superior in predictability to the original form of rough-sets methodology. In comparison to discriminant analysis, modified rough sets (approximate rules) appear to be better in overall classification, sensitivity, positive and negative predictive values. The findings were supported by applying the modified rough sets and discriminant analysis on a test data set generated from the original data set by using a resampling plan.

Download Full-text

Rotational Characteristics of the Green Solar Corona: 1947-1991

International Astronomical Union Colloquium ◽

10.1017/s0252921100025197 ◽

1994 ◽

Vol 144 ◽

pp. 139-141 ◽

Cited By ~ 2

Author(s):

J. Rybák ◽

V. Rušin ◽

M. Rybanský

Keyword(s):

Solar Corona ◽

World Wide ◽

Rotation Period ◽

Original Data ◽

Coronal Emission ◽

Time Intervals ◽

Data Set ◽

The World ◽

Coronal Emission Line ◽

Homogeneous Data

AbstractFe XIV 530.3 nm coronal emission line observations have been used for the estimation of the green solar corona rotation. A homogeneous data set, created from measurements of the world-wide coronagraphic network, has been examined with a help of correlation analysis to reveal the averaged synodic rotation period as a function of latitude and time over the epoch from 1947 to 1991.The values of the synodic rotation period obtained for this epoch for the whole range of latitudes and a latitude band ±30° are 27.52±0.12 days and 26.95±0.21 days, resp. A differential rotation of green solar corona, with local period maxima around ±60° and minimum of the rotation period at the equator, was confirmed. No clear cyclic variation of the rotation has been found for examinated epoch but some monotonic trends for some time intervals are presented.A detailed investigation of the original data and their correlation functions has shown that an existence of sufficiently reliable tracers is not evident for the whole set of examinated data. This should be taken into account in future more precise estimations of the green corona rotation period.

Download Full-text

Electing the Senate

10.23943/princeton/9780691163161.001.0001 ◽

2017 ◽

Cited By ~ 1

Author(s):

Wendy J. Schiller ◽

Charles Stewart III

Keyword(s):

Original Data ◽

State Legislators ◽

Internal Conflict ◽

Data Set ◽

Senate Elections ◽

Political Actors ◽

The Public ◽

Seventeenth Amendment ◽

The People ◽

Election Process

From 1789 to 1913, U.S. senators were not directly elected by the people—instead the Constitution mandated that they be chosen by state legislators. This radically changed in 1913, when the Seventeenth Amendment to the Constitution was ratified, giving the public a direct vote. This book investigates the electoral connections among constituents, state legislators, political parties, and U.S. senators during the age of indirect elections. The book finds that even though parties controlled the partisan affiliation of the winning candidate for Senate, they had much less control over the universe of candidates who competed for votes in Senate elections and the parties did not always succeed in resolving internal conflict among their rank and file. Party politics, money, and personal ambition dominated the election process, in a system originally designed to insulate the Senate from public pressure. The book uses an original data set of all the roll call votes cast by state legislators for U.S. senators from 1871 to 1913 and all state legislators who served during this time. Newspaper and biographical accounts uncover vivid stories of the political maneuvering, corruption, and partisanship—played out by elite political actors, from elected officials, to party machine bosses, to wealthy business owners—that dominated the indirect Senate elections process. The book raises important questions about the effectiveness of Constitutional reforms, such as the Seventeenth Amendment, that promised to produce a more responsive and accountable government.

Download Full-text

Simultaneous spatiotemporal super-resolution and multi-parametric fluorescence microscopy

Nature Communications ◽

10.1038/s41467-021-22002-9 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Jagadish Sankaran ◽

Harikrushnan Balasubramanian ◽

Wai Hoh Tang ◽

Xue Wen Ng ◽

Adrian Röllin ◽

...

Keyword(s):

Single Molecule ◽

Temporal Dynamics ◽

Growth Factor Receptor ◽

Super Resolution ◽

Actin Binding ◽

Biological Knowledge ◽

Data Set ◽

Epidermal Growth ◽

Molecular Brightness ◽

Super Resolution Microscopy

AbstractSuper-resolution microscopy and single molecule fluorescence spectroscopy require mutually exclusive experimental strategies optimizing either temporal or spatial resolution. To achieve both, we implement a GPU-supported, camera-based measurement strategy that highly resolves spatial structures (~100 nm), temporal dynamics (~2 ms), and molecular brightness from the exact same data set. Simultaneous super-resolution of spatial and temporal details leads to an improved precision in estimating the diffusion coefficient of the actin binding polypeptide Lifeact and corrects structural artefacts. Multi-parametric analysis of epidermal growth factor receptor (EGFR) and Lifeact suggests that the domain partitioning of EGFR is primarily determined by EGFR-membrane interactions, possibly sub-resolution clustering and inter-EGFR interactions but is largely independent of EGFR-actin interactions. These results demonstrate that pixel-wise cross-correlation of parameters obtained from different techniques on the same data set enables robust physicochemical parameter estimation and provides biological knowledge that cannot be obtained from sequential measurements.

Download Full-text

Styles of Representation in Constituencies in the Homeland and Abroad: The Case of Italy

Parliamentary Affairs ◽

10.1093/pa/gsaa063 ◽

2020 ◽

Author(s):

Eva Østergaard-Nielsen ◽

Stefano Camatarri

Keyword(s):

Original Data ◽

Role Orientation ◽

International Mobility ◽

Data Set ◽

Democratic Representation ◽

Public Debates ◽

At Home

Abstract The role orientation of political representatives and candidates is a longstanding concern in studies of democratic representation. The growing trend in countries to allow citizens abroad to candidate in homeland elections from afar provides an interesting opportunity for understanding how international mobility and context influences ideas of representation among these emigrant candidates. In public debates, emigrant candidates are often portrayed as delegates of the emigrant constituencies. However, drawing on the paradigmatic case of Italy and an original data set comprising emigrant candidates, we show that the perceptions of styles of representation abroad are more complex. Systemic differences between electoral districts at home and abroad are relevant for explaining why and how candidates develop a trustee or delegate orientation.

Download Full-text

Do the Heterogeneous Determinants of Repayment Affect Differently across Borrowers of Diverse Credit Sources in Rural Assam? A Double Hurdle Approach

Journal of Development Policy and Practice ◽

10.1177/24551333211031667 ◽

2021 ◽

pp. 245513332110316

Author(s):

Tiken Das ◽

Pradyut Guha ◽

Diganta Das

Keyword(s):

Rural Areas ◽

Probit Model ◽

Original Data ◽

Community Based ◽

Data Set ◽

Brahmaputra Valley ◽

Double Hurdle ◽

Instrumental Variable Probit ◽

Instrumental Variable Probit Model

This study made an attempt to answer the question: Do the heterogeneous determinants of repayment affect the borrowers of diverse credit sources differently? The study is based on data collected from 240 households from three districts in the lower Brahmaputra valley of Assam through a carefully designed primary survey. Besides, the study uses the double hurdle approach and the instrumental variable probit model to reduce possible selection bias. It observes better repayment performance among formal borrowers, followed by semiformal borrowers, while occupation wise it is prominent among organised employees. It has been found that in general, the household characteristics, loan characteristics and location-specific characteristics significantly affect repayment performance of borrowers. However, the nature of impact of the factors influencing repayment performance is remarkably different across credit sources. It ignores the role of traditional community-based organisations in rural Assam while analysing the determinants of repayment performance. The study also recommends for ensuring productive opportunities and efficient market linkages in rural areas of Assam. The study is based on an original data set that has specially been collected to examine question that—do the heterogeneous determinants of repayment affect the borrowers of diverse credit sources differently in the lower Brahmaputra valley of Assam—which has not been studied before.

Download Full-text

Data Augmentation Using Generative Adversarial Network for Automatic Machine Fault Detection Based on Vibration Signals

Applied Sciences ◽

10.3390/app11052166 ◽

2021 ◽

Vol 11 (5) ◽

pp. 2166

Author(s):

Van Bui ◽

Tung Lam Pham ◽

Huy Nguyen ◽

Yeong Min Jang

Keyword(s):

Fault Detection ◽

Data Augmentation ◽

Model Performance ◽

Original Data ◽

Fault Classification ◽

Training Process ◽

Generative Adversarial Network ◽

Data Set ◽

Adversarial Network ◽

Machine Fault

In the last decade, predictive maintenance has attracted a lot of attention in industrial factories because of its wide use of the Internet of Things and artificial intelligence algorithms for data management. However, in the early phases where the abnormal and faulty machines rarely appeared in factories, there were limited sets of machine fault samples. With limited fault samples, it is difficult to perform a training process for fault classification due to the imbalance of input data. Therefore, data augmentation was required to increase the accuracy of the learning model. However, there were limited methods to generate and evaluate the data applied for data analysis. In this paper, we introduce a method of using the generative adversarial network as the fault signal augmentation method to enrich the dataset. The enhanced data set could increase the accuracy of the machine fault detection model in the training process. We also performed fault detection using a variety of preprocessing approaches and classified the models to evaluate the similarities between the generated data and authentic data. The generated fault data has high similarity with the original data and it significantly improves the accuracy of the model. The accuracy of fault machine detection reaches 99.41% with 20% original fault machine data set and 93.1% with 0% original fault machine data set (only use generate data only). Based on this, we concluded that the generated data could be used to mix with original data and improve the model performance.

Download Full-text

Vielfalt and diversité: how local actors in France and Germany evaluate immigration and socio-cultural heterogeneity

Comparative Migration Studies ◽

10.1186/s40878-020-00205-1 ◽

2020 ◽

Vol 8 (1) ◽

Author(s):

Maria Schiller ◽

Christine Lang ◽

Karen Schönwälder ◽

Michalis Moutselos

Keyword(s):

Local Politics ◽

Original Data ◽

National Politics ◽

Data Set ◽

Large Cities ◽

Cultural Heterogeneity ◽

Societal Actors ◽

Local Actors ◽

Urban Level ◽

And Migration

AbstractIn both Germany and France, perceptions of immigration, diversity and their societal consequences have undergone important transformations in the past two decades. However, existing research has only partially captured such processes. The “grand narratives” of national approaches, while still influential, no longer explain contemporary realities. Further, analyses of national politics and discourses may not sufficiently reflect the realities across localities and society more broadly. While emerging in different national contexts, little is known about how diversity is actually perceived by political stakeholders at the urban level. Given the key role of immigration and diversity in current conflicts over Europe’s future, it is imperative to assess present-day conceptualisations of migration-related diversity among important societal actors.This article investigates perceptions and evaluations of socio-cultural heterogeneity by important societal actors in large cities. We contribute to existing literature by capturing an unusually broad set of actors from state and civil society. We also present data drawn from an unusually large number of cities. How influential is the perception of current society as heterogeneous, and what forms of heterogeneity are salient? And is socio-cultural and migration-related heterogeneity evaluated as threatening or rather as beneficial? Based on an original data set, this study explores the shared and contested ideas, the cognitive roadmaps of state and non-state actors involved in local politics.We argue that, in both German and French cities, socio-cultural heterogeneity is nowadays widely recognized as marking cities and often positively connoted. At the same time, perceptions of the main features of diversity and of the benefits and challenges attached to it vary. We find commonalities between French and German local actors, but also clear differences. In concluding, we suggest how and why national contexts importantly shape evaluations of diversity.

Download Full-text

How local personal vote-earning attributes affect the aggregate party vote share: Evidence from the Belgian flexible-list PR system (2003–2014)

Politics ◽

10.1177/0263395718811969 ◽

2018 ◽

Vol 39 (4) ◽

pp. 464-479

Author(s):

Gert-Jan Put ◽

Jef Smulders ◽

Bart Maddens

Keyword(s):

Regression Models ◽

Empirical Test ◽

Original Data ◽

Ordinary Least Squares ◽

Vote Share ◽

District Level ◽

National Party ◽

Data Set ◽

Geographically Distributed ◽

Party Vote Shares

This article investigates the effect of candidates exhibiting local personal vote-earning attributes (PVEA) on the aggregate party vote share at the district level. Previous research has often assumed that packing ballot lists with localized candidates increases the aggregate party vote and seat shares. We present a strict empirical test of this argument by analysing the relative electoral swing of ballot lists at the district level, a measure of change in party vote shares which controls for the national party trend and previous party results in the district. The analysis is based on data of 7527 candidacies during six Belgian regional and federal election cycles between 2003 and 2014, which is aggregated to an original data set of 223 ballot lists. The ordinary least squares (OLS) regression models do not show a significant effect of candidates exhibiting local PVEA on relative electoral swing of ballot lists. However, the results suggest that ballot lists do benefit electorally if candidates with local PVEA are geographically distributed over different municipalities in the district.

Download Full-text