Antibody performance in ChIP-sequencing assays: From quality scores of public data sets to quantitative certification

We have established a certification system for antibodies to be used in chromatin immunoprecipitation assays coupled to massive parallel sequencing (ChIP-seq). This certification comprises a standardized ChIP procedure and the attribution of a numerical quality control indicator (QCi) to biological replicate experiments. The QCi computation is based on a universally applicable quality assessment that quantitates the global deviation of randomly sampled subsets of ChIP-seq dataset with the original genome-aligned sequence reads. Comparison with a QCi database for >28,000 ChIP-seq assays were used to attribute quality grades (ranging from ‘AAA’ to ‘DDD’) to a given dataset. In the present report we used the numerical QC system to assess the factors influencing the quality of ChIP-seq assays, including the nature of the target, the sequencing depth and the commercial source of the antibody. We have used this approach specifically to certify mono and polyclonal antibodies obtained from Active Motif directed against the histone modification marks H3K4me3, H3K27ac and H3K9ac for ChIP-seq. The antibodies received the grades AAA to BBC (www.ngs-qc.org). We propose to attribute such quantitative grading of all antibodies attributed with the label “ChIP-seq grade”.

Download Full-text

Antibody performance in ChIP-sequencing assays: From quality scores of public data sets to quantitative certification

F1000Research ◽

10.12688/f1000research.7637.2 ◽

2016 ◽

Vol 5 ◽

pp. 54 ◽

Cited By ~ 5

Author(s):

Marco-Antonio Mendoza-Parra ◽

Vincent Saravaki ◽

Pierre-Etienne Cholley ◽

Matthias Blum ◽

Benjamin Billoré ◽

...

Keyword(s):

Chromatin Immunoprecipitation ◽

Polyclonal Antibodies ◽

Present Report ◽

Data Sets ◽

Chip Sequencing ◽

Public Data ◽

Original Genome ◽

Commercial Source ◽

Attribute Quality

Download Full-text

A SELF-ORGANIZING MAP FOR MIXED CONTINUOUS AND CATEGORICAL DATA

International Journal of Computing ◽

10.47839/ijc.10.1.733 ◽

2011 ◽

pp. 24-32 ◽

Cited By ~ 1

Author(s):

Nicoleta Rogovschi ◽

Mustapha Lebbah ◽

Younès Bennani

Keyword(s):

Clustering Algorithm ◽

Clustering Algorithms ◽

Mixed Data ◽

Categorical Variables ◽

Data Sets ◽

Self Organizing Map ◽

Data Set ◽

Public Data ◽

Self Organizing

Most traditional clustering algorithms are limited to handle data sets that contain either continuous or categorical variables. However data sets with mixed types of variables are commonly used in data mining field. In this paper we introduce a weighted self-organizing map for clustering, analysis and visualization mixed data (continuous/binary). The learning of weights and prototypes is done in a simultaneous manner assuring an optimized data clustering. More variables has a high weight, more the clustering algorithm will take into account the informations transmitted by these variables. The learning of these topological maps is combined with a weighting process of different variables by computing weights which influence the quality of clustering. We illustrate the power of this method with data sets taken from a public data set repository: a handwritten digit data set, Zoo data set and other three mixed data sets. The results show a good quality of the topological ordering and homogenous clustering.

Download Full-text

Saliency Optimization and Integration Via Iterative Bootstrap Learning

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001418590164 ◽

2018 ◽

Vol 32 (09) ◽

pp. 1859016

Author(s):

Liming Li ◽

Xiaodong Chai ◽

Shuguang Zhao ◽

Shubin Zheng ◽

Shengchao Su

Keyword(s):

Multiple Scales ◽

Saliency Detection ◽

Saliency Map ◽

Input Image ◽

Training Data ◽

Data Sets ◽

Saliency Maps ◽

Training Samples ◽

Public Data

This paper proposes an effective method to elevate the performance of saliency detection via iterative bootstrap learning, which consists of two tasks including saliency optimization and saliency integration. Specifically, first, multiscale segmentation and feature extraction are performed on the input image successively. Second, prior saliency maps are generated using existing saliency models, which are used to generate the initial saliency map. Third, prior maps are fed into the saliency regressor together, where training samples are collected from the prior maps at multiple scales and the random forest regressor is learned from such training data. An integration of the initial saliency map and the output of saliency regressor is deployed to generate the coarse saliency map. Finally, in order to improve the quality of saliency map further, both initial and coarse saliency maps are fed into the saliency regressor together, and then the output of the saliency regressor, the initial saliency map as well as the coarse saliency map are integrated into the final saliency map. Experimental results on three public data sets demonstrate that the proposed method consistently achieves the best performance and significant improvement can be obtained when applying our method to existing saliency models.

Download Full-text

A PROBABILISTIC SELF-ORGANIZING MAP FOR BINARY DATA TOPOGRAPHIC CLUSTERING

International Journal of Computational Intelligence and Applications ◽

10.1142/s1469026808002351 ◽

2008 ◽

Vol 07 (04) ◽

pp. 363-383 ◽

Cited By ~ 10

Author(s):

MUSTAPHA LEBBAH ◽

YOUNÈS BENNANI ◽

NICOLETA ROGOVSCHI

Keyword(s):

Binary Data ◽

Learning Algorithm ◽

Data Sets ◽

Self Organizing Map ◽

Data Set ◽

Binary Coding ◽

Public Data ◽

Multivariate Binary Data ◽

Self Organizing

This paper introduces a probabilistic self-organizing map for topographic clustering, analysis and visualization of multivariate binary data or categorical data using binary coding. We propose a probabilistic formalism dedicated to binary data in which cells are represented by a Bernoulli distribution. Each cell is characterized by a prototype with the same binary coding as used in the data space and the probability of being different from this prototype. The learning algorithm, Bernoulli on self-organizing map, that we propose is an application of the EM standard algorithm. We illustrate the power of this method with six data sets taken from a public data set repository. The results show a good quality of the topological ordering and homogenous clustering.

Download Full-text

NUCOME: A comprehensive database of nucleosome organization referenced landscapes in mammalian genomes

BMC Bioinformatics ◽

10.1186/s12859-021-04239-9 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Xiaolan Chen ◽

Hui Yang ◽

Guifen Liu ◽

Yong Zhang

Keyword(s):

Data Quality ◽

Cell Types ◽

Data Sets ◽

Data Quality Control ◽

Data Set ◽

Redundant Data ◽

Nucleosome Organization ◽

Public Data ◽

Mammalian Genomes

Abstract Background Nucleosome organization is involved in many regulatory activities in various organisms. However, studies integrating nucleosome organization in mammalian genomes are very limited mainly due to the lack of comprehensive data quality control (QC) assessment and uneven data quality of public data sets. Results The NUCOME is a database focused on filtering qualified nucleosome organization referenced landscapes covering various cell types in human and mouse based on QC metrics. The filtering strategy guarantees the quality of nucleosome organization referenced landscapes and exempts users from redundant data set selection and processing. The NUCOME database provides standardized, qualified data source and informative nucleosome organization features at a whole-genome scale and on the level of individual loci. Conclusions The NUCOME provides valuable data resources for integrative analyses focus on nucleosome organization. The NUCOME is freely available at http://compbio-zhanglab.org/NUCOME.

Download Full-text

Trust in the Police: Cross-country Comparisons

Voprosy Ekonomiki ◽

10.32609/0042-8736-2012-11-24-47 ◽

2012 ◽

pp. 24-47

Author(s):

V. Gimpelson ◽

G. Monusova

Keyword(s):

Public Opinion ◽

Public Attitudes ◽

Crime Rates ◽

Authoritarian Regimes ◽

Data Sets ◽

The Public ◽

Positive Attitudes ◽

Cross Country ◽

Police Activity

Using different cross-country data sets and simple econometric techniques we study public attitudes towards the police. More positive attitudes are more likely to emerge in the countries that have better functioning democratic institutions, less prone to corruption but enjoy more transparent and accountable police activity. This has a stronger impact on the public opinion (trust and attitudes) than objective crime rates or density of policemen. Citizens tend to trust more in those (policemen) with whom they share common values and can have some control over. The latter is a function of democracy. In authoritarian countries — “police states” — this tendency may not work directly. When we move from semi-authoritarian countries to openly authoritarian ones the trust in the police measured by surveys can also rise. As a result, the trust appears to be U-shaped along the quality of government axis. This phenomenon can be explained with two simple facts. First, publicly spread information concerning police activity in authoritarian countries is strongly controlled; second, the police itself is better controlled by authoritarian regimes which are afraid of dangerous (for them) erosion of this institution.

Download Full-text

Toward an Evidence-Based Definition and Classification of Carbohydrate Food Quality: An Expert Panel Report

Nutrients ◽

10.3390/nu13082667 ◽

2021 ◽

Vol 13 (8) ◽

pp. 2667

Author(s):

Kevin B. Comerford ◽

Yanni Papanikolaou ◽

Julie Miller Jones ◽

Judith Rodriguez ◽

Joanne Slavin ◽

...

Keyword(s):

Food Quality ◽

Diet Quality ◽

Expert Panel ◽

Present Report ◽

Food Group ◽

Dietary Guidelines ◽

Evidence Based ◽

Carbohydrate Quality ◽

Dietary Guidelines For Americans

Carbohydrate-containing crops provide the bulk of dietary energy worldwide. In addition to their various carbohydrate forms (sugars, starches, fibers) and ratios, these foods may also contain varying amounts and combinations of proteins, fats, vitamins, minerals, phytochemicals, prebiotics, and anti-nutritional factors that may impact diet quality and health. Currently, there is no standardized or unified way to assess the quality of carbohydrate foods for the overall purpose of improving diet quality and health outcomes, creating an urgent need for the development of metrics and tools to better define and classify high-quality carbohydrate foods. The present report is based on a series of expert panel meetings and a scoping review of the literature focused on carbohydrate quality indicators and metrics produced over the last 10 years. The report outlines various approaches to assessing food quality, and proposes next steps and principles for developing improved metrics for assessing carbohydrate food quality. The expert panel concluded that a composite metric based on nutrient profiling methods featuring inputs such as carbohydrate–fiber–sugar ratios, micronutrients, and/or food group classification could provide useful and informative measures for guiding researchers, policymakers, industry, and consumers towards a better understanding of carbohydrate food quality and overall healthier diets. The identification of higher quality carbohydrate foods could improve evidence-based public health policies and programming—such as the 2025–2030 Dietary Guidelines for Americans.

Download Full-text

MULFE: Multi-Label Learning via Label-Specific Feature Space Ensemble

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3451392 ◽

2021 ◽

Vol 16 (1) ◽

pp. 1-24

Author(s):

Yaojin Lin ◽

Qinghua Hu ◽

Jinghua Liu ◽

Xingquan Zhu ◽

Xindong Wu

Keyword(s):

Empirical Studies ◽

Feature Space ◽

Training Data ◽

Data Sets ◽

Learning Framework ◽

Feature Spaces ◽

Public Data ◽

Margin Distribution ◽

Label Correlations ◽

Label Correlation

In multi-label learning, label correlations commonly exist in the data. Such correlation not only provides useful information, but also imposes significant challenges for multi-label learning. Recently, label-specific feature embedding has been proposed to explore label-specific features from the training data, and uses feature highly customized to the multi-label set for learning. While such feature embedding methods have demonstrated good performance, the creation of the feature embedding space is only based on a single label, without considering label correlations in the data. In this article, we propose to combine multiple label-specific feature spaces, using label correlation, for multi-label learning. The proposed algorithm, mu lti- l abel-specific f eature space e nsemble (MULFE), takes consideration label-specific features, label correlation, and weighted ensemble principle to form a learning framework. By conducting clustering analysis on each label’s negative and positive instances, MULFE first creates features customized to each label. After that, MULFE utilizes the label correlation to optimize the margin distribution of the base classifiers which are induced by the related label-specific feature spaces. By combining multiple label-specific features, label correlation based weighting, and ensemble learning, MULFE achieves maximum margin multi-label classification goal through the underlying optimization framework. Empirical studies on 10 public data sets manifest the effectiveness of MULFE.

Download Full-text

Plasma Cleaning Improves the Image Quality of Serial Block-face Scanning Electron Microscopy (SBFSEM) Volumetric Data Sets

Microscopy and Microanalysis ◽

10.1017/s1431927617006997 ◽

2017 ◽

Vol 23 (S1) ◽

pp. 1266-1267 ◽

Cited By ~ 1

Author(s):

Barbara Armbruster ◽

Christopher Booth ◽

Stuart Searle ◽

Michael Cable ◽

Ronald Vane

Keyword(s):

Electron Microscopy ◽

Scanning Electron Microscopy ◽

Image Quality ◽

Data Sets ◽

Plasma Cleaning ◽

Volumetric Data ◽

Face Scanning ◽

Block Face ◽

Scanning Electron

Download Full-text

Improvements for research data repositories: The case of text spam

Journal of Information Science ◽

10.1177/0165551521998636 ◽

2021 ◽

pp. 016555152199863

Author(s):

Ismael Vázquez ◽

María Novo-Lourés ◽

Reyes Pavón ◽

Rosalía Laza ◽

José Ramón Méndez ◽

...

Keyword(s):

Web Application ◽

Research Data ◽

Data Sets ◽

Data Repositories ◽

Software Applications ◽

Public Data ◽

Protection Mechanisms ◽

Experimental Protocols ◽

Learning Research ◽

Processing Steps

Current research has evolved in such a way scientists must not only adequately describe the algorithms they introduce and the results of their application, but also ensure the possibility of reproducing the results and comparing them with those obtained through other approximations. In this context, public data sets (sometimes shared through repositories) are one of the most important elements for the development of experimental protocols and test benches. This study has analysed a significant number of CS/ML ( Computer Science/ Machine Learning) research data repositories and data sets and detected some limitations that hamper their utility. Particularly, we identify and discuss the following demanding functionalities for repositories: (1) building customised data sets for specific research tasks, (2) facilitating the comparison of different techniques using dissimilar pre-processing methods, (3) ensuring the availability of software applications to reproduce the pre-processing steps without using the repository functionalities and (4) providing protection mechanisms for licencing issues and user rights. To show the introduced functionality, we created STRep (Spam Text Repository) web application which implements our recommendations adapted to the field of spam text repositories. In addition, we launched an instance of STRep in the URL https://rdata.4spam.group to facilitate understanding of this study.

Download Full-text