scholarly journals NUCOME: A comprehensive database of nucleosome organization referenced landscapes in mammalian genomes

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Xiaolan Chen ◽  
Hui Yang ◽  
Guifen Liu ◽  
Yong Zhang

Abstract Background Nucleosome organization is involved in many regulatory activities in various organisms. However, studies integrating nucleosome organization in mammalian genomes are very limited mainly due to the lack of comprehensive data quality control (QC) assessment and uneven data quality of public data sets. Results The NUCOME is a database focused on filtering qualified nucleosome organization referenced landscapes covering various cell types in human and mouse based on QC metrics. The filtering strategy guarantees the quality of nucleosome organization referenced landscapes and exempts users from redundant data set selection and processing. The NUCOME database provides standardized, qualified data source and informative nucleosome organization features at a whole-genome scale and on the level of individual loci. Conclusions The NUCOME provides valuable data resources for integrative analyses focus on nucleosome organization. The NUCOME is freely available at http://compbio-zhanglab.org/NUCOME.

2011 ◽  
pp. 24-32 ◽  
Author(s):  
Nicoleta Rogovschi ◽  
Mustapha Lebbah ◽  
Younès Bennani

Most traditional clustering algorithms are limited to handle data sets that contain either continuous or categorical variables. However data sets with mixed types of variables are commonly used in data mining field. In this paper we introduce a weighted self-organizing map for clustering, analysis and visualization mixed data (continuous/binary). The learning of weights and prototypes is done in a simultaneous manner assuring an optimized data clustering. More variables has a high weight, more the clustering algorithm will take into account the informations transmitted by these variables. The learning of these topological maps is combined with a weighting process of different variables by computing weights which influence the quality of clustering. We illustrate the power of this method with data sets taken from a public data set repository: a handwritten digit data set, Zoo data set and other three mixed data sets. The results show a good quality of the topological ordering and homogenous clustering.


Author(s):  
MUSTAPHA LEBBAH ◽  
YOUNÈS BENNANI ◽  
NICOLETA ROGOVSCHI

This paper introduces a probabilistic self-organizing map for topographic clustering, analysis and visualization of multivariate binary data or categorical data using binary coding. We propose a probabilistic formalism dedicated to binary data in which cells are represented by a Bernoulli distribution. Each cell is characterized by a prototype with the same binary coding as used in the data space and the probability of being different from this prototype. The learning algorithm, Bernoulli on self-organizing map, that we propose is an application of the EM standard algorithm. We illustrate the power of this method with six data sets taken from a public data set repository. The results show a good quality of the topological ordering and homogenous clustering.


2021 ◽  
Author(s):  
Rishabh Deo Pandey ◽  
Itu Snigdh

Abstract Data quality became significant with the emergence of data warehouse systems. While accuracy is intrinsic data quality, validity of data presents a wider perspective, which is more representational and contextual in nature. Through our article we present a different perspective in data collection and collation. We focus on faults experienced in data sets and present validity as a function of allied parameters such as completeness, usability, availability and timeliness for determining the data quality. We also analyze the applicability of these metrics and apply modifications to make it conform to IoT applications. Another major focus of this article is to verify these metrics on aggregated data set instead of separate data values. This work focuses on using the different validation parameters for determining the quality of data generated in a pervasive environment. Analysis approach presented is simple and can be employed to test the validity of collected data, isolate faults in the data set and also measure the suitability of data before applying algorithms for analysis.


2021 ◽  
Vol 35 (6) ◽  
pp. 137-146
Author(s):  
Haeyoon Lee ◽  
Muheon Jeong ◽  
Inseon Park

The purpose of this study is to obtain implications through comparative analysis of disaster safety datasets and services of representative public data portals in Korea and Japan. Comparative standards were established first. Then, dataset weight analysis of disaster-type and safety-management -stage components, trend analysis through text mining on data-set descriptions, and data quality and portal services analysis were performed. As a result public data sets were lower in Korea both numerically and proportionally than in Japan. Japan had a high proportion of disaster preparation and recovery datasets in terms of disaster safety management, while Korea had a high proportion of prevention data-sets. In addition, in terms of disaster response collaboration, most of Korea has material management and resource support, but Japan has high proportion of emergency recovery and situation management of damaged facilities. In terms of data quality, Japan has many datasets with four levels of Berners-Lee rating. However Korea has a high proportion of datasets with three levels of Beners-Lee rating. However, Korea has a better data format for big-data utilization. Portal services are mainly centered on natural disasters in Japan, but in Korea, they are centered on social disasters. The results of this study provide a reference for the future direction of disaster safety public data portals in Korea.


2003 ◽  
Vol 36 (3) ◽  
pp. 931-939 ◽  
Author(s):  
Henning Osholm Sørensen ◽  
Sine Larsen

The influence of the different experimental parameters on the quality of the diffraction data collected on tetrafluoroterephthalonitrile (TFT) with a Nonius KappaCCD instrument has been examined. Data sets measured with different scan widths (0.25°, 0.50°, 1.0°) and scan times (70 s/° and 140 s/°) were compared with a highly redundant data set collected with an Enraf–Nonius CAD4 point detector diffractometer. As part of this analysis it was investigated how the parameters employed during the data reduction performed with theEvalCCDandSORTAVprograms affect the quality of the data. The KappaCCD data sets did not show any significant contamination from λ/2 radiation and possess good internal consistency with lowRintvalues. Decreasing the scan width seems to increase the standard uncertainties, which conversely are improved by an increase in the scan time. The suitability of the KappaCCD instrument to measure data to be used in charge density studies was also examined by performing a charge density data collection with the KappaCCD instrument. The same multipole model was used in the refinement of these data and of the CAD4 data. The two refinements gave almost identical parameters and residual electron densities. The topological analysis of the resulting static electron densities shows that the bond critical points have the same characteristics.


2014 ◽  
Vol 11 (2) ◽  
Author(s):  
Pavol Král’ ◽  
Lukáš Sobíšek ◽  
Mária Stachová

Data quality can be seen as a very important factor for the validity of information extracted from data sets using statistical or data mining procedures. In the paper we propose a description of data quality allowing us to characterize data quality of the whole data set, as well as data quality of particular variables and individual cases. On the basis of the proposed description, we define a distance based measure of data quality for individual cases as a distance of the cases from the ideal one. Such a measure can be used as additional information for preparation of a training data set, fitting models, decision making based on results of analyses etc. It can be utilized in different ways ranging from a simple weighting function to belief functions.


2009 ◽  
Vol 14 (9) ◽  
pp. 1054-1066 ◽  
Author(s):  
Keith A. Houck ◽  
David J. Dix ◽  
Richard S. Judson ◽  
Robert J. Kavlock ◽  
Jian Yang ◽  
...  

The complexity of human biology has made prediction of health effects as a consequence of exposure to environmental chemicals especially challenging. Complex cell systems, such as the Biologically Multiplexed Activity Profiling (BioMAP) primary, human, cell-based disease models, leverage cellular regulatory networks to detect and distinguish chemicals with a broad range of target mechanisms and biological processes relevant to human toxicity. Here the authors use the BioMAP human cell systems to characterize effects relevant to human tissue and inflammatory disease biology following exposure to the 320 environmental chemicals in the Environmental Protection Agency’s (EPA’s) ToxCast phase I library. The ToxCast chemicals were assayed at 4 concentrations in 8 BioMAP cell systems, with a total of 87 assay endpoints resulting in more than 100,000 data points. Within the context of the BioMAP database, ToxCast compounds could be classified based on their ability to cause overt cytotoxicity in primary human cell types or according to toxicity mechanism class derived from comparisons to activity profiles of BioMAP reference compounds. ToxCast chemicals with similarity to inducers of mitochondrial dysfunction, cAMP elevators, inhibitors of tubulin function, inducers of endoplasmic reticulum stress, or NFκB pathway inhibitors were identified based on this BioMAP analysis. This data set is being combined with additional ToxCast data sets for development of predictive toxicity models at the EPA. ( Journal of Biomolecular Screening 2009:1054-1066)


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Jiawei Lian ◽  
Junhong He ◽  
Yun Niu ◽  
Tianze Wang

Purpose The current popular image processing technologies based on convolutional neural network have the characteristics of large computation, high storage cost and low accuracy for tiny defect detection, which is contrary to the high real-time and accuracy, limited computing resources and storage required by industrial applications. Therefore, an improved YOLOv4 named as YOLOv4-Defect is proposed aim to solve the above problems. Design/methodology/approach On the one hand, this study performs multi-dimensional compression processing on the feature extraction network of YOLOv4 to simplify the model and improve the feature extraction ability of the model through knowledge distillation. On the other hand, a prediction scale with more detailed receptive field is added to optimize the model structure, which can improve the detection performance for tiny defects. Findings The effectiveness of the method is verified by public data sets NEU-CLS and DAGM 2007, and the steel ingot data set collected in the actual industrial field. The experimental results demonstrated that the proposed YOLOv4-Defect method can greatly improve the recognition efficiency and accuracy and reduce the size and computation consumption of the model. Originality/value This paper proposed an improved YOLOv4 named as YOLOv4-Defect for the detection of surface defect, which is conducive to application in various industrial scenarios with limited storage and computing resources, and meets the requirements of high real-time and precision.


2018 ◽  
Author(s):  
Brian Hie ◽  
Bryan Bryson ◽  
Bonnie Berger

AbstractResearchers are generating single-cell RNA sequencing (scRNA-seq) profiles of diverse biological systems1–4 and every cell type in the human body.5 Leveraging this data to gain unprecedented insight into biology and disease will require assembling heterogeneous cell populations across multiple experiments, laboratories, and technologies. Although methods for scRNA-seq data integration exist6,7, they often naively merge data sets together even when the data sets have no cell types in common, leading to results that do not correspond to real biological patterns. Here we present Scanorama, inspired by algorithms for panorama stitching, that overcomes the limitations of existing methods to enable accurate, heterogeneous scRNA-seq data set integration. Our strategy identifies and merges the shared cell types among all pairs of data sets and is orders of magnitude faster than existing techniques. We use Scanorama to combine 105,476 cells from 26 diverse scRNA-seq experiments across 9 different technologies into a single comprehensive reference, demonstrating how Scanorama can be used to obtain a more complete picture of cellular function across a wide range of scRNA-seq experiments.


2017 ◽  
Vol 6 (3) ◽  
pp. 71 ◽  
Author(s):  
Claudio Parente ◽  
Massimiliano Pepe

The purpose of this paper is to investigate the impact of weights in pan-sharpening methods applied to satellite images. Indeed, different data sets of weights have been considered and compared in the IHS and Brovey methods. The first dataset contains the same weight for each band while the second takes in account the weighs obtained by spectral radiance response; these two data sets are most common in pan-sharpening application. The third data set is resulting by a new method. It consists to compute the inertial moment of first order of each band taking in account the spectral response. For testing the impact of the weights of the different data sets, WorlView-3 satellite images have been considered. In particular, two different scenes (the first in urban landscape, the latter in rural landscape) have been investigated. The quality of pan-sharpened images has been analysed by three different quality indexes: Root mean square error (RMSE), Relative average spectral error (RASE) and Erreur Relative Global Adimensionnelle de Synthèse (ERGAS).


Sign in / Sign up

Export Citation Format

Share Document