Data quality of Whole Genome Bisulfite Sequencing on Illumina platforms

Mapping Intimacies ◽

10.1101/188797 ◽

2017 ◽

Author(s):

Amanda Raine ◽

Ulrika Liljedahl ◽

Jessica Nordlund

Keyword(s):

Dna Methylation ◽

Data Quality ◽

Large Scale ◽

Cost Effective ◽

Bisulfite Treatment ◽

Real Time Analysis ◽

Genome Bisulfite Sequencing ◽

Software Upgrade ◽

Impact Data

AbstractThe powerful HiSeq X sequencers with their patterned flowcell technology and fast turnaround times are instrumental for many large-scale genomic and epigenomic studies. However, assessment of DNA methylation by sodium bisulfite treatment results in sequencing libraries of low diversity, which may impact data quality and yield. In this report we assess the quality of WGBS data generated on the HiSeq X system in comparison with data generated on the HiSeq 2500 system and the newly released NovaSeq system. We report a systematic issue with low basecall quality scores assigned to guanines in the second read of WGBS when using certain Real Time Analysis (RTA) software versions on the HiSeq X sequencer, reminiscent of an issue that was previously reported with certain HiSeq 2500 software versions. However, with the HD.3.4.0/RTA 2.7.7 software upgrade for the HiSeq X system, we observed an overall improved quality and yield of the WGBS data generated, which in turn empowers cost-effective and high quality DNA methylation studies.

Download Full-text

Data Quality Associated with Big Data Processing: A Survey

Journal of University of Shanghai for Science and Technology ◽

10.51201/jusst/21/05386 ◽

2021 ◽

Vol 23 (06) ◽

pp. 1011-1018

Author(s):

Aishrith P Rao ◽

◽

Raghavendra J C ◽

Dr. Sowmyarani C N ◽

Dr. Padmashree T ◽

...

Keyword(s):

Big Data ◽

Data Quality ◽

Data Gathering ◽

Cost Effective ◽

Data Repository ◽

Critical Approach ◽

Critical Aspect ◽

Quality Of Data ◽

Data Group

With the advancement of technology and the large volume of data produced, processed, and stored, it is becoming increasingly important to maintain the quality of data in a cost-effective and productive manner. The most important aspects of Big Data (BD) are storage, processing, privacy, and analytics. The Big Data group has identified quality as a critical aspect of its maturity. Nonetheless, it is a critical approach that should be adopted early in the lifecycle and gradually extended to other primary processes. Companies are very reliant and drive profits from the huge amounts of data they collect. When its consistency deteriorates, the ramifications are uncertain and may result in completely undesirable conclusions. In the sense of BD, determining data quality is difficult, but it is essential that we uphold the data quality before we can proceed with any analytics. We investigate data quality during the stages of data gathering, preprocessing, data repository, and evaluation/analysis of BD processing in this paper. The related solutions are also suggested based on the elaboration and review of the proposed problems.

Download Full-text

Direct Synthesis of MoSe2 Thin Films on SiO2/Si Using Selenization Process of Sputtered Molybdenum

Journal of Nanoelectronics and Optoelectronics ◽

10.1166/jno.2020.2829 ◽

2020 ◽

Vol 15 (5) ◽

pp. 580-585

Author(s):

Dae-Young Um ◽

R. Nandi ◽

Jeong-Hun Yang ◽

Jin-Soo Kim ◽

Jong-Woong Kim ◽

...

Keyword(s):

Large Scale ◽

Direct Synthesis ◽

Cost Effective ◽

Metal Films ◽

Crystalline Quality ◽

Research Attention ◽

Molybdenum Diselenide ◽

Potential Applications ◽

Direct Current Sputtering

Recently, molybdenum diselenide (MoSe2) has attracted nascent research attention for potential applications in electronic and optoelectronic devices due to its unique properties including tunable bandgap, strong photoluminescence and large exciton binding energy. However, the synthesis of reproducible, controlled and large scale MoSe2 films is still a great challenge. Here, we have investigated the morphology, structure and crystalline quality of MoSe2 films synthesized by the selenization of Mo metal films. The Mo metal films of different thicknesses were deposited at room temperature by direct current sputtering. Subsequently, MoSe2 films were prepared by selenization of sputtered Mo films at 550 °C for 20 minutes. The obtained MoSe2 films are polycrystalline with hexagonal crystal structure. The crystalline quality of the MoSe2 films is improved with increase in the thickness of Mo metal films. The MoSe2 films are found to be n-type in nature and reasonably stoichiometric (Mo/Se ratio ∼1:1.9). This study provides an experimental demonstration of an alternative cost-effective direct synthesis of MoSe2 films on SiO2/Si for the applications of semiconductor devices.

Download Full-text

GROUP COGNITIVE THERAPY: TREATMENT OF CHOICE OR SUB-OPTIMAL OPTION?

Behavioural and Cognitive Psychotherapy ◽

10.1017/s1352465801003058 ◽

2001 ◽

Vol 29 (3) ◽

pp. 311-332 ◽

Cited By ~ 50

Author(s):

Norma Morrison

Keyword(s):

Group Therapy ◽

Cognitive Therapy ◽

Large Scale ◽

Cost Effective ◽

Advantages And Disadvantages ◽

Quality Of Therapy ◽

Group Format ◽

Cognitive Behavioural ◽

Therapy Treatment

In the present climate of limited resources and long waiting lists, it is not surprising that there is more emphasis on making sure that psychological treatments are not only clinically sound but also cost-effective. One solution to this is to provide time-limited, focused interventions such as cognitive therapy. Another obvious solution is to deliver treatment in groups rather than individually. However, what evidence is there that therapy can be delivered as effectively in groups as individually? This review will look at which different formats have been tried, what the advantages and disadvantages of those formats might be, which client groups have been targeted for cognitive- behavioural group therapy (CBGT), and whether a group format in general offers any advantages over individual CBT. Outcome studies and their implications for the use of CBGT are considered. Results suggest that, in most client groups, there is little difference in efficacy between group and individual CBT, although there is some evidence that results for some types of patient can be disappointing in CBGT. It may be that the best compromise in terms of cost- effectiveness between quality of therapy and quantity of patients treated is offered by large-scale psychoeducational didactic group therapy.

Download Full-text

DNA methylation estimation using methylation-sensitive restriction enzyme bisulfite sequencing (MREBS)

10.1101/217208 ◽

2017 ◽

Cited By ~ 2

Author(s):

Giancarlo Bonora ◽

Liudmilla Rubbi ◽

Marco Morselli ◽

Constantinos Chronis ◽

Kathrin Plath ◽

...

Keyword(s):

Dna Methylation ◽

Restriction Enzyme ◽

Large Scale ◽

Bisulfite Sequencing ◽

Multiple Regression Model ◽

Reduced Representation ◽

A Genome ◽

Genome Bisulfite Sequencing ◽

Methylation Sensitive Restriction Enzyme ◽

Wide Scale

ABSTRACTWhole-genome bisulfite sequencing (WGBS) and reduced representation bisulfite sequencing (RRBS) are widely used for measuring DNA methylation levels on a genome-wide scale(1). Both methods have limitations: WGBS is expensive and prohibitive for most large-scale projects; RRBS only interrogates 6-12% of the CpGs in the human genome(16,19). Here, we introduce methylation-sensitive restriction enzyme bisulfite sequencing (MREBS) which has the reduced sequencing requirements of RRBS, but significantly expands the coverage of CpG sites in the genome. We built a multiple regression model that combines the two features of MREBS: the bisulfite conversion ratios of single cytosines (as in WGBS and RRBS) as well as the number of reads that cover each locus (as in MRE-seq(12)). This combined approach allowed us to estimate differential methylation across 60% of the genome using read count data alone, and where counts were sufficiently high in both samples (about 1.5% of the genome), our estimates were significantly improved by the single CpG conversion information. We show that differential DNA methylation values based on MREBS data correlate well with those based on WGBS and RRBS. This newly developed technique combines the sequencing cost of RRBS and DNA methylation estimates on a portion of the genome similar to WGBS, making it ideal for large-scale projects of mammalian genomes.

Download Full-text

EM-seq: Detection of DNA Methylation at Single Base Resolution from Picograms of DNA

10.1101/2019.12.20.884692 ◽

2019 ◽

Cited By ~ 7

Author(s):

Romualdas Vaisvila ◽

V. K. Chaithanya Ponnaluri ◽

Zhiyi Sun ◽

Bradley W. Langhorst ◽

Lana Saleh ◽

...

Keyword(s):

Dna Methylation ◽

Bisulfite Sequencing ◽

Nucleotide Composition ◽

Whole Genome ◽

Sequencing Data ◽

Genomic Features ◽

Single Base ◽

Bisulfite Treatment ◽

Genome Bisulfite Sequencing ◽

Single Base Resolution

AbstractBisulfite sequencing is widely used to detect 5mC and 5hmC at single base resolution. However, bisulfite treatment damages DNA resulting in fragmentation, loss of DNA and biased sequencing data. To overcome this, we developed Enzymatic Methyl-seq (EM-seq), an enzymatic based approach that uses as little as 100 pg of DNA. EM-seq outperformed bisulfite converted libraries in all metrics examined including coverage, duplication, sensitivity and nucleotide composition. EM-seq libraries displayed even GC distribution, improved correlation across input amounts as well as increased representation of genomic features. These data indicate that EM-seq is more accurate and reliable than whole genome bisulfite sequencing (WGBS).

Download Full-text

Upstream to downstream: a multiple-assessment-point approach for targeting non-point-source priority management areas at large watershed scale

Hydrology and Earth System Sciences Discussions ◽

10.5194/hessd-10-14535-2013 ◽

2013 ◽

Vol 10 (11) ◽

pp. 14535-14555

Author(s):

L. Chen ◽

Y. Zhong ◽

G. Wei ◽

Z. Shen

Keyword(s):

Point Source ◽

Large Scale ◽

Cost Effective ◽

Small Scale ◽

Watershed Scale ◽

Nps Pollution ◽

Specific Assessment ◽

High Level ◽

Multiple Assessment

Abstract. The identification of priority management areas (PMAs) is essential for the control of non-point source (NPS) pollution, especially for a large-scale watershed. However, previous studies have typically focused on small-scale catchments adjacent to specific assessment points; thus, the interactions between multiple river points remain poorly understood. In this study, a multiple-assessment-point PMA (MAP-PMA) framework was proposed by integrating the upstream sources and the downstream transport aspects of NPS pollution. Based on the results, the integration of the upstream input changes was vital for the final PMAs map, especially for downstream areas. Contrary to conventional wisdom, this research recommended that the NPS pollutants could be best controlled among the upstream high-level PMAs when protecting the water quality of the entire watershed. The MAP-PMA framework provided a more cost-effective tool for the establishment of conservation practices, especially for a large-scale watershed.

Download Full-text

Current Advances in DNA Methylation Analysis Methods

BioMed Research International ◽

10.1155/2021/8827516 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Ehsan Khodadadi ◽

Leila Fahmideh ◽

Ehsaneh Khodadadi ◽

Sounkalo Dao ◽

Mehdi Yousefi ◽

...

Keyword(s):

Dna Methylation ◽

Large Scale ◽

Primary Methods ◽

Bisulfite Treatment ◽

Advantages And Disadvantages ◽

Methylation Of Dna ◽

Biological Identification ◽

Methyl Cytosine ◽

Sequencing Platforms ◽

New Generation

DNA methylation is one of the epigenetic changes, which plays a major role in regulating gene expression and, thus, many biological processes and diseases. There are several methods for determining the methylation of DNA samples. However, selecting the most appropriate method for answering biological questions appears to be a challenging task. The primary methods in DNA methylation focused on identifying the state of methylation of the examined genes and determining the total amount of 5-methyl cytosine. The study of DNA methylation at a large scale of genomic levels became possible following the use of microarray hybridization technology. The new generation of sequencing platforms now allows the preparation of genomic maps of DNA methylation at the single-open level. This review includes the majority of methods available to date, introducing the most widely used methods, the bisulfite treatment, biological identification, and chemical cutting along with their advantages and disadvantages. The techniques are then scrutinized according to their robustness, high throughput capabilities, and cost.

Download Full-text

Data quality of whole genome bisulfite sequencing on Illumina platforms

PLoS ONE ◽

10.1371/journal.pone.0195972 ◽

2018 ◽

Vol 13 (4) ◽

pp. e0195972 ◽

Cited By ~ 6

Author(s):

Amanda Raine ◽

Ulrika Liljedahl ◽

Jessica Nordlund

Keyword(s):

Data Quality ◽

Bisulfite Sequencing ◽

Whole Genome ◽

Whole Genome Bisulfite Sequencing ◽

Genome Bisulfite Sequencing

Download Full-text

METHimpute: Imputation-guided construction of complete methylomes from WGBS data

10.1101/190223 ◽

2017 ◽

Author(s):

Aaron Taudt ◽

David Roquis ◽

Amaryllis Vidalis ◽

René Wardenaar ◽

Frank Johannes ◽

...

Keyword(s):

Large Scale ◽

Bisulfite Sequencing ◽

Hidden Markov ◽

Methylation Status ◽

Population Level ◽

Cost Effective ◽

High Accuracy ◽

Whole Genome ◽

Effective Solution ◽

Genome Bisulfite Sequencing

AbstractWhole-genome Bisulfite sequencing (WGBS) has become the standard method for interrogating plant methylomes at base resolution. However, deep WGBS measurements remain cost prohibitive for large, complex genomes and for population-level studies. As a result, most published plant methylomes are sequenced far below saturation, with a large proportion of cytosines having either missing data or insufficient coverage. Here we present METHimpute, a Hidden Markov Model (HMM) based imputation algorithm for the analysis of WGBS data. Unlike existing methods, METHimpute enables the construction of complete methylomes by inferring the methylation status and level of all cytosines in the genome regardless of coverage. Application of METHimpute to maize, rice and Arabidopsis shows that the algorithm infers cytosine-resolution methylomes with high accuracy from data as low as 6X, compared to data with 60X, thus making it a cost-effective solution for large-scale studies. Although METHimpute has been extensively tested in plants, it should be broadly applicable to other species.

Download Full-text

Literature Consistency of Bioinformatics Sequence Databases is Effective for Assessing Record Quality

10.1101/101873 ◽

2017 ◽

Author(s):

Mohamed Reda Bouadjenek ◽

Karin Verspoor ◽

Justin Zobel

Keyword(s):

Data Quality ◽

Quality Indicators ◽

Large Scale ◽

Genomic Data ◽

Principal Component ◽

Mutual Relationship ◽

Query Quality ◽

Automatic Methods ◽

Sequence Databases

AbstractBioinformatics sequence databases such as Genbank or UniProt contain hundreds of millions of records of genomic data. These records are derived from direct submissions from individual laboratories, as well as from bulk submissions from large-scale sequencing centres; their diversity and scale means that they suffer from a range of data quality issues including errors, discrepancies, redundancies, ambiguities, incompleteness, and inconsistencies with the published literature. In this work, we seek to investigate and analyze the data quality of sequence databases from the perspective of a curator, who must detect anomalous and suspicious records.Specifically, we emphasize the detection of inconsistent records with respect to the literature. Focusing on GenBank, we propose a set of 24 quality indicators, which are based on treating a record as a query into the published literature, and then use query quality predictors. We then carry out an analysis that shows that the proposed quality indicators and the quality of the records have a mutual relationship, in which one depends on the other. We propose to represent record-literature consistency as a vector of these quality indicators. By reducing the dimensionality of this representation for visualization purposes using Principal Component Analysis, we show that records which have been reported as inconsistent with the literature fall roughly in the same area, and therefore share similar characteristics. By manually analyzing records not previously known to be erroneous that fall in the same area than records know to be inconsistent, we show that 1 record out of 4 is inconsistent with respect to the literature. This high density of inconsistent record opens the way towards the development of automatic methods for the detection of faulty records. We conclude that literature inconsistency is a meaningful strategy for identifying suspicious records.

Download Full-text