scholarly journals Data quality of Whole Genome Bisulfite Sequencing on Illumina platforms

2017 ◽  
Author(s):  
Amanda Raine ◽  
Ulrika Liljedahl ◽  
Jessica Nordlund

AbstractThe powerful HiSeq X sequencers with their patterned flowcell technology and fast turnaround times are instrumental for many large-scale genomic and epigenomic studies. However, assessment of DNA methylation by sodium bisulfite treatment results in sequencing libraries of low diversity, which may impact data quality and yield. In this report we assess the quality of WGBS data generated on the HiSeq X system in comparison with data generated on the HiSeq 2500 system and the newly released NovaSeq system. We report a systematic issue with low basecall quality scores assigned to guanines in the second read of WGBS when using certain Real Time Analysis (RTA) software versions on the HiSeq X sequencer, reminiscent of an issue that was previously reported with certain HiSeq 2500 software versions. However, with the HD.3.4.0/RTA 2.7.7 software upgrade for the HiSeq X system, we observed an overall improved quality and yield of the WGBS data generated, which in turn empowers cost-effective and high quality DNA methylation studies.

2021 ◽  
Vol 23 (06) ◽  
pp. 1011-1018
Author(s):  
Aishrith P Rao ◽  
◽  
Raghavendra J C ◽  
Dr. Sowmyarani C N ◽  
Dr. Padmashree T ◽  
...  

With the advancement of technology and the large volume of data produced, processed, and stored, it is becoming increasingly important to maintain the quality of data in a cost-effective and productive manner. The most important aspects of Big Data (BD) are storage, processing, privacy, and analytics. The Big Data group has identified quality as a critical aspect of its maturity. Nonetheless, it is a critical approach that should be adopted early in the lifecycle and gradually extended to other primary processes. Companies are very reliant and drive profits from the huge amounts of data they collect. When its consistency deteriorates, the ramifications are uncertain and may result in completely undesirable conclusions. In the sense of BD, determining data quality is difficult, but it is essential that we uphold the data quality before we can proceed with any analytics. We investigate data quality during the stages of data gathering, preprocessing, data repository, and evaluation/analysis of BD processing in this paper. The related solutions are also suggested based on the elaboration and review of the proposed problems.


2020 ◽  
Vol 15 (5) ◽  
pp. 580-585
Author(s):  
Dae-Young Um ◽  
R. Nandi ◽  
Jeong-Hun Yang ◽  
Jin-Soo Kim ◽  
Jong-Woong Kim ◽  
...  

Recently, molybdenum diselenide (MoSe2) has attracted nascent research attention for potential applications in electronic and optoelectronic devices due to its unique properties including tunable bandgap, strong photoluminescence and large exciton binding energy. However, the synthesis of reproducible, controlled and large scale MoSe2 films is still a great challenge. Here, we have investigated the morphology, structure and crystalline quality of MoSe2 films synthesized by the selenization of Mo metal films. The Mo metal films of different thicknesses were deposited at room temperature by direct current sputtering. Subsequently, MoSe2 films were prepared by selenization of sputtered Mo films at 550 °C for 20 minutes. The obtained MoSe2 films are polycrystalline with hexagonal crystal structure. The crystalline quality of the MoSe2 films is improved with increase in the thickness of Mo metal films. The MoSe2 films are found to be n-type in nature and reasonably stoichiometric (Mo/Se ratio ∼1:1.9). This study provides an experimental demonstration of an alternative cost-effective direct synthesis of MoSe2 films on SiO2/Si for the applications of semiconductor devices.


2001 ◽  
Vol 29 (3) ◽  
pp. 311-332 ◽  
Author(s):  
Norma Morrison

In the present climate of limited resources and long waiting lists, it is not surprising that there is more emphasis on making sure that psychological treatments are not only clinically sound but also cost-effective. One solution to this is to provide time-limited, focused interventions such as cognitive therapy. Another obvious solution is to deliver treatment in groups rather than individually. However, what evidence is there that therapy can be delivered as effectively in groups as individually? This review will look at which different formats have been tried, what the advantages and disadvantages of those formats might be, which client groups have been targeted for cognitive- behavioural group therapy (CBGT), and whether a group format in general offers any advantages over individual CBT. Outcome studies and their implications for the use of CBGT are considered. Results suggest that, in most client groups, there is little difference in efficacy between group and individual CBT, although there is some evidence that results for some types of patient can be disappointing in CBGT. It may be that the best compromise in terms of cost- effectiveness between quality of therapy and quantity of patients treated is offered by large-scale psychoeducational didactic group therapy.


2017 ◽  
Author(s):  
Giancarlo Bonora ◽  
Liudmilla Rubbi ◽  
Marco Morselli ◽  
Constantinos Chronis ◽  
Kathrin Plath ◽  
...  

ABSTRACTWhole-genome bisulfite sequencing (WGBS) and reduced representation bisulfite sequencing (RRBS) are widely used for measuring DNA methylation levels on a genome-wide scale(1). Both methods have limitations: WGBS is expensive and prohibitive for most large-scale projects; RRBS only interrogates 6-12% of the CpGs in the human genome(16,19). Here, we introduce methylation-sensitive restriction enzyme bisulfite sequencing (MREBS) which has the reduced sequencing requirements of RRBS, but significantly expands the coverage of CpG sites in the genome. We built a multiple regression model that combines the two features of MREBS: the bisulfite conversion ratios of single cytosines (as in WGBS and RRBS) as well as the number of reads that cover each locus (as in MRE-seq(12)). This combined approach allowed us to estimate differential methylation across 60% of the genome using read count data alone, and where counts were sufficiently high in both samples (about 1.5% of the genome), our estimates were significantly improved by the single CpG conversion information. We show that differential DNA methylation values based on MREBS data correlate well with those based on WGBS and RRBS. This newly developed technique combines the sequencing cost of RRBS and DNA methylation estimates on a portion of the genome similar to WGBS, making it ideal for large-scale projects of mammalian genomes.


Author(s):  
Romualdas Vaisvila ◽  
V. K. Chaithanya Ponnaluri ◽  
Zhiyi Sun ◽  
Bradley W. Langhorst ◽  
Lana Saleh ◽  
...  

AbstractBisulfite sequencing is widely used to detect 5mC and 5hmC at single base resolution. However, bisulfite treatment damages DNA resulting in fragmentation, loss of DNA and biased sequencing data. To overcome this, we developed Enzymatic Methyl-seq (EM-seq), an enzymatic based approach that uses as little as 100 pg of DNA. EM-seq outperformed bisulfite converted libraries in all metrics examined including coverage, duplication, sensitivity and nucleotide composition. EM-seq libraries displayed even GC distribution, improved correlation across input amounts as well as increased representation of genomic features. These data indicate that EM-seq is more accurate and reliable than whole genome bisulfite sequencing (WGBS).


2013 ◽  
Vol 10 (11) ◽  
pp. 14535-14555
Author(s):  
L. Chen ◽  
Y. Zhong ◽  
G. Wei ◽  
Z. Shen

Abstract. The identification of priority management areas (PMAs) is essential for the control of non-point source (NPS) pollution, especially for a large-scale watershed. However, previous studies have typically focused on small-scale catchments adjacent to specific assessment points; thus, the interactions between multiple river points remain poorly understood. In this study, a multiple-assessment-point PMA (MAP-PMA) framework was proposed by integrating the upstream sources and the downstream transport aspects of NPS pollution. Based on the results, the integration of the upstream input changes was vital for the final PMAs map, especially for downstream areas. Contrary to conventional wisdom, this research recommended that the NPS pollutants could be best controlled among the upstream high-level PMAs when protecting the water quality of the entire watershed. The MAP-PMA framework provided a more cost-effective tool for the establishment of conservation practices, especially for a large-scale watershed.


2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Ehsan Khodadadi ◽  
Leila Fahmideh ◽  
Ehsaneh Khodadadi ◽  
Sounkalo Dao ◽  
Mehdi Yousefi ◽  
...  

DNA methylation is one of the epigenetic changes, which plays a major role in regulating gene expression and, thus, many biological processes and diseases. There are several methods for determining the methylation of DNA samples. However, selecting the most appropriate method for answering biological questions appears to be a challenging task. The primary methods in DNA methylation focused on identifying the state of methylation of the examined genes and determining the total amount of 5-methyl cytosine. The study of DNA methylation at a large scale of genomic levels became possible following the use of microarray hybridization technology. The new generation of sequencing platforms now allows the preparation of genomic maps of DNA methylation at the single-open level. This review includes the majority of methods available to date, introducing the most widely used methods, the bisulfite treatment, biological identification, and chemical cutting along with their advantages and disadvantages. The techniques are then scrutinized according to their robustness, high throughput capabilities, and cost.


PLoS ONE ◽  
2018 ◽  
Vol 13 (4) ◽  
pp. e0195972 ◽  
Author(s):  
Amanda Raine ◽  
Ulrika Liljedahl ◽  
Jessica Nordlund

2017 ◽  
Author(s):  
Aaron Taudt ◽  
David Roquis ◽  
Amaryllis Vidalis ◽  
René Wardenaar ◽  
Frank Johannes ◽  
...  

AbstractWhole-genome Bisulfite sequencing (WGBS) has become the standard method for interrogating plant methylomes at base resolution. However, deep WGBS measurements remain cost prohibitive for large, complex genomes and for population-level studies. As a result, most published plant methylomes are sequenced far below saturation, with a large proportion of cytosines having either missing data or insufficient coverage. Here we present METHimpute, a Hidden Markov Model (HMM) based imputation algorithm for the analysis of WGBS data. Unlike existing methods, METHimpute enables the construction of complete methylomes by inferring the methylation status and level of all cytosines in the genome regardless of coverage. Application of METHimpute to maize, rice and Arabidopsis shows that the algorithm infers cytosine-resolution methylomes with high accuracy from data as low as 6X, compared to data with 60X, thus making it a cost-effective solution for large-scale studies. Although METHimpute has been extensively tested in plants, it should be broadly applicable to other species.


2017 ◽  
Author(s):  
Mohamed Reda Bouadjenek ◽  
Karin Verspoor ◽  
Justin Zobel

AbstractBioinformatics sequence databases such as Genbank or UniProt contain hundreds of millions of records of genomic data. These records are derived from direct submissions from individual laboratories, as well as from bulk submissions from large-scale sequencing centres; their diversity and scale means that they suffer from a range of data quality issues including errors, discrepancies, redundancies, ambiguities, incompleteness, and inconsistencies with the published literature. In this work, we seek to investigate and analyze the data quality of sequence databases from the perspective of a curator, who must detect anomalous and suspicious records.Specifically, we emphasize the detection of inconsistent records with respect to the literature. Focusing on GenBank, we propose a set of 24 quality indicators, which are based on treating a record as a query into the published literature, and then use query quality predictors. We then carry out an analysis that shows that the proposed quality indicators and the quality of the records have a mutual relationship, in which one depends on the other. We propose to represent record-literature consistency as a vector of these quality indicators. By reducing the dimensionality of this representation for visualization purposes using Principal Component Analysis, we show that records which have been reported as inconsistent with the literature fall roughly in the same area, and therefore share similar characteristics. By manually analyzing records not previously known to be erroneous that fall in the same area than records know to be inconsistent, we show that 1 record out of 4 is inconsistent with respect to the literature. This high density of inconsistent record opens the way towards the development of automatic methods for the detection of faulty records. We conclude that literature inconsistency is a meaningful strategy for identifying suspicious records.


Sign in / Sign up

Export Citation Format

Share Document