custom script
Recently Published Documents


TOTAL DOCUMENTS

14
(FIVE YEARS 10)

H-INDEX

1
(FIVE YEARS 1)

2021 ◽  
Author(s):  
Michal Franek ◽  
Agata Kilar ◽  
Petr Fojtík ◽  
Marie Olšinová ◽  
Aleš Benda ◽  
...  

Analysis of histone variants and epigenetic marks is dominated by genome-wide approaches in the form of chromatin immunoprecipitation-sequencing (ChIP-seq) and related methods. While uncontested in their value for single-copy genes, mapping the chromatin of DNA repeats is problematic for biochemical techniques based on averaging cell populations or high number of repeats in a single cell analysis. Extending chromatin and DNA fibers allows us to study the epigenetics of individual repeats in their specific chromosomal context and thus constitutes an important tool for a wholesome understanding of the epigenetic organization of genomes. We present that using an optimized fiber extension protocol is essential to obtain more reproducible data, where the clustering of fibers is minimized. We also demonstrate that applying super-resolution microscopy is important to reliably evaluate the distribution of histone modifications on individual fibers. Furthermore, we introduce a custom script to analyse methylation levels on DNA fibers and apply it to map the methylation of telomeres, ribosomal genes and centromeres.


2021 ◽  
Vol 9 ◽  
Author(s):  
Emily D. Fountain ◽  
Li-Chen Zhou ◽  
Alyssa Karklus ◽  
Qun-Xiu Liu ◽  
James Meyers ◽  
...  

Microarrays can be a cost-effective alternative to high-throughput sequencing for discovering novel single-nucleotide polymorphisms (SNPs). Illumina’s iScan platform dominates the market, but their commercial microarray products are designed for model organisms. Further, the platform outputs data in a proprietary format. This cannot be easily converted to human-readable genotypes or be merged with pre-existing data. To address this, we present and validate a novel pipeline to facilitate data analysis from cross-species application of Illumina microarrays. This facilitates the generation of a compatible VCF from iScan data and the merging of this with a second VCF comprising genotypes derived from other samples and sources. Our pipeline includes a custom script, iScanVCFMerge (presented as a Python package), which we validate using iScan data from three great ape genera. We conclude that cross-species application of microarrays can be a rapid, cost-effective approach for SNP discovery in non-model organisms. Our pipeline surmounts the common challenges of integrating iScan genotypes with pre-existing data.


GigaScience ◽  
2021 ◽  
Vol 10 (3) ◽  
Author(s):  
Holly C Beale ◽  
Jacquelyn M Roger ◽  
Matthew A Cattle ◽  
Liam T McKay ◽  
Drew K A Thompson ◽  
...  

Abstract Background The reproducibility of gene expression measured by RNA sequencing (RNA-Seq) is dependent on the sequencing depth. While unmapped or non-exonic reads do not contribute to gene expression quantification, duplicate reads contribute to the quantification but are not informative for reproducibility. We show that mapped, exonic, non-duplicate (MEND) reads are a useful measure of reproducibility of RNA-Seq datasets used for gene expression analysis. Findings In bulk RNA-Seq datasets from 2,179 tumors in 48 cohorts, the fraction of reads that contribute to the reproducibility of gene expression analysis varies greatly. Unmapped reads constitute 1–77% of all reads (median [IQR], 3% [3–6%]); duplicate reads constitute 3–100% of mapped reads (median [IQR], 27% [13–43%]); and non-exonic reads constitute 4–97% of mapped, non-duplicate reads (median [IQR], 25% [16–37%]). MEND reads constitute 0–79% of total reads (median [IQR], 50% [30–61%]). Conclusions Because not all reads in an RNA-Seq dataset are informative for reproducibility of gene expression measurements and the fraction of reads that are informative varies, we propose reporting a dataset's sequencing depth in MEND reads, which definitively inform the reproducibility of gene expression, rather than total, mapped, or exonic reads. We provide a Docker image containing (i) the existing required tools (RSeQC, sambamba, and samblaster) and (ii) a custom script to calculate MEND reads from RNA-Seq data files. We recommend that all RNA-Seq gene expression experiments, sensitivity studies, and depth recommendations use MEND units for sequencing depth.


PLoS ONE ◽  
2021 ◽  
Vol 16 (1) ◽  
pp. e0245172
Author(s):  
Meghan Maguire ◽  
Julie A. Kase ◽  
Dwayne Roberson ◽  
Tim Muruvanda ◽  
Eric W. Brown ◽  
...  

Shiga toxin-producing Escherichia coli (STEC) contamination of agricultural water might be an important factor to recent foodborne illness and outbreaks involving leafy greens. Closed bacterial genomes from whole genome sequencing play an important role in source tracking. We aimed to determine the limits of detection and classification of STECs by qPCR and nanopore sequencing using 24 hour enriched irrigation water artificially contaminated with E. coli O157:H7 (EDL933). We determined the limit of STEC detection by qPCR to be 30 CFU/reaction, which is equivalent to 105 CFU/ml in the enrichment. By using Oxford Nanopore’s EPI2ME WIMP workflow and de novo assembly with Flye followed by taxon classification with a k-mer analysis software (Kraken2), E. coli O157:H7 could be detected at 103 CFU/ml (68 reads) and a complete fragmented E. coli O157:H7 metagenome-assembled genome (MAG) was obtained at 105−108 CFU/ml. Using a custom script to extract the E. coli reads, a completely closed MAG was obtained at 107−108 CFU/ml and a complete, fragmented MAG was obtained at 105−106 CFU/ml. In silico virulence detection for E. coli MAGs for 105−108 CFU/ml showed that the virulotype was indistinguishable from the spiked E. coli O157:H7 strain. We further identified the bacterial species in the un-spiked enrichment, including antimicrobial resistance genes, which could have important implications to food safety. We propose this workflow provides proof of concept for faster detection and complete genomic characterization of STECs from a complex microbial sample compared to current reporting protocols and could be applied to determine the limit of detection and assembly of other foodborne bacterial pathogens.


2020 ◽  
Vol 145 (6) ◽  
pp. 363-373
Author(s):  
Anna Underhill ◽  
Cory Hirsch ◽  
Matthew Clark

Grape (Vitis vinifera) cluster compactness is an important trait due to its effect on disease susceptibility, but visual evaluation of compactness relies on human judgement and an ordinal scale that is not appropriate for all populations. We developed an image analysis pipeline and used it to quantify cluster compactness traits in a segregating hybrid wine grape (Vitis sp.) population for 2 years. Images were collected from grape clusters immediately after harvest, segmented by color, and analyzed using a custom script. Both automated and conventional phenotyping methods were used, and comparisons were made between each method. A partial least squares (PLS) model was constructed to evaluate the prediction of physical cluster compactness using image-derived measurements. Quantitative trait loci (QTL) on chromosomes 4, 9, 12, 16, and 17 were associated with both image-derived and conventionally phenotyped traits within years, which demonstrated the ability of image-derived traits to identify loci related to cluster morphology and cluster compactness. QTL for 20-berry weight were observed between years on chromosomes 11 and 17. Additionally, the automated method of cluster length measurement was highly accurate, with a deviation of less than 10 mm (r = 0.95) compared with measurements obtained with a hand caliper. A remaining challenge is the utilization of color-based image segmentation in a population that segregates for fruit color, which leads to difficulty in differentiating the stem from the fruit when the two are similarly colored in non-noir fruit. Overall, this research demonstrates the validity of image-based phenotyping for quantifying cluster compactness and for identifying QTL for the advancement of grape breeding efforts.


Genes ◽  
2020 ◽  
Vol 11 (10) ◽  
pp. 1158
Author(s):  
Jair Antonio Tenorio Castaño ◽  
Ignacio Hernández-Gonzalez ◽  
Natalia Gallego ◽  
Carmen Pérez-Olivares ◽  
Nuria Ochoa Parra ◽  
...  

Pulmonary arterial hypertension is a very infrequent disease, with a variable etiology and clinical expressivity, making sometimes the clinical diagnosis a challenge. Current classification based on clinical features does not reflect the underlying molecular profiling of these groups. The advance in massive parallel sequencing in PAH has allowed for the describing of several new causative and susceptibility genes related to PAH, improving overall patient diagnosis. In order to address the molecular diagnosis of patients with PAH we designed, validated, and routinely applied a custom panel including 21 genes. Three hundred patients from the National Spanish PAH Registry (REHAP) were included in the analysis. A custom script was developed to annotate and filter the variants. Variant classification was performed according to the ACMG guidelines. Pathogenic and likely pathogenic variants have been found in 15% of the patients with 12% of variants of unknown significance (VUS). We have found variants in patients with connective tissue disease (CTD) and congenital heart disease (CHD). In addition, in a small proportion of patients (1.75%), we observed a possible digenic mode of inheritance. These results stand out the importance of the genetic testing of patients with associated forms of PAH (i.e., CHD and CTD) additionally to the classical IPAH and HPAH forms. Molecular confirmation of the clinical presumptive diagnosis is required in cases with a high clinical overlapping to carry out proper management and follow up of the individuals with the disease.


2020 ◽  
Author(s):  
András Zlinszky ◽  
Gergely Padányi-Gulyás

<p>Sampling-based water quality monitoring networks are inherently spatially sparse. In locations or times where no in-situ water quality data are available, satellite imagery is an essential source of information. Satellite remote sensing can provide high spatial or temporal resolution imagery and has provided a breakthrough for oceanography, but so far, applications for coastal and inland water were limited by data resolution. Recently established satellite systems provide significant advances: Sentinel-2 delivers imagery with 20 m resolution, suitable for viewing even small rivers and ponds. Sentinel-3 delivers daily imagery with 300 m pixel size, which for lakes and coastal seas allows tracking water quality processes at the speed they happen. Information on suspended sediment and chlorophyll concentrations in water can be derived from optical images using simple calculations. The accuracy of these operations will vary across locations and can only be assessed through calibration and validation with in situ data. In absence of such data for all lakes globally, UWQV is based on a small set of algorithms that have been verified on several optically complex water systems to have a close to linear correlation with chlorophyll or suspended sediment concentration. Suspended sediment visualization is based on radiances observed in the 620 or 700 nm spectral bands, while chlorophyll visualization uses fluorescence-based indicators: Fluorescence Line Height, Reflectance Line Height and Maximum Chlorophyll Index. Since remote sensing based chlorophyll retrieval in sediment-laden waters with low transparency is hardly possible, for such cases chlorophyll concentrations are not visualized. The viewer runs as a Custom Script in the Sentinel-Hub EO Browser, which is a global, near real-time satellite data viewing and algorithm testing framework. The Javascript code is open source and enables users to easily tune visualization parameters and select different algorithms for cloud and water masking and chlorophyll and suspended sediment visualization.<br>Wherever in-situ water quality measurements are available, UWQV contributes significant added value by complementing water sample or instrument-based data, providing a map view or even a timelapse of maps; by providing an early warning system for water quality deterioration; by supporting optimization of sampling times and locations based on spatially and temporally explicit information, and  enabling cross-validating water quality information from different sources to reduce uncertainty or identify implausible measurements. Additionally, data-driven spatially explicit models can be verified and tuned based on similarity of their output to situations observed on satellite imagery.<br>UWQV is has all the advantages and drawbacks of a global solution: it will never be more accurate than a locally tuned water quality remote sensing algorithm; however, we hope that it will encourage water quality authorities and stakeholders to initiate the development of locally optimized satellite-based monitoring. By providing easy to read visualizations in a framework accessible to the general public, UWQV can democratize water quality information and raise public awareness of water quality processes and problems.</p><p>The first version of the algorithm is available in the Sentinel-Hub Custom Script Repository under the following link: https://github.com/sentinel-hub/custom-scripts/tree/master/sentinel-2/ulyssys_water_quality_viewer</p><p>An interactive test example of the visualization can be accessed here: tinyurl.com/UWQV-example</p>


Author(s):  
Y. X. Lin ◽  
S. T. Wang

Abstract. The display and recognition of geographical features based on KML and Google Earth/Google Maps provide possibilities for the visualization and analysis of earthquake disasters. In this paper, compile and generate the corresponding point and face KML files of the disaster information of Guangxi Zhuang Autonomous Region in January through the Visual Basic custom script. The KML files realize information display, query and analysis in Google Earth,and combine with Arcgis's topography, elevation and image information of feature points to carry out multi-information source analysis and realize the interpretation and analysis of the current situation of earthquakes, which can provide some reference information for the monitoring and evaluation of earthquakes.


Author(s):  
András Zlinszky ◽  
Gergely Padányi-Gulyás

Easy to use satellite-based water quality visualizations are needed for monitoring and understanding coastal and inland waters, but to date, no publicly accessible real-time global visualization system was in place. Here we introduce the Ulyssys Water Quality Viewer (UWQV), a Sentinel Hub EO Browser Custom script designed for qualitative views of aquatic chlorophyll and suspended sediment concentrations. The viewer avoids unmixing of the chlorophyll and suspended sediment spectral signal by visualizing these parameters together, with high concentrations of suspended sediment obscuring chlorophyll if present. Cloud masking uses the Hollstein and Braaten algorithms (existing EO Browser custom script code), additionally water surfaces are masked using the Normalized Differential Water Index. Chlorophyll is estimated using reflectance line height-based indicators such as fluorescence line height and maximum chlorophyll index. Suspended sediment is visualized based on single-band reflectances at 620 or 700 nm. Data sources are Sentinel-2 and Sentinel-3 images, allowing either 20 m spatial resolution or up to daily imaging. This visualization system is easy to operate and interpret, and combined with the data service capacity of the Sentinel Hub, it is expected that UWQV will contribute to monitoring of remote water bodies and to our overall understanding of physical limnology and aquatic ecology.


2019 ◽  
Author(s):  
Holly C. Beale ◽  
Jacquelyn M. Roger ◽  
Matthew A. Cattle ◽  
Liam T. McKay ◽  
Drew K. A. Thomson ◽  
...  

AbstractBackgroundThe accuracy of gene expression as measured by RNA sequencing (RNA-Seq) is dependent on the amount of sequencing performed. However, some types of reads are not informative for determining this accuracy. Unmapped and non-exonic reads do not contribute to gene expression quantification. Duplicate reads can be the product of high gene expression or technical errors.FindingsWe surveyed bulk RNA-Seq datasets from 2179 tumors in 48 cohorts to determine the fractions of uninformative reads. Total sequence depth was 0.2-668 million reads (median (med.) 61 million; interquartile range (IQR) 53 million). Unmapped reads constitute 1-77% of all reads (med. 3%; IQR 3%); duplicate reads constitute 3-100% of mapped reads (med. 27%; IQR 30%); and non-exonic reads constitute 4-97% of mapped, non-duplicate reads (med. 25%; IQR 21%). Informative reads--Mapped, Exonic, Non-duplicate (MEND) reads--constitute 0-79% of total reads (med. 50%; IQR 31%). Further, we find that MEND read counts have a 0.22 Pearson correlation to the number of genes expressed above 1 Transcript Per Million, while total reads have a correlation of −0.05.ConclusionsSince the fraction of uninformative reads vary, we propose using only definitively informative reads, MEND reads, for the purposes of asserting the accuracy of gene expression measured in a bulk RNA-Seq experiment. We provide a Docker image containing 1) the existing required tools (RSeQC, sambamba and samblaster) and 2) a custom script. We recommend that all results, sensitivity studies and depth recommendations use MEND units.


Sign in / Sign up

Export Citation Format

Share Document