A flexible simulation toolkit for designing and evaluating ChIP-sequencing experiments

AbstractA major challenge in evaluating quantitative ChIP-seq analyses, such as peak calling and differential binding, is a lack of reliable ground truth data. We present Tulip, a toolkit for rapidly simulating ChIP-seq data using statistical models of the experimental steps. Tulip may be used for a range of applications, including power analysis for experimental design, benchmarking of analysis tools, and modeling effects of processes such as replication on ChIP-seq signals.

Download Full-text

A flexible ChIP-sequencing simulation toolkit

BMC Bioinformatics ◽

10.1186/s12859-021-04097-5 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

An Zheng ◽

Michael Lamkin ◽

Yutong Qiu ◽

Kevin Ren ◽

Alon Goren ◽

...

Keyword(s):

Ground Truth ◽

Peak Calling ◽

Simulation Framework ◽

Experimental Conditions ◽

Ground Truth Data ◽

Chip Sequencing ◽

Genome Wide ◽

Experimental Parameters ◽

Differential Binding ◽

The Impact

Abstract Background A major challenge in evaluating quantitative ChIP-seq analyses, such as peak calling and differential binding, is a lack of reliable ground truth data. Accurate simulation of ChIP-seq data can mitigate this challenge, but existing frameworks are either too cumbersome to apply genome-wide or unable to model a number of important experimental conditions in ChIP-seq. Results We present ChIPs, a toolkit for rapidly simulating ChIP-seq data using statistical models of key experimental steps. We demonstrate how ChIPs can be used for a range of applications, including benchmarking analysis tools and evaluating the impact of various experimental parameters. ChIPs is implemented as a standalone command-line program written in C++ and is available from https://github.com/gymreklab/chips. Conclusions ChIPs is an efficient ChIP-seq simulation framework that generates realistic datasets over a flexible range of experimental conditions. It can serve as an important component in various ChIP-seq analyses where ground truth data are needed.

Download Full-text

Integrating hierarchical statistical models and machine-learning algorithms for ground-truthing drone images of the vegetation: taxonomy, abundance and population ecological models

10.1101/491381 ◽

2018 ◽

Cited By ~ 1

Author(s):

Christian Damgaard

Keyword(s):

Machine Learning ◽

Statistical Models ◽

Learning Algorithms ◽

Plant Competition ◽

Image Data ◽

Ground Truth ◽

Ecological Models ◽

Machine Learning Algorithms ◽

Ground Truth Data ◽

Ground Truthing

AbstractIn order to fit population ecological models, e.g. plant competition models, to new drone-aided image data, we need to develop statistical models that may take the new type of measurement uncertainty when applying machine-learning algorithms into account and quantify its importance for statistical inferences and ecological predictions. Here, it is proposed to quantify the uncertainty and bias of image predicted plant taxonomy and abundance in a hierarchical statistical model that is linked to ground-truth data obtained by the pin-point method. It is critical that the error rate in the species identification process is minimized when the image data are fitted to the population ecological models, and several avenues for reaching this objective are discussed. The outlined method to statistically model known sources of uncertainty when applying machine-learning algorithms may be relevant for other applied scientific disciplines.

Download Full-text

Integrating Hierarchical Statistical Models and Machine-Learning Algorithms for Ground-Truthing Drone Images of the Vegetation: Taxonomy, Abundance and Population Ecological Models

Remote Sensing ◽

10.3390/rs13061161 ◽

2021 ◽

Vol 13 (6) ◽

pp. 1161

Author(s):

Christian Damgaard

Keyword(s):

Machine Learning ◽

Statistical Models ◽

Learning Algorithms ◽

Plant Competition ◽

Image Data ◽

Ground Truth ◽

Ecological Models ◽

Machine Learning Algorithms ◽

Ground Truth Data ◽

Ground Truthing

In order to fit population ecological models, e.g., plant competition models, to new drone-aided image data, we need to develop statistical models that may take the new type of measurement uncertainty when applying machine-learning algorithms into account and quantify its importance for statistical inferences and ecological predictions. Here, it is proposed to quantify the uncertainty and bias of image predicted plant taxonomy and abundance in a hierarchical statistical model that is linked to ground-truth data obtained by the pin-point method. It is critical that the error rate in the species identification process is minimized when the image data are fitted to the population ecological models, and several avenues for reaching this objective are discussed. The outlined method to statistically model known sources of uncertainty when applying machine-learning algorithms may be relevant for other applied scientific disciplines.

Download Full-text

Some combinatorial structures in experimental design: overview, statistical models and applications

Biometrics & Biostatistics International Journal ◽

10.15406/bbij.2018.07.00228 ◽

2018 ◽

Vol 7 (4) ◽

Author(s):

Petya Valcheva

Keyword(s):

Experimental Design ◽

Statistical Models ◽

Combinatorial Structures

Download Full-text

Application of experimental design for selection of optimal modes of electro-deformation cladding with a flexible tool

10.36652/0042-4633-2020-5-71-76 ◽

2020 ◽

pp. 71-76

Author(s):

M.A. Levantsevich ◽

E.V. Pilipchuk ◽

N.N Maksimchenko ◽

L.S. Belevskiy ◽

R.R. Dema

Keyword(s):

Composite Material ◽

Experimental Design ◽

Statistical Models ◽

Experiment Planning ◽

Flexible Tool ◽

Tool Coating ◽

Coating Composite ◽

Selection Of

Experimental-statistical models of the process of forming composite chromium coatings by electrodeformation cladding with a flexible tool are developed, which allow to determine the parameters of the regimes for obtaining coatings of the required thickness and roughness. Keywords electrodeformation cladding, flexible tool, coating, composite material, experiment planning, noncompositional plan, thickness, roughness. [email protected]

Download Full-text

Assessing Wildfire Burn Severity and Its Relationship with Environmental Factors: A Case Study in Interior Alaska Boreal Forest

Remote Sensing ◽

10.3390/rs13101966 ◽

2021 ◽

Vol 13 (10) ◽

pp. 1966

Author(s):

Christopher W Smith ◽

Santosh K Panda ◽

Uma S Bhatt ◽

Franz J Meyer ◽

Anushree Badola ◽

...

Keyword(s):

Boreal Forest ◽

Ground Truth ◽

Burn Severity ◽

Classification Methods ◽

Spectral Indices ◽

Ground Truth Data ◽

Burn Scar ◽

Interior Alaska ◽

Remote Sensing Methods ◽

The Relationship

In recent years, there have been rapid improvements in both remote sensing methods and satellite image availability that have the potential to massively improve burn severity assessments of the Alaskan boreal forest. In this study, we utilized recent pre- and post-fire Sentinel-2 satellite imagery of the 2019 Nugget Creek and Shovel Creek burn scars located in Interior Alaska to both assess burn severity across the burn scars and test the effectiveness of several remote sensing methods for generating accurate map products: Normalized Difference Vegetation Index (NDVI), Normalized Burn Ratio (NBR), and Random Forest (RF) and Support Vector Machine (SVM) supervised classification. We used 52 Composite Burn Index (CBI) plots from the Shovel Creek burn scar and 28 from the Nugget Creek burn scar for training classifiers and product validation. For the Shovel Creek burn scar, the RF and SVM machine learning (ML) classification methods outperformed the traditional spectral indices that use linear regression to separate burn severity classes (RF and SVM accuracy, 83.33%, versus NBR accuracy, 73.08%). However, for the Nugget Creek burn scar, the NDVI product (accuracy: 96%) outperformed the other indices and ML classifiers. In this study, we demonstrated that when sufficient ground truth data is available, the ML classifiers can be very effective for reliable mapping of burn severity in the Alaskan boreal forest. Since the performance of ML classifiers are dependent on the quantity of ground truth data, when sufficient ground truth data is available, the ML classification methods would be better at assessing burn severity, whereas with limited ground truth data the traditional spectral indices would be better suited. We also looked at the relationship between burn severity, fuel type, and topography (aspect and slope) and found that the relationship is site-dependent.

Download Full-text

Automatic Evaluation of Wheat Resistance to Fusarium Head Blight Using Dual Mask-RCNN Deep Learning Frameworks in Computer Vision

Remote Sensing ◽

10.3390/rs13010026 ◽

2020 ◽

Vol 13 (1) ◽

pp. 26

Author(s):

Wen-Hao Su ◽

Jiajing Zhang ◽

Ce Yang ◽

Rae Page ◽

Tamas Szinyei ◽

...

Keyword(s):

Fusarium Head Blight ◽

Ground Truth ◽

Wheat Breeding ◽

Head Blight ◽

Detection Rates ◽

Ground Truth Data ◽

Resistant Cultivars ◽

Feature Pyramid ◽

Rater Error ◽

Wheat Lines

In many regions of the world, wheat is vulnerable to severe yield and quality losses from the fungus disease of Fusarium head blight (FHB). The development of resistant cultivars is one means of ameliorating the devastating effects of this disease, but the breeding process requires the evaluation of hundreds of lines each year for reaction to the disease. These field evaluations are laborious, expensive, time-consuming, and are prone to rater error. A phenotyping cart that can quickly capture images of the spikes of wheat lines and their level of FHB infection would greatly benefit wheat breeding programs. In this study, mask region convolutional neural network (Mask-RCNN) allowed for reliable identification of the symptom location and the disease severity of wheat spikes. Within a wheat line planted in the field, color images of individual wheat spikes and their corresponding diseased areas were labeled and segmented into sub-images. Images with annotated spikes and sub-images of individual spikes with labeled diseased areas were used as ground truth data to train Mask-RCNN models for automatic image segmentation of wheat spikes and FHB diseased areas, respectively. The feature pyramid network (FPN) based on ResNet-101 network was used as the backbone of Mask-RCNN for constructing the feature pyramid and extracting features. After generating mask images of wheat spikes from full-size images, Mask-RCNN was performed to predict diseased areas on each individual spike. This protocol enabled the rapid recognition of wheat spikes and diseased areas with the detection rates of 77.76% and 98.81%, respectively. The prediction accuracy of 77.19% was achieved by calculating the ratio of the wheat FHB severity value of prediction over ground truth. This study demonstrates the feasibility of rapidly determining levels of FHB in wheat spikes, which will greatly facilitate the breeding of resistant cultivars.

Download Full-text

Classification of Cattle Behaviours Using Neck-Mounted Accelerometer-Equipped Collars and Convolutional Neural Networks

Sensors ◽

10.3390/s21124050 ◽

2021 ◽

Vol 21 (12) ◽

pp. 4050

Author(s):

Dejan Pavlovic ◽

Christopher Davison ◽

Andrew Hamilton ◽

Oskar Marko ◽

Robert Atkinson ◽

...

Keyword(s):

Neural Network ◽

Model Performance ◽

Ground Truth ◽

Practical Implementation ◽

Ground Truth Data ◽

Battery Lifetime ◽

Implementation Challenges ◽

Memory Footprint ◽

Commercial Farms ◽

Using Data

Monitoring cattle behaviour is core to the early detection of health and welfare issues and to optimise the fertility of large herds. Accelerometer-based sensor systems that provide activity profiles are now used extensively on commercial farms and have evolved to identify behaviours such as the time spent ruminating and eating at an individual animal level. Acquiring this information at scale is central to informing on-farm management decisions. The paper presents the development of a Convolutional Neural Network (CNN) that classifies cattle behavioural states (`rumination’, `eating’ and `other’) using data generated from neck-mounted accelerometer collars. During three farm trials in the United Kingdom (Easter Howgate Farm, Edinburgh, UK), 18 steers were monitored to provide raw acceleration measurements, with ground truth data provided by muzzle-mounted pressure sensor halters. A range of neural network architectures are explored and rigorous hyper-parameter searches are performed to optimise the network. The computational complexity and memory footprint of CNN models are not readily compatible with deployment on low-power processors which are both memory and energy constrained. Thus, progressive reductions of the CNN were executed with minimal loss of performance in order to address the practical implementation challenges, defining the trade-off between model performance versus computation complexity and memory footprint to permit deployment on micro-controller architectures. The proposed methodology achieves a compression of 14.30 compared to the unpruned architecture but is nevertheless able to accurately classify cattle behaviours with an overall F1 score of 0.82 for both FP32 and FP16 precision while achieving a reasonable battery lifetime in excess of 5.7 years.

Download Full-text

A machine learning approach to estimate the strain energy absorption in expanded polystyrene foams

Journal of Cellular Plastics ◽

10.1177/0021955x211021014 ◽

2021 ◽

pp. 0021955X2110210

Author(s):

Alejandro E Rodríguez-Sánchez ◽

Héctor Plascencia-Mora

Keyword(s):

Neural Network ◽

Energy Absorption ◽

Mechanical Energy ◽

Compressive Loading ◽

Ground Truth ◽

Expanded Polystyrene ◽

Polystyrene Foam ◽

Stress Strain ◽

Ground Truth Data ◽

Expanded Polystyrene Foam

Traditional modeling of mechanical energy absorption due to compressive loadings in expanded polystyrene foams involves mathematical descriptions that are derived from stress/strain continuum mechanics models. Nevertheless, most of those models are either constrained using the strain as the only variable to work at large deformation regimes and usually neglect important parameters for energy absorption properties such as the material density or the rate of the applying load. This work presents a neural-network-based approach that produces models that are capable to map the compressive stress response and energy absorption parameters of an expanded polystyrene foam by considering its deformation, compressive loading rates, and different densities. The models are trained with ground-truth data obtained in compressive tests. Two methods to select neural network architectures are also presented, one of which is based on a Design of Experiments strategy. The results show that it is possible to obtain a single artificial neural networks model that can abstract stress and energy absorption solution spaces for the conditions studied in the material. Additionally, such a model is compared with a phenomenological model, and the results show than the neural network model outperforms it in terms of prediction capabilities, since errors around 2% of experimental data were obtained. In this sense, it is demonstrated that by following the presented approach is possible to obtain a model capable to reproduce compressive polystyrene foam stress/strain data, and consequently, to simulate its energy absorption parameters.

Download Full-text

Multi-Temporal Arable Land Monitoring in Arid Region of Northwest China Using a New Extraction Index

Sustainability ◽

10.3390/su13095274 ◽

2021 ◽

Vol 13 (9) ◽

pp. 5274

Author(s):

Xinyang Yu ◽

Younggu Her ◽

Xicun Zhu ◽

Changhe Lu ◽

Xuefei Li

Keyword(s):

Arable Land ◽

Ground Truth ◽

Northwest China ◽

Hexi Corridor ◽

Ground Truth Data ◽

Land Protection ◽

Promising Tool ◽

Study Results ◽

Multi Temporal ◽

The Mean

Development of a high-accuracy method to extract arable land using effective data sources is crucial to detect and monitor arable land dynamics, servicing land protection and sustainable development. In this study, a new arable land extraction index (ALEI) based on spectral analysis was proposed, examined by ground truth data, and then applied to the Hexi Corridor in northwest China. The arable land and its change patterns during 1990–2020 were extracted and identified using 40 Landsat TM/OLI images acquired in 1990, 2000, 2010, and 2020. The results demonstrated that the proposed method can distinguish arable land areas accurately, with the User’s (Producer’s) accuracy and overall accuracy (kappa coefficient) exceeding 0.90 (0.88) and 0.89 (0.87), respectively. The mean relative error calculated using field survey data obtained in 2012 and 2020 was 0.169 and 0.191, respectively, indicating the feasibility of the ALEI method in arable land extracting. The study found that arable land area in the Hexi Corridor was 13217.58 km2 in 2020, significantly increased by 25.33% compared to that in 1990. At 10-year intervals, the arable land experienced different change patterns. The study results indicate that ALEI index is a promising tool used to effectively extract arable land in the arid area.

Download Full-text