scholarly journals Efficient ancestry and mutation simulation with msprime 1.0

Genetics ◽  
2021 ◽  
Author(s):  
Franz Baumdicker ◽  
Gertjan Bisschop ◽  
Daniel Goldstein ◽  
Graham Gower ◽  
Aaron P Ragsdale ◽  
...  

Abstract Stochastic simulation is a key tool in population genetics, since the models involved are often analytically intractable and simulation is usually the only way of obtaining ground-truth data to evaluate inferences. Because of this, a large number of specialized simulation programs have been developed, each filling a particular niche, but with largely overlapping functionality and a substantial duplication of effort. Here, we introduce msprime version 1.0, which efficiently implements ancestry and mutation simulations based on the succinct tree sequence data structure and the tskit library. We summarize msprime’s many features, and show that its performance is excellent, often many times faster and more memory efficient than specialized alternatives. These high-performance features have been thoroughly tested and validated, and built using a collaborative, open source development model, which reduces duplication of effort and promotes software quality via community engagement.

2021 ◽  
Author(s):  
Franz Baumdicker ◽  
Gertjan Bisschop ◽  
Daniel Goldstein ◽  
Graham Gower ◽  
Aaron P Ragsdale ◽  
...  

Stochastic simulation is a key tool in population genetics, since the models involved are often analytically intractable and simulation is usually the only way of obtaining ground-truth data to evaluate inferences. Because of this necessity, a large number of specialised simulation programs have been developed, each filling a particular niche, but with largely overlapping functionality and a substantial duplication of effort. Here, we introduce msprime version 1.0, which efficiently implements ancestry and mutation simulations based on the succinct tree sequence data structure and tskit library. We summarise msprime's many features, and show that its performance is excellent, often many times faster and more memory efficient than specialised alternatives. These high-performance features have been thoroughly tested and validated, and built using a collaborative, open source development model, which reduces duplication of effort and promotes software quality via community engagement.


Geosciences ◽  
2018 ◽  
Vol 8 (12) ◽  
pp. 446 ◽  
Author(s):  
Evangelos Alevizos ◽  
Jens Greinert

This study presents a novel approach, based on high-dimensionality hydro-acoustic data, for improving the performance of angular response analysis (ARA) on multibeam backscatter data in terms of acoustic class separation and spatial resolution. This approach is based on the hyper-angular cube (HAC) data structure which offers the possibility to extract one angular response from each cell of the cube. The HAC consists of a finite number of backscatter layers, each representing backscatter values corresponding to single-incidence angle ensonifications. The construction of the HAC layers can be achieved either by interpolating dense soundings from highly overlapping multibeam echo-sounder (MBES) surveys (interpolated HAC, iHAC) or by producing several backscatter mosaics, each being normalized at a different incidence angle (synthetic HAC, sHAC). The latter approach can be applied to multibeam data with standard overlap, thus minimizing the cost for data acquisition. The sHAC is as efficient as the iHAC produced by actual soundings, providing distinct angular responses for each seafloor type. The HAC data structure increases acoustic class separability between different acoustic features. Moreover, the results of angular response analysis are applied on a fine spatial scale (cell dimensions) offering more detailed acoustic maps of the seafloor. Considering that angular information is expressed through high-dimensional backscatter layers, we further applied three machine learning algorithms (random forest, support vector machine, and artificial neural network) and one pattern recognition method (sum of absolute differences) for supervised classification of the HAC, using a limited amount of ground truth data (one sample per seafloor type). Results from supervised classification were compared with results from an unsupervised method for inter-comparison of the supervised algorithms. It was found that all algorithms (regarding both the iHAC and the sHAC) produced very similar results with good agreement (>0.5 kappa) with the unsupervised classification. Only the artificial neural network required the total amount of ground truth data for producing comparable results with the remaining algorithms.


2019 ◽  
Author(s):  
Pakhrur Razi

Located on the mountainous area, Kelok Sembilan flyover area in West Sumatra, Indonesia has a long history of land deformation, therefore monitoring and analyzing as continuously is a necessity to minimize the impact. Notably, in the rainy season, the land deformation occurs along this area. The zone is crucial as the center of transportation connection in the middle of Sumatra. Quasi-Persistent Scatterer (Q-PS) Interferometry technique was applied for extracting information of land deformation on the field from time to time. Not only does the method have high performance for detecting land deformation but also improve the number of PS point, especially in a non-urban area. This research supported by 90 scenes of Sentinel-1A (C-band) taken from October 2014 to November 2017 for ascending and descending orbit with VV and VH polarization in 5 × 20 m (range × azimuth) resolution. Both satellite orbits detected two critical locations of land deformation namely as zone A and Zone B, which located in positive steep slope where there is more than 500 mm movement in the Line of Sight (LOS) during acquisition time. Deformations in the vertical and horizontal direction for both zone, are 778.9 mm, 795.7 mm and 730.5 mm, 751.7 mm, respectively. Finally, the results were confirmed by ground truth data using Unmanned Aerial Vehicle (UAV) observation.


Sensor Review ◽  
2019 ◽  
Vol 39 (2) ◽  
pp. 288-306 ◽  
Author(s):  
Guan Yuan ◽  
Zhaohui Wang ◽  
Fanrong Meng ◽  
Qiuyan Yan ◽  
Shixiong Xia

Purpose Currently, ubiquitous smartphones embedded with various sensors provide a convenient way to collect raw sequence data. These data bridges the gap between human activity and multiple sensors. Human activity recognition has been widely used in quite a lot of aspects in our daily life, such as medical security, personal safety, living assistance and so on. Design/methodology/approach To provide an overview, the authors survey and summarize some important technologies and involved key issues of human activity recognition, including activity categorization, feature engineering as well as typical algorithms presented in recent years. In this paper, the authors first introduce the character of embedded sensors and dsiscuss their features, as well as survey some data labeling strategies to get ground truth label. Then, following the process of human activity recognition, the authors discuss the methods and techniques of raw data preprocessing and feature extraction, and summarize some popular algorithms used in model training and activity recognizing. Third, they introduce some interesting application scenarios of human activity recognition and provide some available data sets as ground truth data to validate proposed algorithms. Findings The authors summarize their viewpoints on human activity recognition, discuss the main challenges and point out some potential research directions. Originality/value It is hoped that this work will serve as the steppingstone for those interested in advancing human activity recognition.


2020 ◽  
Vol 14 (1) ◽  
pp. 99-108
Author(s):  
Jinhwan Jang

Background: As wireless communication technologies evolve, probe-based travel-time collection systems are becoming popular around the globe. However, two problems generally arise in probe-based systems: one is the outlier and the other is time lag. To resolve the problems, methods for outlier removal and travel-time prediction need to be applied. Methods: In this study, data processing methods for addressing the two issues are proposed. After investigating the characteristic of the travel times on the test section, the modified z-score was suggested for censoring outliers contained in probe travel times. To mitigate the time-lag phenomenon, a recurrent neural network, a class of deep learning where temporal sequence data are normally treated, was applied to predict travel times. Results: As a result of evaluation with ground-truth data obtained through test-car runs, the proposed methods showed enhanced performances with prediction errors lower than 13% on average compared to current practices. Conclusion: The suggested methods can make drivers to better arrange their trip schedules with real-time travel-time information with improved accuracy.


2017 ◽  
Author(s):  
Anthony Bolger ◽  
Alisandra Denton ◽  
Marie Bolger ◽  
Björn Usadel

AbstractRecent massive growth in the production of sequencing data necessitates matching improve-ments in bioinformatics tools to effectively utilize it. Existing tools suffer from limitations in both scalability and applicability which are inherent to their underlying algorithms and data structures. We identify the key requirements for the ideal data structure for sequence analy-ses: it should be informationally lossless, locally updatable, and memory efficient; requirements which are not met by data structures underlying the major assembly strategies Overlap Layout Consensus and De Bruijn Graphs. We therefore propose a new data structure, the LOGAN graph, which is based on a memory efficient Sparse De Bruijn Graph with routing information. Innovations in storing routing information and careful implementation allow sequence datasets for Escherichia coli (4.6Mbp, 117x coverage), Arabidopsis thaliana (135Mbp, 17.5x coverage) and Solanum pennellii (1.2Gbp, 47x coverage) to be loaded into memory on a desktop computer in seconds, minutes, and hours respectively. Memory consumption is competitive with state of the art alternatives, while losslessly representing the reads in an indexed and updatable form. Both Second and Third Generation Sequencing reads are supported. Thus, the LOGAN graph is positioned to be the backbone for major breakthroughs in sequence analysis such as integrated hybrid assembly, assembly of exceptionally large and repetitive genomes, as well as assembly and representation of pan-genomes.


Electronics ◽  
2021 ◽  
Vol 10 (18) ◽  
pp. 2296
Author(s):  
Hyun-Tae Choi ◽  
Byung-Woo Hong

The development of convolutional neural networks for deep learning has significantly contributed to image classification and segmentation areas. For high performance in supervised image segmentation, we need many ground-truth data. However, high costs are required to make these data, so unsupervised manners are actively being studied. The Mumford–Shah and Chan–Vese models are well-known unsupervised image segmentation models. However, the Mumford–Shah model and the Chan–Vese model cannot separate the foreground and background of the image because they are based on pixel intensities. In this paper, we propose a weakly supervised model for image segmentation based on the segmentation models (Mumford–Shah model and Chan–Vese model) and classification. The segmentation model (i.e., Mumford–Shah model or Chan–Vese model) is to find a base image mask for classification, and the classification network uses the mask from the segmentation models. With the classifcation network, the output mask of the segmentation model changes in the direction of increasing the performance of the classification network. In addition, the mask can distinguish the foreground and background of images naturally. Our experiment shows that our segmentation model, integrated with a classifier, can segment the input image to the foreground and the background only with the image’s class label, which is the image-level label.


2021 ◽  
Vol 13 (10) ◽  
pp. 1966
Author(s):  
Christopher W Smith ◽  
Santosh K Panda ◽  
Uma S Bhatt ◽  
Franz J Meyer ◽  
Anushree Badola ◽  
...  

In recent years, there have been rapid improvements in both remote sensing methods and satellite image availability that have the potential to massively improve burn severity assessments of the Alaskan boreal forest. In this study, we utilized recent pre- and post-fire Sentinel-2 satellite imagery of the 2019 Nugget Creek and Shovel Creek burn scars located in Interior Alaska to both assess burn severity across the burn scars and test the effectiveness of several remote sensing methods for generating accurate map products: Normalized Difference Vegetation Index (NDVI), Normalized Burn Ratio (NBR), and Random Forest (RF) and Support Vector Machine (SVM) supervised classification. We used 52 Composite Burn Index (CBI) plots from the Shovel Creek burn scar and 28 from the Nugget Creek burn scar for training classifiers and product validation. For the Shovel Creek burn scar, the RF and SVM machine learning (ML) classification methods outperformed the traditional spectral indices that use linear regression to separate burn severity classes (RF and SVM accuracy, 83.33%, versus NBR accuracy, 73.08%). However, for the Nugget Creek burn scar, the NDVI product (accuracy: 96%) outperformed the other indices and ML classifiers. In this study, we demonstrated that when sufficient ground truth data is available, the ML classifiers can be very effective for reliable mapping of burn severity in the Alaskan boreal forest. Since the performance of ML classifiers are dependent on the quantity of ground truth data, when sufficient ground truth data is available, the ML classification methods would be better at assessing burn severity, whereas with limited ground truth data the traditional spectral indices would be better suited. We also looked at the relationship between burn severity, fuel type, and topography (aspect and slope) and found that the relationship is site-dependent.


2020 ◽  
Vol 13 (1) ◽  
pp. 26
Author(s):  
Wen-Hao Su ◽  
Jiajing Zhang ◽  
Ce Yang ◽  
Rae Page ◽  
Tamas Szinyei ◽  
...  

In many regions of the world, wheat is vulnerable to severe yield and quality losses from the fungus disease of Fusarium head blight (FHB). The development of resistant cultivars is one means of ameliorating the devastating effects of this disease, but the breeding process requires the evaluation of hundreds of lines each year for reaction to the disease. These field evaluations are laborious, expensive, time-consuming, and are prone to rater error. A phenotyping cart that can quickly capture images of the spikes of wheat lines and their level of FHB infection would greatly benefit wheat breeding programs. In this study, mask region convolutional neural network (Mask-RCNN) allowed for reliable identification of the symptom location and the disease severity of wheat spikes. Within a wheat line planted in the field, color images of individual wheat spikes and their corresponding diseased areas were labeled and segmented into sub-images. Images with annotated spikes and sub-images of individual spikes with labeled diseased areas were used as ground truth data to train Mask-RCNN models for automatic image segmentation of wheat spikes and FHB diseased areas, respectively. The feature pyramid network (FPN) based on ResNet-101 network was used as the backbone of Mask-RCNN for constructing the feature pyramid and extracting features. After generating mask images of wheat spikes from full-size images, Mask-RCNN was performed to predict diseased areas on each individual spike. This protocol enabled the rapid recognition of wheat spikes and diseased areas with the detection rates of 77.76% and 98.81%, respectively. The prediction accuracy of 77.19% was achieved by calculating the ratio of the wheat FHB severity value of prediction over ground truth. This study demonstrates the feasibility of rapidly determining levels of FHB in wheat spikes, which will greatly facilitate the breeding of resistant cultivars.


Sensors ◽  
2021 ◽  
Vol 21 (12) ◽  
pp. 4050
Author(s):  
Dejan Pavlovic ◽  
Christopher Davison ◽  
Andrew Hamilton ◽  
Oskar Marko ◽  
Robert Atkinson ◽  
...  

Monitoring cattle behaviour is core to the early detection of health and welfare issues and to optimise the fertility of large herds. Accelerometer-based sensor systems that provide activity profiles are now used extensively on commercial farms and have evolved to identify behaviours such as the time spent ruminating and eating at an individual animal level. Acquiring this information at scale is central to informing on-farm management decisions. The paper presents the development of a Convolutional Neural Network (CNN) that classifies cattle behavioural states (`rumination’, `eating’ and `other’) using data generated from neck-mounted accelerometer collars. During three farm trials in the United Kingdom (Easter Howgate Farm, Edinburgh, UK), 18 steers were monitored to provide raw acceleration measurements, with ground truth data provided by muzzle-mounted pressure sensor halters. A range of neural network architectures are explored and rigorous hyper-parameter searches are performed to optimise the network. The computational complexity and memory footprint of CNN models are not readily compatible with deployment on low-power processors which are both memory and energy constrained. Thus, progressive reductions of the CNN were executed with minimal loss of performance in order to address the practical implementation challenges, defining the trade-off between model performance versus computation complexity and memory footprint to permit deployment on micro-controller architectures. The proposed methodology achieves a compression of 14.30 compared to the unpruned architecture but is nevertheless able to accurately classify cattle behaviours with an overall F1 score of 0.82 for both FP32 and FP16 precision while achieving a reasonable battery lifetime in excess of 5.7 years.


Sign in / Sign up

Export Citation Format

Share Document