scholarly journals Automatic generation of ground truth data for the evaluation of clonal grouping methods in B-cell populations

2020 ◽  
Author(s):  
Nika Abdollahi ◽  
Anne de Septenville ◽  
Frédéric Davi ◽  
Juliana S. Bernardes

MotivationThe adaptive B-cell response is driven by the expansion, somatic hypermutation, and selection of B-cell clones. Their number, size and sequence diversity are essential characteristics of B-cell populations. Identifying clones in B-cell populations is central to several repertoire studies such as statistical analysis, repertoire comparisons, and clonal tracking. Several clonal grouping methods have been developed to group sequences from B-cell immune repertoires. Such methods have been principally evaluated on simulated benchmarks since experimental data containing clonally related sequences can be difficult to obtain. However, experimental data might contains multiple sources of sequence variability hampering their artificial reproduction. Therefore, the generation of high precision ground truth data that preserves real repertoire distributions is necessary to accurately evaluate clonal grouping methods.ResultsWe proposed a novel methodology to generate ground truth data sets from real repertoires. Our procedure requires V(D)J annotations to obtain the initial clones, and iteratively apply an optimisation step that moves sequences among clones to increase their cohesion and separation. We first showed that our method was able to identify clonally-related sequences in simulated repertoires with higher mutation rates, accurately. Next, we demonstrated how real benchmarks (generated by our method) constitute a challenge for clonal grouping methods, when comparing the performance of a widely used clonal grouping algorithm on several generated benchmarks. Our method can be used to generate a high number of benchmarks and contribute to construct more accurate clonal grouping tools.Availability and implementationThe source code and generated data sets are freely available at github.com/NikaAb/BCR_GTG

Algorithms ◽  
2021 ◽  
Vol 14 (7) ◽  
pp. 212
Author(s):  
Youssef Skandarani ◽  
Pierre-Marc Jodoin ◽  
Alain Lalande

Deep learning methods are the de facto solutions to a multitude of medical image analysis tasks. Cardiac MRI segmentation is one such application, which, like many others, requires a large number of annotated data so that a trained network can generalize well. Unfortunately, the process of having a large number of manually curated images by medical experts is both slow and utterly expensive. In this paper, we set out to explore whether expert knowledge is a strict requirement for the creation of annotated data sets on which machine learning can successfully be trained. To do so, we gauged the performance of three segmentation models, namely U-Net, Attention U-Net, and ENet, trained with different loss functions on expert and non-expert ground truth for cardiac cine–MRI segmentation. Evaluation was done with classic segmentation metrics (Dice index and Hausdorff distance) as well as clinical measurements, such as the ventricular ejection fractions and the myocardial mass. The results reveal that generalization performances of a segmentation neural network trained on non-expert ground truth data is, to all practical purposes, as good as that trained on expert ground truth data, particularly when the non-expert receives a decent level of training, highlighting an opportunity for the efficient and cost-effective creation of annotations for cardiac data sets.


Author(s):  
Ning Niu ◽  
He Jin

China’s urban villages have distinct characteristics compared with the ones in western countries. Identifying urban villages provides a basis for policymakers to evaluate and improve the effectiveness of urban planning in China and other developing countries. However, perhaps due to limitations of data acquisition among others, few urban studies have successfully identified urban villages at the building level. To fill the research gap, this paper has fused multiple sources of data and utilized a three-stage model to identify urban villages in Haizhu District (Guangzhou, China). The first stage discriminates residential buildings, offices, shops, and restaurants based on various peak times of bike trajectories in different types of buildings. However, the first stage could not distinguish the regular residential buildings (in cities) and residential buildings within urban villages due to the similarity of human activities between them. It then utilized a second stage to identify residential buildings within urban villages based on the area, height, and density of buildings. In the third stage, we used correction rules to identify buildings with mixed-use and single-use buildings within urban villages. The results showed that urban villages were mainly concentrated in the western and central regions of the Haizhu District. Most of them were adjacent to shopping buildings or high-rise residential buildings. Building height and density played critical roles in the characterization of residential buildings in urban villages. Our accuracy rate was around 85% when verified against ground-truth data.


Author(s):  
N. Soyama ◽  
K. Muramatsu ◽  
M. Daigo ◽  
F. Ochiai ◽  
N. Fujiwara

Validating the accuracy of land cover products using a reliable reference dataset is an important task. A reliable reference dataset is produced with information derived from ground truth data. Recently, the amount of ground truth data derived from information collected by volunteers has been increasing globally. The acquisition of volunteer-based reference data demonstrates great potential. However information given by volunteers is limited useful vegetation information to produce a complete reference dataset based on the plant functional type (PFT) with five specialized forest classes. In this study, we examined the availability and applicability of FLUXNET information to produce reference data with higher levels of reliability. FLUXNET information was useful especially for forest classes for interpretation in comparison with the reference dataset using information given by volunteers.


2020 ◽  
Vol 12 (1) ◽  
pp. 9-12
Author(s):  
Arjun G. Koppad ◽  
Syeda Sarfin ◽  
Anup Kumar Das

The study has been conducted for land use and land cover classification by using SAR data. The study included examining of ALOS 2 PALSAR L- band quad pol (HH, HV, VH and VV) SAR data for LULC classification. The SAR data was pre-processed first which included multilook, radiometric calibration, geometric correction, speckle filtering, SAR Polarimetry and decomposition. For land use land cover classification of ALOS-2-PALSAR data sets, the supervised Random forest classifier was used. Training samples were selected with the help of ground truth data. The area was classified under 7 different classes such as dense forest, moderate dense forest, scrub/sparse forest, plantation, agriculture, water body, and settlements. Among them the highest area was covered by dense forest (108647ha) followed by horticulture plantation (57822 ha) and scrub/Sparse forest (49238 ha) and lowest area was covered by moderate dense forest (11589 ha).   Accuracy assessment was performed after classification. The overall accuracy of SAR data was 80.36% and Kappa Coefficient was 0.76.  Based on SAR backscatter reflectance such as single, double, and volumetric scattering mechanism different land use classes were identified.


2016 ◽  
Author(s):  
Roshni Cooper ◽  
Shaul Yogev ◽  
Kang Shen ◽  
Mark Horowitz

AbstractMotivation:Microtubules (MTs) are polarized polymers that are critical for cell structure and axonal transport. They form a bundle in neurons, but beyond that, their organization is relatively unstudied.Results:We present MTQuant, a method for quantifying MT organization using light microscopy, which distills three parameters from MT images: the spacing of MT minus-ends, their average length, and the average number of MTs in a cross-section of the bundle. This method allows for robust and rapid in vivo analysis of MTs, rendering it more practical and more widely applicable than commonly-used electron microscopy reconstructions. MTQuant was successfully validated with three ground truth data sets and applied to over 3000 images of MTs in a C. elegans motor neuron.Availability:MATLAB code is available at http://roscoope.github.io/MTQuantContact:[email protected] informationSupplementary data are available at Bioinformatics online.


BMC Genomics ◽  
2020 ◽  
Vol 21 (S9) ◽  
Author(s):  
Xingyu Yang ◽  
Christopher M. Tipton ◽  
Matthew C. Woodruff ◽  
Enlu Zhou ◽  
F. Eun-Hyung Lee ◽  
...  

Abstract Background B cell affinity maturation enables B cells to generate high-affinity antibodies. This process involves somatic hypermutation of B cell immunoglobulin receptor (BCR) genes and selection by their ability to bind antigens. Lineage trees are used to describe this microevolution of B cell immunoglobulin genes. In a lineage tree, each node is one BCR sequence that mutated from the germinal center and each directed edge represents a single base mutation, insertion or deletion. In BCR sequencing data, the observed data only contains a subset of BCR sequences in this microevolution process. Therefore, reconstructing the lineage tree from experimental data requires algorithms to build the tree based on partially observed tree nodes. Results We developed a new algorithm named Grow Lineages along Minimum Spanning Tree (GLaMST), which efficiently reconstruct the lineage tree given observed BCR sequences that correspond to a subset of the tree nodes. Through comparison using simulated and real data, GLaMST outperforms existing algorithms in simulations with high rates of mutation, insertion and deletion, and generates lineage trees with smaller size and closer to ground truth according to tree features that highly correlated with selection pressure. Conclusions GLaMST outperforms state-of-art in reconstruction of the BCR lineage tree in both efficiency and accuracy. Integrating it into existing BCR sequencing analysis frameworks can significant improve lineage tree reconstruction aspect of the analysis.


2019 ◽  
Vol 13 (1) ◽  
pp. 120-126
Author(s):  
K. Bhavanishankar ◽  
M. V. Sudhamani

Objective: Lung cancer is proving to be one of the deadliest diseases that is haunting mankind in recent years. Timely detection of the lung nodules would surely enhance the survival rate. This paper focusses on the classification of candidate lung nodules into nodules/non-nodules in a CT scan of the patient. A deep learning approach –autoencoder is used for the classification. Investigation/Methodology: Candidate lung nodule patches obtained as the results of the lung segmentation are considered as input to the autoencoder model. The ground truth data from the LIDC repository is prepared and is submitted to the autoencoder training module. After a series of experiments, it is decided to use 4-stacked autoencoder. The model is trained for over 600 LIDC cases and the trained module is tested for remaining data sets. Results: The results of the classification are evaluated with respect to performance measures such as sensitivity, specificity, and accuracy. The results obtained are also compared with other related works and the proposed approach was found to be better by 6.2% with respect to accuracy. Conclusion: In this paper, a deep learning approach –autoencoder has been used for the classification of candidate lung nodules into nodules/non-nodules. The performance of the proposed approach was evaluated with respect to sensitivity, specificity, and accuracy and the obtained values are 82.6%, 91.3%, and 87.0%, respectively. This result is then compared with existing related works and an improvement of 6.2% with respect to accuracy has been observed.


Sign in / Sign up

Export Citation Format

Share Document