scholarly journals Fast lightweight accurate xenograft sorting

2021 ◽  
Vol 16 (1) ◽  
Author(s):  
Jens Zentgraf ◽  
Sven Rahmann

Abstract Motivation With an increasing number of patient-derived xenograft (PDX) models being created and subsequently sequenced to study tumor heterogeneity and to guide therapy decisions, there is a similarly increasing need for methods to separate reads originating from the graft (human) tumor and reads originating from the host species’ (mouse) surrounding tissue. Two kinds of methods are in use: On the one hand, alignment-based tools require that reads are mapped and aligned (by an external mapper/aligner) to the host and graft genomes separately first; the tool itself then processes the resulting alignments and quality metrics (typically BAM files) to assign each read or read pair. On the other hand, alignment-free tools work directly on the raw read data (typically FASTQ files). Recent studies compare different approaches and tools, with varying results. Results We show that alignment-free methods for xenograft sorting are superior concerning CPU time usage and equivalent in accuracy. We improve upon the state of the art sorting by presenting a fast lightweight approach based on three-way bucketed quotiented Cuckoo hashing. Our hash table requires memory comparable to an FM index typically used for read alignment and less than other alignment-free approaches. It allows extremely fast lookups and uses less CPU time than other alignment-free methods and alignment-based methods at similar accuracy. Several engineering steps (e.g., shortcuts for unsuccessful lookups, software prefetching) improve the performance even further. Availability Our software xengsort is available under the MIT license at http://gitlab.com/genomeinformatics/xengsort. It is written in numba-compiled Python and comes with sample Snakemake workflows for hash table construction and dataset processing.

2021 ◽  
Author(s):  
Jens Zentgraf ◽  
Sven Rahmann

Abstract Motivation: With an increasing number of patient-derived xenograft (PDX) models being created and subsequently sequenced to study tumor heterogeneity and to guide therapy decisions, there is a similarly increasing need for methods to separate reads originating from the graft (human) tumor and reads originating from the host species' (mouse) surrounding tissue. Two kinds of methods are in use: On the one hand, alignment-based tools require that reads are mapped and aligned (by an external mapper/aligner) to the host and graft genomes separately first; the tool itself then processes the resulting alignments and quality metrics (typically BAM files) to assign each read or read pair. On the other hand, alignment-free tools work directly on the raw read data (typically FASTQ files). Recent studies compare different approaches and tools, with varying results. Results: We show that alignment-free methods for xenograft sorting are superior concerning CPU time usage and equivalent in accuracy. We improve upon the state of the art sorting by presenting a fast lightweight approach based on three-way bucketed quotiented Cuckoo hashing. Our hash table requires memory comparable to an FM index typically used for read alignment and less than other alignment-free approaches. It allows extremely fast lookups and uses less CPU time than other alignment-free methods and alignment-based methods at similar accuracy. Several engineering steps (e.g., shortcuts for unsuccessful lookups, software prefetching) improve the performance even further. Availability: Our software xengsort is available under the MIT license at http://gitlab.com/genomeinformatics/xengsort. It is written in numba-compiled Python and comes with sample Snakemake workows for hash table construction and dataset processing.


2020 ◽  
Author(s):  
Jens Zentgraf ◽  
Sven Rahmann

AbstractMotivationWith an increasing number of patient-derived xenograft (PDX) models being created and subsequently sequenced to study tumor heterogeneity and to guide therapy decisions, there is a similarly increasing need for methods to separate reads originating from the graft (human) tumor and reads originating from the host species’ (mouse) surrounding tissue. Two kinds of methods are in use: On the one hand, alignment-based tools require that reads are mapped and aligned (by an external mapper/aligner) to the host and graft genomes separately first; the tool itself then processes the resulting alignments and quality metrics (typically BAM files) to assign each read or read pair. On the other hand, alignment-free tools work directly on the raw read data (typically FASTQ files). Recent studies compare different approaches and tools, with varying results.ResultsWe show that alignment-free methods for xenograft sorting are superior concerning CPU time usage and equivalent in accuracy. We improve upon the state of the art sorting by presenting a fast lightweight approach based on three-way bucketed quotiented Cuckoo hashing. Our hash table requires memory comparable to an FM index typically used for read alignment and less than other alignment-free approaches. It allows extremely fast lookups and uses less CPU time than other alignment-free methods and alignment-based methods at similar accuracy.AvailabilityOur software xengsort is available under the MIT license at http://gitlab.com/genomeinformatics/xengsort. It is written in numba-compiled Python and comes with Snakemake workflows for hash table construction and dataset [email protected]


2021 ◽  
Vol 52 (1) ◽  
Author(s):  
Fabienne Archer ◽  
Alexandra Bobet-Erny ◽  
Maryline Gomes

AbstractThe number and severity of diseases affecting lung development and adult respiratory function have stimulated great interest in developing new in vitro models to study lung in different species. Recent breakthroughs in 3-dimensional (3D) organoid cultures have led to new physiological in vitro models that better mimic the lung than conventional 2D cultures. Lung organoids simulate multiple aspects of the real organ, making them promising and useful models for studying organ development, function and disease (infection, cancer, genetic disease). Due to their dynamics in culture, they can serve as a sustainable source of functional cells (biobanking) and be manipulated genetically. Given the differences between species regarding developmental kinetics, the maturation of the lung at birth, the distribution of the different cell populations along the respiratory tract and species barriers for infectious diseases, there is a need for species-specific lung models capable of mimicking mammal lungs as they are of great interest for animal health and production, following the One Health approach. This paper reviews the latest developments in the growing field of lung organoids.


Database ◽  
2021 ◽  
Vol 2021 ◽  
Author(s):  
Yifan Shao ◽  
Haoru Li ◽  
Jinghang Gu ◽  
Longhua Qian ◽  
Guodong Zhou

Abstract Extraction of causal relations between biomedical entities in the form of Biological Expression Language (BEL) poses a new challenge to the community of biomedical text mining due to the complexity of BEL statements. We propose a simplified form of BEL statements [Simplified Biological Expression Language (SBEL)] to facilitate BEL extraction and employ BERT (Bidirectional Encoder Representation from Transformers) to improve the performance of causal relation extraction (RE). On the one hand, BEL statement extraction is transformed into the extraction of an intermediate form—SBEL statement, which is then further decomposed into two subtasks: entity RE and entity function detection. On the other hand, we use a powerful pretrained BERT model to both extract entity relations and detect entity functions, aiming to improve the performance of two subtasks. Entity relations and functions are then combined into SBEL statements and finally merged into BEL statements. Experimental results on the BioCreative-V Track 4 corpus demonstrate that our method achieves the state-of-the-art performance in BEL statement extraction with F1 scores of 54.8% in Stage 2 evaluation and of 30.1% in Stage 1 evaluation, respectively. Database URL: https://github.com/grapeff/SBEL_datasets


1998 ◽  
Vol 08 (01) ◽  
pp. 21-66 ◽  
Author(s):  
W. M. P. VAN DER AALST

Workflow management promises a new solution to an age-old problem: controlling, monitoring, optimizing and supporting business processes. What is new about workflow management is the explicit representation of the business process logic which allows for computerized support. This paper discusses the use of Petri nets in the context of workflow management. Petri nets are an established tool for modeling and analyzing processes. On the one hand, Petri nets can be used as a design language for the specification of complex workflows. On the other hand, Petri net theory provides for powerful analysis techniques which can be used to verify the correctness of workflow procedures. This paper introduces workflow management as an application domain for Petri nets, presents state-of-the-art results with respect to the verification of workflows, and highlights some Petri-net-based workflow tools.


2021 ◽  
Vol 7 (4) ◽  
pp. 1-24
Author(s):  
Douglas Do Couto Teixeira ◽  
Aline Carneiro Viana ◽  
Jussara M. Almeida ◽  
Mrio S. Alvim

Predicting mobility-related behavior is an important yet challenging task. On the one hand, factors such as one’s routine or preferences for a few favorite locations may help in predicting their mobility. On the other hand, several contextual factors, such as variations in individual preferences, weather, traffic, or even a person’s social contacts, can affect mobility patterns and make its modeling significantly more challenging. A fundamental approach to study mobility-related behavior is to assess how predictable such behavior is, deriving theoretical limits on the accuracy that a prediction model can achieve given a specific dataset. This approach focuses on the inherent nature and fundamental patterns of human behavior captured in that dataset, filtering out factors that depend on the specificities of the prediction method adopted. However, the current state-of-the-art method to estimate predictability in human mobility suffers from two major limitations: low interpretability and hardness to incorporate external factors that are known to help mobility prediction (i.e., contextual information). In this article, we revisit this state-of-the-art method, aiming at tackling these limitations. Specifically, we conduct a thorough analysis of how this widely used method works by looking into two different metrics that are easier to understand and, at the same time, capture reasonably well the effects of the original technique. We evaluate these metrics in the context of two different mobility prediction tasks, notably, next cell and next distinct cell prediction, which have different degrees of difficulty. Additionally, we propose alternative strategies to incorporate different types of contextual information into the existing technique. Our evaluation of these strategies offer quantitative measures of the impact of adding context to the predictability estimate, revealing the challenges associated with doing so in practical scenarios.


2020 ◽  
Vol 20 (9&10) ◽  
pp. 747-765
Author(s):  
F. Orts ◽  
G. Ortega ◽  
E.M. E.M. Garzon

Despite the great interest that the scientific community has in quantum computing, the scarcity and high cost of resources prevent to advance in this field. Specifically, qubits are very expensive to build, causing the few available quantum computers are tremendously limited in their number of qubits and delaying their progress. This work presents new reversible circuits that optimize the necessary resources for the conversion of a sign binary number into two's complement of N digits. The benefits of our work are two: on the one hand, the proposed two's complement converters are fault tolerant circuits and also are more efficient in terms of resources (essentially, quantum cost, number of qubits, and T-count) than the described in the literature. On the other hand, valuable information about available converters and, what is more, quantum adders, is summarized in tables for interested researchers. The converters have been measured using robust metrics and have been compared with the state-of-the-art circuits. The code to build them in a real quantum computer is given.


1967 ◽  
Vol 71 (677) ◽  
pp. 342-343
Author(s):  
F. H. East

The Aviation Group of the Ministry of Technology (formerly the Ministry of Aviation) is responsible for spending a large part of the country's defence budget, both in research and development on the one hand and production or procurement on the other. In addition, it has responsibilities in many non-defence fields, mainly, but not exclusively, in aerospace.Few developments have been carried out entirely within the Ministry's own Establishments; almost all have required continuous co-operation between the Ministry and Industry. In the past the methods of management and collaboration and the relative responsibilities of the Ministry and Industry have varied with time, with the type of equipment to be developed, with the size of the development project and so on. But over the past ten years there has been a growing awareness of the need to put some system into the complex business of translating a requirement into a specification and a specification into a product within reasonable bounds of time and cost.


2021 ◽  
Vol 11 (23) ◽  
pp. 11241
Author(s):  
Ling Li ◽  
Fei Xue ◽  
Dong Liang ◽  
Xiaofei Chen

Concealed objects detection in terahertz imaging is an urgent need for public security and counter-terrorism. So far, there is no public terahertz imaging dataset for the evaluation of objects detection algorithms. This paper provides a public dataset for evaluating multi-object detection algorithms in active terahertz imaging. Due to high sample similarity and poor imaging quality, object detection on this dataset is much more difficult than on those commonly used public object detection datasets in the computer vision field. Since the traditional hard example mining approach is designed based on the two-stage detector and cannot be directly applied to the one-stage detector, this paper designs an image-based Hard Example Mining (HEM) scheme based on RetinaNet. Several state-of-the-art detectors, including YOLOv3, YOLOv4, FRCN-OHEM, and RetinaNet, are evaluated on this dataset. Experimental results show that the RetinaNet achieves the best mAP and HEM further enhances the performance of the model. The parameters affecting the detection metrics of individual images are summarized and analyzed in the experiments.


Author(s):  
Yizhen Chen ◽  
Haifeng Hu

Most existing segmentation networks are built upon a “ U -shaped” encoder–decoder structure, where the multi-level features extracted by the encoder are gradually aggregated by the decoder. Although this structure has been proven to be effective in improving segmentation performance, there are two main drawbacks. On the one hand, the introduction of low-level features brings a significant increase in calculations without an obvious performance gain. On the other hand, general strategies of feature aggregation such as addition and concatenation fuse features without considering the usefulness of each feature vector, which mixes the useful information with massive noises. In this article, we abandon the traditional “ U -shaped” architecture and propose Y-Net, a dual-branch joint network for accurate semantic segmentation. Specifically, it only aggregates the high-level features with low-resolution and utilizes the global context guidance generated by the first branch to refine the second branch. The dual branches are effectively connected through a Semantic Enhancing Module, which can be regarded as the combination of spatial attention and channel attention. We also design a novel Channel-Selective Decoder (CSD) to adaptively integrate features from different receptive fields by assigning specific channelwise weights, where the weights are input-dependent. Our Y-Net is capable of breaking through the limit of singe-branch network and attaining higher performance with less computational cost than “ U -shaped” structure. The proposed CSD can better integrate useful information and suppress interference noises. Comprehensive experiments are carried out on three public datasets to evaluate the effectiveness of our method. Eventually, our Y-Net achieves state-of-the-art performance on PASCAL VOC 2012, PASCAL Person-Part, and ADE20K dataset without pre-training on extra datasets.


Sign in / Sign up

Export Citation Format

Share Document