scholarly journals Joint DNA-based disaster victim identification

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Magnus D. Vigeland ◽  
Thore Egeland

AbstractWe address computational and statistical aspects of DNA-based identification of victims in the aftermath of disasters. Current methods and software for such identification typically consider each victim individually, leading to suboptimal power of identification and potential inconsistencies in the statistical summary of the evidence. We resolve these problems by performing joint identification of all victims, using the complete genetic data set. Individual identification probabilities, conditional on all available information, are derived from the joint solution in the form of posterior pairing probabilities. A closed formula is obtained for the a priori number of possible joint solutions to a given DVI problem. This number increases quickly with the number of victims and missing persons, posing computational challenges for brute force approaches. We address this complexity with a preparatory sequential step aiming to reduce the search space. The examples show that realistic cases are handled efficiently. User-friendly implementations of all methods are provided in the R package dvir, freely available on all platforms.

2021 ◽  
Author(s):  
Magnus Dehli Vigeland ◽  
Thore Egeland

Abstract We address computational and statistical aspects of DNA-based identification of victims in the aftermath of disasters. Current methods and software for such identification typically consider each victim individually, leading to suboptimal power of identification and potential inconsistencies in the statistical summary of the evidence. We resolve these problems by performing joint identification of all victims, using the complete genetic data set. Individual identification probabilities, conditional on all available information, are derived from the joint solution in the form of posterior pairing probabilities. A closed formula is obtained for the a priori number of possible joint solutions to a given DVI problem. This number increases quickly with the number of victims and missing persons, posing computational challenges for brute force approaches. We address this complexity with a preparatory sequential step aiming to reduce the search space. The examples show that realistic cases are handled efficiently. User-friendly implementations of all methods are provided in the R package dvir, freely available on all platforms.


2021 ◽  
Author(s):  
Jin Kim

This article presents Exploratory Only: an intuitive tool for conducting large-scale exploratory analyses easily and quickly. Available in three forms (as a web application, standalone program, and R Package) and launched as a point-and-click interface, Exploratory Only allows researchers to conduct all possible correlation, moderation, and mediation analyses among selected variables in their data set with minimal effort and time. Compared to a popular alternative, SPSS, Exploratory Only is shown to be orders of magnitude easier and faster at conducting exploratory analyses. The article demonstrates how to use Exploratory Only and discusses the caveat to using it. As long as researchers use Exploratory Only as intended—to discover novel hypotheses to investigate in follow-up studies, rather than to confirm nonexistent a priori hypotheses (i.e., p-hacking)—Exploratory Only can promote progress in behavioral science by encouraging more exploratory analyses and therefore more discoveries.


1999 ◽  
Vol 09 (03) ◽  
pp. 195-202 ◽  
Author(s):  
JOSÉ ALFREDO FERREIRA COSTA ◽  
MÁRCIO LUIZ DE ANDRADE NETTO

Determining the structure of data without prior knowledge of the number of clusters or any information about their composition is a problem of interest in many fields, such as image analysis, astrophysics, biology, etc. Partitioning a set of n patterns in a p-dimensional feature space must be done such that those in a given cluster are more similar to each other than the rest. As there are approximately [Formula: see text] possible ways of partitioning the patterns among K clusters, finding the best solution is very hard when n is large. The search space is increased when we have no a priori number of partitions. Although the self-organizing feature map (SOM) can be used to visualize clusters, the automation of knowledge discovery by SOM is a difficult task. This paper proposes region-based image processing methods to post-processing the U-matrix obtained after the unsupervised learning performed by SOM. Mathematical morphology is applied to identify regions of neurons that are similar. The number of regions and their labels are automatically found and they are related to the number of clusters in a multivariate data set. New data can be classified by labeling it according to the best match neuron. Simulations using data sets drawn from finite mixtures of p-variate normal densities are presented as well as related advantages and drawbacks of the method.


2017 ◽  
Vol 1 (2) ◽  
pp. 13
Author(s):  
Eva López-Tello ◽  
Salvador Mandujano

ResumenEl empleo de cámaras trampa es un método que se ha popularizado en la última década debido al desarrollo tecnológico que ha hecho más accesible la adquisición de este equipo. Una de las ventajas de este método es que podemos obtener mucha información en poco tiempo de diferentes especies. Sin embargo, existen pocos programas que faciliten la organización y extracción de la información de una gran cantidad de imágenes. Recientemente se ha puesto disponible libremente el paquete R llamado camtrapR, el cual sirve para extraer los metadatos de las imágenes, crear tablas de registros independientes, registros de presencia/ausencia para ocupación, y gráficos espaciales. Para comprobar la funcionalidad del programa en este artículo presentamos seis ejemplos de las principales funciones de camtrapR. Para esto se utilizó un conjunto de imágenes obtenidas con 10 cámaras-trampa en una localidad de la Reserva de la Biosfera Tehuacán-Cuicatlán. camtrapR se aplicó para probar los siguientes objetivos: organización y manejo de las fotos, clasificación por especie, identificación individual, extracción de metadatos por especie y/o individuos, exploración y visualización de datos, y exportación de datos para análisis de ocupación. Está disponible libre el código R utilizado en este trabajo. De acuerdo a los resultados obtenidos se considera que camtrapR es un paquete eficiente para facilitar y reducir el tiempo de extracción de los metadatos de las imágenes; así mismo es posible obtener los registros independientes sin errores de omisión o duplicación de datos. Además, permite crear archivos *.csv que después pueden ser analizados con otros paquetes R o programas para otros propósitos.Palabras clave: base de datos, historias de captura, metadatos, R. AbstractThe camera-trap is a method that has become popular in the last decade due to the technological development that has made the acquisition of this equipment more accessible. One of the advantages of this method is that we can get a lot of information in a short time for different species. However, there are few programs that facilitate the organization and extraction of information from large number of images. Recently, the R package called camtrapR has been made freely available, which serves to extract the metadata from the images, create independent record tables, occupation presence/absence registers and spatial graphics. To check the functionality of this package, in this article we present six examples of how to use the main functions of camtrapR. For this purpose, we used a data set of images obtained with 10 cameras in the location of the Tehuacán-Cuicatlán Biosphere Reserve. camtrapR was applied to test the following objectives: organization and management of the photos, classification by species, individual identification, extraction of metadata by species and individuals, exploration and visualization of data, and export of data for analysis of occupation. The R code used in this work is available freely in line. According to our results, camtrapR is an efficient package to facilitate and reduce the extraction time of the metadata of the images; it is also possible to obtain the independent records without errors of omission or duplication of data. In addition, it allows to create * .csv files that can then be analyzed with other R packages or programs for different objectives.Key words: capture histories, database, metadata, organization, R.


2020 ◽  
Author(s):  
József Bukszár ◽  
Edwin JCG van den Oord

ABSTRACTThe large number of existing databases provides a freely available independent source of information with a considerable potential to increase the likelihood of identifying genes for complex diseases. We developed a flexible framework for integrating such heterogeneous databases into novel large scale genetic studies and implemented the methods in a freely-available, user-friendly R package called MIND. For each marker, MIND computes the posterior probability that the marker has effect in the novel data collection based on the information in all available data. MIND 1) relies on a very general model, 2) is based on the mathematical formulas that provide us with the exact value of the posterior probability, and 3) has good estimation properties because of its very efficient parameterization. For an existing data set, only the ranks of the markers are needed, where ties among the ranks are allowed. Through simulations, cross-validation analyses involving 18 GWAS, and an independent replication study of 6,544 SNPs in 6,298 samples we show that MIND 1) is accurate, 2) outperforms marker selection for follow up studies based on p-values, and 3) identifies effects that would otherwise require replication of over 20 times as many markers.AUTHOR SUMMARYThe large number of existing databases provides a freely available independent source of information with a considerable potential to increase the likelihood of identifying genes for complex diseases. We developed a flexible framework for integrating such heterogeneous databases into novel large scale genetic studies and implemented the methods in a freely-available, user-friendly R package called MIND. For each marker, MIND computes an estimate of the (posterior) probability that the marker has effect in the novel data collection based on the information in all available data. For an existing data set, only the ranks of the markers are needed to be known, where ties among the ranks are allowed. MIND 1) relies on a realistic model that takes confounding effects into account, 2) is based on the mathematical formulas that provide us with the exact value of the posterior probability, and 3) has good estimation properties because of its very efficient parameterization. Simulation, validation, and a replication study in independent samples show that MIND is accurate and greatly outperforms marker selection without using existing data sets.


2020 ◽  
Author(s):  
Xiaoyu Lu ◽  
Szu-Wei Tu ◽  
Wennan Chang ◽  
Changlin Wan ◽  
Jiashi Wang ◽  
...  

ABSTRACTDeconvolution of mouse transcriptomic data is challenged by the fact that mouse models carry various genetic and physiological perturbations, making it questionable to assume fixed cell types and cell type marker genes for different dataset scenarios. We developed a Semi-Supervised Mouse data Deconvolution (SSMD) method to study the mouse tissue microenvironment (TME). SSMD is featured by (i) a novel non-parametric method to discover data set specific cell type signature genes; (ii) a community detection approach for fixing cell types and their marker genes; (iii) a constrained matrix decomposition method to solve cell type relative proportions that is robust to diverse experimental platforms. In summary, SSMD addressed several key challenges in the deconvolution of mouse tissue data, including: (1) varied cell types and marker genes caused by highly divergent genotypic and phenotypic conditions of mouse experiment, (2) diverse experimental platforms of mouse transcriptomics data, (3) small sample size and limited training data source, and (4) capable to estimate the proportion of 35 cell types in blood, inflammatory, central nervous or hematopoietic systems. In silico and experimental validation of SSMD demonstrated its high sensitivity and accuracy in identifying (sub) cell types and predicting cell proportions comparing to state-of-the-arts methods. A user-friendly R package and a web server of SSMD are released via https://github.com/xiaoyulu95/SSMD.Key pointsWe provide a novel tissue deconvolution method, namely SSMD, which is specifically designed for mouse data to handle the variations caused by different mouse strain, genetic and phenotypic background, and experimental platforms.SSMD is capable to detect data set and tissue microenvironment specific cell markers for more than 30 cell types in mouse blood, inflammatory tissue, cancer, and central nervous system.SSMD achieve much improved performance in estimating relative proportion of the cell types compared with state-of-the-art methods.The semi-supervised setting enables the application of SSMD on transcriptomics, DNA methylation and ATAC-seq data.A user friendly R package and a R shiny of SSMD based webserver are also developed.


Author(s):  
W. Karel ◽  
M. Doneus ◽  
C. Briese ◽  
G. Verhoeven ◽  
N. Pfeifer

We present a method for the automatic geo-referencing of archaeological photographs captured aboard unmanned aerial vehicles (UAVs), termed UPs. We do so by help of pre-existing ortho-photo maps (OPMs) and digital surface models (DSMs). Typically, these pre-existing data sets are based on data that were captured at a widely different point in time. This renders the detection (and hence the matching) of homologous feature points in the UPs and OPMs infeasible mainly due to temporal variations of vegetation and illumination. Facing this difficulty, we opt for the normalized cross correlation coefficient of perspectively transformed image patches as the measure of image similarity. Applying a threshold to this measure, we detect candidates for homologous image points, resulting in a distinctive, but computationally intensive method. In order to lower computation times, we reduce the dimensionality and extents of the search space by making use of a priori knowledge of the data sets. By assigning terrain heights interpolated in the DSM to the image points found in the OPM, we generate control points. We introduce respective observations into a bundle block, from which gross errors i.e. false matches are eliminated during its robust adjustment. A test of our approach on a UAV image data set demonstrates its potential and raises hope to successfully process large image archives.


2021 ◽  
Vol 22 (3) ◽  
pp. 1399
Author(s):  
Salim Ghannoum ◽  
Waldir Leoncio Netto ◽  
Damiano Fantini ◽  
Benjamin Ragan-Kelley ◽  
Amirabbas Parizadeh ◽  
...  

The growing attention toward the benefits of single-cell RNA sequencing (scRNA-seq) is leading to a myriad of computational packages for the analysis of different aspects of scRNA-seq data. For researchers without advanced programing skills, it is very challenging to combine several packages in order to perform the desired analysis in a simple and reproducible way. Here we present DIscBIO, an open-source, multi-algorithmic pipeline for easy, efficient and reproducible analysis of cellular sub-populations at the transcriptomic level. The pipeline integrates multiple scRNA-seq packages and allows biomarker discovery with decision trees and gene enrichment analysis in a network context using single-cell sequencing read counts through clustering and differential analysis. DIscBIO is freely available as an R package. It can be run either in command-line mode or through a user-friendly computational pipeline using Jupyter notebooks. We showcase all pipeline features using two scRNA-seq datasets. The first dataset consists of circulating tumor cells from patients with breast cancer. The second one is a cell cycle regulation dataset in myxoid liposarcoma. All analyses are available as notebooks that integrate in a sequential narrative R code with explanatory text and output data and images. R users can use the notebooks to understand the different steps of the pipeline and will guide them to explore their scRNA-seq data. We also provide a cloud version using Binder that allows the execution of the pipeline without the need of downloading R, Jupyter or any of the packages used by the pipeline. The cloud version can serve as a tutorial for training purposes, especially for those that are not R users or have limited programing skills. However, in order to do meaningful scRNA-seq analyses, all users will need to understand the implemented methods and their possible options and limitations.


2014 ◽  
Vol 08 (04) ◽  
pp. 515-544 ◽  
Author(s):  
Pavlos Fafalios ◽  
Panagiotis Papadakos ◽  
Yannis Tzitzikas

The integration of the classical Web (of documents) with the emerging Web of Data is a challenging vision. In this paper we focus on an integration approach during searching which aims at enriching the responses of non-semantic search systems with semantic information, i.e. Linked Open Data (LOD), and exploiting the outcome for offering advanced exploratory search services which provide an overview of the search space and allow the users to explore the related LOD. We use named entities identified in the search results for automatically connecting search hits with LOD and we consider a scenario where this entity-based integration is performed at query time with no human effort and no a-priori indexing which is beneficial in terms of configurability and freshness. However, the number of identified entities can be high and the same is true for the semantic information about these entities that can be fetched from the available LOD. To this end, in this paper we propose a Link Analysis-based method which is used for ranking (and thus selecting to show) the more important semantic information related to the search results. We report the results of a survey regarding the marine domain with promising results, and comparative results that illustrate the effectiveness of the proposed (PageRank-based) ranking scheme. Finally, we report experimental results regarding efficiency showing that the proposed functionality can be offered even at query time.


2021 ◽  
Vol 4 (1) ◽  
pp. 251524592095492
Author(s):  
Marco Del Giudice ◽  
Steven W. Gangestad

Decisions made by researchers while analyzing data (e.g., how to measure variables, how to handle outliers) are sometimes arbitrary, without an objective justification for choosing one alternative over another. Multiverse-style methods (e.g., specification curve, vibration of effects) estimate an effect across an entire set of possible specifications to expose the impact of hidden degrees of freedom and/or obtain robust, less biased estimates of the effect of interest. However, if specifications are not truly arbitrary, multiverse-style analyses can produce misleading results, potentially hiding meaningful effects within a mass of poorly justified alternatives. So far, a key question has received scant attention: How does one decide whether alternatives are arbitrary? We offer a framework and conceptual tools for doing so. We discuss three kinds of a priori nonequivalence among alternatives—measurement nonequivalence, effect nonequivalence, and power/precision nonequivalence. The criteria we review lead to three decision scenarios: Type E decisions (principled equivalence), Type N decisions (principled nonequivalence), and Type U decisions (uncertainty). In uncertain scenarios, multiverse-style analysis should be conducted in a deliberately exploratory fashion. The framework is discussed with reference to published examples and illustrated with the help of a simulated data set. Our framework will help researchers reap the benefits of multiverse-style methods while avoiding their pitfalls.


Sign in / Sign up

Export Citation Format

Share Document