BIOTAS: BIOTelemetry Analysis Software, for the semi-automated removal of false positives from radio telemetry data

K. Nebiolo; T. Castro-Santos

doi:10.1186/s40317-022-00273-3

BIOTAS: BIOTelemetry Analysis Software, for the semi-automated removal of false positives from radio telemetry data

Animal Biotelemetry ◽

10.1186/s40317-022-00273-3 ◽

2022 ◽

Vol 10 (1) ◽

Author(s):

K. Nebiolo ◽

T. Castro-Santos

Keyword(s):

Data Management ◽

False Positive ◽

Large Scale ◽

Cross Validation ◽

Radio Telemetry ◽

Wide Band ◽

False Positives ◽

Training Data ◽

Analysis Software ◽

Telemetry Data

Abstract Introduction Radio telemetry, one of the most widely used techniques for tracking wildlife and fisheries populations, has a false-positive problem. Bias from false-positive detections can affect many important derived metrics, such as home range estimation, site occupation, survival, and migration timing. False-positive removal processes have relied upon simple filters and personal opinion. To overcome these shortcomings, we have developed BIOTAS (BIOTelemetry Analysis Software) to assist with false-positive identification, removal, and data management for large-scale radio telemetry projects. Methods BIOTAS uses a naïve Bayes classifier to identify and remove false-positive detections from radio telemetry data. The semi-supervised classifier uses spurious detections from unknown tags and study tags as training data. We tested BIOTAS on four scenarios: wide-band receiver with a single Yagi antenna, wide-band receiver that switched between two Yagi antennas, wide-band receiver with a single dipole antenna, and single-band receiver that switched between five frequencies. BIOTAS has a built in a k-fold cross-validation and assesses model quality with sensitivity, specificity, positive and negative predictive value, false-positive rate, and precision-recall area under the curve. BIOTAS also assesses concordance with a traditional consecutive detection filter using Cohen’s $$\kappa$$ κ . Results Overall BIOTAS performed equally well in all scenarios and was able to discriminate between known false-positive detections and valid study tag detections with low false-positive rates (< 0.001) as determined through cross-validation, even as receivers switched between antennas and frequencies. BIOTAS classified between 94 and 99% of study tag detections as valid. Conclusion As part of a robust data management plan, BIOTAS is able to discriminate between detections from study tags and known false positives. BIOTAS works with multiple manufacturers and accounts for receivers that switch between antennas and frequencies. BIOTAS provides the framework for transparent, objective, and repeatable telemetry projects for wildlife conservation surveys, and increases the efficiency of data processing.

Download Full-text

The role of the bandwidth matrix in influencing kernel home range estimates for snakes using VHF telemetry data

Wildlife Research ◽

10.1071/wr14233 ◽

2015 ◽

Vol 42 (5) ◽

pp. 437 ◽

Cited By ~ 14

Author(s):

Javan M. Bauder ◽

David R. Breininger ◽

M. Rebecca Bolt ◽

Michael L. Legare ◽

Christopher L. Jenkins ◽

...

Keyword(s):

Home Range ◽

Cross Validation ◽

Radio Telemetry ◽

High Sensitivity ◽

Home Range Size ◽

Range Size ◽

Home Ranges ◽

Telemetry Data ◽

History Of ◽

Sampling Duration

Context Despite the diversity of available home range estimators, no single method performs equally well in all circumstances. It is therefore important to understand how different estimators perform for data collected under diverse conditions. Kernel density estimation is a popular approach for home range estimation. While many studies have evaluated different kernel bandwidth selectors, few studies have compared different formulations of the bandwidth matrix using wildlife telemetry data. Additionally, few studies have compared the performance of kernel bandwidth selectors using VHF radio-telemetry data from small-bodied taxa. Aims In this study, we used eight different combinations of bandwidth selectors and matrices to evaluate their ability to meet several criteria that could be potentially used to select a home range estimator. Methods We used handheld VHF telemetry data from two species of snake displaying non-migratory and migratory movement patterns. We used subsampling to estimate each estimator’s sensitivity to sampling duration and fix rate and compared home range size, the number of disjunct volume contours and the proportion of telemetry fixes not included in those contours among estimators. Key Results We found marked differences among bandwidth selectors with regards to our criteria but comparatively little difference among bandwidth matrices for a given bandwidth selector. Least-squares cross-validation bandwidths exhibited near-universal convergence failure whereas likelihood cross-validation bandwidths showed high sensitivity to sampling duration and fix rate. The reference, plug-in and smoothed cross-validation bandwidths were more robust to variation in sampling intensity, with the former consistently producing the largest estimates of home range size. Conclusions Our study illustrates the performance of multiple kernel bandwidth estimators for estimating home ranges with datasets typical of many small-bodied taxa. The reference and plug-in bandwidths with an unconstrained bandwidth matrix generally had the best performance. However, our study concurs with earlier studies indicating that no single home range estimator performs equally well in all circumstances. Implications Although we did not find strong differences between bandwidth matrices, we encourage the use of unconstrained matrices because of their greater flexibility in smoothing data not parallel to the coordinate axes. We also encourage researchers to select an estimator suited to their study objectives and the life history of their study organism.

Download Full-text

Application of XML for Telemetry Data Management

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.756-759.3250 ◽

2013 ◽

Vol 756-759 ◽

pp. 3250-3253

Author(s):

Zhi Guo Zhang ◽

Lei Gong ◽

Hong Ping Wang

Keyword(s):

Data Structure ◽

Data Management ◽

Information Sharing ◽

Large Scale ◽

Text File ◽

Telemetry Data ◽

Binary File ◽

Practical System ◽

Management Method

At present, the telemetry data is stored and released in the form of binary file or text file, and the user cant find out the data structure from the file itself, this bring difficulties in query and the information sharing. To deal with the problem, a XML-based telemetry data management method was proposed by taking the advantages of extensible makeup language (XML). Large scale of telemetry data can be stored in sections using this method, index efficiency of telemetry data is increased obviously in the operation of practical system. The distribution ability of telemetry data is increased too.

Download Full-text

WITMSG: Large-scale Prediction of Human Intronic m6A RNA Methylation Sites from Sequence and Genomic Features

Current Genomics ◽

10.2174/1389202921666200211104140 ◽

2020 ◽

Vol 21 (1) ◽

pp. 67-76 ◽

Cited By ~ 4

Author(s):

Lian Liu ◽

Xiujuan Lei ◽

Jia Meng ◽

Zhen Wei

Keyword(s):

Large Scale ◽

Cross Validation ◽

Rna Localization ◽

Training Data ◽

Biological Processes ◽

Computational Framework ◽

Rna Methylation ◽

M6a Rna Methylation ◽

First Time ◽

Fold Cross Validation

Introduction: N6-methyladenosine (m6A) is one of the most widely studied epigenetic modifications. It plays important roles in various biological processes, such as splicing, RNA localization and degradation, many of which are related to the functions of introns. Although a number of computational approaches have been proposed to predict the m6A sites in different species, none of them were optimized for intronic m6A sites. As existing experimental data overwhelmingly relied on polyA selection in sample preparation and the intronic RNAs are usually underrepresented in the captured RNA library, the accuracy of general m6A sites prediction approaches is limited for intronic m6A sites prediction task. Methodology: A computational framework, WITMSG, dedicated to the large-scale prediction of intronic m6A RNA methylation sites in humans has been proposed here for the first time. Based on the random forest algorithm and using only known intronic m6A sites as the training data, WITMSG takes advantage of both conventional sequence features and a variety of genomic characteristics for improved prediction performance of intron-specific m6A sites. Results and Conclusion: It has been observed that WITMSG outperformed competing approaches (trained with all the m6A sites or intronic m6A sites only) in 10-fold cross-validation (AUC: 0.940) and when tested on independent datasets (AUC: 0.946). WITMSG was also applied intronome-wide in humans to predict all possible intronic m6A sites, and the prediction results are freely accessible at http://rnamd.com/intron/.

Download Full-text

We Ran 9 Billion Regressions: Eliminating False Positives through Computational Model Robustness

Sociological Methodology ◽

10.1177/0081175018777988 ◽

2018 ◽

Vol 48 (1) ◽

pp. 1-33 ◽

Cited By ~ 8

Author(s):

John Muñoz ◽

Cristobal Young

Keyword(s):

Computational Model ◽

Model Uncertainty ◽

False Positive ◽

Large Scale ◽

Random Noise ◽

Robustness Analysis ◽

Model Space ◽

False Positives ◽

Model Specification ◽

Model Robustness

False positive findings are a growing problem in many research literatures. We argue that excessive false positives often stem from model uncertainty. There are many plausible ways of specifying a regression model, but researchers typically report only a few preferred estimates. This raises the concern that such research reveals only a small fraction of the possible results and may easily lead to nonrobust, false positive conclusions. It is often unclear how much the results are driven by model specification and how much the results would change if a different plausible model were used. Computational model robustness analysis addresses this challenge by estimating all possible models from a theoretically informed model space. We use large-scale random noise simulations to show (1) the problem of excess false positive errors under model uncertainty and (2) that computational robustness analysis can identify and eliminate false positives caused by model uncertainty. We also draw on a series of empirical applications to further illustrate issues of model uncertainty and estimate instability. Computational robustness analysis offers a method for relaxing modeling assumptions and improving the transparency of applied research.

Download Full-text

Identification of and Correction for Publication Bias: Comment

10.31222/osf.io/dh87m ◽

2019 ◽

Author(s):

Amanda Kvarven ◽

Eirik Strømland ◽

Magnus Johannesson

Keyword(s):

Publication Bias ◽

False Positive ◽

Large Scale ◽

Meta Analysis ◽

False Positive Rate ◽

Effect Sizes ◽

Replication Studies ◽

Moderate Reduction ◽

Positive Rate ◽

Meta Analyses

Andrews & Kasy (2019) propose an approach for adjusting effect sizes in meta-analysis for publication bias. We use the Andrews-Kasy estimator to adjust the result of 15 meta-analyses and compare the adjusted results to 15 large-scale multiple labs replication studies estimating the same effects. The pre-registered replications provide precisely estimated effect sizes, which do not suffer from publication bias. The Andrews-Kasy approach leads to a moderate reduction of the inflated effect sizes in the meta-analyses. However, the approach still overestimates effect sizes by a factor of about two or more and has an estimated false positive rate of between 57% and 100%.

Download Full-text

DeepSSPred: A Deep Learning Based Sulfenylation site predictor via a novel n-segmented optimize federated feature encoder

Protein and Peptide Letters ◽

10.2174/0929866527666201202103411 ◽

2020 ◽

Vol 27 ◽

Author(s):

Zaheer Ullah Khan ◽

Dechang Pi

Keyword(s):

Large Scale ◽

Computational Models ◽

Research Work ◽

Training Data ◽

Training Dataset ◽

Validation Dataset ◽

Cytokine Signaling ◽

Minority Class ◽

Independent Dataset ◽

Feature Encoding

Background: S-sulfenylation (S-sulphenylation, or sulfenic acid) proteins, are special kinds of post-translation modification, which plays an important role in various physiological and pathological processes such as cytokine signaling, transcriptional regulation, and apoptosis. Despite these aforementioned significances, and by complementing existing wet methods, several computational models have been developed for sulfenylation cysteine sites prediction. However, the performance of these models was not satisfactory due to inefficient feature schemes, severe imbalance issues, and lack of an intelligent learning engine. Objective: In this study, our motivation is to establish a strong and novel computational predictor for discrimination of sulfenylation and non-sulfenylation sites. Methods: In this study, we report an innovative bioinformatics feature encoding tool, named DeepSSPred, in which, resulting encoded features is obtained via n-segmented hybrid feature, and then the resampling technique called synthetic minority oversampling was employed to cope with the severe imbalance issue between SC-sites (minority class) and non-SC sites (majority class). State of the art 2DConvolutional Neural Network was employed over rigorous 10-fold jackknife cross-validation technique for model validation and authentication. Results: Following the proposed framework, with a strong discrete presentation of feature space, machine learning engine, and unbiased presentation of the underline training data yielded into an excellent model that outperforms with all existing established studies. The proposed approach is 6% higher in terms of MCC from the first best. On an independent dataset, the existing first best study failed to provide sufficient details. The model obtained an increase of 7.5% in accuracy, 1.22% in Sn, 12.91% in Sp and 13.12% in MCC on the training data and12.13% of ACC, 27.25% in Sn, 2.25% in Sp, and 30.37% in MCC on an independent dataset in comparison with 2nd best method. These empirical analyses show the superlative performance of the proposed model over both training and Independent dataset in comparison with existing literature studies. Conclusion : In this research, we have developed a novel sequence-based automated predictor for SC-sites, called DeepSSPred. The empirical simulations outcomes with a training dataset and independent validation dataset have revealed the efficacy of the proposed theoretical model. The good performance of DeepSSPred is due to several reasons, such as novel discriminative feature encoding schemes, SMOTE technique, and careful construction of the prediction model through the tuned 2D-CNN classifier. We believe that our research work will provide a potential insight into a further prediction of S-sulfenylation characteristics and functionalities. Thus, we hope that our developed predictor will significantly helpful for large scale discrimination of unknown SC-sites in particular and designing new pharmaceutical drugs in general.

Download Full-text

Knowledge Transfer with Weighted Adversarial Network for Cold-Start Store Site Recommendation

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3442203 ◽

2021 ◽

Vol 15 (3) ◽

pp. 1-27

Author(s):

Yan Liu ◽

Bin Guo ◽

Daqing Zhang ◽

Djamal Zeghlache ◽

Jingmin Chen ◽

...

Keyword(s):

Large Scale ◽

Cold Start ◽

Weighting Scheme ◽

Training Data ◽

Chain Store ◽

Useful Knowledge ◽

Adversarial Network ◽

Brick And Mortar ◽

New City ◽

Learning Machine

Store site recommendation aims to predict the value of the store at candidate locations and then recommend the optimal location to the company for placing a new brick-and-mortar store. Most existing studies focus on learning machine learning or deep learning models based on large-scale training data of existing chain stores in the same city. However, the expansion of chain enterprises in new cities suffers from data scarcity issues, and these models do not work in the new city where no chain store has been placed (i.e., cold-start problem). In this article, we propose a unified approach for cold-start store site recommendation, Weighted Adversarial Network with Transferability weighting scheme (WANT), to transfer knowledge learned from a data-rich source city to a target city with no labeled data. In particular, to promote positive transfer, we develop a discriminator to diminish distribution discrepancy between source city and target city with different data distributions, which plays the minimax game with the feature extractor to learn transferable representations across cities by adversarial learning. In addition, to further reduce the risk of negative transfer, we design a transferability weighting scheme to quantify the transferability of examples in source city and reweight the contribution of relevant source examples to transfer useful knowledge. We validate WANT using a real-world dataset, and experimental results demonstrate the effectiveness of our proposed model over several state-of-the-art baseline models.

Download Full-text

PowerSystems.jl — A power system data management package for large scale modeling

SoftwareX ◽

10.1016/j.softx.2021.100747 ◽

2021 ◽

Vol 15 ◽

pp. 100747

Author(s):

José Daniel Lara ◽

Clayton Barrows ◽

Daniel Thom ◽

Dheepak Krishnamurthy ◽

Duncan Callaway

Keyword(s):

Power System ◽

Data Management ◽

Large Scale ◽

Scale Modeling ◽

System Data ◽

Large Scale Modeling

Download Full-text

Classification of Very-High-Spatial-Resolution Aerial Images Based on Multiscale Features with Limited Semantic Information

Remote Sensing ◽

10.3390/rs13030364 ◽

2021 ◽

Vol 13 (3) ◽

pp. 364

Author(s):

Han Gao ◽

Jinhui Guo ◽

Peng Guo ◽

Xiuwan Chen

Keyword(s):

Deep Learning ◽

Land Cover ◽

Spatial Resolution ◽

Large Scale ◽

High Spatial Resolution ◽

Training Data ◽

Aerial Images ◽

Rural Landscapes ◽

Feature Representations ◽

Object Based

Recently, deep learning has become the most innovative trend for a variety of high-spatial-resolution remote sensing imaging applications. However, large-scale land cover classification via traditional convolutional neural networks (CNNs) with sliding windows is computationally expensive and produces coarse results. Additionally, although such supervised learning approaches have performed well, collecting and annotating datasets for every task are extremely laborious, especially for those fully supervised cases where the pixel-level ground-truth labels are dense. In this work, we propose a new object-oriented deep learning framework that leverages residual networks with different depths to learn adjacent feature representations by embedding a multibranch architecture in the deep learning pipeline. The idea is to exploit limited training data at different neighboring scales to make a tradeoff between weak semantics and strong feature representations for operational land cover mapping tasks. We draw from established geographic object-based image analysis (GEOBIA) as an auxiliary module to reduce the computational burden of spatial reasoning and optimize the classification boundaries. We evaluated the proposed approach on two subdecimeter-resolution datasets involving both urban and rural landscapes. It presented better classification accuracy (88.9%) compared to traditional object-based deep learning methods and achieves an excellent inference time (11.3 s/ha).

Download Full-text

Gravity Control-Based Data Augmentation Technique for Improving VR User Activity Recognition

Symmetry ◽

10.3390/sym13050845 ◽

2021 ◽

Vol 13 (5) ◽

pp. 845

Author(s):

Dongheun Han ◽

Chulwoo Lee ◽

Hyeongyeop Kang

Keyword(s):

Activity Recognition ◽

Large Scale ◽

Data Augmentation ◽

Training Data ◽

Measurement Unit ◽

Gravitational Acceleration ◽

The Neural Network ◽

Typical Data ◽

Robust Recognition ◽

Gravity Acceleration

The neural-network-based human activity recognition (HAR) technique is being increasingly used for activity recognition in virtual reality (VR) users. The major issue of a such technique is the collection large-scale training datasets which are key for deriving a robust recognition model. However, collecting large-scale data is a costly and time-consuming process. Furthermore, increasing the number of activities to be classified will require a much larger number of training datasets. Since training the model with a sparse dataset can only provide limited features to recognition models, it can cause problems such as overfitting and suboptimal results. In this paper, we present a data augmentation technique named gravity control-based augmentation (GCDA) to alleviate the sparse data problem by generating new training data based on the existing data. The benefits of the symmetrical structure of the data are that it increased the number of data while preserving the properties of the data. The core concept of GCDA is two-fold: (1) decomposing the acceleration data obtained from the inertial measurement unit (IMU) into zero-gravity acceleration and gravitational acceleration, and augmenting them separately, and (2) exploiting gravity as a directional feature and controlling it to augment training datasets. Through the comparative evaluations, we validated that the application of GCDA to training datasets showed a larger improvement in classification accuracy (96.39%) compared to the typical data augmentation methods (92.29%) applied and those that did not apply the augmentation method (85.21%).

Download Full-text