An evaluation of transcriptome-based exon capture for frog phylogenomics across multiple scales of divergence (Class: Amphibia, Order: Anura)

Mapping Intimacies ◽

10.1101/031468 ◽

2015 ◽

Cited By ~ 2

Author(s):

Daniel Portik ◽

Lydia Smith ◽

Ke Bi

Keyword(s):

Multiple Scales ◽

Model Organisms ◽

Probe Design ◽

Phylogenetic Distance ◽

Sequence Alignments ◽

Data Set ◽

Variable Regions ◽

Large Sets ◽

Exon Capture ◽

Sensitivity Specificity

Custom sequence capture experiments are becoming an efficient approach for gathering large sets of orthologous markers with targeted levels of informativeness in non-model organisms. Transcriptome-based exon capture utilizes transcript sequences to design capture probes, often with the aid of a reference genome to identify intron-exon boundaries and exclude shorter exons (< 200 bp). Here, we test an alternative approach that directly uses transcript sequences for probe design, which are often composed of multiple exons of varying lengths. Based on a selection of 1,260 orthologous transcripts, we conducted sequence captures across multiple phylogenetic scales for frogs, including species up to ~100 million years divergent from the focal group. After several conservative filtering steps, we recovered a large phylogenomic data set consisting of sequence alignments for 1,047 of the 1,260 transcriptome-based loci (~630,000 bp) and a large quantity of highly variable regions flanking the exons in transcripts (~70,000 bp). We recovered high numbers of both shorter (< 100 bp) and longer exons (> 200 bp), with no major reduction in coverage towards the ends of exons. We observed significant differences in the performance of blocking oligos for target enrichment and non-target depletion during captures, and observed differences in PCR duplication rates that can be attributed to the number of individuals pooled for capture reactions. We explicitly tested the effects of phylogenetic distance on capture sensitivity, specificity, and missing data, and provide a baseline estimate of expectations for these metrics based on nuclear pairwise differences among samples. We provide recommendations for transcriptome-based exon capture design based on our results, and describe multiple pipelines for data assembly and analysis.

Minimal clustering and species delimitation based on multi-locus alignments vs SNPs: the case of the Seriphium plumosum L. complex (Gnaphalieae: Asteraceae)

10.1101/2021.03.21.436318 ◽

2021 ◽

Author(s):

Zaynab Shaik ◽

Nicola Georgina Bergh ◽

Bengt Oxelman ◽

Anthony George Verboom

Keyword(s):

Species Delimitation ◽

Bayes Factor ◽

Cape Floristic Region ◽

Parametric Models ◽

Western Cape ◽

Nucleotide Polymorphisms ◽

Sequence Alignments ◽

Separate Species ◽

Multiple Sequence ◽

Data Set

We applied species delimitation methods based on the Multi-Species Coalescent (MSC) model to 500+ loci derived from genotyping-by-sequencing on the South African Seriphium plumosum (Asteraceae) species complex. The loci were represented either as multiple sequence alignments or single nucleotide polymorphisms (SNPs), and analysed by the STACEY and Bayes Factor Delimitation (BFD)/SNAPP methods, respectively. Both methods supported species taxonomies where virtually all of the 32 sampled individuals, each representing its own geographical population, were identified as separate species. Computational efforts required to achieve adequate mixing of MCMC chains were considerable, and the species/minimal cluster trees identified similar strongly supported clades in replicate runs. The resolution was, however, higher in the STACEY trees than in the SNAPP trees, which is consistent with the higher information content of full sequences. The computational efficiency, measured as effective sample sizes of likelihood and posterior estimates per time unit, was consistently higher for STACEY. A random subset of 56 alignments had similar resolution to the 524-locus SNP data set. The STRUCTURE-like sparse Non-negative Matrix Factorisation (sNMF) method was applied to six individuals from each of 48 geographical populations and 28023 SNPs. Significantly fewer (13) clusters were identified as optimal by this analysis compared to the MSC methods. The sNMF clusters correspond closely to clades consistently supported by MSC methods, and showed evidence of admixture, especially in the western Cape Floristic Region. We discuss the significance of these findings, and conclude that it is important to a priori consider the kind of species one wants to identify when using genome-scale data, the assumptions behind the parametric models applied, and the potential consequences of model violations may have.

Investigating the clinical usefulness of definitions of progression with 10-2 visual field

British Journal of Ophthalmology ◽

10.1136/bjophthalmol-2020-318188 ◽

2021 ◽

pp. bjophthalmol-2020-318188

Author(s):

Shotaro Asano ◽

Hiroshi Murata ◽

Yuri Fujino ◽

Takehiro Yamashita ◽

Atsuya Miki ◽

...

Keyword(s):

Visual Field ◽

False Positive ◽

False Positive Rate ◽

Clinical Validity ◽

Data Set ◽

Humphrey Field Analyzer ◽

Positive Rate ◽

Using Data ◽

Sensitivity Specificity ◽

Higher Sensitivity

Background/AimTo investigate the clinical validity of the Guided Progression Analysis definition (GPAD) and cluster-based definition (CBD) with the Humphrey Field Analyzer 10-2 test in diagnosing glaucomatous visual field (VF) progression, and to introduce a novel definition with optimised specificity by combining the ‘any-location’ and ‘cluster-based’ approaches (hybrid definition).Methods64 400 stable glaucomatous VFs were simulated from 664 pairs of 10-2 tests (10 sets × 10 VF series × 664 eyes; data set 1). Using these simulated VFs, the specificity to detect progression and the effects of changing the parameters (number of test locations or consecutive VF tests, and percentile cut-off values) were investigated. The hybrid definition was designed as the combination where the specificity was closest to 95.0%. Subsequently, another 5000 actual glaucomatous 10-2 tests from 500 eyes (10 VFs each) were collected (data set 2), and their accuracy (sensitivity, specificity and false positive rate) and the time needed to detect VF progression were evaluated.ResultsThe specificity values calculated using data set 1 with GPAD and CBD were 99.6% and 99.8%. Using data set 2, the hybrid definition had a higher sensitivity than GPAD and CBD, without detriment to the specificity or false positive rate. The hybrid definition also detected progression significantly earlier than GPAD and CBD (at 3.1 years vs 4.2 years and 4.1 years, respectively).ConclusionsGPAD and CBD had specificities of 99.6% and 99.8%, respectively. A novel hybrid definition (with a specificity of 95.5%) had higher sensitivity and enabled earlier detection of progression.

Technical note: Multiple wavelet coherence for untangling scale-specific and localized multivariate relationships in geosciences

Hydrology and Earth System Sciences ◽

10.5194/hess-20-3183-2016 ◽

2016 ◽

Vol 20 (8) ◽

pp. 3183-3191 ◽

Cited By ~ 22

Author(s):

Wei Hu ◽

Bing Cheng Si

Keyword(s):

Multiple Scales ◽

Free Water ◽

Wavelet Coherence ◽

Predictor Variables ◽

Water Evaporation ◽

Multivariate Methods ◽

Spectral Coherence ◽

Data Set ◽

Multivariate Relationships ◽

Multiple Wavelet Coherence

Abstract. The scale-specific and localized bivariate relationships in geosciences can be revealed using bivariate wavelet coherence. The objective of this study was to develop a multiple wavelet coherence method for examining scale-specific and localized multivariate relationships. Stationary and non-stationary artificial data sets, generated with the response variable as the summation of five predictor variables (cosine waves) with different scales, were used to test the new method. Comparisons were also conducted using existing multivariate methods, including multiple spectral coherence and multivariate empirical mode decomposition (MEMD). Results show that multiple spectral coherence is unable to identify localized multivariate relationships, and underestimates the scale-specific multivariate relationships for non-stationary processes. The MEMD method was able to separate all variables into components at the same set of scales, revealing scale-specific relationships when combined with multiple correlation coefficients, but has the same weakness as multiple spectral coherence. However, multiple wavelet coherences are able to identify scale-specific and localized multivariate relationships, as they are close to 1 at multiple scales and locations corresponding to those of predictor variables. Therefore, multiple wavelet coherence outperforms other common multivariate methods. Multiple wavelet coherence was applied to a real data set and revealed the optimal combination of factors for explaining temporal variation of free water evaporation at the Changwu site in China at multiple scale-location domains. Matlab codes for multiple wavelet coherence were developed and are provided in the Supplement.

Gaze Amplifies Value in Decision Making

Psychological Science ◽

10.1177/0956797618810521 ◽

2018 ◽

Vol 30 (1) ◽

pp. 116-128 ◽

Cited By ~ 20

Author(s):

Stephanie M. Smith ◽

Ian Krajbich

Keyword(s):

Response Times ◽

Data Sets ◽

Choice Data ◽

Data Set ◽

Choice Process ◽

The Gaze ◽

Large Sets ◽

Subjective Value ◽

Amplifying Effect ◽

Familiar Stimuli

When making decisions, people tend to choose the option they have looked at more. An unanswered question is how attention influences the choice process: whether it amplifies the subjective value of the looked-at option or instead adds a constant, value-independent bias. To address this, we examined choice data from six eye-tracking studies ( Ns = 39, 44, 44, 36, 20, and 45, respectively) to characterize the interaction between value and gaze in the choice process. We found that the summed values of the options influenced response times in every data set and the gaze-choice correlation in most data sets, in line with an amplifying role of attention in the choice process. Our results suggest that this amplifying effect is more pronounced in tasks using large sets of familiar stimuli, compared with tasks using small sets of learned stimuli.

Exon capture optimization in large-genome amphibians

10.1101/021253 ◽

2015 ◽

Cited By ~ 1

Author(s):

Evan McCartney-Melstad ◽

Genevieve G. Mount ◽

H. Bradley Shaffer

Keyword(s):

Model Organisms ◽

Tiger Salamander ◽

Illumina Hiseq ◽

F1 Hybrid ◽

Large Genome ◽

Sequence Capture ◽

Exon Capture ◽

Target Capture ◽

Individual Input ◽

Enrichment Efficiency

Background Gathering genomic-scale data efficiently is challenging for non-model species with large, complex genomes. Transcriptome sequencing is accessible for even large-genome organisms, and sequence capture probes can be designed from such mRNA sequences to enrich and sequence exonic regions. Maximizing enrichment efficiency is important to reduce sequencing costs, but, relatively little data exist for exon capture experiments in large-genome non-model organisms. Here, we conducted a replicated factorial experiment to explore the effects of several modifications to standard protocols that might increase sequence capture efficiency for large-genome amphibians. Methods We enriched 53 genomic libraries from salamanders for a custom set of 8,706 exons under differing conditions. Libraries were prepared using pools of DNA from 3 different salamanders with approximately 30 gigabase genomes: California tiger salamander (Ambystoma californiense), barred tiger salamander (Ambystoma mavortium), and an F1 hybrid between the two. We enriched libraries using different amounts of c0t-1 blocker, individual input DNA, and total reaction DNA. Enriched libraries were sequenced with 150 bp paired-end reads on an Illumina HiSeq 2500, and the efficiency of target enrichment was quantified using unique read mapping rates and average depth across targets. The different enrichment treatments were evaluated to determine if c0t-1 and input DNA significantly impact enrichment efficiency in large-genome amphibians. Results Increasing the amounts of c0t-1 and individual input DNA both reduce the rates of PCR duplication. This reduction led to an increase in the percentage of unique reads mapping to target sequences, essentially doubling overall efficiency of the target capture from 10.4% to nearly 19.9%. We also found that post-enrichment DNA concentrations and qPCR enrichment verification were useful for predicting the success of enrichment. Conclusions Increasing the amount of individual sample input DNA and the amount of c0t-1 blocker both increased the efficiency of target capture in large-genome salamanders. By reducing PCR duplication rates, the number of unique reads mapping to targets increased, making target capture experiments more efficient and affordable. Our results indicate that target capture protocols can be modified to efficiently screen large-genome vertebrate taxa including amphibians.

Clustering Bathymetric Data for Electronic Navigational Charts

Journal of Navigation ◽

10.1017/s0373463316000035 ◽

2016 ◽

Vol 69 (5) ◽

pp. 1143-1153 ◽

Cited By ~ 24

Author(s):

Marta Wlodarczyk–Sielicka ◽

Andrzej Stateczny

Keyword(s):

Clustering Algorithm ◽

Search Algorithm ◽

Clustering Algorithms ◽

Data Set ◽

Bathymetric Data ◽

Large Sets ◽

Analysis Of Results ◽

Comparison And Analysis ◽

Self Organising Map ◽

Source Of Information

An electronic navigational chart is a major source of information for the navigator. The component that contributes most significantly to the safety of navigation on water is the information on the depth of an area. For the purposes of this article, the authors use data obtained by the interferometric sonar GeoSwath Plus. The data were collected in the area of the Port of Szczecin. The samples constitute large sets of data. Data reduction is a procedure to reduce the size of a data set to make it easier and more effective to analyse. The main objective of the authors is the compilation of a new reduction algorithm for bathymetric data. The clustering of data is the first part of the search algorithm. The next step consists of generalisation of bathymetric data. This article presents a comparison and analysis of results of clustering bathymetric data using the following selected methods:K-means clustering algorithm, traditional hierarchical clustering algorithms and self-organising map (using artificial neural networks).

A Method to Detect Differential Gene Expression in Cross-Species Hybridization Experiments at Gene and Probe Level

Biomedical Informatics Insights ◽

10.4137/bii.s3846 ◽

2010 ◽

Vol 3 ◽

pp. BII.S3846 ◽

Cited By ~ 1

Author(s):

Ying Chen ◽

Rebekah Wu ◽

James Felton ◽

David M. Rocke ◽

Anu Chakicherla

Keyword(s):

Gene Expression ◽

Gene Set Analysis ◽

Model Organisms ◽

Whole Genome ◽

Supplementary Data ◽

Genome Sequences ◽

Data Set ◽

Gene Set ◽

Level Data ◽

Species Hybridization

Motivation Whole genome microarrays are increasingly becoming the method of choice to study responses in model organisms to disease, stressors or other stimuli. However, whole genome sequences are available for only some model organisms, and there are still many species whose genome sequences are not yet available. Cross-species studies, where arrays developed for one species are used to study gene expression in a closely related species, have been used to address this gap, with some promising results. Current analytical methods have included filtration of some probes or genes that showed low hybridization activities. But consensus filtration schemes are still not available. Results A novel masking procedure is proposed based on currently available target species sequences to filter out probes and study a cross-species data set using this masking procedure and gene-set analysis. Gene-set analysis evaluates the association of some priori defined gene groups with a phenotype of interest. Two methods, Gene Set Enrichment Analysis (GSEA) and Test of Test Statistics (ToTS) were investigated. The results showed that masking procedure together with ToTS method worked well in our data set. The results from an alternative way to study cross-species hybridization experiments without masking are also presented. We hypothesize that the multi-probes structure of Affymetrix microarrays makes it possible to aggregate the effects of both well-hybridized and poorly-hybridized probes to study a group of genes. The principles of gene-set analysis were applied to the probe-level data instead of gene-level data. The results showed that ToTS can give valuable information and thus can be used as a powerful technique for analyzing cross-species hybridization experiments. Availability Software in the form of R code is available at http://anson.ucdavis.edu/~ychen/cross-species.html Supplementary Data Supplementary data are available at http://anson.ucdavis.edu/~ychen/cross-species.html

A Multiscale Specimen-Specific Data Set to Enable Comprehensive Modeling and Simulation of the Tibiofemoral Joint

ASME 2013 Conference on Frontiers in Medical Devices: Applications of Computer Modeling and Simulation ◽

10.1115/fmd2013-16117 ◽

2013 ◽

Author(s):

Snehal Chokhandre ◽

Ahmet Erdemir

Keyword(s):

Clinical Decision Making ◽

Mechanical Response ◽

Multiple Scales ◽

Treatment Options ◽

Complex Structure ◽

Clinical Decision ◽

Fe Analysis ◽

Data Set ◽

Tibiofemoral Joint ◽

Joint Biomechanics

The tibiofemoral joint is a complex structure and its overall mechanical response is dictated by its numerous substructures at both macro and micro levels. An in-depth understanding of the mechanics of the joint is necessary to develop preventative measures and treatment options for pathological conditions and common injuries. Finite element (FE) analysis is a widely used tool in joint biomechanics studies focused on understanding the underlying mechanical behavior at joint, tissue and cell levels [1]. Studies, regardless of their purpose (descriptive or predictive), when employing FE analysis, require anatomical and mechanical data at single or multiple scales. It is also critical that FE representations are validated and closely represent specifics of the joint of interest, anatomically and mechanically. This is an utmost need if these models are intended to be used to support clinical decision making (in surgery or for rehabilitation) and for the development of implants.

Effects of Cooling System Operations on Withdrawal for Thermoelectric Power

Volume 1: Boilers and Heat Recovery Steam Generator; Combustion Turbines; Energy Water Sustainability; Fuels, Combustion and Material Handling; Heat Exchangers, Condensers, Cooling Systems, and Balance-of-Plant ◽

10.1115/power-icope2017-3763 ◽

2017 ◽

Author(s):

Zachary Clement ◽

Fletcher Fields ◽

Diana Bauer ◽

Vincent Tidwell ◽

Calvin Ray Shaneyfelt ◽

...

Keyword(s):

Water Use ◽

Power Plants ◽

Multiple Scales ◽

Cooling System ◽

Detailed Examination ◽

Water Withdrawal ◽

Energy Information Administration ◽

Data Set ◽

Wear And Tear ◽

Plant Configuration

A new dataset released by the Energy Information Administration (EIA) — which combines water withdrawal, electricity generation, and plant configuration data into a single database — enables detailed examination of cooling system operation at thermoelectric plants at multiple scales, most importantly at the unit level. This dataset was used to explore operations across the population of U.S. thermoelectric plants, leading to the conclusion that roughly 32% of all thermoelectric water withdrawal occurs while power plants are not generating electricity. Based on interviews with industry representatives, a unit’s location on the dispatch curve will largely dictate how the cooling system is operated. Peaking plants and intermediate plants might keep their cooling system running to maintain dispatchability. Other considerations include minimizing wear and tear on the pumps and controlling water chemistry. This observation has implications for understanding water use at thermoelectric plants, policy analysis, and modeling. Previous studies have estimated water use as a function of cooling technology, fuel type, prime mover, pollution controls, and ambient climate (1) or by calculating the amount of water that is thermodynamically necessary for cooling (2). This, however, does not capture all the water a plant is withdrawing simply to maintain dispatchability. This paper uses the new data set from EIA and interviews with plant operators to illuminate the role cooling systems operations play in determining the amount of water a plant withdraws.

Use of data mining techniques to classify length of stay of emergency department patients

Bio-Algorithms and Med-Systems ◽

10.1515/bams-2018-0044 ◽

2019 ◽

Vol 15 (1) ◽

Cited By ~ 1

Author(s):

Görkem Sariyer ◽

Ceren Öcal Taşar ◽

Gizem Ersoy Cepe

Keyword(s):

Length Of Stay ◽

Regression Tree ◽

Secondary Data ◽

Classification And Regression Tree ◽

Urban Hospital ◽

Data Set ◽

Random Forest Classification ◽

Forest Classification ◽

High Level ◽

Sensitivity Specificity

Abstract Emergency departments (EDs) are the largest departments of hospitals which encounter high variety of cases as well as high level of patient volumes. Thus, an efficient classification of those patients at the time of their registration is very important for the operations planning and management. Using secondary data from the ED of an urban hospital, we examine the significance of factors while classifying patients according to their length of stay. Random Forest, Classification and Regression Tree, Logistic Regression (LR), and Multilayer Perceptron (MLP) were adopted in the data set of July 2016, and these algorithms were tested in data set of August 2016. Besides adopting and testing the algorithms on the whole data set, patients in these sets were grouped into 21 based on the similarities in their diagnoses and the algorithms were also performed in these subgroups. Performances of the classifiers were evaluated based on the sensitivity, specificity, and accuracy. It was observed that sensitivity, specificity, and accuracy values of the classifiers were similar, where LR and MLP had somehow higher values. In addition, the average performance of the classifying patients within the subgroups outperformed the classifying based on the whole data set for each of the classifiers.