Sample Data Sets and Matlab Code

2017 ◽

Vol 4 (1) ◽

pp. 41-52

Author(s):

Dedy Loebis

Keyword(s):

Kalman Filtering ◽

Statistical Approach ◽

Water Systems ◽

Data Sets ◽

Pressure Data ◽

Burst Event ◽

Sample Data ◽

Other Information ◽

Supply Systems ◽

Frequency Features

This paper presents the results of work undertaken to develop and test contrasting data analysis approaches for the detection of bursts/leaks and other anomalies within wate r supply systems at district meter area (DMA)level. This was conducted for Yorkshire Water (YW) sample data sets from the Harrogate and Dales (H&D), Yorkshire, United Kingdom water supply network as part of Project NEPTUNE EP/E003192/1 ). A data analysissystem based on Kalman filtering and statistical approach has been developed. The system has been applied to the analysis of flow and pressure data. The system was proved for one dataset case and have shown the ability to detect anomalies in flow and pres sure patterns, by correlating with other information. It will be shown that the Kalman/statistical approach is a promising approach at detecting subtle changes and higher frequency features, it has the potential to identify precursor features and smaller l eaks and hence could be useful for monitoring the development of leaks, prior to a large volume burst event.

Download Full-text

Full-Coverage Film Cooling—Part II: Heat Transfer Data and Numerical Simulation

Journal of Engineering for Power ◽

10.1115/1.3230335 ◽

1980 ◽

Vol 102 (4) ◽

pp. 1006-1012 ◽

Cited By ~ 14

Author(s):

M. E. Crawford ◽

W. M. Kays ◽

R. J. Moffat

Keyword(s):

Heat Transfer ◽

Film Cooling ◽

Numerical Procedure ◽

Stanton Number ◽

Data Sets ◽

Blowing Ratio ◽

Diameter Hole ◽

Full Coverage ◽

Sample Data ◽

Length Data

Experimental research into heat transfer from full-coverage film-cooled surfaces with three injection geometries was described in Part I. This part has two objectives. The first is to present a simple numerical procedure for simulation of heat transfer with full-coverage film cooling. The second objective is to present some of the Stanton number data that was used in Part I of the paper. The data chosen for presentation are the low-Reynolds number, heated-starting-length data for the three injection geometries with five-diameter hole spacing. Sample data sets with high blowing ratio and with ten-diameter hole spacing are also presented. The numerical procedure has been successfully applied to the Stanton number data sets.

Download Full-text

Stamping Plant 4.0 – Basics for the Application of Data Mining Methods in Manufacturing Car Body Parts

Key Engineering Materials ◽

10.4028/www.scientific.net/kem.639.21 ◽

2015 ◽

Vol 639 ◽

pp. 21-30 ◽

Cited By ~ 7

Author(s):

Stephan Purr ◽

Josef Meinhardt ◽

Arnulf Lipp ◽

Axel Werner ◽

Martin Ostermair ◽

...

Keyword(s):

Data Mining ◽

Data Acquisition ◽

Data Driven ◽

Quality Analysis ◽

Process Conditions ◽

Data Sets ◽

Body Parts ◽

Car Body ◽

Sample Data ◽

Mining Methods

Data-driven quality evaluation in the stamping process of car body parts is quite promising because dependencies in the process have not yet been sufficiently researched. However, the application of data mining methods for the process in stamping plants would require a large number of sample data sets. Today, acquiring these data represents a major challenge, because the necessary data are inadequately measured, recorded or stored. Thus, the preconditions for the sample data acquisition must first be created before being able to investigate any correlations. In addition, the process conditions change over time due to wear mechanisms. Therefore, the results do not remain valid and a constant data acquisition is required. In this publication, the current situation in stamping plants regarding the process robustness will be first discussed and the need for data-driven methods will be shown. Subsequently, the state of technology regarding the possibility of collecting the sample data sets for quality analysis in producing car body parts will be researched. At the end of this work, an overview will be provided concerning how this data collection was implemented at BMW as well as what kind of potential can be expected.

Download Full-text

Introduction to special section: New Views of the Moon II, a series of papers related to the lunar science initiative “New views of the moon enabled by combined remotely sensed and lunar sample data sets”

Journal of Geophysical Research Atmospheres ◽

10.1029/2000je001312 ◽

2000 ◽

Vol 105 (E8) ◽

pp. 20275-20276 ◽

Cited By ~ 1

Author(s):

Bradley L. Jolliff

Keyword(s):

Special Section ◽

Remotely Sensed ◽

Data Sets ◽

The Moon ◽

Lunar Sample ◽

Lunar Science ◽

Sample Data ◽

Science Initiative

Download Full-text

Bayesian Classification of Microbial Communities Based on 16S rRNA Metagenomic Data

10.1101/340653 ◽

2018 ◽

Cited By ~ 1

Author(s):

Arghavan Bahadorinejad ◽

Ivan Ivanov ◽

Johanna W Lampe ◽

Meredith AJ Hullar ◽

Robert S Chapkin ◽

...

Keyword(s):

16S Rrna ◽

Sample Size ◽

Microbial Communities ◽

State Of The Art ◽

Metagenomic Data ◽

Data Sets ◽

Sequencing Data ◽

Sample Data

AbstractWe propose a Bayesian method for the classification of 16S rRNA metagenomic profiles of bacterial abundance, by introducing a Poisson-Dirichlet-Multinomial hierarchical model for the sequencing data, constructing a prior distribution from sample data, calculating the posterior distribution in closed form; and deriving an Optimal Bayesian Classifier (OBC). The proposed algorithm is compared to state-of-the-art classification methods for 16S rRNA metagenomic data, including Random Forests and the phylogeny-based Metaphyl algorithm, for varying sample size, classification difficulty, and dimensionality (number of OTUs), using both synthetic and real metagenomic data sets. The results demonstrate that the proposed OBC method, with either noninformative or constructed priors, is competitive or superior to the other methods. In particular, in the case where the ratio of sample size to dimensionality is small, it was observed that the proposed method can vastly outperform the others.Author summaryRecent studies have highlighted the interplay between host genetics, gut microbes, and colorectal tumor initiation/progression. The characterization of microbial communities using metagenomic profiling has therefore received renewed interest. In this paper, we propose a method for classification, i.e., prediction of different outcomes, based on 16S rRNA metagenomic data. The proposed method employs a Bayesian approach, which is suitable for data sets with small ration of number of available instances to the dimensionality. Results using both synthetic and real metagenomic data show that the proposed method can outperform other state-of-the-art metagenomic classification algorithms.

Download Full-text

Highly accurate long-read HiFi sequencing data for five complex genomes

Scientific Data ◽

10.1038/s41597-020-00743-4 ◽

2020 ◽

Vol 7 (1) ◽

Author(s):

Ting Hon ◽

Kristin Mars ◽

Greg Young ◽

Yu-Chih Tsai ◽

Joseph W. Karalius ◽

...

Keyword(s):

Sequence Data ◽

Genome Structure ◽

Data Sets ◽

Sequencing Data ◽

Complex Samples ◽

Bioinformatic Tools ◽

Long Reads ◽

Sequencing Method ◽

Sample Data ◽

Long Read

AbstractThe PacBio® HiFi sequencing method yields highly accurate long-read sequencing datasets with read lengths averaging 10–25 kb and accuracies greater than 99.5%. These accurate long reads can be used to improve results for complex applications such as single nucleotide and structural variant detection, genome assembly, assembly of difficult polyploid or highly repetitive genomes, and assembly of metagenomes. Currently, there is a need for sample data sets to both evaluate the benefits of these long accurate reads as well as for development of bioinformatic tools including genome assemblers, variant callers, and haplotyping algorithms. We present deep coverage HiFi datasets for five complex samples including the two inbred model genomes Mus musculus and Zea mays, as well as two complex genomes, octoploid Fragaria × ananassa and the diploid anuran Rana muscosa. Additionally, we release sequence data from a mock metagenome community. The datasets reported here can be used without restriction to develop new algorithms and explore complex genome structure and evolution. Data were generated on the PacBio Sequel II System.

Download Full-text

Sample Data Sets

Cross-Cultural Research Methods in Psychology ◽

10.1017/cbo9780511779381.016 ◽

2012 ◽

pp. 346-347

Keyword(s):

Data Sets ◽

Sample Data

Download Full-text

SWSPM: A Novel Alignment-Free DNA Comparison Method Based on Signal Processing Approaches

Evolutionary Bioinformatics ◽

10.1177/1176934319849071 ◽

2019 ◽

Vol 15 ◽

pp. 117693431984907 ◽

Cited By ~ 1

Author(s):

Tomáš Farkaš ◽

Jozef Sitarčík ◽

Broňa Brejová ◽

Mária Lucká

Keyword(s):

Signal Processing ◽

Large Scale ◽

Distance Measure ◽

Comparison Method ◽

Communication Overhead ◽

Data Sets ◽

Sliding Windows ◽

Current State ◽

Alignment Free ◽

Sample Data

Computing similarity between 2 nucleotide sequences is one of the fundamental problems in bioinformatics. Current methods are based mainly on 2 major approaches: (1) sequence alignment, which is computationally expensive, and (2) faster, but less accurate, alignment-free methods based on various statistical summaries, for example, short word counts. We propose a new distance measure based on mathematical transforms from the domain of signal processing. To tolerate large-scale rearrangements in the sequences, the transform is computed across sliding windows. We compare our method on several data sets with current state-of-art alignment-free methods. Our method compares favorably in terms of accuracy and outperforms other methods in running time and memory requirements. In addition, it is massively scalable up to dozens of processing units without the loss of performance due to communication overhead. Source files and sample data are available at https://bitbucket.org/fiitstubioinfo/swspm/src

Download Full-text

Sampling Adequacy and the Semantic Differential

Psychological Reports ◽

10.2466/pr0.1980.47.2.351 ◽

1980 ◽

Vol 47 (2) ◽

pp. 351-357 ◽

Cited By ~ 1

Author(s):

Charles D. Dziuban ◽

Edwin C. Shirkey

Keyword(s):

Semantic Differential ◽

Data Sets ◽

Considerable Variability ◽

Correlation Matrices ◽

Factor Analytic ◽

Total Correlation ◽

Sample Data ◽

The Individual ◽

Psychometric Quality

Version Two of the Kaiser Measures of Sampling Adequacy was derived for a typical six-concept Semantic Differential. The over-all indices indicated that both concept and total correlation matrices would lead to comparable decisions regarding the psychometric quality of the sample data sets. The individual measures, however, showed considerable variability for some scales, placing several in a range which would make them suspect psycho-metrically. It was recommended that the concept of psychometric adequacy be used in determining the efficacy of one's Semantic Differential data for factor analytic procedures.

Download Full-text

Technical Note: Review of methods for linear least-squares fitting of data and application to atmospheric chemistry problems

Atmospheric Chemistry and Physics Discussions ◽

10.5194/acpd-8-6409-2008 ◽

2008 ◽

Vol 8 (2) ◽

pp. 6409-6436 ◽

Cited By ~ 7

Author(s):

C. A. Cantrell

Keyword(s):

Least Squares ◽

Atmospheric Chemistry ◽

Least Squares Method ◽

Technical Note ◽

Routine Practice ◽

Data Sets ◽

Straight Line ◽

Sample Data ◽

Straight Lines ◽

Best Fit

Abstract. The representation of data, whether geophysical observations, numerical model output or laboratory results, by a best fit straight line is a routine practice in the geosciences and other fields. While the literature is full of detailed analyses of procedures for fitting straight lines to values with uncertainties, a surprising number of scientists blindly use the standard least squares method, such as found on calculators and in spreadsheet programs, that assumes no uncertainties in the x values. Here, the available procedures for estimating the best fit straight line to data, including those applicable to situations for uncertainties present in both the x and y variables, are reviewed. Representative methods that are presented in the literature for bivariate weighted fits are compared using several sample data sets, and guidance is presented as to when the somewhat more involved iterative methods are required, or when the standard least-squares procedure would be expected to be satisfactory. A spreadsheet-based template is made available that employs one method for bivariate fitting.

Download Full-text

Sample Data Sets and Matlab Code

Delivering Sustainable Water Systems Using Kalman Filter and Statistical Approach

Full-Coverage Film Cooling—Part II: Heat Transfer Data and Numerical Simulation

Stamping Plant 4.0 – Basics for the Application of Data Mining Methods in Manufacturing Car Body Parts

Introduction to special section: New Views of the Moon II, a series of papers related to the lunar science initiative “New views of the moon enabled by combined remotely sensed and lunar sample data sets”

Bayesian Classification of Microbial Communities Based on 16S rRNA Metagenomic Data

Highly accurate long-read HiFi sequencing data for five complex genomes

Sample Data Sets

SWSPM: A Novel Alignment-Free DNA Comparison Method Based on Signal Processing Approaches

Sampling Adequacy and the Semantic Differential

Technical Note: Review of methods for linear least-squares fitting of data and application to atmospheric chemistry problems

Export Citation Format