Training Data Sets Construction from Large Data Set for PCB Character
                    Recognition

Previous work using a large data set (no. 1, n = 5,355) of carcass sponge samples from three large-volume beef abattoirs highlighted the potential use of binary (present or absent) Enterobacteriaceae results for predicting the absence of Salmonella on carcasses. Specifically, the absence of Enterobacteriaceae was associated with the absence of Salmonella. We tested the accuracy of this predictive approach by using another large data set (no. 2, n = 2,163 carcasses sampled before or after interventions) from the same three data set no. 1 abattoirs over a later 7-month period. Similarly, the predictive approach was tested on smaller subsets from data set no. 2 (n = 1,087, and n = 405) and on a much smaller data set (no. 3, n = 100 postintervention carcasses) collected at a small-volume abattoir over 4 months. Of Enterobacteriaceae-negative data set no. 2 carcasses, >98% were Salmonella negative. Similarly accurate predictions were obtained in the two data subsets obtained from data set no. 2 and in data set no. 3. Of final postintervention carcass samples in data set nos. 2 and 3, 9 and 70%, respectively, were Enterobacteriaceae positive; mean Enterobacteriaceae values for the two data sets were −0.375, and 0.169 log CFU/100 cm2 (detection limit = −0.204, and Enterobacteriaceae negative assigned a value of −0.505 log CFU/100 cm2). Salmonella contamination rates for final postintervention beef carcasses in data set nos. 2 and 3 were 1.1 and 7.0%, respectively. Binary Enterobacteriaceae results may be useful in evaluating beef abattoir hygiene and intervention treatment efficacy.

Download Full-text

Deep residual detection of radio frequency interference for FAST

Monthly Notices of the Royal Astronomical Society ◽

10.1093/mnras/stz3521 ◽

2020 ◽

Vol 492 (1) ◽

pp. 1421-1431 ◽

Cited By ~ 4

Author(s):

Zhicheng Yang ◽

Ce Yu ◽

Jian Xiao ◽

Bo Zhang

Keyword(s):

Radio Frequency ◽

Large Data ◽

High Sensitivity ◽

Original Data ◽

Training Data ◽

Radio Frequency Interference ◽

Data Sets ◽

Data Set ◽

Time Required ◽

Key Steps

ABSTRACT Radio frequency interference (RFI) detection and excision are key steps in the data-processing pipeline of the Five-hundred-meter Aperture Spherical radio Telescope (FAST). Because of its high sensitivity and large data rate, FAST requires more accurate and efficient RFI flagging methods than its counterparts. In the last decades, approaches based upon artificial intelligence (AI), such as codes using convolutional neural networks (CNNs), have been proposed to identify RFI more reliably and efficiently. However, RFI flagging of FAST data with such methods has often proved to be erroneous, with further manual inspections required. In addition, network construction as well as preparation of training data sets for effective RFI flagging has imposed significant additional workloads. Therefore, rapid deployment and adjustment of AI approaches for different observations is impractical to implement with existing algorithms. To overcome such problems, we propose a model called RFI-Net. With the input of raw data without any processing, RFI-Net can detect RFI automatically, producing corresponding masks without any alteration of the original data. Experiments with RFI-Net using simulated astronomical data show that our model has outperformed existing methods in terms of both precision and recall. Besides, compared with other models, our method can obtain the same relative accuracy with fewer training data, thus reducing the effort and time required to prepare the training data set. Further, the training process of RFI-Net can be accelerated, with overfittings being minimized, compared with other CNN codes. The performance of RFI-Net has also been evaluated with observing data obtained by FAST and the Bleien Observatory. Our results demonstrate the ability of RFI-Net to accurately identify RFI with fine-grained, high-precision masks that required no further modification.

Download Full-text

Car-Following Described by Blending Data-Driven and Analytical Models: A Gaussian Process Regression Approach

Transportation Research Record Journal of the Transportation Research Board ◽

10.1177/03611981211032648 ◽

2021 ◽

pp. 036119812110326

Author(s):

Ignasi Echaniz Soldevila ◽

Victor L. Knoop ◽

Serge Hoogendoorn

Keyword(s):

Gaussian Process Regression ◽

Large Data ◽

Driving Behavior ◽

Large Data Sets ◽

Training Data ◽

Data Driven ◽

Data Sets ◽

Data Set ◽

Car Following ◽

New Variables

Traffic engineers rely on microscopic traffic models to design, plan, and operate a wide range of traffic applications. Recently, large data sets, yet incomplete and from small space regions, are becoming available thanks to technology improvements and governmental efforts. With this study we aim to gain new empirical insights into longitudinal driving behavior and to formulate a model which can benefit from these new challenging data sources. This paper proposes an application of an existing formulation, Gaussian process regression (GPR), to describe individual longitudinal driving behavior of drivers. The method integrates a parametric and a non-parametric mathematical formulation. The model predicts individual driver’s acceleration given a set of variables. It uses the GPR to make predictions when there exists correlation between new input and the training data set. The data-driven model benefits from a large training data set to capture all driver longitudinal behavior, which would be difficult to fit in fixed parametric equation(s). The methodology allows us to train models with new variables without the need of altering the model formulation. And importantly, the model also uses existing traditional parametric car-following models to predict acceleration when no similar situations are found in the training data set. A case study using radar data in an urban environment shows that a hybrid model performs better than parametric model alone and suggests that traffic light status over time influences drivers’ acceleration. This methodology can help engineers to use large data sets and to find new variables to describe traffic behavior.

Download Full-text

A Practical Robust and Efficient RBF Metamodel Method for Typical Engineering Problems

Volume 1: 34th Design Automation Conference, Parts A and B ◽

10.1115/detc2008-49994 ◽

2008 ◽

Cited By ~ 1

Author(s):

Xingjie Fang ◽

Liping Wang ◽

Don Beeson ◽

Gene Wiggs

Keyword(s):

Principal Component ◽

Large Data ◽

Large Data Sets ◽

Training Data ◽

Data Sets ◽

Dimensional Model ◽

Data Set ◽

Engineering Problems ◽

Processing Techniques ◽

Generalization Accuracy

Radial Basis Function (RBF) metamodels have recently attracted increased interest due to their significant advantages over other types of non-parametric metamodels. However, because of the interpolation nature of the RBF mathematics, the accuracy of the model may dramatically deteriorate if the training data set used contains duplicate information, noise or outliers. Also constructing the metamodel may be time consuming whenever the training data sets are large or a high dimensional model is required. In this paper, we propose a robust and efficient RBF metamodeling approach based on data pre-processing techniques that alleviate the accuracy and efficiency issues commonly encountered when RBF models are used in typical real engineering situations. These techniques include 1) the removal of duplicate training data information, 2) the generation of smaller uniformly distributed subsets of training data from large data sets and 3) the quantification and identification of outliers by principal component analysis (PCA) and Hotelling statistics. Simulation results are used to validate the generalization accuracy and efficiency of the proposed approach.

Download Full-text

Galaxy spin direction distribution in HST and SDSS show similar large-scale asymmetry

Publications of the Astronomical Society of Australia ◽

10.1017/pasa.2020.46 ◽

2020 ◽

Vol 37 ◽

Author(s):

Lior Shamir

Keyword(s):

Large Scale ◽

Spiral Galaxies ◽

Hubble Space Telescope ◽

Gravitational Interaction ◽

Large Data ◽

Sloan Digital Sky Survey ◽

Data Sets ◽

Dipole Axis ◽

Data Set ◽

The Asymmetry

Abstract Several recent observations using large data sets of galaxies showed non-random distribution of the spin directions of spiral galaxies, even when the galaxies are too far from each other to have gravitational interaction. Here, a data set of $\sim8.7\cdot10^3$ spiral galaxies imaged by Hubble Space Telescope (HST) is used to test and profile a possible asymmetry between galaxy spin directions. The asymmetry between galaxies with opposite spin directions is compared to the asymmetry of galaxies from the Sloan Digital Sky Survey. The two data sets contain different galaxies at different redshift ranges, and each data set was annotated using a different annotation method. The results show that both data sets show a similar asymmetry in the COSMOS field, which is covered by both telescopes. Fitting the asymmetry of the galaxies to cosine dependence shows a dipole axis with probabilities of $\sim2.8\sigma$ and $\sim7.38\sigma$ in HST and SDSS, respectively. The most likely dipole axis identified in the HST galaxies is at $(\alpha=78^{\rm o},\delta=47^{\rm o})$ and is well within the $1\sigma$ error range compared to the location of the most likely dipole axis in the SDSS galaxies with $z>0.15$ , identified at $(\alpha=71^{\rm o},\delta=61^{\rm o})$ .

Download Full-text

Climatology of nutrient distributions in the South China Sea based on a large data set derived from a new algorithm

Progress In Oceanography ◽

10.1016/j.pocean.2021.102586 ◽

2021 ◽

pp. 102586

Author(s):

Chuanjun Du ◽

Ruoying He ◽

Zhiyu Liu ◽

Tao Huang ◽

Lifang Wang ◽

...

Keyword(s):

South China Sea ◽

South China ◽

Large Data ◽

The South China Sea ◽

The South ◽

Data Set ◽

China Sea ◽

Large Data Set

Download Full-text

Spike detection: Inter-reader agreement and a statistical Turing test on a large data set

Clinical Neurophysiology ◽

10.1016/j.clinph.2016.11.005 ◽

2017 ◽

Vol 128 (1) ◽

pp. 243-250 ◽

Cited By ~ 55

Author(s):

Mark L. Scheuer ◽

Anto Bagic ◽

Scott B. Wilson

Keyword(s):

Large Data ◽

Turing Test ◽

Spike Detection ◽

Data Set ◽

Large Data Set

Download Full-text

Generation of geometric interpolations of building types with deep variational autoencoders

Design Science ◽

10.1017/dsj.2020.31 ◽

2020 ◽

Vol 6 ◽

Author(s):

Jaime de Miguel Rodríguez ◽

Maria Eugenia Villafañe ◽

Luka Piškorec ◽

Fernando Sancho Caparrini

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Large Data ◽

Learning Model ◽

Large Data Sets ◽

Data Sets ◽

Connectivity Map ◽

Data Set ◽

3D Objects ◽

Machine Learning Model

Abstract This work presents a methodology for the generation of novel 3D objects resembling wireframes of building types. These result from the reconstruction of interpolated locations within the learnt distribution of variational autoencoders (VAEs), a deep generative machine learning model based on neural networks. The data set used features a scheme for geometry representation based on a ‘connectivity map’ that is especially suited to express the wireframe objects that compose it. Additionally, the input samples are generated through ‘parametric augmentation’, a strategy proposed in this study that creates coherent variations among data by enabling a set of parameters to alter representative features on a given building type. In the experiments that are described in this paper, more than 150 k input samples belonging to two building types have been processed during the training of a VAE model. The main contribution of this paper has been to explore parametric augmentation for the generation of large data sets of 3D geometries, showcasing its problems and limitations in the context of neural networks and VAEs. Results show that the generation of interpolated hybrid geometries is a challenging task. Despite the difficulty of the endeavour, promising advances are presented.

Download Full-text

Determining radial efficiency with a large data set by solving small-size linear programs

Annals of Operations Research ◽

10.1007/s10479-015-1968-4 ◽

2015 ◽

Vol 250 (1) ◽

pp. 147-166 ◽

Cited By ~ 5

Author(s):

Wen-Chih Chen ◽

Sheng-Yung Lai

Keyword(s):

Large Data ◽

Linear Programs ◽

Data Set ◽

Large Data Set

Download Full-text

Main large data set features detection by a linear predictor model

10.1063/1.4897836 ◽

2014 ◽

Cited By ~ 19

Author(s):

Carlos Enrique Gutierrez ◽

Prof. Mohamad Reza Alsharif ◽

Mahdi Khosravy ◽

Prof. Katsumi Yamashita ◽

Prof. Hayao Miyagi ◽

...

Keyword(s):

Large Data ◽

Data Set ◽

Linear Predictor ◽

Large Data Set ◽

Predictor Model

Download Full-text

Training Data Sets Construction from Large Data Set for PCB Character Recognition

Use of Enterobacteriaceae Analysis Results for Predicting Absence of Salmonella Serovars on Beef Carcasses

Deep residual detection of radio frequency interference for FAST

Car-Following Described by Blending Data-Driven and Analytical Models: A Gaussian Process Regression Approach

A Practical Robust and Efficient RBF Metamodel Method for Typical Engineering Problems

Galaxy spin direction distribution in HST and SDSS show similar large-scale asymmetry

Climatology of nutrient distributions in the South China Sea based on a large data set derived from a new algorithm

Spike detection: Inter-reader agreement and a statistical Turing test on a large data set

Generation of geometric interpolations of building types with deep variational autoencoders

Determining radial efficiency with a large data set by solving small-size linear programs

Main large data set features detection by a linear predictor model