scholarly journals Training Data Sets Construction from Large Data Set for PCB Character Recognition

2019 ◽  
Vol 6 (4) ◽  
pp. 225-234
Author(s):  
NDAYISHIMIYE Fabrice ◽  
Sumyung Gang ◽  
Joon Jae Lee
2009 ◽  
Vol 72 (2) ◽  
pp. 260-266 ◽  
Author(s):  
JOHN R. RUBY ◽  
STEVEN C. INGHAM

Previous work using a large data set (no. 1, n = 5,355) of carcass sponge samples from three large-volume beef abattoirs highlighted the potential use of binary (present or absent) Enterobacteriaceae results for predicting the absence of Salmonella on carcasses. Specifically, the absence of Enterobacteriaceae was associated with the absence of Salmonella. We tested the accuracy of this predictive approach by using another large data set (no. 2, n = 2,163 carcasses sampled before or after interventions) from the same three data set no. 1 abattoirs over a later 7-month period. Similarly, the predictive approach was tested on smaller subsets from data set no. 2 (n = 1,087, and n = 405) and on a much smaller data set (no. 3, n = 100 postintervention carcasses) collected at a small-volume abattoir over 4 months. Of Enterobacteriaceae-negative data set no. 2 carcasses, >98% were Salmonella negative. Similarly accurate predictions were obtained in the two data subsets obtained from data set no. 2 and in data set no. 3. Of final postintervention carcass samples in data set nos. 2 and 3, 9 and 70%, respectively, were Enterobacteriaceae positive; mean Enterobacteriaceae values for the two data sets were −0.375, and 0.169 log CFU/100 cm2 (detection limit = −0.204, and Enterobacteriaceae negative assigned a value of −0.505 log CFU/100 cm2). Salmonella contamination rates for final postintervention beef carcasses in data set nos. 2 and 3 were 1.1 and 7.0%, respectively. Binary Enterobacteriaceae results may be useful in evaluating beef abattoir hygiene and intervention treatment efficacy.


2020 ◽  
Vol 492 (1) ◽  
pp. 1421-1431 ◽  
Author(s):  
Zhicheng Yang ◽  
Ce Yu ◽  
Jian Xiao ◽  
Bo Zhang

ABSTRACT Radio frequency interference (RFI) detection and excision are key steps in the data-processing pipeline of the Five-hundred-meter Aperture Spherical radio Telescope (FAST). Because of its high sensitivity and large data rate, FAST requires more accurate and efficient RFI flagging methods than its counterparts. In the last decades, approaches based upon artificial intelligence (AI), such as codes using convolutional neural networks (CNNs), have been proposed to identify RFI more reliably and efficiently. However, RFI flagging of FAST data with such methods has often proved to be erroneous, with further manual inspections required. In addition, network construction as well as preparation of training data sets for effective RFI flagging has imposed significant additional workloads. Therefore, rapid deployment and adjustment of AI approaches for different observations is impractical to implement with existing algorithms. To overcome such problems, we propose a model called RFI-Net. With the input of raw data without any processing, RFI-Net can detect RFI automatically, producing corresponding masks without any alteration of the original data. Experiments with RFI-Net using simulated astronomical data show that our model has outperformed existing methods in terms of both precision and recall. Besides, compared with other models, our method can obtain the same relative accuracy with fewer training data, thus reducing the effort and time required to prepare the training data set. Further, the training process of RFI-Net can be accelerated, with overfittings being minimized, compared with other CNN codes. The performance of RFI-Net has also been evaluated with observing data obtained by FAST and the Bleien Observatory. Our results demonstrate the ability of RFI-Net to accurately identify RFI with fine-grained, high-precision masks that required no further modification.


Author(s):  
Ignasi Echaniz Soldevila ◽  
Victor L. Knoop ◽  
Serge Hoogendoorn

Traffic engineers rely on microscopic traffic models to design, plan, and operate a wide range of traffic applications. Recently, large data sets, yet incomplete and from small space regions, are becoming available thanks to technology improvements and governmental efforts. With this study we aim to gain new empirical insights into longitudinal driving behavior and to formulate a model which can benefit from these new challenging data sources. This paper proposes an application of an existing formulation, Gaussian process regression (GPR), to describe individual longitudinal driving behavior of drivers. The method integrates a parametric and a non-parametric mathematical formulation. The model predicts individual driver’s acceleration given a set of variables. It uses the GPR to make predictions when there exists correlation between new input and the training data set. The data-driven model benefits from a large training data set to capture all driver longitudinal behavior, which would be difficult to fit in fixed parametric equation(s). The methodology allows us to train models with new variables without the need of altering the model formulation. And importantly, the model also uses existing traditional parametric car-following models to predict acceleration when no similar situations are found in the training data set. A case study using radar data in an urban environment shows that a hybrid model performs better than parametric model alone and suggests that traffic light status over time influences drivers’ acceleration. This methodology can help engineers to use large data sets and to find new variables to describe traffic behavior.


Author(s):  
Xingjie Fang ◽  
Liping Wang ◽  
Don Beeson ◽  
Gene Wiggs

Radial Basis Function (RBF) metamodels have recently attracted increased interest due to their significant advantages over other types of non-parametric metamodels. However, because of the interpolation nature of the RBF mathematics, the accuracy of the model may dramatically deteriorate if the training data set used contains duplicate information, noise or outliers. Also constructing the metamodel may be time consuming whenever the training data sets are large or a high dimensional model is required. In this paper, we propose a robust and efficient RBF metamodeling approach based on data pre-processing techniques that alleviate the accuracy and efficiency issues commonly encountered when RBF models are used in typical real engineering situations. These techniques include 1) the removal of duplicate training data information, 2) the generation of smaller uniformly distributed subsets of training data from large data sets and 3) the quantification and identification of outliers by principal component analysis (PCA) and Hotelling statistics. Simulation results are used to validate the generalization accuracy and efficiency of the proposed approach.


Author(s):  
Lior Shamir

Abstract Several recent observations using large data sets of galaxies showed non-random distribution of the spin directions of spiral galaxies, even when the galaxies are too far from each other to have gravitational interaction. Here, a data set of $\sim8.7\cdot10^3$ spiral galaxies imaged by Hubble Space Telescope (HST) is used to test and profile a possible asymmetry between galaxy spin directions. The asymmetry between galaxies with opposite spin directions is compared to the asymmetry of galaxies from the Sloan Digital Sky Survey. The two data sets contain different galaxies at different redshift ranges, and each data set was annotated using a different annotation method. The results show that both data sets show a similar asymmetry in the COSMOS field, which is covered by both telescopes. Fitting the asymmetry of the galaxies to cosine dependence shows a dipole axis with probabilities of $\sim2.8\sigma$ and $\sim7.38\sigma$ in HST and SDSS, respectively. The most likely dipole axis identified in the HST galaxies is at $(\alpha=78^{\rm o},\delta=47^{\rm o})$ and is well within the $1\sigma$ error range compared to the location of the most likely dipole axis in the SDSS galaxies with $z>0.15$ , identified at $(\alpha=71^{\rm o},\delta=61^{\rm o})$ .


2021 ◽  
pp. 102586
Author(s):  
Chuanjun Du ◽  
Ruoying He ◽  
Zhiyu Liu ◽  
Tao Huang ◽  
Lifang Wang ◽  
...  

2017 ◽  
Vol 128 (1) ◽  
pp. 243-250 ◽  
Author(s):  
Mark L. Scheuer ◽  
Anto Bagic ◽  
Scott B. Wilson

2020 ◽  
Vol 6 ◽  
Author(s):  
Jaime de Miguel Rodríguez ◽  
Maria Eugenia Villafañe ◽  
Luka Piškorec ◽  
Fernando Sancho Caparrini

Abstract This work presents a methodology for the generation of novel 3D objects resembling wireframes of building types. These result from the reconstruction of interpolated locations within the learnt distribution of variational autoencoders (VAEs), a deep generative machine learning model based on neural networks. The data set used features a scheme for geometry representation based on a ‘connectivity map’ that is especially suited to express the wireframe objects that compose it. Additionally, the input samples are generated through ‘parametric augmentation’, a strategy proposed in this study that creates coherent variations among data by enabling a set of parameters to alter representative features on a given building type. In the experiments that are described in this paper, more than 150 k input samples belonging to two building types have been processed during the training of a VAE model. The main contribution of this paper has been to explore parametric augmentation for the generation of large data sets of 3D geometries, showcasing its problems and limitations in the context of neural networks and VAEs. Results show that the generation of interpolated hybrid geometries is a challenging task. Despite the difficulty of the endeavour, promising advances are presented.


2014 ◽  
Author(s):  
Carlos Enrique Gutierrez ◽  
Prof. Mohamad Reza Alsharif ◽  
Mahdi Khosravy ◽  
Prof. Katsumi Yamashita ◽  
Prof. Hayao Miyagi ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document