noisy data
Recently Published Documents


TOTAL DOCUMENTS

1333
(FIVE YEARS 303)

H-INDEX

53
(FIVE YEARS 5)

2022 ◽  
Author(s):  
Tong Guo

In industry deep learning application, our manually labeled data has a certain number of noisy data. To solve this problem and achieve more than 90 score in dev dataset, we present a simple method to find the noisy data and re-label the noisy data by human, given the model predictions as references in human labeling. In this paper, we illustrate our idea for a broad set of deep learning tasks, includes classification, sequence tagging, object detection, sequence generation, click-through rate prediction. The experimental results and human evaluation results verify our idea.


2022 ◽  
Vol 71 ◽  
pp. 103237
Author(s):  
Xingang Fang ◽  
Julia Klawohn ◽  
Alexander De Sabatino ◽  
Harsh Kundnani ◽  
Jonathan Ryan ◽  
...  

2021 ◽  
Author(s):  
Tong Guo

In industry deep learning application, our manually labeled data has a certain number of noisy data. To solve this problem and achieve more than 90 score in dev dataset, we present a simple method to find the noisy data and re-label the noisy data by human, given the model predictions as references in human labeling. In this paper, we illustrate our idea for a broad set of deep learning tasks, includes classification, sequence tagging, object detection, sequence generation, click-through rate prediction. The experimental results and human evaluation results verify our idea.


Author(s):  
Souad Azzouzi ◽  
Amal Hjouji ◽  
Jaouad EL- Mekkaoui ◽  
Ahmed EL Khalfi

The Fuzzy C-means (FCM) algorithm has been widely used in the field of clustering and classification but has encountered difficulties with noisy data and outliers. Other versions of algorithms related to possibilistic theory have given good results, such as Fuzzy C- Means(FCM), possibilistic C-means (PCM), Fuzzy possibilistic C-means (FPCM) and possibilistic fuzzy C- Means algorithm (PFCM).This last algorithm works effectively in some environments but encountered more shortcomings with noisy databases. To solve this problem, we propose in this manuscript, a new algorithm named Improved Possibilistic Fuzzy C-Means (ImPFCM) by combining the PFCM algorithm with a very powerful statistical method. The properties of this new ImPFCM algorithm show that it is not only applicable on clusters of spherical shapes, but also on clusters of different sizes and densities. The results of the comparative study with very recent algorithms indicate the performance and the superiority of the proposed approach to easily group the datasets in a large-dimensional space and to use not only the Euclidean distance but more sophisticated standards norms, capable to deal with much more complicated problems. On the other hand, we have demonstrated that the ImPFCM algorithm is also capable of detecting the cluster center with high accuracy and performing satisfactorily in multiple environments with noisy data and outliers.


2021 ◽  
Vol 11 (24) ◽  
pp. 12062
Author(s):  
Reina Murakami ◽  
Valentin Grave ◽  
Osamu Fukuda ◽  
Hiroshi Okumura ◽  
Nobuhiko Yamaguchi

Appearances of products are important to companies as they reflect the quality of their manufacture to customers. Nowadays, visual inspection is conducted by human inspectors. This research attempts to automate this process using Convolutional AutoEncoders (CAE). Our models were trained using images of non-defective parts. Previous research on autoencoders has reported that the accuracy of image regeneration can be improved by adding noise to the training dataset, but no extensive analyse of the noise factor has been done. Therefore, our method compares the effects of two different noise patterns on the models efficiency: Gaussian noise and noise made of a known structure. The test datasets were comprised of “defective” parts. Over the experiments, it has mostly been observed that the precision of the CAE sharpened when using noisy data during the training phases. The best results were obtained with structural noise, made of defined shapes randomly corrupting training data. Furthermore, the models were able to process test data that had slightly different positions and rotations compared to the ones found in the training dataset. However, shortcomings appeared when “regular” spots (in the training data) and “defective” spots (in the test data) partially, or totally, overlapped.


2021 ◽  
Author(s):  
Tong Guo

In industry deep learning application, our manually labeled data has a certain number of noisy data. To solve this problem and achieve more than 90 score in dev dataset, we present a simple method to find the noisy data and re-label the noisy data by human, given the model predictions as references in human labeling. In this paper, we illustrate our idea for a broad set of deep learning tasks, includes classification, sequence tagging, object detection, sequence generation, click-through rate prediction. The experimental results and human evaluation results verify our idea.


2021 ◽  
Vol 9 (1) ◽  
Author(s):  
Peter Monk ◽  
Virginia Selgas

AbstractTarget signatures are discrete quantities computed from measured scattering data that could potentially be used to classify scatterers or give information about possible defects in the scatterer compared to an ideal object. Here, we study a class of modified interior transmission eigenvalues that are intended to provide target signatures for an inverse fluid–solid interaction problem. The modification is based on an auxiliary problem parametrized by an artificial diffusivity constant. This constant may be chosen strictly positive, or strictly negative. For both choices, we characterize the modified interior transmission eigenvalues by means of a suitable operator so that we can determine their location in the complex plane. Moreover, for the negative sign choice, we also show the existence and discreteness of these eigenvalues. Finally, no matter the choice of the sign, we analyze the approximation of the eigenvalues from far field measurements of the scattered fluid pressure and provide numerical results which show that, even with noisy data, some of the eigenvalues can be determined from far field data.


2021 ◽  
Vol 2021 ◽  
pp. 1-19
Author(s):  
Rui Tao ◽  
Jian Liu ◽  
Yuqing Song ◽  
Rui Peng ◽  
Dali Zhang ◽  
...  

Traffic peak is an important parameter of modern transport systems. It can be used to calculate the indices of road congestion, which has become a common problem worldwide. With accurate information about traffic peaks, transportation administrators can make better decisions to optimize the traffic networks and therefore enhance the performance of transportation systems. We present a traffic peak detection method, which constructs the Voronoi diagram of the input traffic flow data and computes the prominence of candidate peak points using the diagram. Salient peaks are selected based on the prominence. The algorithm takes O(n log n) time and linear space, where n is the size of the input time series. As compared with the existing algorithms, our approach works directly on noisy data and detects salient peaks without a smoothing prestep and thus avoids the dilemma in choosing an appropriate smoothing scale and prevents the occurrence of removing/degrading real peaks during smoothing step. The prominence of candidate peaks offers the subsequent analysis the flexibility to choose peaks at any scale. Experiments illustrated that the proposed method outperforms the existing smoothing-based methods in sensitivity, positive predictivity, and accuracy.


Sign in / Sign up

Export Citation Format

Share Document