contaminated data
Recently Published Documents


TOTAL DOCUMENTS

72
(FIVE YEARS 21)

H-INDEX

9
(FIVE YEARS 2)

2021 ◽  
Vol 13 (17) ◽  
pp. 3475
Author(s):  
Yihuan Peng ◽  
Xuetong Xie ◽  
Mingsen Lin ◽  
Lishan Ran ◽  
Feng Yuan ◽  
...  

Rain affects the wind measurement accuracy of the Ku-band spaceborne scatterometer. In order to improve the quality of the retrieved wind field, it is necessary to identify and flag rain-contaminated data. In this study, an HY-2A scatterometer is used to study rain identification. In addition to the conventional parameters, such as the retrieved wind speed, the wind direction relative to the along-track direction, and the normalized beam difference, the experiment expands the mean deviation of the backscattering coefficient, the beam difference between fore and aft, and the node number of the wind vector cell (WVC) as the sensitive parameters according to the microwave scattering characteristics of rain and the actual measurement situation of the HY-2A. Furthermore, a rain identification model for HY2 (HY2RRM) with the K-Nearest Neighborhood (KNN) algorithm was built. After several tests, the accuracy of the selected HY2RRM approach is found to about 88%, and about 70% of rain-contaminated data can be accurately identified. The research results are helpful for better understanding the characteristics of microwave backscattering and provide a possible way to further improve the wind field retrieval accuracy of the HY-2A scatterometer and other Ku-band scatterometers.


2021 ◽  
Author(s):  
Hao Chen ◽  
Hideki Mizunaga ◽  
Toshiaki Tanaka ◽  
Lei Zhou

Abstract Magnetotelluric (MT) method is an electromagnetic geophysical method for inferring the earth's subsurface electrical conductivity from measurements of natural geomagnetic and geoelectric field variation at the earth's surface. The first step in MT data processing is to estimate the impedance tensor in the frequency domain from the measured time-series data. The initial MT response function estimator is based on the least-square theory; it can be severely disturbed by the cultural noise. In the presence of a small amount of intermittent contaminated data, it can be improved by remote reference technique, robust procedure or combination of them. In the presence of a large amount of contaminated data, it can still succeed with assistance from data analysis to remove the most contaminated data before the impedance tensor estimation. The phase difference is an important parameter to analyze the data in the frequency domain. In this paper, we investigate three parameters(the predicted coherence, remote coherence and polarization direction) correspond the phase difference to analyze the MT data. We demonstrated that the high predicted coherence could indicate a high signal-to-noise ratio(SNR) or strong coherence noise. The polarization direction was useful to visualize the background noise. The remote coherence was a useful parameter to indicate the quality of the data. In this paper, we will introduce a robust M-estimator at first. At last, we showed the effectiveness of the application of remote linear coherence to the selection strategy based on the M-estimator. By this selection strategy, the result can be improved dramatically in the presence of a large amount of intermittent noise.


2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Tobias Hepp ◽  
Jakob Zierk ◽  
Manfred Rauh ◽  
Markus Metzler ◽  
Andreas Mayr

Abstract Background Medical decision making based on quantitative test results depends on reliable reference intervals, which represent the range of physiological test results in a healthy population. Current methods for the estimation of reference limits focus either on modelling the age-dependent dynamics of different analytes directly in a prospective setting or the extraction of independent distributions from contaminated data sources, e.g. data with latent heterogeneity due to unlabeled pathologic cases. In this article, we propose a new method to estimate indirect reference limits with non-linear dependencies on covariates from contaminated datasets by combining the framework of mixture models and distributional regression. Results Simulation results based on mixtures of Gaussian and gamma distributions suggest accurate approximation of the true quantiles that improves with increasing sample size and decreasing overlap between the mixture components. Due to the high flexibility of the framework, initialization of the algorithm requires careful considerations regarding appropriate starting weights. Estimated quantiles from the extracted distribution of healthy hemoglobin concentration in boys and girls provide clinically useful pediatric reference limits similar to solutions obtained using different approaches which require more samples and are computationally more expensive. Conclusions Latent class distributional regression models represent the first method to estimate indirect non-linear reference limits from a single model fit, but the general scope of applications can be extended to other scenarios with latent heterogeneity.


Author(s):  
Haniel Fernandes

Amid the covid-19 pandemic, other diseases, including viruses, are still acting to the detriment of their seasonality and risk factors for contagion. For this reason, it is interesting to know the degree of impact of other viruses, mainly respiratory, in which they have similar symptoms, in diagnoses for contamination by the new coronavirus based on epidemiological surveys, via epidemiological weeks, in Brazil. To what extent there may be a hypothesis of confusion of contaminated data, harming the health system, with regard to the need for intensive care units and control of viruses, and negatively or positively implying in the control or uncontrolling of viruses in general.


Author(s):  
Yanghui Tan ◽  
Chunyang Niu ◽  
Hui Tian ◽  
Yejin Lin ◽  
Jundong Zhang

Author(s):  
Władysław Homenda ◽  
Agnieszka Jastrzȩbska ◽  
Witold Pedrycz ◽  
Fusheng Yu

AbstractIn this paper, we look closely at the issue of contaminated data sets, where apart from legitimate (proper) patterns we encounter erroneous patterns. In a typical scenario, the classification of a contaminated data set is always negatively influenced by garbage patterns (referred to as foreign patterns). Ideally, we would like to remove them from the data set entirely. The paper is devoted to comparison and analysis of three different models capable to perform classification of proper patterns with rejection of foreign patterns. It should be stressed that the studied models are constructed using proper patterns only, and no knowledge about the characteristics of foreign patterns is needed. The methods are illustrated with a case study of handwritten digits recognition, but the proposed approach itself is formulated in a general manner. Therefore, it can be applied to different problems. We have distinguished three structures: global, local, and embedded, all capable to eliminate foreign patterns while performing classification of proper patterns at the same time. A comparison of the proposed models shows that the embedded structure provides the best results but at the cost of a relatively high model complexity. The local architecture provides satisfying results and at the same time is relatively simple.


Sign in / Sign up

Export Citation Format

Share Document