erroneous data
Recently Published Documents


TOTAL DOCUMENTS

100
(FIVE YEARS 33)

H-INDEX

13
(FIVE YEARS 2)

Symmetry ◽  
2021 ◽  
Vol 13 (12) ◽  
pp. 2452
Author(s):  
Aleksandras Krylovas ◽  
Natalja Kosareva ◽  
Stanislav Dadelo

This article presents the methodology and tools to evaluate the reliability of quantitative sociological research data. The problem of filtering unreliable data is usually solved by statistical methods. This article proposes an improved method for filtering unreliable data. In this case, the statistical methods are not applied to the initial data but the value of the distance function between the two preferences. This allows for the disclosure of conflicting or erroneous data. Calculation of the distance between two preferences and prioritisation of life goals are based on binary relation theory, where the properties of symmetry (antisymmetry) are very important. The article presents a case study on 11 life goals evaluation and ranking by Lithuanian and China students. The study revealed that the China student data filtered at least twice as much as the Lithuanian student data, i.e., they are less reliable. The filtered data show that students of both countries ranked the most and the least important life goals in a very similar way with minimum deviations detected in the ranking results.


2021 ◽  
Vol 31 (2) ◽  
pp. 212-252
Author(s):  
Erin Buzuvis ◽  
Sarah Litwin ◽  
Warren Zola

Sport is a vehicle for social change and should be leveraged as such in 2021 and beyond to address matters of equality. In recent years, the public has paid greater attention to transgender athletes participating in sport at all levels—high school, collegiate, professional, and Olympic—despite the fact that transgender athletes have been competing in sports for decades. Backlash has arisen in general but also more specifically in response to several recent Supreme Court cases that have both solidified and extended rights of lesbian, gay, bisexual, transgender, and other gender and sexual minorities. In turn, state laws that seek to limit the rights of transgender students to participate in sports have been drafted around the country. To be sure, these laws are often built on erroneous data, a misunderstanding of facts, and ignorance, but their existence continues to fuel the public debate on whether transgender athletes should be allowed to participate based on their gender identity or their sex as determined at birth.


2021 ◽  
Vol 2 (3) ◽  
pp. 1-29
Author(s):  
Mona Nashaat ◽  
Aindrila Ghosh ◽  
James Miller ◽  
Shaikh Quader

Error detection is a crucial preliminary phase in any data analytics pipeline. Existing error detection techniques typically target specific types of errors. Moreover, most of these detection models either require user-defined rules or ample hand-labeled training examples. Therefore, in this article, we present TabReformer, a model that learns bidirectional encoder representations for tabular data. The proposed model consists of two main phases. In the first phase, TabReformer follows encoder architecture with multiple self-attention layers to model the dependencies between cells and capture tuple-level representations. Also, the model utilizes a Gaussian Error Linear Unit activation function with the Masked Data Model objective to achieve deeper probabilistic understanding. In the second phase, the model parameters are fine-tuned for the task of erroneous data detection. The model applies a data augmentation module to generate more erroneous examples to represent the minority class. The experimental evaluation considers a wide range of databases with different types of errors and distributions. The empirical results show that our solution can enhance the recall values by 32.95% on average compared with state-of-the-art techniques while reducing the manual effort by up to 48.86%.


2021 ◽  
Author(s):  
Julius Polz ◽  
Lennart Schmidt ◽  
Luca Glawion ◽  
Maximilian Graf ◽  
Christian Werner ◽  
...  

<p>We can observe a global decrease of well maintained weather stations by meteorological services and governmental institutes. At the same time, environmental sensor data is increasing through the use of opportunistic or remote sensing approaches. Overall, the trend for environmental sensor networks is strongly going towards automated routines, especially for quality-control (QC) to provide usable data in near real-time. A common QC scenario is that data is being flagged manually using expert knowledge and visual inspection by humans. To reduce this tedious process and to enable near-real time data provision, machine-learning (ML) algorithms exhibit a high potential as they can be designed to imitate the experts actions. </p><p>Here we address these three common challenges when applying ML for QC: 1) Robustness to missing values in the input data. 2) Availability of training data, i.e. manual quality flags that mark erroneous data points. And 3) Generalization of the model regarding non-stationary behavior of one  experimental system or changes in the experimental setup when applied to a different study area. We approach the QC problem and the related issues both as a supervised and an unsupervised learning problem using deep neural networks on the one hand and dimensionality reduction combined with clustering algorithms on the other.</p><p>We compare the different ML algorithms on two time-series datasets to test their applicability across scales and domains. One dataset consists of signal levels of 4000 commercial microwave links distributed all over Germany that can be used to monitor precipitation. The second dataset contains time-series of soil moisture and temperature from 120 sensors deployed at a small-scale measurement plot at the TERENO site “Hohes Holz”.</p><p>First results show that supervised ML provides an optimized performance for QC for an experimental system not subject to change and at the cost of a laborious preparation of the training data. The unsupervised approach is also able to separate valid from erroneous data at reasonable accuracy. However, it provides the additional benefit that it does not require manual flags and can thus be retrained more easily in case the system is subject to significant changes. </p><p>In this presentation, we discuss the performance, advantages and drawbacks of the proposed ML routines to tackle the aforementioned challenges. Thus, we aim to provide a starting point for researchers in the promising field of ML application for automated QC of environmental sensor data.</p>


2021 ◽  
Vol 27 (1) ◽  
pp. 60-70
Author(s):  
Wei He ◽  
Xinlong Liu ◽  
Xiumin Chu ◽  
Zhiyuan Wang ◽  
Pawel Fracz ◽  
...  

Affected by the environment of inland waterway, an Automatic Identification System (AIS) collects lots of abnormal data, which significantly reduces the inland river navigation performance using AIS data. To this end, this paper aims to restore the AIS data by repairing the lost data points. By analysing enormous abnormal AIS data, the abnormal data were firstly divided into three types, i.e., the erroneous data, short-time lost data, and long-time lost data. Then, a cubic spline interpolation method was employed to deal with the erroneous data and short-time lost data. Meanwhile, a least square support vector machine method was utilized to repair the long-time lost data. Finally, field experiments were carried out to validate the applicability of the proposed method, and it is shown that the fitting model can repair the AIS data with an accuracy of more than 90 %.


Zootaxa ◽  
2021 ◽  
Vol 4929 (1) ◽  
pp. 1-100
Author(s):  
PAOLO ROSA ◽  
POKKATTU GOPI ASWATHI ◽  
CHENTHAMARAKSHAN BIJOY

An illustrated and updated checklist of the Indian Chrysididae is presented, including synonyms and distributional summaries. The list includes 105 species in 20 genera. Six species are described as new: Elampus gladiator Rosa, sp. nov. (Himachal Pradesh, Jammu & Kashmir, and Uttar Pradesh), Chrysis aswathiae Rosa, sp. nov. (Tamil Nadu, elegans species group), Chrysis baldocki Rosa, sp. nov. (Tamil Nadu, smaragdula group), Chrysis bernasconii Rosa, sp. nov. (Tamil Nadu, subsinuata group), Chrysis polita Rosa, sp. nov. (West Bengal, Uttaranchal, Myanmar, ignita group), and Chrysis travancoriana Rosa, sp. nov. (Kerala and Tamil Nadu, praecipua group). Six species are newly recorded: Chrysis hecate Mocsáry, 1889; Chrysis jalala Nurse, 1902; Chrysis obscura Smith, 1860; Istiochrysis ziliolii Rosa & Xu, 2016; Praestochrysis furcifera (Bingham, 1903); Primeuchroeus siamensis (Bischoff, 1910). Two new synonymies are proposed: Chrysis abuensis Nurse, 1902, syn. nov. of Chrysis wroughtoni du Buysson, 1896b; Chrysis nursei Bingham, 1903 syn. nov. of Chrysis gujaratica Nurse, 1903a. Holopyga (Hedychridium) virescens Mocsáry, 1914 is transferred to the genus Hedychridium Abeille de Perrin, 1878; the name Hedychridium virescens (Mocsáry, 1914) results a secondary homonym of Hedychridium virescens du Buysson, 1908 and it is here replaced with the new name Hedychridium mocsaryi Rosa, nom. nov. Chrysis cotesi du Buysson, 1893, sp. resurr. is here revalidated from the previous synonymy with Chrysis palliditarsis Spinola, 1838. Chrysis bahadur Nurse, 1903a is transferred from the ignita group to the splendidula group, Chrysis bhavanae Bingham, 1903 is transferred from the ignita group to the maculicornis group, and Chrysis thakur is transferred from the smaragdula group to the oculata group. Chrysis nila Bingham, 1903 and Chrysis variipes Mocsáry, 1911 are included in the newly established nila group. Spinolia kashmirae Kimsey in Kimsey & Bohart, 1991 is classified as unnecessary replacement name. The name Parnopes oberthuri du Buysson, 1904 is here emendated into Parnopes oberthueri (currently Cephaloparnops oberthueri). Potential erroneous data, misidentifications and dubious distributional records that may exist in the literature are also identified. We examined almost all type specimens, excluding taxa described by Cameron and Smith. We provide a key to Indian genera, including those expected for the country and not yet recorded, and colour images of type and non-type specimens belonging to 82 species. 


2021 ◽  
Vol 9 (2) ◽  
pp. 149
Author(s):  
Evelin Engler ◽  
Paweł Banyś ◽  
Hans-Georg Engler ◽  
Michael Baldauf ◽  
Frank Sill Torres

Collision avoidance is one of the main tasks on board ships to ensure safety at sea. To comply with this requirement, the direct ship environment, which is often modelled as the ship’s domain, has to be kept free of other vessels and objects. This paper addresses the question to which extent inaccuracies in position (P), navigation (N), and timing (T) data impact the reliability of collision avoidance. Employing a simplified model of the ship domain, the determined error bounds are used to derive requirements for ship-side PNT data provision. For this purpose, vessel traffic data obtained in the western Baltic Sea based on the automatic identification system (AIS) is analysed to extract all close encounters between ships considered as real-world traffic situations with a potential risk of collision. This study assumes that in these situations, erroneous data can lead to an incorrect assessment of the situation with regard to existing collision risks. The size of the error determines whether collisions are detected, spatially incorrectly assigned, or not detected. Therefore, the non-recognition of collision risks ultimately determines the limits of tolerable errors in the PNT data. The results indicate that under certain conditions, the probability of non-recognition of existing collision risks can reach non-negligible values, e.g., more than 1%, even though position accuracies are better than 10 m.


2021 ◽  
pp. 207-220
Author(s):  
Ze Shi Li ◽  
Visakha Phusamruat ◽  
Tony Clear ◽  
Daniela Damian

This chapter assesses whether the short-term benefits of using digital technology to suppress the Covid-19 pandemic justify the detrimental long-term consequences for privacy. It addresses this complex question through an inevitably incomplete discussion of privacy data protection laws, technology design, and trust in governments and technology providers as well as cultural understandings of privacy. After outlining the technology-assisted measures in various regions in Asia, the chapter highlights major privacy concerns and looks at a number of trade-offs that emerge from the use of technology to contain the spread of the virus. These trade-offs exemplify the risks of adoption of just-in-time software technologies for public health purposes without fully understanding their impact on users and of potentially erroneous data-driven decisions and the involuntary collection of personal data. They also raise important policy questions in the dynamic and fast-shifting context of the Covid-19 pandemic.


Author(s):  
Mohammad Javad MANSOURZADEH ◽  
Javad GHAZIMIRSAEID ◽  
Nadia MOTAMEDI ◽  
Ali NAJAFI ◽  
Auwal ABDULLAHI ABUBAKAR ◽  
...  

Background: Retraction is a mechanism for correcting the literature and a warning for readers in relation to publications that contain serious flaws or erroneous data. As a result of growth and development of Iranian publications in the last two decades, that brings unethical behavior of researchers led to retraction of their publications. We aimed to investigate Iranian retracted publications indexed in PubMed database. Methods: All Iranian retracted publications published in PubMed up to Dec 2017 have been retrieved. Bibliographic information of retracted publications, retraction notice, time lag between article publication date and the date of retraction notice, reasons of retraction, Issuer of retraction and acknowledge information of retracted publication were recorded. Additionally, citation data of retracted publications before 2013 were analyzed. Results: Overall, 164 Iranian retracted publications were identified. Meantime lag was 20.8 months. "Islamic Azad University" and "Tehran University of Medical Sciences (TUMS)" were two affiliations that have received highest number of retracted publications. The most issuer of retraction publications was editor-in-chief and the most mentioned reasons for retractions were authorship issues, plagiarism, and redundant publication. Thirtythree (20.12%) publications have received funds from various agencies. Citation study of retracted publications indicates that these publications have received 789 citations (Citation per publication=11.6). Conclusion: Although Iranian retracted publications represent small portion of all Iranian publications, but the number of retracted publications has increased. More than half of retracted publications have had authorship issues and plagiarism that requires more attention to research ethics authorities.  


Author(s):  
Dhai Eddine Salhi ◽  
Abelkamel Tari ◽  
Mohand Tahar Kechadi

In the world of the internet of things (IoT), many connected objects generate an enormous amount of data. This data is used to analyze and make decisions about specific phenomena. If an object generates wrong data, it will influence the analysis of this collected data and the decision later. A forensics analysis is necessary to detect IoT nodes that are failing. This paper deals with a problem: the detection of these nodes, which generate erroneous data. The study starts to collect in a cloud computing server temperature measurements (the case study); using temperature sensors, the communication of the nodes is based on the HIP (host identity protocol). The detection is made using a data mining classification technique, in order to group the connected objects according to the collected measurements. At the end of the study, very good results were found, which opens the door to further studies.


Sign in / Sign up

Export Citation Format

Share Document