database errors
Recently Published Documents


TOTAL DOCUMENTS

12
(FIVE YEARS 3)

H-INDEX

6
(FIVE YEARS 1)

2019 ◽  
Author(s):  
Thatcher Louis Collins

Unreproducibility stemming from a loss of data integrity can be prevented with hash functions, secure sketches, and Benford's Law when combined with the historical practice of a Pli Cacheté where scientific discoveries were archived with a 3rd party to later prove the date of discovery. Including the distinct systems of preregistation and data provenance tracking becomes the starting point for the creation of a complete ontology of scientific documentation. The ultimate goals in such a system--ideally mandated--would rule out several forms of dishonesty, catch computational and database errors, catch honest mistakes, and allow for automated data audits of large collaborative open science projects.


2016 ◽  
Vol 10 (4) ◽  
pp. 933-953 ◽  
Author(s):  
Fiorenzo Franceschini ◽  
Domenico Maisano ◽  
Luca Mastrogiacomo

Author(s):  
Pavel Hering ◽  
Pavel Kopunecz ◽  
Oto Hanuš ◽  
Martin Tomáška ◽  
Marcela Klimešová ◽  
...  

Milk recording (MR) is an essential breeder measure. Results are important for inheritance check. The occurrence of errors in the data may compromise the efficiency of breeding of dairy cows. The aim was possibility to reduce the incidence of MR database errors. Analyses of frequency distribution of MR data deviations from different sources and estimations of limits of difference acceptability in milk recording were performed. The results of MR control days of flowmeter in parlor (DMY) were paired to the AVG7 results (average for 7 days) from the same flowmeter (n = 16,247, original recordings of complete lactations). The individual differences in milk yield indicators were calculated between successive MR control days (DMY – R, monthly interval, the reference value (R) = previous DMY) for MR data file. A statistically significant correlation coefficient (AVG7 and DMY) was 0.935 (P < 0.001) and was higher in comparison to the previous assessment under AMS conditions (automatic milking system; 0.898; P < 0.001). This means that 87.3% of the variability in the ​​milk yield values for MR (DMY) can be explained by variations in the AVG7 values and vice versa. Difference tests confirmed significant differences (P < 0.001) 0.76 and 0.55 kg between DMY (in MR) and AVG7 for original and also refined data file. Mentioned differences, although statistically significant, correspond only to 2.96 and 2.15% relatively. The use of multi–day milk yield average from the electronic flowmeter is an equivalent alternative to the use of record from one MR control day. Results are used in MR practice.


2013 ◽  
Vol 49 (1) ◽  
pp. 155-165 ◽  
Author(s):  
Fiorenzo Franceschini ◽  
Domenico Maisano ◽  
Luca Mastrogiacomo

2013 ◽  
Vol 39 (4) ◽  
pp. 497-538 ◽  
Author(s):  
Sharona Hoffman ◽  
Andy Podgurski

Very large biomedical research databases, containing electronic health records (EHR) and genomic data from millions of patients, have been heralded recently for their potential to accelerate scientific discovery and produce dramatic improvements in medical treatments. Research enabled by these databases may also lead to profound changes in law, regulation, social policy, and even litigation strategies. Yet, is “big data” necessarily better data?This paper makes an original contribution to the legal literature by focusing on what can go wrong in the process of biomedical database research and what precautions are necessary to avoid critical mistakes. We address three main reasons for approaching such research with care and being cautious in relying on its outcomes for purposes of public policy or litigation. First, the data contained in biomedical databases is surprisingly likely to be incorrect or incomplete. Second, systematic biases, arising from both the nature of the data and the preconceptions of investigators, are serious threats to the validity of research results, especially in answering causal questions. Third, data mining of biomedical databases makes it easier for individuals with political, social, or economic agendas to generate ostensibly scientific but misleading research findings for the purpose of manipulating public opinion and swaying policymakers.In short, this paper sheds much-needed light on the problems of credulous and uninformed acceptance of research results derived from biomedical databases. An understanding of the pitfalls of big data analysis is of critical importance to anyone who will rely on or dispute its outcomes, including lawyers, policymakers, and the public at large. The Article also recommends technical, methodological, and educational interventions to combat the dangers of database errors and abuses.


2013 ◽  
Vol 19 (4) ◽  
pp. 320 ◽  
Author(s):  
Michael C Calver ◽  
Stephen J Beatty ◽  
Kate A Bryant ◽  
Christopher R Dickman ◽  
Brendan C Ebner ◽  
...  

Assessments of scientists’ research records through citations are becoming increasingly important in management and in bibliometric research, but the databases available may contain errors that reduce the reliability of assessments. We investigated this by profiling our personal records in five databases: Scopus, Web of Knowledge, Web of Science, the Cited Reference Search within Web of Science, and the freeware Publish or Perish followed by correction in CleanPoP. We documented disparities between the results and our CVs, noting implications for bibliometric analyses from our perspective as conservation biologists. No database provided a complete, accurate record for anyone. Sometimes publications were out of range or missing, especially if they were books and book chapters. Other errors included mistakes in the order of authors or year of publication, as well as misattribution of publications. The Hirsch index (h) was robust across databases, but other metrics were more volatile. Nevertheless, all metrics except median citations/paper gave high correlations of 0.78 or greater for the rank order of authors across databases. Profiling researchers’ records without knowledge of their CVs will likely result in inaccurate assessments. Reliance on one database compounds the problem if the database does not encompass the researcher’s full output, especially books and book chapters. Coverage may be particularly important for conservation biologists, who sometimes publish material of local relevance in local journals not abstracted in some of the databases. Administrators and researchers seeking citation profiles should query multiple databases to obtain a more complete picture of research output and cross check against a full CV when possible. It may be unjustified to assume that discrepancies between database and CV indicate mistakes made by the researcher — verification from the original publication is necessary. Furthermore, citations are but one of many measures available for assessing the quality, use or impact of research, and their sole use, irrespective of possible errors, may be misleading.


2010 ◽  
Vol 1 (1) ◽  
pp. 2
Author(s):  
Ali M. Al-Ghamdi

This paper highlights cartographic considerations relevant during the process of quantification of generalization uncertainties, defined here as Generalization Factor (GF). The paper adds to current research on map or spatial database errors and uncertainties, but focuses on the complex nature of the quantification process of generalization uncertainties. Three main cartographic aspects or contexts are discussed in this paper: namely, feature complexity, map sources, and map purposes. The paper discusses the difficulties in producing a universal index as GF that accounts satisfactorily for generalization uncertainty. As a result, there is a need for a thorough study to account for all types of generalization uncertainty for each feature according to the cartographic consideration discussed in this study, although such contexts are not exhaustive. The study suggests that the uncertainty measures should result in a form of value that can be attached to each feature in the database, especially for those detailed databases that are designed for analysis purposes. The study suggests that it might well be possible to quantify generalization uncertainty more easily once the process of generalization is performed automatically or even semi-automatically, especially with the advent of new generalization tools. 


2005 ◽  
Vol 52 (5) ◽  
pp. 35-42 ◽  
Author(s):  
H. Korving ◽  
F. Clemens

Assessments of sewer performance are usually based on a single computation of CSO (combined sewer overflow) volumes using a time series of rainfall as system loads. A shortcoming of this method is that uncertainties in knowledge of sewer system dimensions are not taken into account. Moreover, sewer models are rarely calibrated. This paper presents the impacts of database errors and model calibration on return periods of calculated CSO volumes. The impact of uncertainties is illustrated with two examples. Variability of calculated CSO volumes is estimated using Monte Carlo simulations. The results show that calculated CSO volumes vary considerably due to database errors, especially uncertain dimensions of the catchment area. Furthermore, event-based calibration of a sewer model does not result in more reliable predictions because the calibrated parameters have low portability. However, it enables removal of database errors harmonising model predictions and ‘reality’.


Sign in / Sign up

Export Citation Format

Share Document