scholarly journals Synthetic data generator for testing record linkage routines in Brazil.

Author(s):  
Vitor Trentin ◽  
Valeria Bastos ◽  
Myrian Costa ◽  
Kenneth Camargo ◽  
Rejane Sobrino ◽  
...  

IntroductionRecord linkage has been increasingly used in Brazil. However, only a few studies report the quality of the linkage process. Synthetic test data can be used to evaluate the quality of data linkage. Objectives and ApproachTo develop a synthetic data generator that creates test datasets with similar attributes and error characteristics found in the Brazilian databases. We analyzed the 2013 mortality database from Rio de Janeiro State to know the characteristics and frequency distribution of the database attributes (name, mother’s name, sex, date of birth and address). We used Python and C++ to customize and add routines to GeCo (http://dlrep.org/dataset/GeCo), a personal data generation tool developed by Tran et al. (DOI:10.1145/2505515.2508207). ResultsBrazilian names have specific characteristics that distinguish them from other countries’ patterns: multiple family names are usual, as are composite first names, and, despite that, homonyms are frequent. Family names may include the full extension or only parts of either the father and mother’s respective family names, or both, so there is a wide variation in progeny family names and not necessarily a common family name for all family members. Conclusion/ImplicationsDue to the specific national characteristics of name building in Brazil, modeling synthetic data is particularly challenging and needs to have more flexible rules in order to generate databases that will actually allow assessing the quality of data linkage processes.

2021 ◽  
Vol 15 (4) ◽  
pp. 1-20
Author(s):  
Georg Steinbuss ◽  
Klemens Böhm

Benchmarking unsupervised outlier detection is difficult. Outliers are rare, and existing benchmark data contains outliers with various and unknown characteristics. Fully synthetic data usually consists of outliers and regular instances with clear characteristics and thus allows for a more meaningful evaluation of detection methods in principle. Nonetheless, there have only been few attempts to include synthetic data in benchmarks for outlier detection. This might be due to the imprecise notion of outliers or to the difficulty to arrive at a good coverage of different domains with synthetic data. In this work, we propose a generic process for the generation of datasets for such benchmarking. The core idea is to reconstruct regular instances from existing real-world benchmark data while generating outliers so that they exhibit insightful characteristics. We propose and describe a generic process for the benchmarking of unsupervised outlier detection, as sketched so far. We then describe three instantiations of this generic process that generate outliers with specific characteristics, like local outliers. To validate our process, we perform a benchmark with state-of-the-art detection methods and carry out experiments to study the quality of data reconstructed in this way. Next to showcasing the workflow, this confirms the usefulness of our proposed process. In particular, our process yields regular instances close to the ones from real data. Summing up, we propose and validate a new and practical process for the benchmarking of unsupervised outlier detection.


2014 ◽  
Vol 12 (2) ◽  
pp. 93-106 ◽  
Author(s):  
Tobias Matzner

Purpose – Ubiquitous computing and “big data” have been widely recognized as requiring new concepts of privacy and new mechanisms to protect it. While improved concepts of privacy have been suggested, the paper aims to argue that people acting in full conformity to those privacy norms still can infringe the privacy of others in the context of ubiquitous computing and “big data”. Design/methodology/approach – New threats to privacy are described. Helen Nissenbaum's concept of “privacy as contextual integrity” is reviewed concerning its capability to grasp these problems. The argument is based on the assumption that the technologies work, persons are fully informed and capable of deciding according to advanced privacy considerations. Findings – Big data and ubiquitous computing enable privacy threats for persons whose data are only indirectly involved and even for persons about whom no data have been collected and processed. Those new problems are intrinsic to the functionality of these new technologies and need to be addressed on a social and political level. Furthermore, a concept of data minimization in terms of the quality of the data is proposed. Originality/value – The use of personal data as a threat to the privacy of others is established. This new perspective is used to reassess and recontextualize Helen Nissenbaum's concept of privacy. Data minimization in terms of quality of data is proposed as a new concept.


Author(s):  
Hoon Kim ◽  
Kangwook Lee ◽  
Gyeongjo Hwang ◽  
Changho Suh

Developing a computer vision-based algorithm for identifying dangerous vehicles requires a large amount of labeled accident data, which is difficult to collect in the real world. To tackle this challenge, we first develop a synthetic data generator built on top of a driving simulator. We then observe that the synthetic labels that are generated based on simulation results are very noisy, resulting in poor classification performance. In order to improve the quality of synthetic labels, we propose a new label adaptation technique that first extracts internal states of vehicles from the underlying driving simulator, and then refines labels by predicting future paths of vehicles based on a well-studied motion model. Via real-data experiments, we show that our dangerous vehicle classifier can reduce the missed detection rate by at least 18.5% compared with those trained with real data when time-to-collision is between 1.6s and 1.8s.


Sensors ◽  
2021 ◽  
Vol 21 (17) ◽  
pp. 5923
Author(s):  
Borja Saez-Mingorance ◽  
Antonio Escobar-Molero ◽  
Javier Mendez-Gomez ◽  
Encarnacion Castillo-Morales ◽  
Diego P. Morales-Santos

This work studies the feasibility of a novel two-step algorithm for infrastructure and object positioning, using pairwise distances. The proposal is based on the optimization algorithms, Scaling-by-Majorizing-a-Complicated-Function and the Limited-Memory-Broyden-Fletcher-Goldfarb-Shannon. A qualitative evaluation of these algorithms is performed for 3D positioning. As the final stage, smoothing filtering techniques are applied to estimate the trajectory, from the previously obtained positions. This approach can also be used as a synthetic gesture data generator framework. This framework is independent from the hardware and can be used to simulate the estimation of trajectories from noisy distances gathered with a large range of sensors by modifying the noise properties of the initial distances. The framework is validated, using a system of ultrasound transceivers. The results show this framework to be an efficient and simple positioning and filtering approach, accurately reconstructing the real path followed by the mobile object while maintaining low latency. Furthermore, these capabilities can be exploited by using the proposed algorithms for synthetic data generation, as demonstrated in this work, where synthetic ultrasound gesture data are generated.


2020 ◽  
Vol 13 (1) ◽  
pp. 140-162
Author(s):  
Viktoras Justickis

Abstract The role of balancing in the development and application of European data protection is enormous. European courts widely use it; it is the basis for harmonization of pan-European and national laws, plays a crucial role in everyday data protection. Therefore, the correctness of a huge number of critical decisions in the EU depends on the perfection of the balancing method. However, the real ability of the balancing method to cope with this mission has been subjected to intense criticism in the scientific literature. This criticism has highlighted its imperfections and casts doubt on its suitability to optimize the relation between competing rights. Paradoxically, the everyday practice of balancing tends to ignore this criticism. The limitations of the balancing method are typically not discussed and are not taken into account when considering legal cases and solving practical issues. Thus, it is tacitly assumed that the shortcomings and limitations of the balancing method, which the criticism points out, are irrelevant when making real-life decisions. This article discusses the scope of this phenomenon, its manifestations, and its impact on the quality of data protection decisions based on the balancing method:sub-optimality of these decisions, their opacity, public dissatisfaction with the legal regulation, its instability and low authority The ways of bridging the gap between the practice of balancing and science and broader consideration by the practice of the shortcomings of the balancing method identified during scientific discussions are considered.


Author(s):  
B. L. Armbruster ◽  
B. Kraus ◽  
M. Pan

One goal in electron microscopy of biological specimens is to improve the quality of data to equal the resolution capabilities of modem transmission electron microscopes. Radiation damage and beam- induced movement caused by charging of the sample, low image contrast at high resolution, and sensitivity to external vibration and drift in side entry specimen holders limit the effective resolution one can achieve. Several methods have been developed to address these limitations: cryomethods are widely employed to preserve and stabilize specimens against some of the adverse effects of the vacuum and electron beam irradiation, spot-scan imaging reduces charging and associated beam-induced movement, and energy-filtered imaging removes the “fog” caused by inelastic scattering of electrons which is particularly pronounced in thick specimens.Although most cryoholders can easily achieve a 3.4Å resolution specification, information perpendicular to the goniometer axis may be degraded due to vibration. Absolute drift after mechanical and thermal equilibration as well as drift after movement of a holder may cause loss of resolution in any direction.


Author(s):  
Elena A. Beigel ◽  
Natalya G. Kuptsova ◽  
Elena V. Katamanova ◽  
Oksana V. Ushakova ◽  
Oleg L. Lakhman

Introduction. Occupational chronic obstructive pulmonary disease (COPD) is one of the leading nosological forms of occupational respiratory disease. Numerous studies have shown high effectiveness of the combination of indacaterol/glycopyrronium (Ultibro®breezhaler®) on the impact on clinical and functional indicators in the treatment of COPD in General practice.The aim of the investigation the case of occupational COPD with the analysis of the dynamics of functional indicators, tolerance to physical load and evaluation of the quality of life of workers engaged in aluminum production by using combination of indacaterol/glycopyrronium.Materials and methods. The random sampling method included 20 men, workers of aluminum production, with the established diagnosis of professional COPD at the age of 40 to 60 years. The survey was conducted (Borg scale, medical Research Council scale (mMRC) and COPD Assessment Test (CAT). Functional methods of studies were conducted: spirometry, body plethysmography, electrocardiography (ECG) and the six-minute stepper test (6-MST).Results. Against the background of 8 weeks of therapy, the volume of forced exhalation for 1 minute (FEV1) increased by 14.7% and amounted to 67.90% of the due values, the forced vital capacity of the lungs (FVC) increased by 11.3% and amounted to 76.95% of the due. According to the body plethysmography (BPG) is set to decrease in residual lung volume on average by 13.4% and static hyperinflation, confirmed by the decrease in functional residual volume (FRV) of 18.8%. During the study period increased physical activity of patients. The average difference between the distance traveled in the six-minute step test before and after treatment was 58.8 m. The analysis of personal data showed that the quality of life of patients improved, the total score in the questionnaire CAT at the beginning of the study was 16.9 points, and after 8 weeks decreased by 63% and amounted to 10.7 points.Conclusions: The Results indicate a positive effect of combination therapy with indacaterol/glycopyrronium on the course and progression of occupational COPD.


Author(s):  
Nur Maimun ◽  
Jihan Natassa ◽  
Wen Via Trisna ◽  
Yeye Supriatin

The accuracy in administering the diagnosis code was the important matter for medical recorder, quality of data was the most important thing for health information management of medical recorder. This study aims to know the coder competency for accuracy and precision of using ICD 10 at X Hospital in Pekanbaru. This study was a qualitative method with case study implementation from five informan. The result show that medical personnel (doctor) have never received a training about coding, doctors writing that hard and difficult to read, failure for making diagnoses code or procedures, doctor used an usual abbreviations that are not standard, theres still an officer who are not understand about the nomenclature and mastering anatomy phatology, facilities and infrastructure were supported for accuracy and precision of the existing code. The errors of coding always happen because there is a human error. The accuracy and precision in coding very influence against the cost of INA CBGs, medical and the committee did most of the work in the case of severity level III, while medical record had a role in monitoring or evaluation of coding implementation. If there are resumes that is not clearly case mix team check file needed medical record the result the diagnoses or coding for conformity. Keywords: coder competency, accuracy and precision of coding, ICD 10


2017 ◽  
Vol 4 (1) ◽  
pp. 25-31 ◽  
Author(s):  
Diana Effendi

Information Product Approach (IP Approach) is an information management approach. It can be used to manage product information and data quality analysis. IP-Map can be used by organizations to facilitate the management of knowledge in collecting, storing, maintaining, and using the data in an organized. The  process of data management of academic activities in X University has not yet used the IP approach. X University has not given attention to the management of information quality of its. During this time X University just concern to system applications used to support the automation of data management in the process of academic activities. IP-Map that made in this paper can be used as a basis for analyzing the quality of data and information. By the IP-MAP, X University is expected to know which parts of the process that need improvement in the quality of data and information management.   Index term: IP Approach, IP-Map, information quality, data quality. REFERENCES[1] H. Zhu, S. Madnick, Y. Lee, and R. Wang, “Data and Information Quality Research: Its Evolution and Future,” Working Paper, MIT, USA, 2012.[2] Lee, Yang W; at al, Journey To Data Quality, MIT Press: Cambridge, 2006.[3] L. Al-Hakim, Information Quality Management: Theory and Applications. Idea Group Inc (IGI), 2007.[4] “Access : A semiotic information quality framework: development and comparative analysis : Journal ofInformation Technology.” [Online]. Available: http://www.palgravejournals.com/jit/journal/v20/n2/full/2000038a.html. [Accessed: 18-Sep-2015].[5] Effendi, Diana, Pengukuran Dan Perbaikan Kualitas Data Dan Informasi Di Perguruan Tinggi MenggunakanCALDEA Dan EVAMECAL (Studi Kasus X University), Proceeding Seminar Nasional RESASTEK, 2012, pp.TIG.1-TI-G.6.


2019 ◽  
Author(s):  
Bogdan Corneliu Andor ◽  
Dionisio Franco Barattini ◽  
Dumitru Emanuel Dogaru ◽  
Simone Guadagna ◽  
Serban Rosu

BACKGROUND Osteoarthritis (OA) is one of the top five most disabling conditions and it affects more than one third of persons over 65 years of age. Currently 80% of persons affected by OA already report having some movement limitation, 20% of people are not be able to perform major activities of daily living, and about 11% of the total affected population need of personal care. On 2014 the European Society for Clinical and Economic Aspects of Osteoporosis and Osteoarthritis (ESCEO) suggested as first step of pharmacological treatment for knee OA a background therapy with chronic symptomatic slow-acting drugs for osteoarthritis (SYSADOAs), such as glucosamine sulphate, chondroitin sulphate and hyaluronic acid (HA). In studies with oral HA, symptoms of OA are often measured using subjective parameters such as the visual analog scale (VAS) or the quality of life questionnaire (QoL) and objective measurements as ultrasonography (US) or range of motion (ROM) are employed in very few trials. This affects the quality of data in the literature. OBJECTIVE The primary objective of this work is to assess the feasibility of implementing US and ROM as objective measurements to correlate the improvement of knee mobility with pain reduction, evaluated using a subjective scale (VAS) in patients assuming a nutraceutical containing HA. The secondary objective is to evaluate the enrollment rate in one month to verify the feasibility for time and budget of the planned future main study. The explorative objective of the trial is to obtain preliminary data on efficacy of the tested product. METHODS This open-label pilot trial is performed in an orthopedic clinic (Timisoara, Romania). Male and female subjects (from 50 to 70 years) diagnosed with symptomatic OA of the knee with mild joint discomfort for at least 6 months are included. Following protocol, 8 patients are administered for 8 weeks Syalox® 300 Plus (River Pharma, Italy), a product based on HA of high molecular weight. Baseline and final visit assessments include orthopedic assessment, US, Knee injury and Osteoarthritis Outcome Score (KOOS) questionnaire, VAS and ROM of knee. RESULTS Data collection occurred between February 2018 and June 2018. All results are expected to be available by the end of 2018. CONCLUSIONS This pilot trial will be the first study to analyze the potential correlation between subjective evaluation (VAS, KOOS questionnaire) and objective measurements (US, ROM and actigraphy). The data from this study will assess the feasibility of the planned monthly recruitment rate and the necessary time and budget, and should provide preliminary information on efficacy of the tested product. CLINICALTRIAL ClinicalTrials.gov (NCT number: NCT03421054).


Sign in / Sign up

Export Citation Format

Share Document