scholarly journals Exceptional Data Quality using Intelligent Matching and Retrieval

AI Magazine ◽  
2010 ◽  
Vol 31 (1) ◽  
pp. 65 ◽  
Author(s):  
Clint R. Bidlack ◽  
Michael P Wellman

Recent advances in enterprise web-based software have created a need for sophisticated yet user-friendly data quality solutions. A new category of data quality solutions are discussed that fill this need using intelligent matching and retrieval algorithms. Solutions are focused on customer and sales data and include real-time inexact search, batch processing, and data migration. Users are empowered to maintain higher quality data resulting in more efficient sales and marketing operations. Sales managers spend more time with customers and less time managing data.

2021 ◽  
Author(s):  
Victoria Leong ◽  
Kausar Raheel ◽  
Sim Jia Yi ◽  
Kriti Kacker ◽  
Vasilis M. Karlaftis ◽  
...  

Background. The global COVID-19 pandemic has triggered a fundamental reexamination of how human psychological research can be conducted both safely and robustly in a new era of digital working and physical distancing. Online web-based testing has risen to the fore as a promising solution for rapid mass collection of cognitive data without requiring human contact. However, a long-standing debate exists over the data quality and validity of web-based studies. Here, we examine the opportunities and challenges afforded by the societal shift toward web-based testing, highlight an urgent need to establish a standard data quality assurance framework for online studies, and develop and validate a new supervised online testing methodology, remote guided testing (RGT). Methods. A total of 85 healthy young adults were tested on 10 cognitive tasks assessing executive functioning (flexibility, memory and inhibition) and learning. Tasks were administered either face-to-face in the laboratory (N=41) or online using remote guided testing (N=44), delivered using identical web-based platforms (CANTAB, Inquisit and i-ABC). Data quality was assessed using detailed trial-level measures (missed trials, outlying and excluded responses, response times), as well as overall task performance measures. Results. The results indicated that, across all measures of data quality and performance, RGT data was statistically-equivalent to data collected in person in the lab. Moreover, RGT participants out-performed the lab group on measured verbal intelligence, which could reflect test environment differences, including possible effects of mask-wearing on communication. Conclusions. These data suggest that the RGT methodology could help to ameliorate concerns regarding online data quality and - particularly for studies involving high-risk or rare cohorts - offer an alternative for collecting high-quality human cognitive data without requiring in-person physical attendance.


2020 ◽  
Author(s):  
João Pedro Marques ◽  
Ana Luísa Carvalho ◽  
José Henriques ◽  
Joaquim Neto Murta ◽  
Jorge Saraiva ◽  
...  

Abstract BACKGROUND The development of multicenter patient registries promotes the generation of scientific knowledge by using real-world data. A country-wide, web-based registry for inherited retinal dystrophies (IRDs) empowers patients and community organizations, while supporting formal partnerships with investigators and stakeholders in the global aim to develop high-value, high-utility research. We aim to describe the design, development and deployment of a country-wide, web-based, user-friendly and interoperable registry for IRDs – the IRD-PT. RESULTS The IRD-PT is a clinical/genetic research registry included in the retina.pt platform (http://www.retina.com.pt), which was developed by the Portuguese Retina Study Group. The retina.pt platform collects data on individuals diagnosed with retinal diseases, from several sites across Portugal, with over 1800 participants and over 30,000 consultations to date. The IRD-PT module interacts with the retina.pt core system which provides a range of basic functions for patient data management, while the IRD-PT module allows data capture for the specific purpose of IRDs. All IRDs are coded accordingly to the International Statistical Classification of Diseases and Related Health Problems (ICD) 9, ICD 10, ICD 11, and Orphanet Rare Disease Ontology (ORPHA codes) to make the IRD-PT interoperable with other IRD registries across the world. Furthermore, the genes are coded according to the Ontology of Genes and Genomes and Online Mendelian Inheritance in Man, whereas signs and symptoms are coded according to the Human Phenotype Ontology. The IRD-PT module pre-launched at Centro Hospitalar e Universitário de Coimbra , the largest reference center for IRDs in Portugal. As of April 1 st 2020, finalized data from 537 participants were available for this preliminary analysis. CONCLUSIONS In the specific field of rare diseases, the use of registries increases research accessibility for individuals, while providing clinicians/investigators with a coherent data ecosystem necessary to boost research. Appropriate design and implementation of patient registries enables rapid decision making and ongoing data mining, ultimately leading to improved patient outcomes. We have described here the principles behind the design, development and deployment of a web-based, user-friendly and interoperable software tool aimed to generate important knowledge and collect high-quality data on the epidemiology, genomic landscape and natural history of IRDs in Portugal.


2021 ◽  
Author(s):  
Victoria Leong ◽  
Kausar Raheel ◽  
Jia Yi Sim ◽  
Kriti Kacker ◽  
Vasilis M Karlaftis ◽  
...  

BACKGROUND The global COVID-19 pandemic has triggered a fundamental reexamination of how human psychological research can be conducted both safely and robustly in a new era of digital working and physical distancing. Online web-based testing has risen to the fore as a promising solution for rapid mass collection of cognitive data without requiring human contact. However, a long-standing debate exists over the data quality and validity of web-based studies. OBJECTIVE Here, we examine the opportunities and challenges afforded by the societal shift toward web-based testing, highlight an urgent need to establish a standard data quality assurance framework for online studies, and develop and validate a new supervised online testing methodology, remote guided testing (RGT). METHODS A total of 85 healthy young adults were tested on 10 cognitive tasks assessing executive functioning (flexibility, memory and inhibition) and learning. Tasks were administered either face-to-face in the laboratory (N=41) or online using remote guided testing (N=44), delivered using identical web-based platforms (CANTAB, Inquisit and i-ABC). Data quality was assessed using detailed trial-level measures (missed trials, outlying and excluded responses, response times), as well as overall task performance measures. RESULTS The results indicated that, across all measures of data quality and performance, RGT data was statistically-equivalent to data collected in person in the lab. Moreover, RGT participants out-performed the lab group on measured verbal intelligence, which could reflect test environment differences, including possible effects of mask-wearing on communication. CONCLUSIONS These data suggest that the RGT methodology could help to ameliorate concerns regarding online data quality and - particularly for studies involving high-risk or rare cohorts - offer an alternative for collecting high-quality human cognitive data without requiring in-person physical attendance. CLINICALTRIAL N.A.


2016 ◽  
Author(s):  
Alfred Enyekwe ◽  
Osahon Urubusi ◽  
Raufu Yekini ◽  
Iorkam Azoom ◽  
Oloruntoba Isehunwa

ABSTRACT Significant emphasis on data quality is placed on real-time drilling data for the optimization of drilling operations and on logging data for quality lithological and petrophysical description of a field. This is evidenced by huge sums spent on real time MWD/LWD tools, broadband services, wireline logging tools, etc. However, a lot more needs to be done to harness quality data for future workover and or abandonment operations where data being relied on is data that must have been entered decades ago and costs and time spent are critically linked to already known and certified information. In some cases, data relied on has been migrated across different data management platforms, during which relevant data might have been lost, mis-interpreted or mis-placed. Another common cause of wrong data is improperly documented well intervention operations which have been done in such a short time, that there is no pressure to document the operation properly. This leads to confusion over simple issues such as what depth a plug was set, or what junk was left in hole. The relative lack of emphasis on this type of data quality has led to high costs of workover and abandonment operations. In some cases, well control incidents and process safety incidents have arisen. This paper looks at over 20 workover operations carried out in a span of 10 years. An analysis is done on the wells’ original timeline of operation. The data management system is generally analyzed and a categorization of issues experienced during the workover operations is outlined. Bottlenecks in data management are defined and solutions currently being implemented to manage these problems are listed as recommended good practices.


Author(s):  
Nigel W.T. Quinn ◽  
Roberta Tassey ◽  
Jun Wang

This chapter describes a new approach to environmental decision support for salinity management in the San Joaquin Basin that focuses on Web-based data sharing using tools such as YSI Econet and continuous data quality management using an enterprise-level software tool WISKI. These tools offer real-time Web-access to sensor data as well as providing the owner full control over the way the data is visualized. The same websites use GIS to superimpose the monitoring site locations on maps of local hydrography and allow point and click access to the data collected at each environmental monitoring site. This information technology suite of software and hardware work together with a watershed simulation model WARMF-SJR to provide timely, reliable, and high quality data and forecasts of river salinity that can used by stakeholder decision makers to ensure compliance with state water quality objectives.


Author(s):  
Nigel W.T. Quinn ◽  
Roberta Tassey ◽  
Jun Wang

This chapter describes a new approach to environmental decision support for salinity management in the San Joaquin Basin that focuses on Web-based data sharing using tools such as YSI Econet and continuous data quality management using an enterprise-level software tool WISKI. These tools offer real-time Web-access to sensor data as well as providing the owner full control over the way the data is visualized. The same websites use GIS to superimpose the monitoring site locations on maps of local hydrography and allow point and click access to the data collected at each environmental monitoring site. This information technology suite of software and hardware work together with a watershed simulation model WARMF-SJR to provide timely, reliable, and high quality data and forecasts of river salinity that can used by stakeholder decision makers to ensure compliance with state water quality objectives.


2020 ◽  
Vol 15 (1) ◽  
Author(s):  
João Pedro Marques ◽  
Ana Luísa Carvalho ◽  
José Henriques ◽  
Joaquim Neto Murta ◽  
Jorge Saraiva ◽  
...  

Abstract Background The development of multicenter patient registries promotes the generation of scientific knowledge by using real-world data. A country-wide, web-based registry for inherited retinal dystrophies (IRDs) empowers patients and community organizations, while supporting formal partnerships research. We aim to describe the design, development and deployment of a country-wide, with investigators and stakeholders in the global aim to develop high-value, high-utility web-based, user-friendly and interoperable registry for IRDs—the IRD-PT. Results The IRD-PT is a clinical/genetic research registry included in the retina.pt platform (https://www.retina.com.pt), which was developed by the Portuguese Retina Study Group. The retina.pt platform collects data on individuals diagnosed with retinal diseases, from several sites across Portugal, with over 1800 participants and over 30,000 consultations to date. The IRD-PT module interacts with the retina.pt core system which provides a range of basic functions for patient data management, while the IRD-PT module allows data capture for the specific purpose of IRDs. All IRDs are coded accordingly to the International Statistical Classification of Diseases and Related Health Problems (ICD) 9, ICD 10, ICD 11, and Orphanet Rare Disease Ontology (ORPHA codes) to make the IRD-PT interoperable with other IRD registries across the world. Furthermore, the genes are coded according to the Ontology of Genes and Genomes and Online Mendelian Inheritance in Man, whereas signs and symptoms are coded according to the Human Phenotype Ontology. The IRD-PT module pre-launched at Centro Hospitalar e Universitário de Coimbra, the largest reference center for IRDs in Portugal. As of April 1st 2020, finalized data from 537 participants were available for this preliminary analysis. Conclusions In the specific field of rare diseases, the use of registries increases research accessibility for individuals, while providing clinicians/investigators with a coherent data ecosystem necessary to boost research. Appropriate design and implementation of patient registries enables rapid decision making and ongoing data mining, ultimately leading to improved patient outcomes. We have described here the principles behind the design, development and deployment of a web-based, user-friendly and interoperable software tool aimed to generate important knowledge and collecting high-quality data on the epidemiology, genomic landscape and natural history of IRDs in Portugal.


2021 ◽  
Vol 40 (4) ◽  
pp. 713-727
Author(s):  
F.M. Dahunsi ◽  
A.J. Joseph ◽  
O.A. Sarumi ◽  
O.O. Obe

The evaluation of mobile crowdsourcing activities and reports require a viable and large volume of data. These data are gathered in real-time and from a large number of paid or unpaid volunteers over a period. A high volume of quality data from smartphones or mobile devices is pivotal to the accuracy and validity of the results. Therefore, there is a need for a robust and scalable database structure that can effectively manage and store the large volumes of data collected from various volunteers without compromising the integrity of the data. An in-depth review of various database designs to select the most suitable that will meet the needs of a real-time, robust and large volunteer data handling system is presented. A non-relational database was proposed for the mobile- end database: Google Cloud Firestore specifically due to its support for mobile client implementation, this choice also makes the integration of data from the mobile end-users to the cloud-hosted database relatively easier with all proposed services being part of the Google Cloud Platform; although it is not as popular as some other database services. Separate comparative reviews of the Database Management System (DBMS) performance demonstrated that MongoDB (a non-relational database) performed better when reading large datasets and performing full-text queries, while MySQL (relational) and Cassandra (non-relational) performed much better for data insertion. Google BigQuery was proposed as an appropriate data warehouse solution. It will provide continuity and direct integration with Cloud Firestore and its Application Programming Interface (API) for data migration from Cloud Firestore to BigQuery, and the local server. Also Google BigQuery provides machine learning support for data analytics.


2020 ◽  
Author(s):  
João Pedro Marques ◽  
Ana Luísa Carvalho ◽  
José Henriques ◽  
Joaquim Neto Murta ◽  
Jorge Saraiva ◽  
...  

Abstract BACKGROUNDThe development of multicenter patient registries promotes the generation of scientific knowledge by using real-world data. A country-wide, web-based registry for inherited retinal dystrophies (IRDs) empowers patients and community organizations, while supporting formal partnerships with investigators and stakeholders in the global aim to develop high-value, high-utility research. We aim to describe the design, development and deployment of a country-wide, web-based, user-friendly and interoperable registry for IRDs – the IRD-PT.RESULTSThe IRD-PT is a clinical/genetic research registry included in the retina.pt platform (http://www.retina.com.pt), which was developed by the Portuguese Retina Study Group. The retina.pt platform collects data on individuals diagnosed with retinal diseases, from several sites across Portugal, with over 1800 participants and over 30,000 consultations to date. The IRD-PT module interacts with the retina.pt core system which provides a range of basic functions for patient data management, while the IRD-PT module allows data capture for the specific purpose of IRDs. All IRDs are coded accordingly to the International Statistical Classification of Diseases and Related Health Problems (ICD) 9, ICD 10, ICD 11, and Orphanet Rare Disease Ontology (ORPHA codes) to make the IRD-PT interoperable with other IRD registries across the world. Furthermore, the genes are coded according to the Ontology of Genes and Genomes and Online Mendelian Inheritance in Man, whereas signs and symptoms are coded according to the Human Phenotype Ontology. The IRD-PT module pre-launched at Centro Hospitalar e Universitário de Coimbra, the largest reference center for IRDs in Portugal. As of April 1st 2020, finalized data from 537 participants were available for this preliminary analysis.CONCLUSIONSIn the specific field of rare diseases, the use of registries increases research accessibility for individuals, while providing clinicians/investigators with a coherent data ecosystem necessary to boost research. Appropriate design and implementation of patient registries enables rapid decision making and ongoing data mining, ultimately leading to improved patient outcomes. We have described here the principles behind the design, development and deployment of a web-based, user-friendly and interoperable software tool aimed to generate important knowledge and collecting high-quality data on the epidemiology, genomic landscape and natural history of IRDs in Portugal.


2021 ◽  
Vol 54 (1) ◽  
Author(s):  
Raúl Arias-Carrasco ◽  
Jeevan Giddaluru ◽  
Lucas E. Cardozo ◽  
Felipe Martins ◽  
Vinicius Maracaja-Coutinho ◽  
...  

AbstractThe current COVID-19 pandemic has already claimed more than 3.7 million victims and it will cause more deaths in the coming months. Tools that track the number and locations of cases are critical for surveillance and help in making policy decisions for controlling the outbreak. However, the current surveillance web-based dashboards run on proprietary platforms, which are often expensive and require specific computational knowledge. We developed a user-friendly web tool, named OUTBREAK, that facilitates epidemic surveillance by showing in an animated graph the timeline and geolocations of cases of an outbreak. It permits even non-specialist users to input data most conveniently and track outbreaks in real-time. We applied our tool to visualize the SARS 2003, MERS, and COVID19 epidemics, and provided them as examples on the website. Through the zoom feature, it is also possible to visualize cases at city and even neighborhood levels. We made the tool freely available at https://outbreak.sysbio.tools/. OUTBREAK has the potential to guide and help health authorities to intervene and minimize the effects of outbreaks.


Sign in / Sign up

Export Citation Format

Share Document