data profiling
Recently Published Documents


TOTAL DOCUMENTS

103
(FIVE YEARS 36)

H-INDEX

9
(FIVE YEARS 1)

2022 ◽  
Vol 13 (2) ◽  
pp. 1-25
Author(s):  
Guangliang Gao ◽  
Zhifeng Bao ◽  
Jie Cao ◽  
A. K. Qin ◽  
Timos Sellis

Accurate house prediction is of great significance to various real estate stakeholders such as house owners, buyers, and investors. We propose a location-centered prediction framework that differs from existing work in terms of data profiling and prediction model. Regarding data profiling, we make an important observation as follows – besides the in-house features such as floor area, the location plays a critical role in house price prediction. Unfortunately, existing work either overlooked it or had a coarse grained measurement of locations. Thereby, we define and capture a fine-grained location profile powered by a diverse range of location data sources, including transportation profile, education profile, suburb profile based on census data, and facility profile. Regarding the choice of prediction model, we observe that a variety of approaches either consider the entire data for modeling, or split the entire house data and model each partition independently. However, such modeling ignores the relatedness among partitions, and for all prediction scenarios, there may not be sufficient training samples per partition for the latter approach. We address this problem by conducting a careful study of exploiting the Multi-Task Learning (MTL) model. Specifically, we map the strategies for splitting the entire house data to the ways the tasks are defined in MTL, and select specific MTL-based methods with different regularization terms to capture and exploit the relatedness among tasks. Based on real-world house transaction data collected in Melbourne, Australia, we design extensive experimental evaluations, and the results indicate a significant superiority of MTL-based methods over state-of-the-art approaches. Meanwhile, we conduct an in-depth analysis on the impact of task definitions and method selections in MTL on the prediction performance, and demonstrate that the impact of task definitions on prediction performance far exceeds that of method selections.


Author(s):  
Wan Rozaini Sheik Osman ◽  
◽  
Hapini Awang ◽  
Abdullahi Hassan Abdullahi Hassan ◽  
◽  
...  

Digital-Health Tourism Innovation (DTI) worldwide is in its infancy due to the emergent of coronavirus (COVID-19) disease. With the growth of open geometa data, use of government electronic services including electronic health (e-health), electronic commerce (e-commerce) and mobile health (m-health), Artificial Intelligence (AI) and machine learning strategies. Health and primary healthcare sectors are currently adopting these innovations for socio-economic wellbeing. Digital-health (also termed as e-health) is part of digital tourism innovation. Adapting geometa data profiling to develop a digital-health tourism framework for Primary Healthcare Workers (PHWs) to use mobile health technologies in COVID-19 vaccination trials are the key challenges of this study. Nevertheless, digital health tourism skills have been launched in developing Nations that created thousands of jobs to protect digital tourism businesses from potential vulnerabilities. Despite the benefits of this novel innovation, its deployment and implementation have been treated by inadequate of ICT facilities, lack of geometa data pre-processing to remove noise, data integrity, insufficient of academic research fundings, and reliable research methodology beyond COVID-19 vaccination trials to highlight these aspects. Therefore, qualitative, and quantitative research methods using Precaution Adoption Model Process (PAMP) questionnaire are employed to enable new ways of pre-processing behavior intention factors items. Eight academic researchers who were conversant with digital health technology validated 28 behavior intention factors with average factor loading values of 50% to 75%. Pilot survey conducted among 700 respondents from March 18, 2020, to September 10, 2021, among them are undergraduate students that may use this technology for research purposes. Pre-processed geometa data have shown percentage frequency counts of internet access and other online services 8% to 95%, adapted training factors 49% to 92% and factor items 34% to 78.3% for hypothesis generation towards development of digital health tourism framework in finding explanation to COVID-19 economic challenges. Except behavior intention factors and factor items insights are known and mapped, mobile health technology design process may result in poor conclusions. Thus, patients recovered from COVID-19 infection can still be infected again.


Author(s):  
Thomas Bläsius ◽  
Tobias Friedrich ◽  
Julius Lischeid ◽  
Kitty Meeks ◽  
Martin Schirneck
Keyword(s):  

Author(s):  
Nishita Shewale

Abstract: To introduce unified information systems, this will provide different establishments with an insight on how data related activities take place and there results with assured quality. Considering data accumulation, replication, missing entities, incorrect formatting, anomalies etc. can come to light in the collection of data in different information systems, which can cause an array of adverse effects on data quality, the subject of data quality should be treated with better results. This paper inspects the data quality problems in information systems and introduces the new techniques that enable organizations to improve their quality of data. Keywords: Information Systems (IS), Data Quality, Data Cleaning, Data Profiling, Standardization, Database, Organization


2021 ◽  
Author(s):  
Xuan C. Li ◽  
Yuelin Liu ◽  
Farid Rashidi ◽  
Salem Malikic ◽  
Stephen M. Mount ◽  
...  

Author(s):  
Hashim Mude

The 2013 general election marked the entry of data-driven campaigning into Kenyan politics as political parties begun collecting and storing voter data. More sophisticated techniques were deployed in 2017 as politicians retained the services of data analytics firms such as Cambridge Analytica, accused of digital colonialism and undermining democracies. It is alleged that political parties engaged in regular targeting and more intrusive micro-targeting, facilitated by the absence of a data protection legal framework.The promulgation of the Data Protection Act, 2019, ostensibly remedied this gap. This paper analyses whether, and to what extent, political parties can rely on the same–or similar– regular targeting and micro-targeting techniques in subsequent elections. While regular targeting differs from micro-targeting as the latter operates at a more granular level, both comprise of three steps- collecting a voter’s personal data, profiling them, and sending out targeted messages. This paper considers the legality of each of these steps in turn. It finds that going forward, such practices will likely require the consent of the data subject. However, the Act provides for several exceptions which political parties could abuse to circumvent this requirement. There are also considerable loopholes that allow open access to voter data in the electoral list as well as the personal data of the members of a rival political party. The efficacy of the Data Protection Act will largely rest on whether the Data Protection Commissioner will interpret it progressively and hold political parties to account.


2021 ◽  
Author(s):  
Xuan Cindy Li ◽  
Yuelin Liu ◽  
Farid Rashidi Mehrabadi ◽  
Salem Malikić ◽  
Stephen M. Mount ◽  
...  

AbstractRecent studies on the heritability of methylation patterns in tumor cells, suggest that tumor heterogeneity and progression can be studied through methylation changes. To elucidate methylation-based evolution trajectories in tumors, we introduce a novel computational frame-work for methylation phylogeny reconstruction, leveraging single cell bisulfite treated whole genome sequencing data (scBS-seq), additionally incorporating copy number information inferred independently from matched single cell RNA sequencing (scRNA-seq) data, when available. Our framework consists of three components: (i) noise-minimizing site selection, (ii) likelihood-based sequencing error correction, and (iii) pairwise expected distance calculation for cells, all designed to mitigate the effect of noise and uncertainty due to data sparsity commonly observed in scBS-seq data. We validate our approach with the scBS-seq data of multi-regionally sampled colorectal cancer cells, and demonstrate that the cell lineages constructed by our method strongly correlate with original sampling regions. Additionally, we show that the constructed phylogeny can be used to impute missing entries, which, in turn, may help reduce sparsity issues in scBS-seq data [email protected]


2021 ◽  
Vol 10 (1) ◽  
Author(s):  
Wilbert Serrano ◽  
Raul M. Olaechea ◽  
Luis Cerpa ◽  
Jose Herrera ◽  
Aldo Indacochea

ABSTRACT Hydrothermal vent activity is often associated with submarine volcanism. Here, we investigated the presence of microorganisms related to hydrothermal activity in the Orca seamount. Data profiling of the 16S rRNA gene amplicon sequences revealed a diversity pattern dominated mainly by the phyla Proteobacteria, Acidobacteria, Planctomycetes, and Bacteroidetes.


2021 ◽  
pp. 1-2
Author(s):  
Patrick Juola
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document