data profiling Latest Research Papers

Location-Centered House Price Prediction: A Multi-Task Learning Approach

ACM Transactions on Intelligent Systems and Technology ◽

10.1145/3501806 ◽

2022 ◽

Vol 13 (2) ◽

pp. 1-25

Author(s):

Guangliang Gao ◽

Zhifeng Bao ◽

Jie Cao ◽

A. K. Qin ◽

Timos Sellis

Keyword(s):

Prediction Model ◽

House Price ◽

Prediction Performance ◽

Coarse Grained ◽

Diverse Range ◽

Price Prediction ◽

Task Learning ◽

Data Profiling ◽

Depth Analysis ◽

The Impact

Accurate house prediction is of great significance to various real estate stakeholders such as house owners, buyers, and investors. We propose a location-centered prediction framework that differs from existing work in terms of data profiling and prediction model. Regarding data profiling, we make an important observation as follows – besides the in-house features such as floor area, the location plays a critical role in house price prediction. Unfortunately, existing work either overlooked it or had a coarse grained measurement of locations. Thereby, we define and capture a fine-grained location profile powered by a diverse range of location data sources, including transportation profile, education profile, suburb profile based on census data, and facility profile. Regarding the choice of prediction model, we observe that a variety of approaches either consider the entire data for modeling, or split the entire house data and model each partition independently. However, such modeling ignores the relatedness among partitions, and for all prediction scenarios, there may not be sufficient training samples per partition for the latter approach. We address this problem by conducting a careful study of exploiting the Multi-Task Learning (MTL) model. Specifically, we map the strategies for splitting the entire house data to the ways the tasks are defined in MTL, and select specific MTL-based methods with different regularization terms to capture and exploit the relatedness among tasks. Based on real-world house transaction data collected in Melbourne, Australia, we design extensive experimental evaluations, and the results indicate a significant superiority of MTL-based methods over state-of-the-art approaches. Meanwhile, we conduct an in-depth analysis on the impact of task definitions and method selections in MTL on the prediction performance, and demonstrate that the impact of task definitions on prediction performance far exceeds that of method selections.

Digital-Health Tourism Research-Methodology Coronavirus-Vaccination Trials: A Study Interpreting Geometa-Data Profiling to use Mobile-Health Technologies Nigeria

Emerging Advances in Integrated Technology ◽

10.30880/emait.2021.02.02.005 ◽

2021 ◽

Vol 2 (2) ◽

Author(s):

Wan Rozaini Sheik Osman ◽

◽

Hapini Awang ◽

Abdullahi Hassan Abdullahi Hassan ◽

◽

...

Keyword(s):

Learning Strategies ◽

Mobile Health ◽

Primary Healthcare ◽

Research Methodology ◽

Digital Health ◽

Health Technology ◽

Health Tourism ◽

Health Technologies ◽

Behavior Intention ◽

Data Profiling

Digital-Health Tourism Innovation (DTI) worldwide is in its infancy due to the emergent of coronavirus (COVID-19) disease. With the growth of open geometa data, use of government electronic services including electronic health (e-health), electronic commerce (e-commerce) and mobile health (m-health), Artificial Intelligence (AI) and machine learning strategies. Health and primary healthcare sectors are currently adopting these innovations for socio-economic wellbeing. Digital-health (also termed as e-health) is part of digital tourism innovation. Adapting geometa data profiling to develop a digital-health tourism framework for Primary Healthcare Workers (PHWs) to use mobile health technologies in COVID-19 vaccination trials are the key challenges of this study. Nevertheless, digital health tourism skills have been launched in developing Nations that created thousands of jobs to protect digital tourism businesses from potential vulnerabilities. Despite the benefits of this novel innovation, its deployment and implementation have been treated by inadequate of ICT facilities, lack of geometa data pre-processing to remove noise, data integrity, insufficient of academic research fundings, and reliable research methodology beyond COVID-19 vaccination trials to highlight these aspects. Therefore, qualitative, and quantitative research methods using Precaution Adoption Model Process (PAMP) questionnaire are employed to enable new ways of pre-processing behavior intention factors items. Eight academic researchers who were conversant with digital health technology validated 28 behavior intention factors with average factor loading values of 50% to 75%. Pilot survey conducted among 700 respondents from March 18, 2020, to September 10, 2021, among them are undergraduate students that may use this technology for research purposes. Pre-processed geometa data have shown percentage frequency counts of internet access and other online services 8% to 95%, adapted training factors 49% to 92% and factor items 34% to 78.3% for hypothesis generation towards development of digital health tourism framework in finding explanation to COVID-19 economic challenges. Except behavior intention factors and factor items insights are known and mapped, mobile health technology design process may result in poor conclusions. Thus, patients recovered from COVID-19 infection can still be infected again.

Efficiently Enumerating Hitting Sets of Hypergraphs Arising in Data Profiling

Journal of Computer and System Sciences ◽

10.1016/j.jcss.2021.10.002 ◽

2021 ◽

Author(s):

Thomas Bläsius ◽

Tobias Friedrich ◽

Julius Lischeid ◽

Kitty Meeks ◽

Martin Schirneck

Keyword(s):

Hitting Sets ◽

Data Profiling

Enhancement of Data Quality in the Information Systems within Organization

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.38226 ◽

2021 ◽

Vol 9 (9) ◽

pp. 1490-1497

Author(s):

Nishita Shewale

Keyword(s):

Information Systems ◽

Adverse Effects ◽

Data Quality ◽

Data Cleaning ◽

Quality Data ◽

Quality Of Data ◽

New Techniques ◽

Data Profiling ◽

The Subject

Abstract: To introduce unified information systems, this will provide different establishments with an insight on how data related activities take place and there results with assured quality. Considering data accumulation, replication, missing entities, incorrect formatting, anomalies etc. can come to light in the collection of data in different information systems, which can cause an array of adverse effects on data quality, the subject of data quality should be treated with better results. This paper inspects the data quality problems in information systems and introduces the new techniques that enable organizations to improve their quality of data. Keywords: Information Systems (IS), Data Quality, Data Cleaning, Data Profiling, Standardization, Database, Organization

Abstract LB020: Epigenomic tumor evolution modeling with single-cell methylation data profiling

10.1158/1538-7445.am2021-lb020 ◽

2021 ◽

Author(s):

Xuan C. Li ◽

Yuelin Liu ◽

Farid Rashidi ◽

Salem Malikic ◽

Stephen M. Mount ◽

...

Keyword(s):

Single Cell ◽

Tumor Evolution ◽

Methylation Data ◽

Data Profiling ◽

Evolution Modeling

Political Micro-Targeting in Kenya: An Analysis of the Legality of Data-Driven Campaign Strategies under the Data Protection Act

Journal of Intellectual Property and Information Technology Law (JIPIT) ◽

10.52907/jipit.v1i1.61 ◽

2021 ◽

Vol 1 (1) ◽

pp. 7-36

Author(s):

Hashim Mude

Keyword(s):

Political Parties ◽

Data Protection ◽

Political Party ◽

Data Analytics ◽

Personal Data ◽

Data Driven ◽

Campaign Strategies ◽

Data Profiling ◽

Targeted Messages ◽

Data Subject

The 2013 general election marked the entry of data-driven campaigning into Kenyan politics as political parties begun collecting and storing voter data. More sophisticated techniques were deployed in 2017 as politicians retained the services of data analytics firms such as Cambridge Analytica, accused of digital colonialism and undermining democracies. It is alleged that political parties engaged in regular targeting and more intrusive micro-targeting, facilitated by the absence of a data protection legal framework.The promulgation of the Data Protection Act, 2019, ostensibly remedied this gap. This paper analyses whether, and to what extent, political parties can rely on the same–or similar– regular targeting and micro-targeting techniques in subsequent elections. While regular targeting differs from micro-targeting as the latter operates at a more granular level, both comprise of three steps- collecting a voter’s personal data, profiling them, and sending out targeted messages. This paper considers the legality of each of these steps in turn. It finds that going forward, such practices will likely require the consent of the data subject. However, the Act provides for several exceptions which political parties could abuse to circumvent this requirement. There are also considerable loopholes that allow open access to voter data in the electoral list as well as the personal data of the members of a rival political party. The efficacy of the Data Protection Act will largely rest on whether the Data Protection Commissioner will interpret it progressively and hold political parties to account.

Epigenomic tumor evolution modeling with single-cell methylation data profiling

10.1101/2021.03.22.436475 ◽

2021 ◽

Author(s):

Xuan Cindy Li ◽

Yuelin Liu ◽

Farid Rashidi Mehrabadi ◽

Salem Malikić ◽

Stephen M. Mount ◽

...

Keyword(s):

Single Cell ◽

Sequencing Error ◽

Whole Genome Sequencing Data ◽

Tumor Evolution ◽

Data Sets ◽

Sequencing Data ◽

Data Profiling ◽

Colorectal Cancer Cells ◽

Methylation Patterns ◽

Evolution Modeling

AbstractRecent studies on the heritability of methylation patterns in tumor cells, suggest that tumor heterogeneity and progression can be studied through methylation changes. To elucidate methylation-based evolution trajectories in tumors, we introduce a novel computational frame-work for methylation phylogeny reconstruction, leveraging single cell bisulfite treated whole genome sequencing data (scBS-seq), additionally incorporating copy number information inferred independently from matched single cell RNA sequencing (scRNA-seq) data, when available. Our framework consists of three components: (i) noise-minimizing site selection, (ii) likelihood-based sequencing error correction, and (iii) pairwise expected distance calculation for cells, all designed to mitigate the effect of noise and uncertainty due to data sparsity commonly observed in scBS-seq data. We validate our approach with the scBS-seq data of multi-regionally sampled colorectal cancer cells, and demonstrate that the cell lineages constructed by our method strongly correlate with original sampling regions. Additionally, we show that the constructed phylogeny can be used to impute missing entries, which, in turn, may help reduce sparsity issues in scBS-seq data [email protected]

Bacterial Diversity Profiling around the Orca Seamount in the Bransfield Strait, Antarctica, Based on 16S rRNA Gene Amplicon Sequences

Microbiology Resource Announcements ◽

10.1128/mra.01290-20 ◽

2021 ◽

Vol 10 (1) ◽

Author(s):

Wilbert Serrano ◽

Raul M. Olaechea ◽

Luis Cerpa ◽

Jose Herrera ◽

Aldo Indacochea

Keyword(s):

16S Rrna ◽

16S Rrna Gene ◽

Bacterial Diversity ◽

Hydrothermal Vent ◽

Hydrothermal Activity ◽

Rrna Gene ◽

Diversity Pattern ◽

Submarine Volcanism ◽

Data Profiling ◽

The 16S Rrna Gene

ABSTRACT Hydrothermal vent activity is often associated with submarine volcanism. Here, we investigated the presence of microorganisms related to hydrothermal activity in the Orca seamount. Data profiling of the 16S rRNA gene amplicon sequences revealed a diversity pattern dominated mainly by the phyla Proteobacteria, Acidobacteria, Planctomycetes, and Bacteroidetes.

Data Profiling

Encyclopedia of Big Data ◽

10.1007/978-3-319-32001-4_424-1 ◽

2021 ◽

pp. 1-2

Author(s):

Patrick Juola

Keyword(s):

Data Profiling

Data Profiling over Big Data Area

Advances in Intelligent Systems and Computing - Intelligent Systems in Big Data, Semantic Web and Machine Learning ◽

10.1007/978-3-030-72588-4_8 ◽

2021 ◽

pp. 111-123

Author(s):

Bahaa Eddine Elbaghazaoui ◽

Mohamed Amnai ◽

Abdellatif Semmouri

Keyword(s):

Big Data ◽

Data Profiling

data profiling
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Location-Centered House Price Prediction: A Multi-Task Learning Approach

Digital-Health Tourism Research-Methodology Coronavirus-Vaccination Trials: A Study Interpreting Geometa-Data Profiling to use Mobile-Health Technologies Nigeria

Efficiently Enumerating Hitting Sets of Hypergraphs Arising in Data Profiling

Enhancement of Data Quality in the Information Systems within Organization

Abstract LB020: Epigenomic tumor evolution modeling with single-cell methylation data profiling

Political Micro-Targeting in Kenya: An Analysis of the Legality of Data-Driven Campaign Strategies under the Data Protection Act

Epigenomic tumor evolution modeling with single-cell methylation data profiling

Bacterial Diversity Profiling around the Orca Seamount in the Bransfield Strait, Antarctica, Based on 16S rRNA Gene Amplicon Sequences

Data Profiling

Data Profiling over Big Data Area

Export Citation Format

data profilingRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Location-Centered House Price Prediction: A Multi-Task Learning Approach

Digital-Health Tourism Research-Methodology Coronavirus-Vaccination Trials: A Study Interpreting Geometa-Data Profiling to use Mobile-Health Technologies Nigeria

Efficiently Enumerating Hitting Sets of Hypergraphs Arising in Data Profiling

Enhancement of Data Quality in the Information Systems within Organization

Abstract LB020: Epigenomic tumor evolution modeling with single-cell methylation data profiling

Political Micro-Targeting in Kenya: An Analysis of the Legality of Data-Driven Campaign Strategies under the Data Protection Act

Epigenomic tumor evolution modeling with single-cell methylation data profiling

Bacterial Diversity Profiling around the Orca Seamount in the Bransfield Strait, Antarctica, Based on 16S rRNA Gene Amplicon Sequences

Data Profiling

Data Profiling over Big Data Area

data profiling
Recently Published Documents