data heterogeneity Latest Research Papers

The present study provides a simplified framework verifying the degree of coverage and completeness of settlement maps derived from the OpenStreetMap (OSM) database at the national scale, with a possible use in official statistics. Measuring the completeness of the objects (i.e., buildings) derived from OpenStreetMap database supports its potential use in building/population censuses and other diachronic surveys, as well as administrative sources such as the register of building permits and land-use cadasters. A series of measurements at different scales are proposed and tested for Italy, in line with earlier studies. While recognizing the potential of the OpenStreetMap database for official statistics, the present work underlines the urgent need of an additional (spatially explicit) analysis overcoming the data heterogeneity and sub-optimal coverage of the OSM information source.

Download Full-text

View VULMA: Data Set for Training a Machine-Learning Tool for a Fast Vulnerability Analysis of Existing Buildings

Data ◽

10.3390/data7010004 ◽

2021 ◽

Vol 7 (1) ◽

pp. 4

Author(s):

Angelo Cardellicchio ◽

Sergio Ruggieri ◽

Valeria Leggieri ◽

Giuseppina Uva

Keyword(s):

Machine Learning ◽

Vulnerability Analysis ◽

Data Availability ◽

Training Data ◽

Learning Tools ◽

Existing Buildings ◽

Data Set ◽

Data Assessment ◽

Data Heterogeneity ◽

Evaluation Parameters

The paper presents View VULMA, a data set specifically designed for training machine-learning tools for elaborating fast vulnerability analysis of existing buildings. Such tools require supervised training via an extensive set of building imagery, for which several typological parameters should be defined, with a proper label assigned to each sample on a per-parameter basis. Thus, it is clear how defining an adequate training data set plays a key role, and several aspects should be considered, such as data availability, preprocessing, augmentation and balancing according to the selected labels. In this paper, we highlight all these issues, describing the pursued strategies to elaborate a reliable data set. In particular, a detailed description of both requirements (e.g., scale and resolution of images, evaluation parameters and data heterogeneity) and the steps followed to define View VULMA are provided, starting from the data assessment (which allowed to reduce the initial sample of about 20.000 images to a subset of about 3.000 pictures), to achieve the goal of training a transfer-learning-based automated tool for fast estimation of the vulnerability of existing buildings from single pictures.

Download Full-text

Differences in Baseline Characteristics and Access to Treatment of Newly Diagnosed Patients With IPF in the EMPIRE Countries

Frontiers in Medicine ◽

10.3389/fmed.2021.729203 ◽

2021 ◽

Vol 8 ◽

Author(s):

Abigél Margit Kolonics-Farkas ◽

Martina Šterclová ◽

Nesrin Mogulkoc ◽

Katarzyna Lewandowska ◽

Veronika Müller ◽

...

Keyword(s):

Czech Republic ◽

Smoking Habit ◽

Usual Interstitial Pneumonia ◽

Patient Characteristics ◽

Newly Diagnosed ◽

Access To Treatment ◽

Antifibrotic Therapy ◽

Treatment Data ◽

Data Heterogeneity ◽

Baseline Characteristics

Idiopathic pulmonary fibrosis (IPF) is a rare lung disease with poor prognosis. The diagnosis and treatment possibilities are dependent on the health systems of countries. Hence, comparison among countries is difficult due to data heterogeneity. Our aim was to analyse patients with IPF in Central and Eastern Europe using the uniform data from the European Multipartner IPF registry (EMPIRE), which at the time of analysis involved 10 countries. Newly diagnosed IPF patients (N = 2,492, between March 6, 2012 and May 12, 2020) from Czech Republic (N = 971, 39.0%), Turkey (N = 505, 20.3%), Poland (N = 285, 11.4%), Hungary (N = 216, 8.7%), Slovakia (N = 149, 6.0%), Israel (N = 120, 4.8%), Serbia (N = 95, 3.8%), Croatia (N = 87, 3.5%), Austria (N = 55, 2.2%), and Bulgaria (N = 9, 0.4%) were included, and Macedonia, while a member of the registry, was excluded from this analysis due to low number of cases (N = 5) at this timepoint. Baseline characteristics, smoking habit, comorbidities, lung function values, CO diffusion capacity, high-resolution CT (HRCT) pattern, and treatment data were analysed. Patients were significantly older in Austria than in the Czech Republic, Turkey, Hungary, Slovakia, Israel, and Serbia. Ever smokers were most common in Croatia (84.1%) and least frequent in Serbia (39.2%) and Slovakia (42.6%). The baseline forced vital capacity (FVC) was >80% in 44.6% of the patients, between 50 and 80% in 49.3%, and <50% in 6.1%. Most IPF patients with FVC >80% were registered in Poland (63%), while the least in Israel (25%). A typical usual interstitial pneumonia (UIP) pattern was present in 67.6% of all patients, ranging from 43.5% (Austria) to 77.2% (Poland). The majority of patients received antifibrotic therapy (64.5%); 37.4% used pirfenidone (range 7.4–39.8% between countries); and 34.9% nintedanib (range 12.6–56.0% between countries) treatment. In 6.8% of the cases, a therapy switch was initiated between the 2 antifibrotic agents. Significant differences in IPF patient characteristics and access to antifibrotic therapies exist in EMPIRE countries, which needs further investigation and strategies to improve and harmonize patient care and therapy availability in this region.

Download Full-text

Faculty Opinions recommendation of Improved analyses of GWAS summary statistics by reducing data heterogeneity and errors.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.738314733.793590400 ◽

2021 ◽

Author(s):

Jason Flannick

Keyword(s):

Summary Statistics ◽

Data Heterogeneity

Download Full-text

Data heterogeneity and oldness, two difficulties to overcome for the world seabed sediment mapping

Proceedings of the ICA ◽

10.5194/ica-proc-4-35-2021 ◽

2021 ◽

Vol 4 ◽

pp. 1-7

Author(s):

Thierry Garlan ◽

Isabelle Gabelotaud ◽

Elodie Marchès ◽

Edith Le Borgne ◽

Sylvain Lucas

Keyword(s):

Global Scale ◽

Single Product ◽

Seabed Sediment ◽

Bathymetric Data ◽

The World ◽

Data Heterogeneity ◽

Sediment Data ◽

The University ◽

Seabed Sediments

Abstract. A global seabed sediment map has been developed since 1995 to provide a necessary tool for different needs. This project is not completely original since it had already been done in 1912 when the French hydrographic Office and the University of Nancy produced sedimentary maps of the European and North American coasts. Seabed sediments is one of the last geographical domains which can’t benefit of satellite data. Without this contribution, sediment maps need to use very old data mixed with the new ones to be able to reach the goal of a global map. In general, sediment maps are made with the latest available techniques and are replaced after a few decades, thus generating new cartographic works as if all the previous efforts had become useless. Such approach underestimates the quality of past works and prevents to have maps covering large areas. The present work suggests to standardize all kind of sedimentary data from different periods and from very different acquisition systems and integrate them into a single product. This process has already been done for bathymetric data of marine charts, we discuss in this article of the application of this method at a global scale for sediment data.

Download Full-text

Matching Cyber Security Ontologies through Genetic Algorithm-Based Ontology Alignment Technique

Security and Communication Networks ◽

10.1155/2021/4856265 ◽

2021 ◽

Vol 2021 ◽

pp. 1-7

Author(s):

Weiwei Lin ◽

Reiko Haga

Keyword(s):

Genetic Algorithm ◽

Cyber Security ◽

Shared Knowledge ◽

Ontology Alignment ◽

Knowledge Model ◽

Semantic Relationships ◽

Advantages And Disadvantages ◽

Data Heterogeneity ◽

Alignment Technique

Security ontology can be used to build a shared knowledge model for an application domain to overcome the data heterogeneity issue, but it suffers from its own heterogeneity issue. Finding identical entities in two ontologies, i.e., ontology alignment, is a solution. It is important to select an effective similarity measure (SM) to distinguish heterogeneous entities. However, due to the complex semantic relationships among concepts, no SM is ensured to be effective in all alignment tasks. The aggregation of SMs so that their advantages and disadvantages complement each other directly affects the quality of alignments. In this work, we formally define this problem, discuss its challenges, and present a problem-specific genetic algorithm (GA) to effectively address it. We experimentally test our approach on bibliographic tracks provided by OAEI and five pairs of security ontologies. The results show that GA can effectively address different heterogeneous ontology-alignment tasks and determine high-quality security ontology alignments.

Download Full-text

Divergence time estimation of Galliformes based on the best gene shopping scheme of ultraconserved elements

BMC Ecology and Evolution ◽

10.1186/s12862-021-01935-1 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

De Chen ◽

Peter A. Hosner ◽

Donna L. Dittmann ◽

John P. O’Neill ◽

Sharon M. Birks ◽

...

Keyword(s):

Divergence Time ◽

Time Estimation ◽

Molecular Dating ◽

Great Promise ◽

Data Types ◽

Ultraconserved Elements ◽

Divergence Time Estimation ◽

Dating Methods ◽

Data Heterogeneity ◽

Fossil Records

Abstract Background Divergence time estimation is fundamental to understanding many aspects of the evolution of organisms, such as character evolution, diversification, and biogeography. With the development of sequence technology, improved analytical methods, and knowledge of fossils for calibration, it is possible to obtain robust molecular dating results. However, while phylogenomic datasets show great promise in phylogenetic estimation, the best ways to leverage the large amounts of data for divergence time estimation has not been well explored. A potential solution is to focus on a subset of data for divergence time estimation, which can significantly reduce the computational burdens and avoid problems with data heterogeneity that may bias results. Results In this study, we obtained thousands of ultraconserved elements (UCEs) from 130 extant galliform taxa, including representatives of all genera, to determine the divergence times throughout galliform history. We tested the effects of different “gene shopping” schemes on divergence time estimation using a carefully, and previously validated, set of fossils. Our results found commonly used clock-like schemes may not be suitable for UCE dating (or other data types) where some loci have little information. We suggest use of partitioning (e.g., PartitionFinder) and selection of tree-like partitions may be good strategies to select a subset of data for divergence time estimation from UCEs. Our galliform time tree is largely consistent with other molecular clock studies of mitochondrial and nuclear loci. With our increased taxon sampling, a well-resolved topology, carefully vetted fossil calibrations, and suitable molecular dating methods, we obtained a high quality galliform time tree. Conclusions We provide a robust galliform backbone time tree that can be combined with more fossil records to further facilitate our understanding of the evolution of Galliformes and can be used as a resource for comparative and biogeographic studies in this group.

Download Full-text

Ontologies application in the sharing economy domain: a systematic review

Online Information Review ◽

10.1108/oir-11-2020-0497 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Ummul Hanan Mohamad ◽

Mohammad Nazir Ahmad ◽

Ahmad Mujahid Ubaidillah Zakaria

Keyword(s):

Domain Knowledge ◽

Industrial Revolution ◽

Common Knowledge ◽

Sharing Economy ◽

Ontology Development ◽

Comprehensive Overview ◽

Content Type ◽

Digital Platforms ◽

Application Ontology ◽

Data Heterogeneity

PurposeThis systematic literature review (SLR) paper presents the overview and analysis of the existing ontologies application in the SE domain. It discusses the main challenges in terms of its ontologies development and highlights the key knowledge areas for subdomains in the SE domain that provides a direction to develop ontologies application for SE systematically. The SE is not as straightforward as the traditional economy. It transforms the existing economy ecosystem through peer-to-peer collaborations mediated by the technology. Hence, the complexity of the SE domain accentuates the need to make the SE domain knowledge more explicit.Design/methodology/approachFor the review, the authors only focus on the journal articles published from 2010 to 2020 and mentioned ontology as a solution to overcome the issues specific for the SE domain. The initial identification process produced 3,326 papers from 10 different databases.FindingsAfter applying the inclusion and exclusion criteria, a final set of 11 articles were then analyzed and classified. In SE, good ontology design and development is essential to manage digital platforms, deal with data heterogeneity and govern the interoperability of the SE systems. Yet the preference to build an application ontology, lack of perdurant design and minimal use of the existing standard for building SE common knowledge are deterring the ontology development in this domain. From this review, an anatomy of the SE key subdomain areas is visualized as a reference to further develop the domain ontology for the SE domain systematically.Originality/valueWith the arrival of the Fourth Industrial Revolution (4IR), the sharing economy (SE) has become one of the important domains whose impact has been explosive, and its domain knowledge is complex. Yet, a comprehensive overview and analysis of the ontology applications in the SE domain is not available or well presented to the research community.

Download Full-text

Non-Adhesive Liquid Embolic Agents in Extra-Cranial District: State of the Art and Review of the Literature

Journal of Clinical Medicine ◽

10.3390/jcm10214841 ◽

2021 ◽

Vol 10 (21) ◽

pp. 4841

Author(s):

Filippo Piacentino ◽

Federico Fontana ◽

Marco Curti ◽

Edoardo Macchi ◽

Andrea Coppola ◽

...

Keyword(s):

State Of The Art ◽

Case Reports ◽

Meta Analysis ◽

Review Of The Literature ◽

Time Period ◽

Embolic Agents ◽

Liquid Embolic ◽

Data Heterogeneity ◽

New Generation

This review focuses on the use of “new” generation of non-adhesive liquid embolic agents (NALEA). In literature, non-adhesive liquid embolic agents have mainly been used in the cerebral district; however, multiple papers describing the use of NALEA in the extracranial district have been published recently and the aim of this review is to explore and analyze this field of application. There are a few NALEA liquids such as Onyx, Squid, and Phil currently available in the market, and they are used in the following applications: mainly arteriovenous malformations, endoleaks, visceral aneurysm or pseudoaneurysm, presurgical and hypervascular lesions embolization, and a niche of percutaneous approaches. These types of embolizing fluids can be used alone or in combination with other embolizing agents (such as coils or particles) so as to enhance its embolizing effect or improve its possible defects. The primary purpose of this paper is to evaluate the use of NALEAs, predominantly used alone, in elective embolization procedures. We did not attempt a meta-analysis due to the data heterogeneity, high number of case reports, and the lack of a consistent follow-up time period.

Download Full-text

Utility Optimization of Federated Learning with Differential Privacy

Discrete Dynamics in Nature and Society ◽

10.1155/2021/3344862 ◽

2021 ◽

Vol 2021 ◽

pp. 1-14

Author(s):

Jianzhe Zhao ◽

Keming Mao ◽

Chenxi Huang ◽

Yuyang Zeng

Keyword(s):

Differential Privacy ◽

Learning Algorithm ◽

Wasserstein Distance ◽

Dynamic Allocation ◽

Model Accuracy ◽

Learning Framework ◽

Utility Optimization ◽

Data Heterogeneity ◽

Cross Platform ◽

The Impact

Secure and trusted cross-platform knowledge sharing is significant for modern intelligent data analysis. To address the trade-off problems between privacy and utility in complex federated learning, a novel differentially private federated learning framework is proposed. First, the impact of data heterogeneity of participants on global model accuracy is analyzed quantitatively based on 1-Wasserstein distance. Then, we design a multilevel and multiparticipant dynamic allocation method of privacy budget to reduce the injected noise, and the utility can be improved efficiently. Finally, they are integrated, and a novel adaptive differentially private federated learning algorithm (A-DPFL) is designed. Comprehensive experiments on redefined non-I.I.D MNIST and CIFAR-10 datasets are conducted, and the results demonstrate the superiority of model accuracy, convergence, and robustness.

Download Full-text

data heterogeneity
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Official Statistics, Building Censuses, and OpenStreetMap Completeness in Italy

View VULMA: Data Set for Training a Machine-Learning Tool for a Fast Vulnerability Analysis of Existing Buildings

Differences in Baseline Characteristics and Access to Treatment of Newly Diagnosed Patients With IPF in the EMPIRE Countries

Faculty Opinions recommendation of Improved analyses of GWAS summary statistics by reducing data heterogeneity and errors.

Data heterogeneity and oldness, two difficulties to overcome for the world seabed sediment mapping

Matching Cyber Security Ontologies through Genetic Algorithm-Based Ontology Alignment Technique

Divergence time estimation of Galliformes based on the best gene shopping scheme of ultraconserved elements

Ontologies application in the sharing economy domain: a systematic review

Non-Adhesive Liquid Embolic Agents in Extra-Cranial District: State of the Art and Review of the Literature

Utility Optimization of Federated Learning with Differential Privacy

Export Citation Format

data heterogeneityRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Official Statistics, Building Censuses, and OpenStreetMap Completeness in Italy

View VULMA: Data Set for Training a Machine-Learning Tool for a Fast Vulnerability Analysis of Existing Buildings

Differences in Baseline Characteristics and Access to Treatment of Newly Diagnosed Patients With IPF in the EMPIRE Countries

Faculty Opinions recommendation of Improved analyses of GWAS summary statistics by reducing data heterogeneity and errors.

Data heterogeneity and oldness, two difficulties to overcome for the world seabed sediment mapping

Matching Cyber Security Ontologies through Genetic Algorithm-Based Ontology Alignment Technique

Divergence time estimation of Galliformes based on the best gene shopping scheme of ultraconserved elements

Ontologies application in the sharing economy domain: a systematic review

Non-Adhesive Liquid Embolic Agents in Extra-Cranial District: State of the Art and Review of the Literature

Utility Optimization of Federated Learning with Differential Privacy

data heterogeneity
Recently Published Documents