Open data for tourism: the case of Tourpedia

PurposeThis paper aims to describe Tourpedia, a website about tourism, built on open data provided by official government agencies. Tourpedia provides data under a public license.Design/methodology/approachTourpedia is built upon a modular architecture, which allows a developer to add a new source of data easily. This is achieved through a simple mapping language, namely, Tourpedia mapping language, which maps the original open data set model to the Tourpedia data model.FindingsTourpedia contains more than 70.000 accommodations, downloaded from open data provided by Italian, French and Spanish regions.Research limitations/implicationsTourpedia presents some limitations. First, extracted data are not homogeneous and often they are incomplete or wrong. Second, Tourpedia contains only accommodations. Finally, at the moment Tourpedia covers only some Italian, French and Spanish regions.Practical implicationsThe most important implication of Tourpedia concerns the construction of a single access point for all Italian, French and Spanish open data about accommodations. In addition, a simple mechanism for the integration of new sources of open data is defined.Social implicationsThe current version of Tourpedia opens also the road to three new possible social scenarios. First, Tourpedia could be transformed into an open source of updated information about tourism. Second, Tourpedia could be empowered to support tours, which include some tourist attractions and/or events and suggest the nearest accommodations. Finally, Tourpedia may help tourists to discover unknown places.Originality/valueTourpedia constitutes an access point for data sets providers, application developers and tourists because it provides a unique website.

Download Full-text

A systematic review of machine learning-based missing value imputation techniques

Data Technologies and Applications ◽

10.1108/dta-12-2020-0298 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Tressy Thomas ◽

Enayat Rajabi

Keyword(s):

Machine Learning ◽

Selection Process ◽

Evaluation Metrics ◽

Correct Prediction ◽

Data Sets ◽

Data Set ◽

Missing Value ◽

Content Type ◽

Missing Value Imputation ◽

Literature Reviews

PurposeThe primary aim of this study is to review the studies from different dimensions including type of methods, experimentation setup and evaluation metrics used in the novel approaches proposed for data imputation, particularly in the machine learning (ML) area. This ultimately provides an understanding about how well the proposed framework is evaluated and what type and ratio of missingness are addressed in the proposals. The review questions in this study are (1) what are the ML-based imputation methods studied and proposed during 2010–2020? (2) How the experimentation setup, characteristics of data sets and missingness are employed in these studies? (3) What metrics were used for the evaluation of imputation method?Design/methodology/approachThe review process went through the standard identification, screening and selection process. The initial search on electronic databases for missing value imputation (MVI) based on ML algorithms returned a large number of papers totaling at 2,883. Most of the papers at this stage were not exactly an MVI technique relevant to this study. The literature reviews are first scanned in the title for relevancy, and 306 literature reviews were identified as appropriate. Upon reviewing the abstract text, 151 literature reviews that are not eligible for this study are dropped. This resulted in 155 research papers suitable for full-text review. From this, 117 papers are used in assessment of the review questions.FindingsThis study shows that clustering- and instance-based algorithms are the most proposed MVI methods. Percentage of correct prediction (PCP) and root mean square error (RMSE) are most used evaluation metrics in these studies. For experimentation, majority of the studies sourced the data sets from publicly available data set repositories. A common approach is that the complete data set is set as baseline to evaluate the effectiveness of imputation on the test data sets with artificially induced missingness. The data set size and missingness ratio varied across the experimentations, while missing datatype and mechanism are pertaining to the capability of imputation. Computational expense is a concern, and experimentation using large data sets appears to be a challenge.Originality/valueIt is understood from the review that there is no single universal solution to missing data problem. Variants of ML approaches work well with the missingness based on the characteristics of the data set. Most of the methods reviewed lack generalization with regard to applicability. Another concern related to applicability is the complexity of the formulation and implementation of the algorithm. Imputations based on k-nearest neighbors (kNN) and clustering algorithms which are simple and easy to implement make it popular across various domains.

Download Full-text

Fast and accurate detection of surface defect based on improved YOLOv4

Assembly Automation ◽

10.1108/aa-04-2021-0044 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Jiawei Lian ◽

Junhong He ◽

Yun Niu ◽

Tianze Wang

Keyword(s):

Feature Extraction ◽

Real Time ◽

Surface Defect ◽

Steel Ingot ◽

Industrial Applications ◽

Data Sets ◽

Data Set ◽

Processing Technologies ◽

Content Type ◽

Public Data

Purpose The current popular image processing technologies based on convolutional neural network have the characteristics of large computation, high storage cost and low accuracy for tiny defect detection, which is contrary to the high real-time and accuracy, limited computing resources and storage required by industrial applications. Therefore, an improved YOLOv4 named as YOLOv4-Defect is proposed aim to solve the above problems. Design/methodology/approach On the one hand, this study performs multi-dimensional compression processing on the feature extraction network of YOLOv4 to simplify the model and improve the feature extraction ability of the model through knowledge distillation. On the other hand, a prediction scale with more detailed receptive field is added to optimize the model structure, which can improve the detection performance for tiny defects. Findings The effectiveness of the method is verified by public data sets NEU-CLS and DAGM 2007, and the steel ingot data set collected in the actual industrial field. The experimental results demonstrated that the proposed YOLOv4-Defect method can greatly improve the recognition efficiency and accuracy and reduce the size and computation consumption of the model. Originality/value This paper proposed an improved YOLOv4 named as YOLOv4-Defect for the detection of surface defect, which is conducive to application in various industrial scenarios with limited storage and computing resources, and meets the requirements of high real-time and precision.

Download Full-text

Enhancing transparency through open government data: the case of data portals and their features and capabilities

Online Information Review ◽

10.1108/oir-05-2020-0204 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Martin Lněnička ◽

Renata Machova ◽

Jolana Volejníková ◽

Veronika Linhartová ◽

Radka Knezackova ◽

...

Keyword(s):

Open Data ◽

Open Government ◽

Data Sets ◽

Web Content ◽

Content Type ◽

Domain Experts ◽

Computer Mediated ◽

Open Government Data ◽

Decision Making Processes ◽

Government Data

PurposeThe purpose of this paper was to draw on evidence from computer-mediated transparency and examine the argument that open government data and national data infrastructures represented by open data portals can help in enhancing transparency by providing various relevant features and capabilities for stakeholders' interactions.Design/methodology/approachThe developed methodology consisted of a two-step strategy to investigate research questions. First, a web content analysis was conducted to identify the most common features and capabilities provided by existing national open data portals. The second step involved performing the Delphi process by surveying domain experts to measure the diversity of their opinions on this topic.FindingsIdentified features and capabilities were classified into categories and ranked according to their importance. By formalizing these feature-related transparency mechanisms through which stakeholders work with data sets we provided recommendations on how to incorporate them into designing and developing open data portals.Social implicationsThe creation of appropriate open data portals aims to fulfil the principles of open government and enables stakeholders to effectively engage in the policy and decision-making processes.Originality/valueBy analyzing existing national open data portals and validating the feature-related transparency mechanisms, this paper fills this gap in existing literature on designing and developing open data portals for transparency efforts.

Download Full-text

Asymmetric Open Government Data (OGD) framework in India

Digital Policy Regulation and Governance ◽

10.1108/dprg-11-2017-0059 ◽

2018 ◽

Vol 20 (5) ◽

pp. 434-448 ◽

Cited By ~ 3

Author(s):

Stuti Saxena

Keyword(s):

Economic Value ◽

Open Data ◽

Developed Countries ◽

Open Government ◽

Data Sets ◽

Indian States ◽

Content Type ◽

Open Government Data ◽

Government Data ◽

The Government

Purpose With the ongoing drives towards Open Government Data (OGD) initiatives across the globe, governments have been keen on pursuing their OGD policies to ensure transparency, collaboration and efficiency in administration. As a developing country, India has recently adopted the OGD policy (www.data.gov.in); however, the percolation of this policy in the States has remained slow. This paper aims to underpin the “asymmetry” in OGD framework as far as the Indian States are concerned. Besides, the study also assesses the contribution of “Open Citizens” in furthering the OGD initiatives of the country. Design/methodology/approach An exploratory qualitative following a case study approach informs the present study using documentary analysis where evidentiary support from five Indian States (Uttar Pradesh, Telangana, West Bengal, Sikkim and Gujarat) is being drawn to assess the nature and scope of the OGD framework. Further, conceptualization for “Open Citizen” framework is provided to emphasize upon the need to have aware, informed and pro-active citizens to spearhead the OGD initiatives in the country. Findings While the National OGD portal has a substantial number of data sets across different sectors, the States are lagging behind in the adoption and implementation of OGD policies, and while Telangana and Sikkim have been the frontrunners in adoption of OGD policies in a rudimentary manner, others are yet to catch up with them. Further, there is “asymmetry” in terms of the individual contribution of the government bodies to the open data sets where some government bodies are more reluctant to share their datasets than the others. Practical implications It is the conclusion of the study that governments need to institutionalize the OGD framework in the country, and all the States should appreciate the requirement of adopting a robust OGD policy for furthering transparency, collaboration and efficiency in administration. Social implications As an “Open Citizen”, it behooves upon the citizens to be pro-active and contribute towards the open data sets which would go a long way in deriving social and economic value out of these data sets. Originality/value While there are many studies on OGD in the West, studies focused upon the developing countries are starkly lacking. This study plugs this gap by attempting a comparative analysis of the OGD frameworks across Indian States. Besides, the study has provided a conceptualization of “Open Citizen” (OGD) which may be tapped for further research in developing and developed countries to ascertain the linkage between OGD and OC.

Download Full-text

Characterising RDF data sets

Journal of Information Science ◽

10.1177/0165551516677945 ◽

2017 ◽

Vol 44 (2) ◽

pp. 203-229 ◽

Cited By ~ 6

Author(s):

Javier D Fernández ◽

Miguel A Martínez-Prieto ◽

Pablo de la Fuente Redondo ◽

Claudio Gutiérrez

Keyword(s):

Data Structures ◽

Large Scale ◽

Open Data ◽

Structural Features ◽

Data Sets ◽

Data Set ◽

Wide Range ◽

Rdf Data ◽

Description Framework ◽

Resource Description

The publication of semantic web data, commonly represented in Resource Description Framework (RDF), has experienced outstanding growth over the last few years. Data from all fields of knowledge are shared publicly and interconnected in active initiatives such as Linked Open Data. However, despite the increasing availability of applications managing large-scale RDF information such as RDF stores and reasoning tools, little attention has been given to the structural features emerging in real-world RDF data. Our work addresses this issue by proposing specific metrics to characterise RDF data. We specifically focus on revealing the redundancy of each data set, as well as common structural patterns. We evaluate the proposed metrics on several data sets, which cover a wide range of designs and models. Our findings provide a basis for more efficient RDF data structures, indexes and compressors.

Download Full-text

An application of data envelopment analysis for Korean banks with negative data

Benchmarking An International Journal ◽

10.1108/bij-02-2016-0023 ◽

2017 ◽

Vol 24 (4) ◽

pp. 1052-1064 ◽

Cited By ~ 9

Author(s):

Yong Joo Lee ◽

Seong-Jong Joo ◽

Hong Gyun Park

Keyword(s):

Data Envelopment Analysis ◽

Data Sets ◽

Data Envelopment ◽

Negative Data ◽

Ownership Type ◽

Data Set ◽

Translation Invariant ◽

Content Type ◽

Dea Models ◽

Regional Banks

Purpose The purpose of this paper is to measure the comparative efficiency of 18 Korean commercial banks under the presence of negative observations and examine performance differences among them by grouping them according to their market conditions. Design/methodology/approach The authors employ two data envelopment analysis (DEA) models such as a Banker, Charnes, and Cooper (BCC) model and a modified slacks-based measure of efficiency (MSBM) model, which can handle negative data. The BCC model is proven to be translation invariant for inputs or outputs depending on output or input orientation. Meanwhile, the MSBM model is unit invariant in addition to translation invariant. The authors compare results from both models and choose one for interpreting results. Findings Most Korean banks recovered from the worst performance in 2011 and showed similar performance in recent years. Among three groups such as national banks, regional banks, and special banks, the most special banks demonstrated superb performance across models and years. Especially, the performance difference between the special banks and the regional banks was statistically significant. The authors concluded that the high performance of the special banks was due to their nationwide market access and ownership type. Practical implications This study demonstrates how to analyze and measure the efficiency of entities when variables contain negative observations using a data set for Korean banks. The authors have tried two major DEA models that are able to handle negative data and proposed a practical direction for future studies. Originality/value Although there are research papers for measuring the performance of banks in Korea, all of the papers in the topic have studied efficiency or productivity using positive data sets. However, variables such as net incomes and growth rates frequently include negative observations in bank data sets. This is the first paper to investigate the efficiency of bank operations in the presence of negative data in Korea.

Download Full-text

A scalable eigenspace-based fuzzy c-means for topic detection

Data Technologies and Applications ◽

10.1108/dta-11-2020-0262 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Hendri Murfi

Keyword(s):

Representation Learning ◽

Detection Methods ◽

Data Sets ◽

Topic Detection ◽

Data Set ◽

Content Type ◽

Running Time ◽

Fuzzy C Means ◽

Coherence Score ◽

Value Decomposition

PurposeThe aim of this research is to develop an eigenspace-based fuzzy c-means method for scalable topic detection.Design/methodology/approachThe eigenspace-based fuzzy c-means (EFCM) combines representation learning and clustering. The textual data are transformed into a lower-dimensional eigenspace using truncated singular value decomposition. Fuzzy c-means is performed on the eigenspace to identify the centroids of each cluster. The topics are provided by transforming back the centroids into the nonnegative subspace of the original space. In this paper, we extend the EFCM method for scalability by using the two approaches, i.e. single-pass and online. We call the developed topic detection methods as oEFCM and spEFCM.FindingsOur simulation shows that both oEFCM and spEFCM methods provide faster running times than EFCM for data sets that do not fit in memory. However, there is a decrease in the average coherence score. For both data sets that fit and do not fit into memory, the oEFCM method provides a tradeoff between running time and coherence score, which is better than spEFCM.Originality/valueThis research produces a scalable topic detection method. Besides this scalability capability, the developed method also provides a faster running time for the data set that fits in memory.

Download Full-text

Sharing Open Data in Agriculture

Advances in Library and Information Science - Open Access Implications for Sustainable Social, Political, and Economic Development ◽

10.4018/978-1-7998-5018-2.ch013 ◽

2021 ◽

pp. 244-266

Author(s):

Liah Shonhe

Keyword(s):

Agricultural Sector ◽

Open Data ◽

Research Data ◽

Data Sets ◽

Research Activity ◽

African Countries ◽

Data Set ◽

Data Repositories ◽

Bibliographic Data ◽

Prolific Authors

The main focus of the study was to explore the practices of open data sharing in the agricultural sector, including establishing the research outputs concerning open data in agriculture. The study adopted a desktop research methodology based on literature review and bibliographic data from WoS database. Bibliometric indicators discussed include yearly productivity, most prolific authors, and enhanced countries. Study findings revealed that research activity in the field of agriculture and open access is very low. There were 36 OA articles and only 6 publications had an open data badge. Most researchers do not yet embrace the need to openly publish their data set despite the availability of numerous open data repositories. Unfortunately, most African countries are still lagging behind in management of agricultural open data. The study therefore recommends that researchers should publish their research data sets as OA. African countries need to put more efforts in establishing open data repositories and implementing the necessary policies to facilitate OA.

Download Full-text

A systematical approach to classification problems with feature space heterogeneity

Kybernetes ◽

10.1108/k-06-2018-0313 ◽

2019 ◽

Vol 48 (9) ◽

pp. 2006-2029

Author(s):

Hongshan Xiao ◽

Yu Wang

Keyword(s):

Factor Analysis ◽

Meta Analysis ◽

Feature Space ◽

Classification Performance ◽

Classification Algorithm ◽

Significant Feature ◽

Data Sets ◽

Data Set ◽

Classification Techniques ◽

Content Type

Purpose Feature space heterogeneity exists widely in various application fields of classification techniques, such as customs inspection decision, credit scoring and medical diagnosis. This paper aims to study the relationship between feature space heterogeneity and classification performance. Design/methodology/approach A measurement is first developed for measuring and identifying any significant heterogeneity that exists in the feature space of a data set. The main idea of this measurement is derived from a meta-analysis. For the data set with significant feature space heterogeneity, a classification algorithm based on factor analysis and clustering is proposed to learn the data patterns, which, in turn, are used for data classification. Findings The proposed approach has two main advantages over the previous methods. The first advantage lies in feature transform using orthogonal factor analysis, which results in new features without redundancy and irrelevance. The second advantage rests on samples partitioning to capture the feature space heterogeneity reflected by differences of factor scores. The validity and effectiveness of the proposed approach is verified on a number of benchmarking data sets. Research limitations/implications Measurement should be used to guide the heterogeneity elimination process, which is an interesting topic in future research. In addition, to develop a classification algorithm that enables scalable and incremental learning for large data sets with significant feature space heterogeneity is also an important issue. Practical implications Measuring and eliminating the feature space heterogeneity possibly existing in the data are important for accurate classification. This study provides a systematical approach to feature space heterogeneity measurement and elimination for better classification performance, which is favorable for applications of classification techniques in real-word problems. Originality/value A measurement based on meta-analysis for measuring and identifying any significant feature space heterogeneity in a classification problem is developed, and an ensemble classification framework is proposed to deal with the feature space heterogeneity and improve the classification accuracy.

Download Full-text

Academic social networks and collaboration patterns

Library Hi Tech ◽

10.1108/lht-01-2019-0026 ◽

2019 ◽

Vol 38 (2) ◽

pp. 293-307

Author(s):

Po-Yen Chen

Keyword(s):

Social Networks ◽

Data Collection ◽

Research Collaboration ◽

Open Data ◽

Data Sets ◽

Content Type ◽

Open Government Data ◽

Collaboration Patterns ◽

Academic Social Networks ◽

Government Data

Purpose This study attempts to use a new source of data collection from open government data sets to identify potential academic social networks (ASNs) and defines their collaboration patterns. The purpose of this paper is to propose a direction that may advance our current understanding on how or why ASNs are formed or motivated and influence their research collaboration. Design/methodology/approach This study first reviews the open data sets in Taiwan, which is ranked as the first state in Global Open Data Index published by Open Knowledge Foundation to select the data sets that expose the government’s R&D activities. Then, based on the theory review of research collaboration, potential ASNs in those data sets are identified and are further generalized as various collaboration patterns. A research collaboration framework is used to present these patterns. Findings Project-based social networks, learning-based social networks and institution-based social networks are identified and linked to various collaboration patterns. Their collaboration mechanisms, e.g., team composition, motivation, relationship, measurement, and benefit-cost, are also discussed and compared. Originality/value In traditional, ASNs have usually been known as co-authorship networks or co-inventorship networks due to the limitation of data collection. This study first identifies some ASNs that may be formed before co-authorship networks or co-inventorship networks are formally built-up, and may influence the outcomes of research collaborations. These information allow researchers to deeply dive into the structure of ASNs and resolve collaboration mechanisms.

Download Full-text