Semantic enrichment on large scanned collections through their “satellite texts”: the paradigm of Migne’s Patrologia Graeca

Evagelos Varthis; Spyros Tzanavaris; Ilias Giarenis; Sozon Papavlasopoulos; Manolis Drakakis; Marios Poulos

doi:10.1108/idd-03-2021-0021

Semantic enrichment on large scanned collections through their “satellite texts”: the paradigm of Migne’s Patrologia Graeca

Information Discovery and Delivery ◽

10.1108/idd-03-2021-0021 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Evagelos Varthis ◽

Spyros Tzanavaris ◽

Ilias Giarenis ◽

Sozon Papavlasopoulos ◽

Manolis Drakakis ◽

...

Keyword(s):

Data Sets ◽

Matching Function ◽

Web Interface ◽

Web Tool ◽

Direct Identification ◽

Content Type ◽

Semantic Enrichment ◽

Creative Commons ◽

Scientific Dialogue ◽

Personal Use

Purpose This paper aims to present a methodology for the semantic enrichment on the scanned collection of Migne’s Patrologia Graeca (PG), attempting to easily locate on the Web domain the scanned PG source, when a reference of this source is described and commented on another scanned or textual document, and to semantically enrich PG through related scanned or textual documents named “satellite texts” published by third people. The present enrichment of PG uses as satellite texts the Dorotheos Scholarios's Synoptic Index (DSSI) which act as metadata for PG. Design/methodology/approach The methodology consists of two parts. The first part addresses the DSSI transcription via a proper web tool. The second part is divided into two subsections: the accomplishment of interlinking the printed column numbers of each scanned PG page with its actual filename, which is the build of a matching function, and the build of a web interface for PG, based on the generated Uniform Resource Identifiers (URIs) of the above first subsection. Findings The result of the implemented methodology is a Web portal, capable of providing server-less search of topics with direct (single click) navigation to sources. The produced system is static, scalable, easy to be managed and requires minimal cost to be completed and maintained. The produced data sets of transcribed DSSI and the JavaScript Object Notation (JSON) matching functions are available for personal use of students and scholars under Creative Commons license (CC-BY-NC-SA). Social implications Scholars or anyone interested in a particular subject can easily locate topics in PG and reference them, using URIs that are easy to remember. This fact contributes significantly to the related scientific dialogue. Originality/value The methodology uses the transcribed satellite texts of DSSI, which act as metadata for PG, to semantically enrich PG collection. Furthermore, the built PG Web interface can be used by other satellite texts as a reference basis to further enrich PG, as it provides a direct identification of sources. The presented methodology is general and can be applied to any scanned collection using its own satellite texts.

Get full-text (via PubEx)

GAUGE-Annotated Microbial Transcriptomic Data Facilitate Parallel Mining and High-Throughput Reanalysis To Form Data-Driven Hypotheses

mSystems ◽

10.1128/msystems.01305-20 ◽

2021 ◽

Vol 6 (2) ◽

Author(s):

Zhongyou Li ◽

Katja Koeppen ◽

Victoria I. Holden ◽

Samuel L. Neff ◽

Liviu Cengher ◽

...

Keyword(s):

Gene Expression ◽

High Throughput ◽

Biofilm Formation ◽

Data Driven ◽

Data Sets ◽

Web Interface ◽

Content Type ◽

Transcriptomic Data ◽

Parallel Mining ◽

Differential Gene

ABSTRACT The NCBI Gene Expression Omnibus (GEO) provides tools to query and download transcriptomic data. However, less than 4% of microbial experiments include the sample group annotations required to assess differential gene expression for high-throughput reanalysis, and data deposited after 2014 universally lack these annotations. Our algorithm GAUGE (general annotation using text/data group ensembles) automatically annotates GEO microbial data sets, including microarray and RNA sequencing studies, increasing the percentage of data sets amenable to analysis from 4% to 33%. Eighty-nine percent of GAUGE-annotated studies matched group assignments generated by human curators. To demonstrate how GAUGE annotation can lead to scientific insight, we created GAPE (GAUGE-annotated Pseudomonas aeruginosa and Escherichia coli transcriptomic compendia for reanalysis), a Shiny Web interface to analyze 73 GAUGE-annotated P. aeruginosa studies, three times more than previously available. GAPE analysis revealed that PA3923, a gene of unknown function, was frequently differentially expressed in more than 50% of studies and significantly coregulated with genes involved in biofilm formation. Follow-up wet-bench experiments demonstrate that PA3923 mutants are indeed defective in biofilm formation, consistent with predictions facilitated by GAUGE and GAPE. We anticipate that GAUGE and GAPE, which we have made freely available, will make publicly available microbial transcriptomic data easier to reuse and lead to new data-driven hypotheses. IMPORTANCE GEO archives transcriptomic data from over 5,800 microbial experiments and allows researchers to answer questions not directly addressed in published papers. However, less than 4% of the microbial data sets include the sample group annotations required for high-throughput reanalysis. This limitation blocks a considerable amount of microbial transcriptomic data from being reused easily. Here, we demonstrate that the GAUGE algorithm could make 33% of microbial data accessible to parallel mining and reanalysis. GAUGE annotations increase statistical power and, thereby, make consistent patterns of differential gene expression easier to identify. In addition, we developed GAPE (GAUGE-annotated Pseudomonas aeruginosa and Escherichia coli transcriptomic compendia for reanalysis), a Shiny Web interface that performs parallel analyses on P. aeruginosa and E. coli compendia. Source code for GAUGE and GAPE is freely available and can be repurposed to create compendia for other bacterial species.

Get full-text (via PubEx)

Stereotactic radiation treatment planning and follow-up studies involving fused multimodality imaging

Journal of Neurosurgery ◽

10.3171/sup.2004.101.supplement3.0326 ◽

2004 ◽

Vol 101 (Supplement3) ◽

pp. 326-333 ◽

Cited By ~ 7

Author(s):

Klaus D. Hamm ◽

Gunnar Surber ◽

Michael Schmücking ◽

Reinhard E. Wurm ◽

Rene Aschenbach ◽

...

Keyword(s):

Image Fusion ◽

Treatment Planning ◽

Radiation Treatment ◽

Data Sets ◽

Slice Thickness ◽

Radiation Treatment Planning ◽

Content Type ◽

Follow Up Studies ◽

Fine Print

Object. Innovative new software solutions may enable image fusion to produce the desired data superposition for precise target definition and follow-up studies in radiosurgery/stereotactic radiotherapy in patients with intracranial lesions. The aim is to integrate the anatomical and functional information completely into the radiation treatment planning and to achieve an exact comparison for follow-up examinations. Special conditions and advantages of BrainLAB's fully automatic image fusion system are evaluated and described for this purpose. Methods. In 458 patients, the radiation treatment planning and some follow-up studies were performed using an automatic image fusion technique involving the use of different imaging modalities. Each fusion was visually checked and corrected as necessary. The computerized tomography (CT) scans for radiation treatment planning (slice thickness 1.25 mm), as well as stereotactic angiography for arteriovenous malformations, were acquired using head fixation with stereotactic arc or, in the case of stereotactic radiotherapy, with a relocatable stereotactic mask. Different magnetic resonance (MR) imaging sequences (T1, T2, and fluid-attenuated inversion-recovery images) and positron emission tomography (PET) scans were obtained without head fixation. Fusion results and the effects on radiation treatment planning and follow-up studies were analyzed. The precision level of the results of the automatic fusion depended primarily on the image quality, especially the slice thickness and the field homogeneity when using MR images, as well as on patient movement during data acquisition. Fully automated image fusion of different MR, CT, and PET studies was performed for each patient. Only in a few cases was it necessary to correct the fusion manually after visual evaluation. These corrections were minor and did not materially affect treatment planning. High-quality fusion of thin slices of a region of interest with a complete head data set could be performed easily. The target volume for radiation treatment planning could be accurately delineated using multimodal information provided by CT, MR, angiography, and PET studies. The fusion of follow-up image data sets yielded results that could be successfully compared and quantitatively evaluated. Conclusions. Depending on the quality of the originally acquired image, automated image fusion can be a very valuable tool, allowing for fast (∼ 1–2 minute) and precise fusion of all relevant data sets. Fused multimodality imaging improves the target volume definition for radiation treatment planning. High-quality follow-up image data sets should be acquired for image fusion to provide exactly comparable slices and volumetric results that will contribute to quality contol.

Get full-text (via PubEx)

Mean reversion in corporate leverage: evidence from India

Managerial Finance ◽

10.1108/mf-09-2018-0425 ◽

2019 ◽

Vol 45 (9) ◽

pp. 1183-1198

Author(s):

Gaurav S. Chauhan ◽

Pradip Banerjee

Keyword(s):

Capital Structure ◽

Emerging Market ◽

Simulated Data ◽

Mean Reversion ◽

Developed Countries ◽

Data Sets ◽

Debt Ratio ◽

Testing Strategy ◽

Content Type ◽

Financing Behavior

Purpose Recent papers on target capital structure show that debt ratio seems to vary widely in space and time, implying that the functional specifications of target debt ratios are of little empirical use. Further, target behavior cannot be adjudged correctly using debt ratios, as they could revert due to mechanical reasons. The purpose of this paper is to develop an alternative testing strategy to test the target capital structure. Design/methodology/approach The authors make use of a major “shock” to the debt ratios as an event and think of a subsequent reversion as a movement toward a mean or target debt ratio. By doing this, the authors no longer need to identify target debt ratios as a function of firm-specific variables or any other rigid functional form. Findings Similar to the broad empirical evidence in developed economies, there is no perceptible and systematic mean reversion by Indian firms. However, unlike developed countries, proportionate usage of debt to finance firms’ marginal financing deficits is extensive; equity is used rather sparingly. Research limitations/implications The trade-off theory could be convincingly refuted at least for the emerging market of India. The paper here stimulated further research on finding reasons for specific financing behavior of emerging market firms. Practical implications The results show that the firms’ financing choices are not only depending on their own firm’s specific variables but also on the financial markets in which they operate. Originality/value This study attempts to assess mean reversion in debt ratios in a unique but reassuring manner. The results are confirmed by extensive calibration of the testing strategy using simulated data sets.

Get full-text (via PubEx)

The ingenious marketing of modern paintings

Journal of Historical Research in Marketing ◽

10.1108/jhrm-04-2013-0023 ◽

2014 ◽

Vol 6 (2) ◽

pp. 211-233

Author(s):

Thomas M. Bayer ◽

John Page

Keyword(s):

Exchange Process ◽

Art Market ◽

Late Nineteenth Century ◽

Data Sets ◽

Content Type ◽

Secondary Sources ◽

Auction Houses ◽

Art Dealers ◽

The Media ◽

Eighteenth And Nineteenth Centuries

Purpose – This paper aims to analyze the evolution of the marketing of paintings and related visual products from its nascent stages in England around 1700 to the development of the modern art market by 1900, with a brief discussion connecting to the present. Design/methodology/approach – Sources consist of a mixture of primary and secondary sources as well as a series of econometric and statistical analyses of specifically constructed and unique data sets that list nearly more than 50,000 different sales of paintings during this period. One set records sales of paintings at various English auction houses during the eighteenth and nineteenth centuries; the second set consists of all purchases and sales of paintings recorded in the stock books of the late nineteenth-century London art dealer, Arthur Tooth, during the years of 1870/1871. The authors interpret the data under a commoditization model first introduced by Igor Kopytoff in 1986 that posits that markets and their participants evolve toward maximizing the efficiency of their exchange process within the prevailing exchange technology. Findings – We found that artists were largely responsible for a series of innovations in the art market that replaced the prevailing direct relationship between artists and patron with a modern market for which painters produced works on speculation to be sold by enterprising middlemen to an anonymous public. In this process, artists displayed a remarkable creativity and a seemingly instinctive understanding of the principles of competitive marketing that should dispel the erroneous but persistent notion that artistic genius and business savvy are incompatible. Research limitations/implications – A similar marketing analysis could be done of the development of the art markets of other leading countries, such as France, Italy and Holland, as well as the current developments of the art market. Practical implications – The same process of the development of the art market in England is now occurring in Latin America and China. Also, the commoditization process continues in the present, now using the Internet and worldwide art dealers. Originality/value – This is the first article to trace the historical development of the marketing of art in all of its components: artists, dealers, artist organizations, museums, curators, art critics, the media and art historians.

Get full-text (via PubEx)

Improving recommender systems’ performance on cold-start users and controversial items by a new similarity model

International Journal of Web Information Systems ◽

10.1108/ijwis-07-2015-0024 ◽

2016 ◽

Vol 12 (2) ◽

pp. 126-149 ◽

Cited By ~ 4

Author(s):

Masoud Mansoury ◽

Mehdi Shajari

Keyword(s):

Real World ◽

Design Methodology ◽

Cold Start ◽

Selection Function ◽

Data Sets ◽

Real World Data ◽

Content Type ◽

User Similarity ◽

Active User ◽

Similarity Model

Purpose This paper aims to improve the recommendations performance for cold-start users and controversial items. Collaborative filtering (CF) generates recommendations on the basis of similarity between users. It uses the opinions of similar users to generate the recommendation for an active user. As a similarity model or a neighbor selection function is the key element for effectiveness of CF, many variations of CF are proposed. However, these methods are not very effective, especially for users who provide few ratings (i.e. cold-start users). Design/methodology/approach A new user similarity model is proposed that focuses on improving recommendations performance for cold-start users and controversial items. To show the validity of the authors’ similarity model, they conducted some experiments and showed the effectiveness of this model in calculating similarity values between users even when only few ratings are available. In addition, the authors applied their user similarity model to a recommender system and analyzed its results. Findings Experiments on two real-world data sets are implemented and compared with some other CF techniques. The results show that the authors’ approach outperforms previous CF techniques in coverage metric while preserves accuracy for cold-start users and controversial items. Originality/value In the proposed approach, the conditions in which CF is unable to generate accurate recommendations are addressed. These conditions affect CF performance adversely, especially in the cold-start users’ condition. The authors show that their similarity model overcomes CF weaknesses effectively and improve its performance even in the cold users’ condition.

Get full-text (via PubEx)

Building holistic and agile collection development and assessment

Performance Measurement and Metrics ◽

10.1108/pmm-12-2014-0041 ◽

2015 ◽

Vol 16 (1) ◽

pp. 62-85 ◽

Cited By ~ 8

Author(s):

Cheri Jeanette Duncan ◽

Genya Morgan O'Gara

Keyword(s):

Quantitative Data ◽

Holistic Approach ◽

Assessment Model ◽

Data Sets ◽

Collection Development ◽

Content Type ◽

Qualitative And Quantitative ◽

Institutional Goals ◽

Scholarly Output ◽

Use Of Data

Purpose – The purpose of this paper is to examine the development of a flexible collections assessment rubric comprised of a suite of tools for more consistently and effectively evaluating and expressing a holistic value of library collections to a variety of constituents, from administrators to faculty and students, with particular emphasis to the use of data already being collected at libraries to “take the temperature” of how responsive collections are in supporting institutional goals. Design/methodology/approach – Using a literature review, internal and external conversations, several collections pilot projects, and a variety of other investigative mechanisms, this paper explores methods for creating a more flexible, holistic collection development and assessment model using both qualitative and quantitative data. Findings – The products of scholarship that academic libraries include in their collections are expanding exponentially and range from journals and monographs in all formats, to databases, data sets, digital text and images, streaming media, visualizations and animations. Content is also being shared in new ways and on a variety of platforms. Yet the framework for evaluating this new landscape of scholarly output is in its infancy. So, how do libraries develop and assess collections in a consistent, holistic, yet agile, manner? Libraries must employ a variety of mechanisms to ensure this goal, while remaining flexible in adapting to the shifting collections environment. Originality/value – In so much as the authors are aware, this is the first paper to examine an agile, holistic approach to collections using both qualitative and quantitative data.

Get full-text (via PubEx)

A systematic review of machine learning-based missing value imputation techniques

Data Technologies and Applications ◽

10.1108/dta-12-2020-0298 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Tressy Thomas ◽

Enayat Rajabi

Keyword(s):

Machine Learning ◽

Selection Process ◽

Evaluation Metrics ◽

Correct Prediction ◽

Data Sets ◽

Data Set ◽

Missing Value ◽

Content Type ◽

Missing Value Imputation ◽

Literature Reviews

PurposeThe primary aim of this study is to review the studies from different dimensions including type of methods, experimentation setup and evaluation metrics used in the novel approaches proposed for data imputation, particularly in the machine learning (ML) area. This ultimately provides an understanding about how well the proposed framework is evaluated and what type and ratio of missingness are addressed in the proposals. The review questions in this study are (1) what are the ML-based imputation methods studied and proposed during 2010–2020? (2) How the experimentation setup, characteristics of data sets and missingness are employed in these studies? (3) What metrics were used for the evaluation of imputation method?Design/methodology/approachThe review process went through the standard identification, screening and selection process. The initial search on electronic databases for missing value imputation (MVI) based on ML algorithms returned a large number of papers totaling at 2,883. Most of the papers at this stage were not exactly an MVI technique relevant to this study. The literature reviews are first scanned in the title for relevancy, and 306 literature reviews were identified as appropriate. Upon reviewing the abstract text, 151 literature reviews that are not eligible for this study are dropped. This resulted in 155 research papers suitable for full-text review. From this, 117 papers are used in assessment of the review questions.FindingsThis study shows that clustering- and instance-based algorithms are the most proposed MVI methods. Percentage of correct prediction (PCP) and root mean square error (RMSE) are most used evaluation metrics in these studies. For experimentation, majority of the studies sourced the data sets from publicly available data set repositories. A common approach is that the complete data set is set as baseline to evaluate the effectiveness of imputation on the test data sets with artificially induced missingness. The data set size and missingness ratio varied across the experimentations, while missing datatype and mechanism are pertaining to the capability of imputation. Computational expense is a concern, and experimentation using large data sets appears to be a challenge.Originality/valueIt is understood from the review that there is no single universal solution to missing data problem. Variants of ML approaches work well with the missingness based on the characteristics of the data set. Most of the methods reviewed lack generalization with regard to applicability. Another concern related to applicability is the complexity of the formulation and implementation of the algorithm. Imputations based on k-nearest neighbors (kNN) and clustering algorithms which are simple and easy to implement make it popular across various domains.

Get full-text (via PubEx)

Diagnosis of COVID-19 using 3D CT scans and vaccination for COVID-19

World Journal of Engineering ◽

10.1108/wje-03-2021-0161 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Gangadhar Ch ◽

S. Jana ◽

Sankararao Majji ◽

Prathyusha Kuncha ◽

Fantin Irudaya Raj E. ◽

...

Keyword(s):

Virus Isolate ◽

Research Work ◽

Three Dimensional ◽

Ct Scans ◽

Data Sets ◽

Content Type ◽

Infected People ◽

3D Data ◽

Ct Classification ◽

First Time

Purpose For the first time in a decade, a new form of pneumonia virus, coronavirus, COVID-19, appeared in Wuhan, China. To date, it has affected millions of people, killed thousands and resulted in thousands of deaths around the world. To stop the spread of this virus, isolate the infected people. Computed tomography (CT) imaging is very accurate in revealing the details of the lungs and allows oncologists to detect COVID. However, the analysis of CT scans, which can include hundreds of images, may cause delays in hospitals. The use of artificial intelligence (AI) in radiology could help to COVID-19-positive cancer in this manner is the main purpose of the work. Design/methodology/approach CT scans are a medical imaging procedure that gives a three-dimensional (3D) representation of the lungs for clinical purposes. The volumetric 3D data sets can be regarded as axial, coronal and transverse data sets. By using AI, we can diagnose the virus presence. Findings The paper discusses the use of an AI for COVID-19, and CT classification issue and vaccination details of COVID-19 have been detailed in this paper. Originality/value Originality of the work is, all the data can be collected genuinely and did research work doneown methodology.

Get full-text (via PubEx)

Fast and accurate detection of surface defect based on improved YOLOv4

Assembly Automation ◽

10.1108/aa-04-2021-0044 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Jiawei Lian ◽

Junhong He ◽

Yun Niu ◽

Tianze Wang

Keyword(s):

Feature Extraction ◽

Real Time ◽

Surface Defect ◽

Steel Ingot ◽

Industrial Applications ◽

Data Sets ◽

Data Set ◽

Processing Technologies ◽

Content Type ◽

Public Data

Purpose The current popular image processing technologies based on convolutional neural network have the characteristics of large computation, high storage cost and low accuracy for tiny defect detection, which is contrary to the high real-time and accuracy, limited computing resources and storage required by industrial applications. Therefore, an improved YOLOv4 named as YOLOv4-Defect is proposed aim to solve the above problems. Design/methodology/approach On the one hand, this study performs multi-dimensional compression processing on the feature extraction network of YOLOv4 to simplify the model and improve the feature extraction ability of the model through knowledge distillation. On the other hand, a prediction scale with more detailed receptive field is added to optimize the model structure, which can improve the detection performance for tiny defects. Findings The effectiveness of the method is verified by public data sets NEU-CLS and DAGM 2007, and the steel ingot data set collected in the actual industrial field. The experimental results demonstrated that the proposed YOLOv4-Defect method can greatly improve the recognition efficiency and accuracy and reduce the size and computation consumption of the model. Originality/value This paper proposed an improved YOLOv4 named as YOLOv4-Defect for the detection of surface defect, which is conducive to application in various industrial scenarios with limited storage and computing resources, and meets the requirements of high real-time and precision.

Get full-text (via PubEx)

Enhancing transparency through open government data: the case of data portals and their features and capabilities

Online Information Review ◽

10.1108/oir-05-2020-0204 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Martin Lněnička ◽

Renata Machova ◽

Jolana Volejníková ◽

Veronika Linhartová ◽

Radka Knezackova ◽

...

Keyword(s):

Open Data ◽

Open Government ◽

Data Sets ◽

Web Content ◽

Content Type ◽

Domain Experts ◽

Computer Mediated ◽

Open Government Data ◽

Decision Making Processes ◽

Government Data

PurposeThe purpose of this paper was to draw on evidence from computer-mediated transparency and examine the argument that open government data and national data infrastructures represented by open data portals can help in enhancing transparency by providing various relevant features and capabilities for stakeholders' interactions.Design/methodology/approachThe developed methodology consisted of a two-step strategy to investigate research questions. First, a web content analysis was conducted to identify the most common features and capabilities provided by existing national open data portals. The second step involved performing the Delphi process by surveying domain experts to measure the diversity of their opinions on this topic.FindingsIdentified features and capabilities were classified into categories and ranked according to their importance. By formalizing these feature-related transparency mechanisms through which stakeholders work with data sets we provided recommendations on how to incorporate them into designing and developing open data portals.Social implicationsThe creation of appropriate open data portals aims to fulfil the principles of open government and enables stakeholders to effectively engage in the policy and decision-making processes.Originality/valueBy analyzing existing national open data portals and validating the feature-related transparency mechanisms, this paper fills this gap in existing literature on designing and developing open data portals for transparency efforts.

Get full-text (via PubEx)