metadata generation
Recently Published Documents


TOTAL DOCUMENTS

94
(FIVE YEARS 12)

H-INDEX

10
(FIVE YEARS 1)

2021 ◽  
Author(s):  
Joel Pepper ◽  
Jane Greenberg ◽  
Yasin Bakis ◽  
Xiaojun Wang ◽  
Henry L Bart ◽  
...  

Metadata are key descriptors of research data, particularly for researchers seeking to apply machine learning (ML) to the vast collections of digitized specimens. Unfortunately, the available metadata is often sparse and, at times, erroneous. Additionally, it is prohibitively expensive to address these limitations through traditional, manual means. This paper reports on research that applies machine-driven approaches to analyzing digitized fish images and extracting various important features from them. The digitized fish specimens are being analyzed as part of the Biology Guided Neural Networks (BGNN) initiative, which is developing a novel class of artificial neural networks using phylogenies and anatomy ontologies. Automatically generated metadata is crucial for identifying the high-quality images needed for the neural network's predictive analytics. Methods that combine ML and image informatics techniques allow us to rapidly enrich the existing metadata associated with the 7,244 images from the Illinois Natural History Survey (INHS) used in our study. Results show we can accurately generate many key metadata properties relevant to the BGNN project, as well as general image quality metrics (e.g. brightness and contrast). Results also show that we can accurately generate bounding boxes and segmentation masks for fish, which are needed for subsequent machine learning analyses. The automatic process outperforms humans in terms of time and accuracy, and provides a novel solution for leveraging digitized specimens in ML. This research demonstrates the ability of computational methods to enhance the digital library services associated with the tens of thousands of digitized specimens stored in open-access repositories worldwide.


2021 ◽  
Author(s):  
Joel Pepper ◽  
Jane Greenberg ◽  
Yasin Bakis ◽  
Xiaojun Wang ◽  
Henry Bart ◽  
...  
Keyword(s):  

2021 ◽  
Author(s):  
Felipe A. Ferreira ◽  
Bruno P. Oliveira ◽  
Rodrigo V. Kassick ◽  
Vinícius Furlan ◽  
Hélio Lopes

It has been recognized that a significant increase in the production and consumption of video content occurred in the last decade. Many entertainment companies, like Globo, face challenges regarding video metadata generation. The objective of this paper is to present a suitable architecture for the Globo Group to automatically identify actors that appear in each scene of a video stream, generating new metadata annotations that can be used by recommender systems and search engines among different other applications in this industry sector.


2021 ◽  
Vol 11 (14) ◽  
pp. 6461
Author(s):  
Andy Pearce ◽  
Tim Brookes ◽  
Russell Mason

Brightness is one of the most common timbral descriptors used for searching audio databases, and is also the timbral attribute of recorded sound that is most affected by microphone choice, making a brightness prediction model desirable for automatic metadata generation. A model, sensitive to microphone-related as well as source-related brightness, was developed based on a novel combination of the spectral centroid and the ratio of the total magnitude of the signal above 500 Hz to that of the full signal. This model performed well on training data (r = 0.922). Validating it on new data showed a slight gradient error but good linear correlation across source types and overall (r = 0.955). On both training and validation data, the new model out-performed metrics previously used for brightness prediction.


Author(s):  
Han Yu ◽  
Hongming Cai ◽  
Zhiyuan Liu ◽  
Boyi Xu ◽  
Lihong Jiang
Keyword(s):  

2020 ◽  
Vol 39 (3) ◽  
Author(s):  
Sam Grabus

This research compares automatic subject metadata generation when the pre-1800s Long-S character is corrected to a standard < s >. The test environment includes entries from the third edition of the Encyclopedia Britannica, and the HIVE automatic subject indexing tool. A comparative study of metadata generated before and after correction of the Long-S demonstrated an average of 26.51 percent potentially relevant terms per entry omitted from results if the Long-S is not corrected. Results confirm that correcting the Long-S increases the availability of terms that can be used for creating quality metadata records. A relationship is also demonstrated between shorter entries and an increase in omitted terms when the Long-S is not corrected.


2019 ◽  
Vol 26 (7) ◽  
pp. 55-66
Author(s):  
O. E. Bashina ◽  
N. A. Komkova ◽  
L. V. Matraeva ◽  
V. E. Kosolapova

The article deals with challenges and prospects of implementation of the Statistical Data and Metadata eXchange (SDMX) standard and using it in the international sharing of statistical data and metadata. The authors identified potential areas where this standard can be used, described a mechanism for data and metadata sharing according to SDMX standard. Major issues classified into three groups - general, statistical, information technology - were outlined by applying both domestic and foreign experience of implementation of the standard. These issues may arise at the national level (if the standard is implemented domestically), at the international level (when the standard is applied by international organizations), and at the national-international level (if the information is exchanged between national statistical data providers and international organizations). General issues arise at the regulatory level and are associated with establishing boundaries of responsibility of counterpart organizations at all three levels of interaction, as well as in terms of increasing the capacity to apply the SDMX standard. Issues of statistical nature are most often encountered due to the sharing of large amounts of data and metadata related to various thematic areas of statistics; there should be a unified structure of data and metadata generation and transmission. With the development of information sharing, arise challenges and issues associated with continuous monitoring and expanding SDMX code lists. At the same time, there is a lack of a universal data structure at the international level and, as a result, it is difficult to understand and apply at the national level the existing data structures developed by international organizations. Challenges of information technology are related to creating an IT infrastructure for data and metadata sharing using the SDMX standard. The IT infrastructure (depending on the participant status) includes the following elements: tools for the receiving organizations, tools for sending organization and the infrastructure for the IT professionals. For each of the outlined issues, the authors formulated some practical recommendations based on the complexity principle as applied to the implementation of the international SDMX standard for the exchange of data and metadata.


Sign in / Sign up

Export Citation Format

Share Document