text searching
Recently Published Documents


TOTAL DOCUMENTS

121
(FIVE YEARS 19)

H-INDEX

16
(FIVE YEARS 1)

2021 ◽  
Author(s):  
Yunxin Huang ◽  
Aiguo Song ◽  
Yafei Yang
Keyword(s):  

Author(s):  
Colleen Funkhouser ◽  
Constance Rinaldo ◽  
David Iggulden ◽  
Kelli Trei ◽  
Martin R. Kalfatovic ◽  
...  

The events of 2020, including the COVID-19 pandemic and social justice demonstrations around the world, helped the Biodiversity Heritage Library (BHL) refocus our priorities for implementing our 2020–2025 Strategic Plan, which was adopted in April 2020. BHL’s natural ecosystem relies on virtual coordination across our global community. For this reason, much of the work of the consortium could continue during lockdowns and telework mandates at many of our partner institutions. Though most digitization projects were placed on hold, staff were able to shift priority to metadata enhancements to improve access and discoverability of existing content. Long-planned improvements in the use of persistent identifiers were pushed to the forefront of technical development. Improvements to the data model and user interface to better support born-digital content and articles were also prioritized in the 2021 Technical Priorities. Back-end improvements to the BHL website, currently under development, will also facilitate easier metadata import and updates, making it faster and easier to add article-level metadata, which will improve search and discoverability for articles within the collection. These technical improvements will continue to define BHL as a tool to connect biodiversity data. Part of our collaborative resilience is our ability to adapt and adjust to changing priorities. Over the past year, BHL has been working to review and update our Collection Development Policy. Social justice movements and discussions concerning the harmful content in BHL’s collection are helping to inform the review of our collection development policy in ways we wouldn’t have anticipated prior to 2020. Our review now includes strategies to identify gaps in and improve representation of biodiversity information from underrepresented regions, languages, cultures, and perspectives. We are also working with colleagues to support larger efforts within the library profession to improve and broaden metadata such as subject headings. We will continue to address the problematic historical legacy of natural history collections through actions defined in our strategic plan. The isolation and work from home orders of 2020 and early 2021 provided opportunities for some to expand into deep research and contribute to collaborative projects as work assignments became digital and thus more versatile. For example, transcription projects were accelerated because staff had more time to contribute to these resource-intensive activities. One result of this work is that more handwritten materials such as field notes and correspondence are now available for full-text searching and taxonomic name recognition within BHL. Another strategic goal of BHL is to grow consortial partnerships and alliances to foster cross-institutional collaboration. BHL sought new partnerships that would integrate BHL data into existing and emerging biodiversity projects. This cross-institutional collaboration is another way to cultivate resilience and sow the seeds of sustainability. In this talk, we will describe how BHL shifted its focus to reflect on and respond to global events of 2020. We will share discussions around acknowledging and addressing the harmful legacy content in BHL collections, new technical developments, and examples of collaborative telework projects.


Author(s):  
Estelle Joubert

RIPM (Le Répertoire International de la Presse Musicale) is widely regarded as the most comprehensive resource offering electronic access to music periodicals from the early Romantic era to the twentieth century. Founded in 1980 by H. Robert Cohen, RIPM is the youngest of the so-called ‘4 R's of International Music Research’; its partner initiatives include RISM (Répertoire International des Sources Musicales), RILM (Répertoire International de Littérature Musicale), and RIdIM (Répertoire International d'Iconographie Musicale). Cohen's ambitious project was notably visionary in its use of technology: not only did it use computing from the start (beginning with DOS-based indexing systems), but it was also the first of the 4 R's to explore full text searching. RIPM seeks to address ‘two main problems that have prevented these [historic music] journals from being systematically collected and examined: (1) the limited number of libraries possessing the journals, and (2) the difficulty encountered when one attempts to locate specific information within an available source’. The project has thus focused on collection building, curation, indexing and accessibility. This international cooperative's accomplishments are impressive: as of July 2020, the database contains 527 music periodicals, 430 available in full text complete runs, totalling 996,000 annotated records and 1.47 million full-text pages of music periodicals.


2021 ◽  
pp. 54-64
Author(s):  
Jennifer Branch

The purpose of this research was to examine the information-seeking processes employed by junior high school students from Inuvik, Northwest Territories, Canada when using CD-ROM encyclopedias. The study revealed that participants needed both instruction and practice to develop the skills and strategies needed for fall-text searching of CD-ROM encyclopedias. The participants tended to use search terms only from the original question, had difficulty selecting topics and articles from the retrieved list, and did not read long articles as carefully as short articles. Instruction related to information-- seeking skills and strategies should focus on generating search terms, selecting topics from a retrieved list, and, skimming and scanning through text to find the answer.


2021 ◽  
pp. 1-25
Author(s):  
Estelle Joubert

This article offers a series of experiments exploring the potential for ‘distant reading’ in French music criticism. ‘Distant reading’, a term first coined by literary theorist Franco Moretti, refers to quantitative approaches that allow for new insights into a large corpus of texts by aggregating data. While the main corpus employed here is the Revue et gazette musicale de Paris (1831–1877), I also use secondary corpora of reviews of Félicien David's Herculanum in 1859, Berlioz's reviews of Gluck and Beethoven in the Journal des débats and reviews that mention Gabriel Fauré in the Library of Congress’ Chronicling America database. My experiments employ a text analysis tool named Voyant, built by Geoffrey Rockwell and Stéfan Sinclair, thereby also offering a basic introduction to the range of visualizations employed in distant reading. My experiments focus on areas in which quantitative methods are particularly well suited to generating new knowledge: corpus-wide visualizations and queries, moving beyond traditional text searching, investigations of music critics’ authorial styles and detecting sentiment in reviews, and finally, to geographies of music criticism.


2020 ◽  
Vol 67 (1) ◽  
pp. 1-54 ◽  
Author(s):  
Travis Gagie ◽  
Gonzalo Navarro ◽  
Nicola Prezza

Doklady BGUIR ◽  
2020 ◽  
pp. 29-34
Author(s):  
S. Nasr ◽  
O. V. German

The paper contains a new text searching method representing modification of the Boyer-Moore algorithm and enabling a user to find the places in the text where the given substring occurs maybe with possible errors, that is the string in text and a query may not coincide but nevertheless are identical. The idea consists in division of the searching process in two phases: at the first phase a fuzzy variant of the Boyer–Moore algorithm is performed; at the second phase the Dice metrics is used. The advantage of suggested technique in comparison with the known methods using the fixed value of the mistakes number is that it 1) does not perform precomputation of the auxiliary table of the sizes comparable to the original text sizes and 2) it more flexibly catches the semantics of the erroneous text substrings even for a big number of mistakes. This circumstance extends possibilities of the Boyer–Moore method by addmitting a bigger amount of possible mistakes in text and preserving text semantics. The suggested method provides also more accurate regulation of the upper boundary for the text mistakes which differs it from the known methods with fixed value of the maximum number of mistakes not depending on the text sizes. Moreover, this upper boundary is defined as Levenshtein distance not suitable for evaluating a relevance of the founded text and a query, while the Dice metrics provides such a relevance. In fact, if maximum Levenshtein distanse is 3 then how one can judge if this value is big or small to provide relevance of the search results. Consequently, the suggested method is more flexible, enables one to find relevant answers even in case of a big number of mistakes in text. The efficiency of the suggested method in the worst case is O(nc) with constant c defining the biggest allowable number of mistakes.


Database ◽  
2020 ◽  
Vol 2020 ◽  
Author(s):  
Constance M Smith ◽  
James A Kadin ◽  
Richard M Baldarelli ◽  
Jonathan S Beal ◽  
Olin Blodgett ◽  
...  

Abstract The Gene Expression Database (GXD), an extensive community resource of curated expression information for the mouse, has developed an RNA-Seq and Microarray Experiment Search (http://www.informatics.jax.org/gxd/htexp_index). This tool allows users to quickly and reliably find specific experiments in ArrayExpress and the Gene Expression Omnibus (GEO) that study endogenous gene expression in wild-type and mutant mice. Standardized metadata annotations, curated by GXD, allow users to specify the anatomical structure, developmental stage, mutated gene, strain and sex of samples of interest, as well as the study type and key parameters of the experiment. These searches, powered by controlled vocabularies and ontologies, can be combined with free text searching of experiment titles and descriptions. Search result summaries include link-outs to ArrayExpress and GEO, providing easy access to the expression data itself. Links to the PubMed entries for accompanying publications are also included. More information about this tool and GXD can be found at the GXD home page (http://www.informatics.jax.org/expression.shtml). Database URL: http://www.informatics.jax.org/expression.shtml


Sign in / Sign up

Export Citation Format

Share Document