scholarly journals Semi-automatic Extraction of Plants Morphological Characters from Taxonomic Descriptions Written in Spanish

2018 ◽  
Vol 6 ◽  
pp. e21282 ◽  
Author(s):  
Maria Mora ◽  
José Araya

Taxonomic literature keeps records of the planet's biodiversity and gives access to the knowledge needed for its sustainable management. Unfortunately, most of the taxonomic information is available in scientific publications in text format. The amount of publications generated is very large; therefore, to process it in order to obtain high structured texts would be complex and very expensive. Approaches like citizen science may help the process by selecting whole fragments of texts dealing with morphological descriptions; but a deeper analysis, compatible with accepted ontologies, will require specialised tools. The Biodiversity Heritage Library (BHL) estimates that there are more than 120 million pages published in over 5.4 million books since 1469, plus about 800,000 monographs and 40,000 journal titles (12,500 of these are current titles).It is necessary to develop standards and software tools to extract, integrate and publish this information into existing free and open access repositories of biodiversity knowledge to support science, education and biodiversity conservation.This document presents an algorithm based on computational linguistics techniques to extract structured information from morphological descriptions of plants written in Spanish. The developed algorithm is based on the work of Dr. Hong Cui from the University of Arizona; it uses semantic analysis, ontologies and a repository of knowledge acquired from the same descriptions. The algorithm was applied to the books Trees of Costa Rica Volume III (TCRv3), Trees of Costa Rica Volume IV (TCRv4) and to a subset of descriptions of the Manual of Plants of Costa Rica (MPCR) with very competitive results (more than 92.5% of average performance). The system receives the morphological descriptions in tabular format and generates XML documents. The XML schema allows documenting structures, characters and relations between characters and structures. Each extracted object is associated with attributes like name, value, modifiers, restrictions, ontology term id, amongst other attributes.The implemented tool is free software. It was developed using Java and integrates existing technology as FreeLing, the Plant Ontology (PO), the Plant Glossary, the Ontology Term Organizer (OTO) and the Flora Mesoamericana English-Spanish Glossary.

Author(s):  
Mariya Dimitrova ◽  
Georgi Zhelezov ◽  
Teodor Georgiev ◽  
Lyubomir Penev

Introduction Digitisation of biodiversity knowledge from collections, scholarly literature and various research documents is an ongoing mission of the Biodiversity Information Standards (TDWG) community. Organisations such as the Biodiversity Heritage Library make historical biodiversity literature openly available and develop tools to allow biodiversity data reuse and interoperability. For instance, Plazi transforms free text into machine-readable formats and extracts collection data and feeds it into the Global Biodiversity Information Facility (GBIF) and other aggregators. All of these digitisation workflows require a lot of effort to develop and implement in practice. In essence, what these digitisation activities entail are the mapping of free text to concepts from recognised vocabularies or ontologies in order to make the content understandable to computers. Aim We aim to address the problem of mapping free text to ontological terms ("strings to things") with our tool for text-to-ontology mapping: the Pensoft Annotator. Methods & Implementation The Annotator is a web application that performs direct text matching to terms from any ontology or vocabulary list given as input to the Annotator. The term 'ontology' is used loosely here and means a collection of terms and their synonyms, where terms are uniquely identified via a Uniform Resource Identifier (URI). The Annotator accepts any of the following ontology formats (e.g. OBO, OWL, RDF/XML, etc.) but does not require the existence of a proper ontology structure (logical statements). We use the ROBOT command line tool to convert any of these formats to JSON. After the upload of a new ontology, the Annotator processes the ontology terms by normalising all exact synonyms and by removing all of the other synonyms (related, narrow and broad synonyms). This is done to limit the number of false positive matches and to preserve the semantic similarity between the matched ontology term and the text. After matching the words in the input text and the ontology term labels, the Pensoft Annotator returns a table of matched ontology terms including the following fields: the identifier of the ontology term, the ontology term label or the label of the synonym, the starting position of the matched term in the text, the term context (words surrounding the matched term in the text), the type of ontology term (class or property), the ontology from which the matched term originates and the number of times a given term is mentioned in the text. The Pensoft Annotator allows simultaneous annotation with multiple ontologies. To better visualise the exact ontology from which a matching term has been found, the terms are highlighted in different colour depending on the ontology. The Pensoft Annotator is also accessible programmatically via an Application Programming Interface (API), documented at https://annotator.pensoft.net/api. Discussion & Use Cases The Pensoft Annotator provides functionalities that will aid the transformation of free text to collections of semantic resources. However, it still requires expert knowledge to use as the ontologies need to be selected carefully. Some false positive matches from the annotation are possible because we do not perform semantic analysis of the texts. False negatives are also possible since there might be different word forms of ontology terms, which are not direct matches to them (e.g. 'wolf' and 'wolves'). For this reason, matched terms can be reviewed and removed from the results within the web interface of the Pensoft Annotator. After removal of terms, they will not be present in the downloaded results. The Pensoft Annotator can be used to annotate biodiversity and taxonomic literature to help with the extraction of biodiversity knowledge (e.g. species habitat preferences, species interaction data, localities, biogeographic data). The existence of some domain and taxon-specific ontologies, such as the Hymenoptera Anatomy Ontology, provides further opportunities for context-specific annotation. Semantic analysis of unstructured texts could be applied in addition to ontology annotation to improve the accuracy of ontology term matching and to filter out mismatched terms. Annotation of structured or semi-structured text (e.g. tables) can be done with better success. A recent example demonstrates the use of the Annotator to extract biotic interactions from tables (Dimitrova et al. 2020). The Annotator could also be used for ontology analysis and comparison. Annotation of text can help to discover gaps in ontologies as well as inaccurate synonyms. For instance, a certain word could be recognised as an ontology term match because it is an exact synonym in the ontology but in reality it might be more accurate to mark it as a related synonym. In addition, annotation with multiple ontologies can help to elucidate links between ontologies.


Author(s):  
Nicole Kearney

Wikipedia may have become the world’s principal source of information, but it is not a reliable source. Wikipedia itself is quite explicit on this point. The Wikipedia article entitled Wikipedia is not a reliable source clearly states that, because Wikipedia can be edited by anyone, at any time, “any information it contains at any particular time could be vandalism, a work in progress, or just plain wrong” (Wikipedia 2019a). Despite this, Wikipedia continues to gain status as a trusted authority on, well, everything. It does not, however, have authority on its own; it has authority because it links to authoritative sources. Wikipedia’s Verifiability policy (Wikipedia 2019b) states that: all material in its articles should be “attributable to reliable and published sources”; and all quotations and any material likely to be challenged “must be supported by inline citations”. all material in its articles should be “attributable to reliable and published sources”; and all quotations and any material likely to be challenged “must be supported by inline citations”. This does not mean that Wikipedia is always right; rather (according to the Wikipedia article Wikipedia is wrong) that “the threshold for inclusion in Wikipedia is verifiability, not truth” (Wikipedia 2019c). What this does mean is that Wikipedia is riddled with citations to the primary literature. Thus, articles about the world’s species reference taxonomic descriptions (and subsequent revisions), as well as scientific papers about physiology, evolution, behaviour, ecology, conservation, etc. In order “to facilitate the verification of sourced statements”, Wikipedia’s Scientific Citation Guidelines encourage editors to, wherever possible, include links to scientific articles in the form of DOIs (Digital Object Identifiers) (Wikipedia 2019d). A DOI is a unique, permanent and persistent identifier that is assigned to a fixed piece of online content (usually) at the time of its publication. The DOI system creates a reciprocal linked network of scholarly publications that allows researchers to click from article to article in a never-ending trail of knowledge (whether those articles are in scientific journals or on Wikipedia). This linked network functions seamlessly for modern scientific publications, because DOIs have been almost universally adopted by scientific publishers. But issues arise when it comes to linking to historic publications. Historic literature is the foundation upon which our understanding of biodiversity is based. If Wikipedia is the world’s gateway to that literature, Wikipedia editors must be able to find it and link to it. This presentation will discuss the complexities involved in linking from Wikipedia to the legacy scientific literature, particularly the availability of that literature online, the difference between easy and open access, and what the bioinformatics community can do to help.


Author(s):  
K. R. Ovchinnikova

The relevance of the issue under consideration in the article is connected with the confusion in scientific publications of the concepts of “electronic educational materials” and “electronic educational resources”. The article discusses the concept of “electronic educational materials” from the perspective of general systems theory. And their system character is proved. This allows them to be represented as a single complex of structured information of a specific subject area and didactic materials. These didactic materials support the learning process at all stages of its didactic cycle in accordance with the chosen learning technology based on the didactic capabilities of information technologies. It is concluded that the system of high school electronic materials allows to expand the boundaries of the design activity of the teacher, provide management of the student’s thinking activity, to implement a competence approach to the learning process at university


2012 ◽  
Vol 4 (2) ◽  
pp. 231-238
Author(s):  
Víctor Hugo Méndez-Estrada ◽  
Zaidett Barrientos Llosa

La deserción estudiantil es un problema que los programas de educación superior deben atender con prioridad, dado que provocan inconvenientes a la hora de la acreditación universitaria, son fuente de frustración para los estudiantes y constituyen una pérdida de recursos económicos. El objetivo de este estudio fue determinar si el uso de tecnologías de la informática y comunicación, en su sentido más amplio, permite disminuir la deserción estudiantil en la maestría de Manejo de Recursos Naturales de la Universidad Estatal a Distancia UNED (Costa Rica) y aumenta su calidad y vinculación con las necesidades de la sociedad. Realizamos un análisis comparativo de indicadores de dicha maestría antes del establecimiento de dos medidas y después de ellas. Las medidas de la informática y la comunicación que se aplicaron son: 1) creación de un laboratorio de investigación y 2) generación de bases de datos de seguimiento del avance de los proyectos de investigación de estudiantes. Los periodos analizados van del 2002 hasta finales del 2007 y los acumulados del 2002 hasta finales del 2011. Los indicadores analizados se relacionan con la actividad académica de estudiantes y docentes, y con la proyección hacia la sociedad (medida por las tesis y publicaciones científicas producidas durante esos periodos). Se encontró que en el primer periodo el 61% de los egresados estabaninactivos; mientras que el segundo la inactividad descendió a 17%. La investigación realizada por los profesores pasó de cero a 36%. La proyección a la sociedad pasó de solo tres a 18 tesis, y de cero a 44 publicaciones divulgativas y 18 artículos científicos. Se concluye que las dos medidas de uso de tecnología de la informática y comunicación si apoyan los procesos de enseñanza, ya que disminuye la deserción de egresados y favorece la vinculación del programa con la sociedad.ABSTRACTComputer and communication technologies (ICT) help to reduce attrition in distance graduate education programs. Higher education dropout is a priority problem for university accreditation. It is also a frustration source for students and economic waste. The objective of this study was to determine whether the use of Information and Communication Technologies, in its broadest sense, can: reduce student dropout in the Master Program of Natural Resources Management at the Open University of Costa Rica, UNED, and increases its quality and linkage with society. We performed a comparative analysis of indicators before and after establishing two Information and Communication Technologies’ Innovations: 1) creation of a research laboratory and 2)generation student research project progress monitoring databases. Analyzed periods are: a) 2002 to late 2007 and b) 2002 to late 2011. Analyzed indicators are related to students and professors academic activities and society impact (theses, informative and scientific publications amount). We found that in the first period 61% of graduates were inactive, while in the second period inactivity decreased to 17%. Professors conducting research changed from zero to 36%. Society impact increased from three to 18 theses, and from zero to 44 informative publications and 18 scientific papers. We conclude that the two Information and Communication Technologies’ innovations applied, do support teaching, since attrition decreased and bonding with society increased.


Phytotaxa ◽  
2016 ◽  
Vol 284 (2) ◽  
pp. 81
Author(s):  
PATRICIA ESPINOZA ◽  
EDUARDO CHACÓN-MADRIGAL ◽  
ETHEL SÁNCHEZ ◽  
JORGE GÓMEZ-LAURITO

We described the achenes of 21 species of the genus Scleria reported in Costa Rica using 16 morphological characters and developed a key based only on achene characteristics. Specimens deposited in herbaria in Costa Rica were analyzed. We observed the achenes using a stereoscope and light microscope and took digital images that were used to measure the achenes. Besides, the achenes were observed using a Scanning Electron Microscope. A cluster analysis using achene characteristics was performed in order to know which species are morphologically similar. The intra-specific variation of the characteristics analyzed in the achenes studied is very small for all the species. Using characteristics of the achene, we could differentiate species among four of the five traditional sections of the genus used to classify the species: Hypoporum, Ophryoscleria, Schizolepis and Scleria. The key allows differentiating among 21 species of the genus Scleria previously reported in Costa Rica using only achenes. Besides the key, we prepared an illustrative guide for the genus using pictures taken with SEM and a stereoscope. The descriptions offer better information about the species that grow in Costa Rica.


Phytotaxa ◽  
2018 ◽  
Vol 367 (2) ◽  
pp. 101 ◽  
Author(s):  
ATENA ESLAMI FAROUJI ◽  
HAMED KHODAYARI ◽  
MOSTAFA ASSADI ◽  
BARIŞ ÖZÜDOĞRU ◽  
ÖZLEM ÇETIN ◽  
...  

Taxonomic descriptions of Iranian and Turkish Hesperis (Brassicaceae) species are generally insufficient and partly incomplete, which makes the species delimitation ambiguous. In order to clarify species circumscription, we scored 57 morphological descriptors (MDs) in 121 operational taxonomic units (OTUs) of Hesperis from Iran and Turkey and performed a multivariate analysis. The dendrogram was created from Gower’s distance matrix using Unweighted Pair Group Method with arithmetic mean (UPGMA) algorithm. The dendrogram clearly separates the 121 OTUs of Hesperis into five main phenons, which significantly deviate from the classical taxonomic treatment (sectional assignments) of the genus. Similar distinct delineation among the five phenons was revealed by a Principal Coordinate Analysis (PCoA), highlighting the resolving power of the multivariate analyses of quantitative and qualitative morphological characters. While there were significant variations among the OTUs for 57 MDs, the most distinctive morphological descriptors delimiting the phenons were estimated to be fruit, petal, stem, and leaf by a de-trended correspondence analysis (DCA). We also present a comparative discussion between the classical taxonomy and the delimitation of taxa revealed in our study.


Zootaxa ◽  
2008 ◽  
Vol 1782 (1) ◽  
pp. 1 ◽  
Author(s):  
NICO M. FRANZ

Cotithene Voss, a previously monotypic genus of Neotropical derelomine flower weevils (Curculionidae: Derelomini), is revised, with provision of a key to the species, cladistic analysis and notes on its natural history. The following six new species are described: C. anaphalanta (Costa Rica), C. dicranopygia (Costa Rica), C. leptorhamphis (Costa Rica, Panama), C. melanoptera (Venezuela), C. stratiotricha (Costa Rica) and C. trigaea (Costa Rica). The monophyly of Cotithene is supported by the characters of a dorsomedially expanded, carinate rostrum, ventrally angulate head, long and anteriorly directed setation on the anterior margin of the prosternum and an apicodorsally expanded aedeagus with paired sclerites in the male, and subcontiguous to separated procoxal cavities in the female. Particularly the males of several species have intriguing and allometrically scaled modifications on the head (triangular projections, long setae) and pronotum (expansion, tumescences), which possibly play a role in male-to-male conflicts. Cotithene species are specialized to visit and reproduce on a narrow range of typically closely related species of Cyclanthaceae. The adults do not function as pollinators, and the herbivorous larvae develop in the fruiting organs of their hosts, frequently triggering the abortion of infructescences. An analysis of 12 taxa (5 outgroup, 7 ingroup) and 32 morphological characters yielded a single most parsimonious cladogram (L = 38, CI = 89, RI = 93) with the topology (C. dicranopygia, (C. stratiotricha, ((C. leptorhamphis, C. trigaea), (C. globulicollis Voss, (C. anaphalanta, C. melanoptera))))). The evolution of morphological traits and host shifts is examined in light of the proposed phylogeny.


Zootaxa ◽  
2010 ◽  
Vol 2483 (1) ◽  
pp. 1 ◽  
Author(s):  
C. B. Cameron ◽  
C. Deland ◽  
T. H. Bullock

Here we describe five new North Eastern Pacific species of acorn worms in the genus Saccoglossus (S. porochordus, S. rhabdorhyncus, S. shumaginensis, S. sonorensis, and S. palmeri) on the basis of morphology. Notes on the habit and localization of each species are provided. A summary table lists the morphological characters defining the five new species and ten previously described ones in the genus Saccoglossus. Observation on the biogeography of Enteropneusta suggests that it is an ancient and declining group.


Zootaxa ◽  
2020 ◽  
Vol 4890 (3) ◽  
pp. 417-427
Author(s):  
JAN JEŽEK ◽  
JOZEF OBOŇA ◽  
FRANↅOIS LE PONT ◽  
JEAN-MICHEL MAES ◽  
EDDY MARTINEZ

The former monotypic genus Armillipora Quate, known only from Costa Rica and Panama, is redescribed, including the type species A. selvica Quate, this time collected on the Caribbean side of Nicaragua, RAAN department, and illustrated based on male morphological characters. The male of a new species, A. suapiensis sp. nov., from Bolivia, La Paz department, is described here and also figured.


Nematology ◽  
2009 ◽  
Vol 11 (6) ◽  
pp. 869-881 ◽  
Author(s):  
Natsumi Kanzaki ◽  
Robin M. Giblin-Davis ◽  
Rudolf H. Scheffrahn ◽  
Barbara J. Center ◽  
Kerrie A. Davies

Abstract A species of aphelenchoidid nematode was isolated from a subterranean termite, Cylindrotermes macrognathus, during a survey of termite-associated nematodes in a conserved forest in La Selva, Costa Rica. The nematode was morphologically intermediate between the families Aphelenchidae and Aphelenchoididae, i.e., the nematode had a true bursa supported by bursal limb-like genital papillae but lacked a clear pharyngeal isthmus. The molecular phylogenetic status of the new nematode among tylenchid, cephalobid, panagrolaimid, aphelenchid and aphelenchoidid genera was analysed based on ca 1.2 kb of SSU ribosomal DNA sequence and the inferred position was basal to the family Aphelenchoididae. It was clearly not part of the clade containing the genus Aphelenchus (=Aphelenchidae). This nematode is described herein as Pseudaphelenchus yukiae n. gen., n. sp., and the family definition of Aphelenchoididae is emended to include the unique morphological characters of this new genus. The molecular phylogenetic analysis supported the paraphyly of the three Aphelenchoidinae genera Aphelenchoides, Laimaphelenchus and Schistonchus and the monophyly of Ektaphelenchinae, Seinura (Seinurinae) and Noctuidonema (Acugutturinae). However, many more representatives are needed to resolve the family-genus level phylogeny of Aphelenchoididae.


Sign in / Sign up

Export Citation Format

Share Document