scholarly journals Machine Learning for Species Identification: The HebelomaProject from database to website

Author(s):  
Peter Bartlett ◽  
Ursula Eberhardt ◽  
Nicole Schütz ◽  
Henry Beker

Attempts to use machine learning (ML) for species identification of macrofungi have usually involved the use of image recognition to deduce the species from photographs, sometimes combining this with collection metadata. Our approach is different: we use a set of quantified morphological characters (for example, the average length of the spores) and locality (GPS coordinates). Using this data alone, the machine can learn to differentiate between species. Our case study is the genus Hebeloma, fungi within the order Agaricales, where species determination is renowned as a difficult problem. Whether it is as a result of recent speciation, the plasticity of the species, hybridization or stasis is a difficult question to answer. What is sure is that this has led to difficulties with species delimitation and consequently a controversial taxonomy. The Hebeloma Project—our attempt to solve this problem by rigorously understanding the genus—has been evolving for over 20 years. We began organizing collections in a database in 2003. The database now has over 10,000 collections, from around the world, with not only metadata but also morphological descriptions and photographs, both macroscopic and microscopic, as well as molecular data including at least an internal transcribed spacer (ITS) sequence (generally, but not universally, accepted as a DNA barcode marker for fungi (Schoch et al. 2012)), and in many cases sequences of several loci. Included within this set of collections are almost all type specimens worldwide. The collections on the database have been analysed and compared. The analysis uses both the morphological and molecular data as well as information about habitat and location. In this way, almost all collections are assigned to a species. This development has been enabled and assisted by citizen scientists from around the globe, collecting and recording information about their finds as well as preserving material. From this database, we have built a website, which updates as the database updates. The website (hebeloma.org) is currently undergoing beta testing prior to a public launch. It includes up-to-date species descriptions, which are generated by amalgamating the data from the collections of each species in the database. Additional tools allow the user to explore those species with similar habitat preferences, or those from a particular biogeographic area. The user is also able to compare a range of characters of different species via an interactive plotter. The ML-based species identifier is featured on the website. The standardised storage of the collection data on the database forms the backbone for the identifier. A portion of the collections on the database are (almost) randomly selected as a training set for the learning phase of the algorithm. The learning is “supervised” in the sense that collections in the training set have been pre-assigned to a species by expert analysis. With the learning phase complete, the remainder of the database collections may then be used for testing. To use the species identifier on the website, a user inputs the same small number of morphological characters used to train the tool and it promptly returns the most likely species represented, ranked in order of probability. As well as describing the neural network behind the species identifier tool, we will demonstrate it in action on the website, present the successful results it has had in testing to date and discuss its current limitations and possible generalizations.

Zootaxa ◽  
2019 ◽  
Vol 4657 (1) ◽  
pp. 177-182 ◽  
Author(s):  
MICHAL MOTYKA

Almost all net-winged beetles are members of Müllerian complexes and their similarity due to phenotypic coevolution sometimes complicates species identification and generic placement. Therefore, large specimen series, detailed exhaustive examination of morphological characters and molecular data are needed to clarify the taxonomic placement. Using mitochondrial DNA sequences, I investigated the sexual dimorphism and generic placement of the recently described species Calochromus pardus Kazantsev, 2018. I found that the species does not belong in Calochromus Guérin-Méneville, 1833 and all morphological characters and molecular analyses point to its placement in Micronychus Motschulsky, 1861. Therefore, Micronychus pardus (Kazantsev, 2018), comb. nov. is proposed. Additionally, the male is described here for the first time showing the sexual dimorphism in the species. Unlike the females, the males do not superficially resemble members of Xylobanus Waterhouse, 1879 with bright coloured elytral costae and black background, but mimics the sympatrically occurring yellow and black lycids in the genus Cautires Waterhouse, 1879. 


2020 ◽  
Vol 20 (9) ◽  
pp. 671-679
Author(s):  
Dutrudi Panprommin ◽  
Kanyanat Soontornprasit ◽  
Siriluck Tuncharoen ◽  
Niti Iamchuen

The species identification of larval fish is very important for sustainable fishery resource management. However, identification based on morphological characters is very difficult, complex and error-prone. DNA barcoding with the sequence of cytochrome c oxidase I (COI) gene was used to identify larval fish species from 10 stations in the tributaries of the lower Ing River. One hundred and six samples were collected between May 2016 and April 2017. The average length of the COI nucleotide sequences was approximately 640 bp. A total of 99 nucleotide sequences were identified in 35 species, 31 genera, 19 families and 9 orders, with 97-100% identity with entries in both the GenBank and BOLD databases. The genetic distance within species ranged from 0.000 to 0.004. However, seven samples were identified at only the genus level because their sequences had not been reported in any databases. Based on IUCN conservation status, most species were classified as least concern (77.14%). Approximately 69.23% of all species were related to human uses in fisheries, aquaculture or aquariums, whereas 30.77% of species were not assessed. Trichopsis vittata (family Osphronemidae) (90%) had the most frequency of occurrence, followed by Oryzias minutillus (family Adrianichthyidae) (70%) and Trichopodus trichopterus (family Osphronemidae) (70%).


Zootaxa ◽  
2011 ◽  
Vol 3104 (1) ◽  
pp. 42 ◽  
Author(s):  
MICHELE CESARI ◽  
ILARIA GIOVANNINI ◽  
ROBERTO BERTOLANI ◽  
LORENA REBECCHI

We have in recent papers revealed that an integrative taxonomy approach helps to solve taxonomic problems in tardigrades. However, whole tardigrades are required for DNA work, which leaves no hologenophore voucher specimens with adult morphology. Using a novel methodology for the Tardigrada, we introduce the practice of collecting high quality maximum magnification light microscopy images of recently thawed animals to act as hologenophore voucher specimens of animals later used for DNA barcode sequencing. Within the framework of a DNA barcoding project on tardigrades, we collected a moss sample from the type locality of Macrobiotus terminalis Bertolani & Rebecchi, 1993 (Castelsantangelo, Central Apennines, Italy), a species of the “Macrobiotus hufelandi group”. Within the moss sample we found several animals and eggs with a morphology that corresponded to the original description of M. terminalis, while others were attributable to Macrobiotus macrocalix Bertolani & Rebecchi, 1993. In this study, molecular (cox1 mtDNA) analyses demonstrated no intraspecific variability in M. terminalis from the type locality but very large interspecific differences when compared with M. macrocalix and GenBank data for other species within the M. “hufelandi group”. There was also a large difference between our M. terminalis sequences and the GenBank data of a specimen attributed to the same species. The GenBank sequence originated from a population in the Northern Apennines, whose morphology appeared to be like that of the specimens of the locus typicus. This confirmed the importance in utilising material from the type locality for linking molecular data to the species’ morphological characters. Our paper underlines the importance of an integrative taxonomy in species diagnoses and demonstrates a scenario where morphological observations alone are not always sufficient. Lastly, this work adds reliable information to the sequence reference library that provides a useful building block for further studies on similar and related tardigrade taxa.


ZooKeys ◽  
2021 ◽  
Vol 1054 ◽  
pp. 85-93
Author(s):  
Ruiwen Wu ◽  
Xiongjun Liu ◽  
Takaki Kondo ◽  
Shan Ouyang ◽  
Xiaoping Wu

We diagnose and describe a new freshwater mussel species of the genus Inversidens, I. rentianensissp. nov. from Jiangxi Province, China based on morphological characters and molecular data. This paper includes a morphological description and photograph of the holotype, and partial sequences of mitochondrial COI as DNA barcode data.


ZooKeys ◽  
2021 ◽  
Vol 1037 ◽  
pp. 57-71
Author(s):  
Zhaoyang Chen ◽  
Dengqing Li ◽  
Daiqin Li ◽  
Xin Xu

We diagnose and describe three new species of the primitively segmented spider genus Songthela from Guizhou Province, China, based on morphological characters and molecular data: S. liuisp. nov. (♂♀), S. tianzhusp. nov. (♂♀), and S. yupingsp. nov. (♂♀). We provide the genetic distances within and among the three new species based on the DNA barcode gene, cytochrome c oxidase subunit I (COI) to support our descriptions. We also provide the COI GenBank accession codes for the three new species for future identification.


Phytotaxa ◽  
2018 ◽  
Vol 340 (3) ◽  
pp. 201 ◽  
Author(s):  
NINA B. ALEXEEVA

Until now, as few as 2–4 species of the genus Iris sect. Psammiris (Iris bloudowii, I. humilis, I. mandshurica, and I. potaninii) have been reported in Russia in botanical publications. We have analysed the diagnostic value of morphological characters. At the series level, features of the root system, the shape of basal leaves, the height of flowering scape, and the length of perianth tube are most significant. The shape and size of spathes are also usable for species identification. In the present contribution, a synopsis of I. sect. Psammiris in Russia is presented, including the description of a new series, Vorobievia. In that country, the section comprises 7 species belonging to 3 series and occurring mainly in Siberia and the Far East, one species extending to the Eastern Europe. A key for species determination is compiled, and the distribution areas of the accepted species are specified. Types are indicated for all involved names, two of which (lectotypes) are designated here. Furthermore, previous results of molecular studies including taxa in this section are analysed and discussed, which demonstrate that I. sect. Psammiris is indeed monophyletic according to the morphological and molecular data available so far.


Diversity ◽  
2020 ◽  
Vol 12 (2) ◽  
pp. 85
Author(s):  
Lotanna Micah Nneji ◽  
Adeniyi Charles Adeola ◽  
Yun-Yu Wang ◽  
Adeyemi Mufutau Ajao ◽  
Okorie Anyaele ◽  
...  

Comprehensive biodiversity assessment of moths in Nigeria rely greatly on accurate species identification. While most of the Nigerian moths are identified effortlessly using their morphological traits, some taxa are morphologically indistinguishable, which makes it difficult for taxon diagnosis. We investigated the efficiency of the DNA barcode, a fragment of the mitochondrial Cytochrome C oxidase subunit I, as a tool for the identification of Nigerian moths. We barcoded 152 individuals comprising 18 morphospecies collected from one of the remaining and threatened rainforest blocks of Nigeria – the Cross River National Park. Phenetic neighbor-joining tree and phylogenetic Maximum Likelihood approach were employed for the molecular-based species identification. Results showed that DNA barcodes enabled species-level identification of most of the individuals collected from the Park. Additionally, DNA barcoding unraveled the presence of at least six potential new and yet undescribed species—Amnemopsyche sp., Arctia sp., Deinypena sp., Hodebertia sp., Otroeda sp., and Palpita sp. The phylogenetic Maximum Likelihood using the combined dataset of all the newly assembled sequences from Nigeria showed that all species formed unique clades. The phylogenetic analyses provided evidence of population divergence in Euchromia lethe, Nyctemera leuconoe, and Deinypena lacista. This study thus illustrates the efficacy of DNA barcoding for species identification and discovery of potential new species, which demonstrates its relevance in biodiversity documentation of Nigerian moths. Future work should, therefore, extend to the creation of an exhaustive DNA barcode reference library comprising all species of moths from Nigeria to have a comprehensive insight on the diversity of moths in the country. Finally, we propose integrated taxonomic methods that would combine morphological, ecological, and molecular data in the identification and diversity studies of moths in Nigeria.


2020 ◽  
Vol 20 (3) ◽  
Author(s):  
Bruno R. Sampieri ◽  
Tatiana M. Steiner ◽  
Priscila C. Baroni ◽  
Camila Fernanda da Silva ◽  
Marcos A. L. Teixeira ◽  
...  

Abstract: Polychaetes are common in coastal and estuarine environments worldwide and constitute one of the most complex groups of marine invertebrates. The morpho-physiology of the female reproductive system (FRS) can be understood by using histological tools to describe reproductive cycle and gametogenesis paths and, among other purposes, aiming to identify and differentiate polychaete species. However, this histology-based approach is rarely combined with molecular tools, which is known to accurately delimitate species. In the same way, the description and understanding of oogenesis and vitellogenesis paths within polychaetes are lacking for most families, narrowing the range of its utility. Therefore, the present study aims to describe the oogenesis in three polychaete species common and abundant on the South American Atlantic coast (Laeonereis culveri, Scolelepis goodbodyi and Capitella biota) and investigate the utility of reproductive features and gametogenesis as a relevant associate knowledge to discriminate species, particularly useful for putative cryptic species, integrated with morphological and molecular data. In a first attempt, the results obtained herein allow the authors to describe two new subtypes of oogenesis, dividing it in extraovarian oogenesis type I and II and intraovarian type I and II. The results also demonstrate that the following histological characters of the FRS can be relevant for the separation of related species: a) oogenesis type, b) occurrence or absence of a true ovary, c) ovary tissue organization, d) type of accessory cells present, and e) oocyte morphology. Additionally, these histological features of FRS, when compared with correlated species studied under this scope, converge with the genetic data. The analysis of cytochrome oxidase I (COI) barcode sequences differentiates between North and South American Atlantic populations of L. culveri (16.78% genetic distance), while in S. goodbodyi and C. biota it discriminates them from their congeneric species. These results highlight the importance of multi-tool approach and shows that both FRS histology and histo-physiology, and DNA barcoding can be used to identify and discriminate cryptic species, which is usually not possible when using morphological characters. Besides, these characters may also be useful in differentiating related species, and/or geographically distinct populations among polychaetes.


Zootaxa ◽  
2021 ◽  
Vol 5056 (1) ◽  
pp. 1-67
Author(s):  
ISABEL MUÑOZ ◽  
EVA GARCÍA-ISARCH ◽  
JOSE A. CUESTA

An updated checklist of Mozambican marine brachyuran crabs is generated based on an exhaustive revision of the existing literature, together with the additional records provided by the specimens collected throughout the three “MOZAMBIQUE” surveys carried out in Mozambican waters during three consecutive years (2007–2009) by the Instituto Español de Oceanografía, (Spanish Institute of Oceanography, IEO). A total of 269 species, grouped in 15 superfamilies, 26 families and 172 genera are reported in the checklist, and a detailed inventory is produced with the list and remarks about the brachyuran species collected. Thirty-nine crab species belonging to 19 families were identified based on morphological characteristics and/or genetic tools. DNA barcode sequences (16S rRNA and/or COI) were obtained for 37 species, including 16S and COI sequences that are new for 26 and 14 species, respectively. Colour photographs of fresh specimens illustrate the comments about most species, being the first time that the original colour pattern is described for some of them. New records in Mozambican waters are reported for the species Paromolopsis boasi, Mursia aspera, Carcinoplax ischurodous, Tanaoa pustulosus, Euclosiana exquisita, Oxypleurodon difficilis, Naxioides robillardi, Samadinia galathea, Cyrtomaia gaillardi, Paramaja gibba, Pleistacantha ori, Parathranites granosus, Parathranites orientalis, Ovalipes iridescens and Charybdis smithii, and second records for Moloha alcocki, Samadinia pulchra and Charybdis africana. In addition, Raninoides crosnieri, S. galathea and P. ori were collected for the first time after their descriptions. The female of Samadinia galathea is described for the first time, and a potential new species of Mursia is reported. Some records expand the known bathymetric range of certain species and/or their general distribution. New molecular and morphological data suggest the necessity of the revision of P. boasi, R. crosnieri, C. africana and the genera Platymaia and Carcinoplax. The variability and taxonomic validity of some morphological characters in brachyuran systematic is discussed.  


2018 ◽  
Vol 109 (2) ◽  
pp. 200-211 ◽  
Author(s):  
I. Kelnarova ◽  
E. Jendek ◽  
V.V. Grebennikov ◽  
L. Bocak

AbstractAll more than 3000 species of Agrilus beetles are phytophagous and some cause economically significant damage to trees and shrubs. Facilitated by international trade, Agrilus species regularly invade new countries and continents. This necessitates a rapid identification of Agrilus species, as the first step for subsequent protective measures. This study provides the first DNA reference library for ~100 Agrilus species from the Northern Hemisphere based on three mitochondrial markers: cox1–5′ (DNA barcode fragment), cox1–3′, and rrnL. All 329 Agrilus records available in the Barcode of Life Database format, including specimen images and geo data, are released through a public dataset ‘Agrilus1 329’ available at: dx.doi.org/10.5883/DS-AGRILUS1. All Agrilus species were identified using adult morphology and by using molecular phylogenetic trees, as well as distance- and tree-based algorithms. Most DNA-based species limits agree well with the morphology-based identification. Our results include cases of high intraspecific variability and multiple species para- and polyphyly. DNA barcoding is a powerful species identification tool in Agrilus, although it frequently fails to recover morphologically-delimited Agrilus species-group. Even though the current three-gene database covers only ~3% of the known Agrilus diversity, it contains representatives of all principal lineages from the Northern Hemisphere and represents the most extensive dataset built for DNA-delimited species identification within this genus so far. Molecular data analyses can rapidly and cost-effectively identify an unknown sample, including immature stages and/or non-native taxa, or species not yet formally named.


Sign in / Sign up

Export Citation Format

Share Document