scholarly journals Classifying the Unknown: Identification of Insects by Deep Open-set Bayesian Learning

2021 ◽  
Author(s):  
Sarkhan Badirli ◽  
Christine J. Picard ◽  
George Mohler ◽  
Zeynep Akata ◽  
Murat Dundar

Insects represent a large majority of biodiversity on Earth, yet only 20% of the estimated 5.5 million insect species are currently described (1). While describing new species typically requires specific taxonomic expertise to identify morphological characters that distinguish it from other potential species, DNA-based methods have aided in providing additional evidence of separate species (2). Machine learning (ML) is emerging as a potential new approach in identifying new species, given that this analysis may be more sensitive to subtle differences humans may not process. Existing ML algorithms are limited by image repositories that do not include undescribed species. We developed a Bayesian deep learning method for the open-set classification of species. The proposed approach forms a Bayesian hierarchy of species around corresponding genera and uses deep embeddings of images and barcodes together to identify insects at the lowest level of abstraction possible. To demonstrate proof of concept, we used a database of 32,848 insect instances from 1,040 described species split into training and test data. The test data included 243 species not present in the training data. Our results demonstrate that using DNA sequences and images together, insect instances of described species can be classified with 96.66% accuracy while achieving accuracy of 81.39% in identifying genera of insect instances of undescribed species. The proposed deep open-set Bayesian model demonstrates a powerful new approach that can be used for the gargantuan task of identifying new insect species.

2021 ◽  
Author(s):  
Sarkhan Badirli ◽  
Christine J. Picard ◽  
George Mohler ◽  
Zeynep Akata ◽  
Murat Dundar

Abstract Insects represent a large majority of biodiversity on Earth, yet so few species are described. Describing new species typicallyrequires specific taxonomic expertise to identify morphological characters that distinguish it from other known species andDNA-based methods have aided in providing additional evidence of separate species. Machine learning (ML) provides apowerful method in identifying new species given its analytical processing is more sensitive to subtle physical differencesin images humans may not process. Existing ML algorithms are limited by image repositories that only contain describedspecies, leaving out the possibility of identifying new species. We develop a Bayesian deep learning method for zero-shotclassification of species. The proposed approach forms a Bayesian hierarchy of species around corresponding genera anduses deep embeddings of images and DNA barcodes to identify insects to the lowest taxonomic level possible. To demonstratethis proof of concept, we use a database of 32,848 insect images from 1,040 described species split into training and test datawherein the test data includes 243 species not present in the training data. Our results demonstrate that using DNA sequencesand images together, known insects can be classified with 96.66% accuracy while unknown (to the database) insects have anaccuracy of 81.39% in identifying the correct genus. The proposed deep zero-shot Bayesian model demonstrates a powerfulnew approach that can be used for the gargantuan task of identifying new insect species.


Zootaxa ◽  
2021 ◽  
Vol 4926 (2) ◽  
pp. 151-188
Author(s):  
JAVIER FRESNEDA ◽  
VALERIA RIZZO ◽  
JORDI COMAS ◽  
IGNACIO RIBERA

We redefine the genus Troglocharinus Reitter, 1908 based on a phylogenetic analysis with a combination of mitochondrial and molecular data. We recovered the current Speonomites mengeli (Jeannel, 1910) and S. mercedesi (Zariquiey, 1922) as valid, separate species within the Troglocharinus clade, not directly related to Speonomites Jeannel, 1910, a finding corroborated by a detailed study of the male and female genitalia. In consequence, we reinstate Speonomus mercedesi Zariquiey, 1922 stat. nov. as a valid species, transfer both of them to the genus Troglocharinus, T. mengeli (Jeannel, 1910) comb. nov. and T. mercedesi (Zariquiey, 1922) comb. nov., and redescribe the genus. The study of new material from the distribution area of the former S. mengeli revealed the presence of two undescribed species, T. sendrai sp. nov. and T. fadriquei sp. nov., which we describe herein. We designate the lectotype of Speonomus vinyasi Escolà, 1971 to fix its identity, as among its syntypes there are two different species. In agreement with the results of the phylogenetic analyses we establish the synonymy between the genus Speonomites and Pallaresiella Fresneda, 1998 syn. nv. 


2007 ◽  
Vol 21 (2) ◽  
pp. 173 ◽  
Author(s):  
Klaus Rützler ◽  
Manuel Maldonado ◽  
Carla Piantoni ◽  
Ana Riesgo

The systematics of tropical and subtropical western Atlantic species of Iotrochota is re-examined in light of the discovery of an undescribed species. Iotrochota birotulata (Higgin), the type species, is found to have more characters than previously recognised and is redefined with emphasis on a skeleton of spongin fibres containing stout, curved styles and strongyles (category I) and an interstitial spiculation consisting mainly of longer, slender and straight styles (II). Iotrochota bistylata Boury-Esnault is confirmed as a synonym of the above. The new species, named I. arenosa, sp. nov., differs in external morphology, strong mucus development, incorporation of sand and interstitial spicules that are mainly long, straight strongyles. Iotrochota atra (Whitfield), thought to be a synonym of I. birotulata, is recognised as a separate species occurring exclusively in the Bahamas and is found to be a senior synonym of I. imminuta Pulitzer-Finali; it is morphologically very similar to I. birotulata, but lacks birotulae and has a strongly reduced skeleton of megascleres (mostly one category of delicate strongyles). Iotrochota agglomerata Lehnert & van Soest is recognised as the fourth distinct species for its unusual colour (orange), thinly encrusting habit and special spiculation (styles with tylostylote modifications).


Genetics ◽  
2021 ◽  
Author(s):  
Marco Lopez-Cruz ◽  
Gustavo de los Campos

Abstract Genomic prediction uses DNA sequences and phenotypes to predict genetic values. In homogeneous populations, theory indicates that the accuracy of genomic prediction increases with sample size. However, differences in allele frequencies and in linkage disequilibrium patterns can lead to heterogeneity in SNP effects. In this context, calibrating genomic predictions using a large, potentially heterogeneous, training data set may not lead to optimal prediction accuracy. Some studies tried to address this sample size/homogeneity trade-off using training set optimization algorithms; however, this approach assumes that a single training data set is optimum for all individuals in the prediction set. Here, we propose an approach that identifies, for each individual in the prediction set, a subset from the training data (i.e., a set of support points) from which predictions are derived. The methodology that we propose is a Sparse Selection Index (SSI) that integrates Selection Index methodology with sparsity-inducing techniques commonly used for high-dimensional regression. The sparsity of the resulting index is controlled by a regularization parameter (λ); the G-BLUP (the prediction method most commonly used in plant and animal breeding) appears as a special case which happens when λ = 0. In this study, we present the methodology and demonstrate (using two wheat data sets with phenotypes collected in ten different environments) that the SSI can achieve significant (anywhere between 5-10%) gains in prediction accuracy relative to the G-BLUP.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Young Jae Kim ◽  
Jang Pyo Bae ◽  
Jun-Won Chung ◽  
Dong Kyun Park ◽  
Kwang Gi Kim ◽  
...  

AbstractWhile colorectal cancer is known to occur in the gastrointestinal tract. It is the third most common form of cancer of 27 major types of cancer in South Korea and worldwide. Colorectal polyps are known to increase the potential of developing colorectal cancer. Detected polyps need to be resected to reduce the risk of developing cancer. This research improved the performance of polyp classification through the fine-tuning of Network-in-Network (NIN) after applying a pre-trained model of the ImageNet database. Random shuffling is performed 20 times on 1000 colonoscopy images. Each set of data are divided into 800 images of training data and 200 images of test data. An accuracy evaluation is performed on 200 images of test data in 20 experiments. Three compared methods were constructed from AlexNet by transferring the weights trained by three different state-of-the-art databases. A normal AlexNet based method without transfer learning was also compared. The accuracy of the proposed method was higher in statistical significance than the accuracy of four other state-of-the-art methods, and showed an 18.9% improvement over the normal AlexNet based method. The area under the curve was approximately 0.930 ± 0.020, and the recall rate was 0.929 ± 0.029. An automatic algorithm can assist endoscopists in identifying polyps that are adenomatous by considering a high recall rate and accuracy. This system can enable the timely resection of polyps at an early stage.


Diversity ◽  
2021 ◽  
Vol 13 (7) ◽  
pp. 290
Author(s):  
Andrew M. Hosie ◽  
Jane Fromont ◽  
Kylie Munyard ◽  
Diana S. Jones

The subfamily Acastinae contains a diverse group of barnacles that are obligate symbionts of sponges and alcyonacean and antipatharian corals. Integrating morphological and genetic (COI) data to compare against known species, this paper reports on nine species of sponge-inhabiting barnacles of the subfamily Acastinae, including three undescribed species (Acasta caveata sp. nov., Euacasta acutaflava sp. nov., and E. excoriatrix sp. nov.) and three species previously not recorded in Australian waters (A. sandwichi, Pectinoacasta cancellorum, and P. sculpturata). The new species are distinguished from similar species by a suite of morphological characters as well as genetic distances. A lectotype for Pectinoacasta cancellorum is designated. Sponge hosts were identified for all specimens where possible and are represented by 19 species from eight families and five orders.


2015 ◽  
Vol 46 (1) ◽  
pp. 1-36
Author(s):  
Alexey V. Solovyev

The genus Nirmides Hering, 1931 is revised. Nowadays it includes 17 species, known from Thailand, Vietnam and the Andamans to the Philippines. Eight species are described as new to science: N. siamasp. n. (Thailand), N. ihleisp. n. (Thailand), N. dianasp. n. (Andaman Islands), N. samaressp. n. (Philippines, Samar), N. lourensisp. n. (Philippines, Luzon), N. similissp. n. (Philippines, Mindanao), N. kanlaonensissp. n. (Philippines, Negros), and N. hollowayisp. n. (Borneo). Lectotypes are designated for Susica basalis Walker, 1862 and Nirmides basalis f. fusca Hering, 1931. The taxon Nirma micron van Eecke, 1929 is removed from synonymy with Nirmides basalis (Walker, 1862) and restored to a separate species. A new synonymy is proposed: Nirmides micron (van Eecke, 1929) = Nirmides manwahi Holloway, 1990, syn. n. The homology of the sclerites of the male genitalia is discussed; the musculature of the male genitalia is examined. A key to species is given.


Phytotaxa ◽  
2015 ◽  
Vol 219 (2) ◽  
pp. 174
Author(s):  
Fabiana Firetti Leggieri ◽  
DIEGO DEMARCO ◽  
LÚCIA G. LOHMANN

The Atlantic Forest of Brazil includes one of the highest species diversity and endemism in the planet, representing a priority for biodiversity conservation. A new species of Anemopaegma from the Atlantic Forest of Brazil is here described, illustrated and compared to its closest relatives. Anemopaegma nebulosum Firetti-Leggieri & L.G. Lohmann has been traditionally treated as a morph of Anemopaegma prostratum; however, additional morphological and anatomical studies indicated that A. nebulosum differs significantly from A. prostratum and is best treated as a separate species. More specifically, A. nebulosum is characterized by elliptic and coriaceous leaflets (vs. ovate to orbicular and membranaceous in A. prostratum), smaller leaflet blades (3.6–5.5 x 2.0–3.0 cm vs. 6.7–13.0 x 4.2–8.4 cm in A. prostratum), orbicular prophylls of the axillary buds (vs. no prophylls in A. prostratum), solitary flowers (vs. multi-flowered axillary racemes in A. prostratum) and a gibbous corolla (vs. infundibuliform corollas in A. prostratum). In addition, A. nebulosum differs from A. prostratum anatomically in having thicker leaflet blades composed of two to four layers of palisade parenchyma (vs. one to three layers in A. prostratum), and seven to eight layers in the spongy parenchyma (vs. six to eight layers in A. prostratum). A key for the identification of all species of Anemopaegma from the Atlantic Forest of Brazil is presented.


Phytotaxa ◽  
2013 ◽  
Vol 147 (2) ◽  
pp. 48 ◽  
Author(s):  
HAI-XIA MA ◽  
LARISSA VASILYEVA ◽  
YU LI

Xylaria fusispora, an undescribed species of Xylaria (Xylariales, Xylariaceae), is described and illustrated as a new species based on collections from Guizhou Province, China. Both morphology and phylogenetic analysis of nrDNA ITS sequences support the establishment of this new species. The fungus is characterized by its fusoid-equilateral ascospores and an ascus apical ring not bluing in Melzer’s reagent. The differences between the new species and the related fungi are discussed.


Author(s):  
Dirk Erpenbeck ◽  
Merrick Ekins ◽  
Nicole Enghuber ◽  
John N.A. Hooper ◽  
Helmut Lehnert ◽  
...  

Sponge species are infamously difficult to identify for non-experts due to their high morphological plasticity and the paucity of informative morphological characters. The use of molecular techniques certainly helps with species identification, but unfortunately it requires prior reference sequences. Holotypes constitute the best reference material for species identification, however their usage in molecular systematics and taxonomy is scarce and frequently not even attempted, mostly due to their antiquity and preservation history. Here we provide case studies in which we demonstrate the importance of using holotype material to answer phylogenetic and taxonomic questions. We also demonstrate the possibility of sequencing DNA fragments out of century-old holotypes. Furthermore we propose the deposition of DNA sequences in conjunction with new species descriptions.


Sign in / Sign up

Export Citation Format

Share Document