scholarly journals DeLUCS: Deep Learning for Unsupervised Classification of DNA Sequences

2021 ◽  
Author(s):  
Pablo Millan Arias ◽  
Fatemeh Alipour ◽  
Kathleen Hill ◽  
Lila Kari

We present a novel Deep Learning method for the Unsupervised Classification of DNA Sequences (DeLUCS) that does not require sequence alignment, sequence homology, or (taxonomic) identifiers. DeLUCS uses Chaos Game Representations (CGRs) of primary DNA sequences, and generates “mimic” sequence CGRs to self-learn data patterns (genomic signatures) through the optimization of multiple neural networks. A majority voting scheme is then used to determine the final cluster label for each sequence. DeLUCS is able to cluster large and diverse datasets, with accuracies ranging from 77% to 100%: 2,500 complete vertebrate mitochondrial genomes, at taxonomic levels from sub-phylum to genera; 3,200 randomly selected 400 kbp-long bacterial genome segments, into families; three viral genome and gene datasets, averaging 1,300 sequences each, into virus subtypes. DeLUCS significantly outperforms two classic clustering methods (K-means and Gaussian Mixture Models) for unlabelled data, by as much as 48%. DeLUCS is highly effective, it is able to classify datasets of unlabelled primary DNA sequences totalling over 1 billion bp of data, and it bypasses common limitations to classification resulting from the lack of sequence homology, variation in sequence length, and the absence or instability of sequence annotations and taxonomic identifiers. Thus, DeLUCS offers fast and accurate DNA sequence classification for previously unclassifiable datasets.

2021 ◽  
Vol 11 (18) ◽  
pp. 8578
Author(s):  
Yi-Cheng Huang ◽  
Ting-Hsueh Chuang ◽  
Yeong-Lin Lai

Trap-neuter-return (TNR) has become an effective solution to reduce the prevalence of stray animals. Due to the non-culling policy for stray cats and dogs since 2017, there is a great demand for the sterilization of cats and dogs in Taiwan. In 2020, Heart of Taiwan Animal Care (HOTAC) had more than 32,000 cases of neutered cats and dogs. HOTAC needs to take pictures to record the ears and excised organs of each neutered cat or dog from different veterinary hospitals. The correctness of the archived medical photos and the different shooting and imaging angles from different veterinary hospitals must be carefully reviewed by human professionals. To reduce the cost of manual review, Yolo’s ensemble learning based on deep learning and a majority voting system can effectively identify TNR surgical images, save 80% of the labor force, and its average accuracy (mAP) exceeds 90%. The best feature extraction based on the Yolo model is Yolov4, whose mAP reaches 91.99%, and the result is integrated into the voting classification. Experimental results show that compared with the previous manual work, it can decrease the workload by more than 80%.


Human Cell ◽  
2018 ◽  
Vol 31 (2) ◽  
pp. 102-105 ◽  
Author(s):  
Jun Miyake ◽  
Yuhei Kaneshita ◽  
Satoshi Asatani ◽  
Seiichi Tagawa ◽  
Hirohiko Niioka ◽  
...  

PeerJ ◽  
2018 ◽  
Vol 6 ◽  
pp. e4264 ◽  
Author(s):  
Gerardo Mendizabal-Ruiz ◽  
Israel Román-Godínez ◽  
Sulema Torres-Ramos ◽  
Ricardo A. Salido-Ruiz ◽  
Hugo Vélez-Pérez ◽  
...  

Genomic signal processing (GSP) methods which convert DNA data to numerical values have recently been proposed, which would offer the opportunity of employing existing digital signal processing methods for genomic data. One of the most used methods for exploring data is cluster analysis which refers to the unsupervised classification of patterns in data. In this paper, we propose a novel approach for performing cluster analysis of DNA sequences that is based on the use of GSP methods and the K-means algorithm. We also propose a visualization method that facilitates the easy inspection and analysis of the results and possible hidden behaviors. Our results support the feasibility of employing the proposed method to find and easily visualize interesting features of sets of DNA data.


Parasitology ◽  
2004 ◽  
Vol 128 (5) ◽  
pp. 511-519 ◽  
Author(s):  
I. D. WHITTINGTON ◽  
M. R. DEVENEY ◽  
J. A. T. MORGAN ◽  
L. A. CHISHOLM ◽  
R. D. ADLARD

Phylogenetic relationships within the Capsalidae (Monogenea) were examined using large subunit ribosomal DNA sequences from 17 capsalid species (representing 7 genera, 5 subfamilies), 2 outgroup taxa (Monocotylidae) plusUdonella caligorum(Udonellidae). Trees were constructed using maximum likelihood, minimum evolution and maximum parsimony algorithms. An initial tree, generated from sequences 315 bases long, suggests that Capsalinae, Encotyllabinae, Entobdellinae and Trochopodinae are monophyletic, but that Benedeniinae is paraphyletic. Analyses indicate thatNeobenedenia, currently in the Benedeniinae, should perhaps be placed in a separate subfamily. An additional analysis was made which omitted 3 capsalid taxa (for which only short sequences were available) and all outgroup taxa because of alignment difficulties. Sequence length increased to 693 bases and good branch support was achieved. The Benedeniinae was again paraphyletic. Higher-level classification of the Capsalidae, evolution of the Entobdellinae and issues of species identity inNeobenedeniaare discussed.


Methods ◽  
2019 ◽  
Vol 166 ◽  
pp. 66-73 ◽  
Author(s):  
Neo Christopher Chung ◽  
Bilal Mirza ◽  
Howard Choi ◽  
Jie Wang ◽  
Ding Wang ◽  
...  

Author(s):  
Abdelmalek Amine ◽  
Zakaria Elberrichi ◽  
Michel Simonet ◽  
Ladjel Bellatreche ◽  
Mimoun Malki

The classification of textual documents has been the subject of many studies. Technologies like the Web and numerical libraries facilitated the exponential growth of available documentation. The classification of textual documents is very important since it allows the users to effectively and quickly fly over and understand better the contents of large corpora. Most classification approaches use the supervised method of training, more suitable with small corpora and when human experts are available to generate the best classes of data for the training phase, which is not always feasible. The unsupervised classification or “clustering” methods make emerge latent (hidden) classes automatically with minimum human intervention, There are many, and the SOM (self Organized Maps) by Kohonen is one of the algorithms for unsupervised classification that gather a certain number of similar objects in groups without a priori knowledge. This chapter introduces the concept of unsupervised classification of textual documents and proposes an experiment with a conceptual approach for the representation of texts and the method of Kohonen for clustering.


Author(s):  
Liyang Zhu ◽  
Jungang Han ◽  
Renwen Guo ◽  
Dong Wu ◽  
Qiang Wei ◽  
...  

Background: Osteonecrosis of Femoral Head (ONFH) is a common complication in orthopaedics, wherein femoral structures are usually damaged due to the impairment or interruption of femoral head blood supply. Aim: In this study, we propose an automatic approach for the classification of the early ONFH with deep learning. Methods: We first classify all femoral CT slices according to their spatial locations with the Convolutional Neural Network (CNN). So, all CT slices are divided into upper, middle or lower segments of femur head. Then the femur head areas can be segmented with the Conditional Generative Adversarial Network (CGAN) for each part. The Convolutional Autoencoder is employed to reduce dimensions and extract features of femur head, and finally K-means clustering is used for an unsupervised classification of the early ONFH. Results: To invalidate the effectiveness of the proposed approach, we carry out the experiments on the dataset with 120 patients. The experimental results show that the segmentation accuracy is higher than 95%. The Convolutional Autoencoder can reduce the dimension of data, the Peak Signal-to-Noise Ratios (PSNRs) are better than 34dB for inputs and outputs. Meanwhile, there is a great intra-category similarity, and a significant inter-category difference. Conclusion: The research on the classification of the early ONFH has a valuable clinical merit, and hopefully it can assist physicians to apply more individualized treatment for patient.


Sign in / Sign up

Export Citation Format

Share Document