Deep learning for HGT insertion sites recognition

Abstract Background Horizontal Gene Transfer (HGT) refers to the sharing of genetic materials between distant species that are not in a parent-offspring relationship. The HGT insertion sites are important to understand the HGT mechanisms. Recent studies in main agents of HGT, such as transposon and plasmid, demonstrate that insertion sites usually hold specific sequence features. This motivates us to find a method to infer HGT insertion sites according to sequence features. Results In this paper, we propose a deep residual network, DeepHGT, to recognize HGT insertion sites. To train DeepHGT, we extracted about 1.55 million sequence segments as training instances from 262 metagenomic samples, where the ratio between positive instances and negative instances is about 1:1. These segments are randomly partitioned into three subsets: 80% of them as the training set, 10% as the validation set, and the remaining 10% as the test set. The training loss of DeepHGT is 0.4163 and the validation loss is 0.423. On the test set, DeepHGT has achieved the area under curve (AUC) value of 0.8782. Furthermore, in order to further evaluate the generalization of DeepHGT, we constructed an independent test set containing 689,312 sequence segments from another 147 gut metagenomic samples. DeepHGT has achieved the AUC value of 0.8428, which approaches the previous test AUC value. As a comparison, the gradient boosting classifier model implemented in PyFeat achieve an AUC value of 0.694 and 0.686 on the above two test sets, respectively. Furthermore, DeepHGT could learn discriminant sequence features; for example, DeepHGT has learned a sequence pattern of palindromic subsequences as a significantly (P-value=0.0182) local feature. Hence, DeepHGT is a reliable model to recognize the HGT insertion site. Conclusion DeepHGT is the first deep learning model that can accurately recognize HGT insertion sites on genomes according to the sequence pattern.

Download Full-text

Can Deep Learning Using Weight Bearing Knee Anterio-Posterior Radiograph Alone Replace a Whole-Leg Radiograph in the Interpretation of Weight Bearing Line Ratio?

Journal of Clinical Medicine ◽

10.3390/jcm10081772 ◽

2021 ◽

Vol 10 (8) ◽

pp. 1772

Author(s):

Hyun-Doo Moon ◽

Han-Gyeol Choi ◽

Kyong-Joon Lee ◽

Dong-Jun Choi ◽

Hyun-Jin Yoo ◽

...

Keyword(s):

Deep Learning ◽

Lower Limb ◽

Assessment Tool ◽

Weight Bearing ◽

Absolute Error ◽

Limb Alignment ◽

Lower Limb Alignment ◽

Test Set ◽

Cumulative Score ◽

Validation Set

Weight bearing whole-leg radiograph (WLR) is essential to assess lower limb alignment such as weight bearing line (WBL) ratio. The purpose of this study was to develop a deep learning (DL) model that predicts the WBL ratio using knee standing AP alone. Total of 3997 knee AP & WLRs were used. WBL ratio was used for labeling and analysis of prediction accuracy. The WBL ratio was divided into seven categories (0, 0.1, 0.2, 0.3, 0.4, 0.5, and 0.6). After training, performance of the DL model was evaluated. Final performance was evaluated using 386 subjects as a test set. Cumulative score (CS) within error range 0.1 was set with showing maximum CS in the validation set (95% CI, 0.924–0.970). In the test set, mean absolute error was 0.054 (95% CI, 0.048–0.061) and CS was 0.951 (95% CI, 0.924–0.970). Developed DL algorithm could predict the WBL ratio on knee standing AP alone with comparable accuracy as the degree primary physician can assess the alignment. It can be the basis for developing an automated lower limb alignment assessment tool that can be used easily and cost-effectively in primary clinics.

Download Full-text

Diagnostic Classification of Cystoscopic Images Using Deep Convolutional Neural Networks

JCO Clinical Cancer Informatics ◽

10.1200/cci.17.00126 ◽

2018 ◽

pp. 1-8 ◽

Cited By ~ 9

Author(s):

Okyaz Eminaga ◽

Nurettin Eminaga ◽

Axel Semjonow ◽

Bernhard Breil

Keyword(s):

Deep Learning ◽

Harmonic Series ◽

Diagnostic Classification ◽

Training Set ◽

Deep Convolutional Neural Networks ◽

Data Set ◽

Test Set ◽

Filter Size ◽

Validation Set

Purpose The recognition of cystoscopic findings remains challenging for young colleagues and depends on the examiner’s skills. Computer-aided diagnosis tools using feature extraction and deep learning show promise as instruments to perform diagnostic classification. Materials and Methods Our study considered 479 patient cases that represented 44 urologic findings. Image color was linearly normalized and was equalized by applying contrast-limited adaptive histogram equalization. Because these findings can be viewed via cystoscopy from every possible angle and side, we ultimately generated images rotated in 10-degree grades and flipped them vertically or horizontally, which resulted in 18,681 images. After image preprocessing, we developed deep convolutional neural network (CNN) models (ResNet50, VGG-19, VGG-16, InceptionV3, and Xception) and evaluated these models using F1 scores. Furthermore, we proposed two CNN concepts: 90%-previous-layer filter size and harmonic-series filter size. A training set (60%), a validation set (10%), and a test set (30%) were randomly generated from the study data set. All models were trained on the training set, validated on the validation set, and evaluated on the test set. Results The Xception-based model achieved the highest F1 score (99.52%), followed by models that were based on ResNet50 (99.48%) and the harmonic-series concept (99.45%). All images with cancer lesions were correctly determined by these models. When the focus was on the images misclassified by the model with the best performance, 7.86% of images that showed bladder stones with indwelling catheter and 1.43% of images that showed bladder diverticulum were falsely classified. Conclusion The results of this study show the potential of deep learning for the diagnostic classification of cystoscopic images. Future work will focus on integration of artificial intelligence–aided cystoscopy into clinical routines and possibly expansion to other clinical endoscopy applications.

Download Full-text

The semantic segmentation approach for normal and pathologic tympanic membrane using deep learning

10.1101/515007 ◽

2019 ◽

Cited By ~ 2

Author(s):

Jungirl Seok ◽

Jae-Jin Song ◽

Ja-Won Koo ◽

Hee Chan Kim ◽

Byung Yoon Choi

Keyword(s):

Deep Learning ◽

Tympanic Membrane ◽

Medical Records ◽

Semantic Segmentation ◽

Detection Accuracy ◽

Test Set ◽

Starting Point ◽

Segmentation Approach ◽

Validation Set ◽

Deep Learning Model

AbstractObjectivesThe purpose of this study was to create a deep learning model for the detection and segmentation of major structures of the tympanic membrane.MethodsTotal 920 tympanic endoscopic images had been stored were obtained, retrospectively. We constructed a detection and segmentation model using Mask R-CNN with ResNet-50 backbone targeting three clinically meaningful structures: (1) tympanic membrane (TM); (2) malleus with side of tympanic membrane; and (3) suspected perforation area. The images were randomly divided into three sets – taining set, validation set, and test set – at a ratio of 0.6:0.2:0.2, resulting in 548, 187, and 185 images, respectively. After assignment, 548 tympanic membrane images were augmented 50 times each, reaching 27,400 images.ResultsAt the most optimized point of the model, it achieved a mean average precision of 92.9% on test set. When an intersection over Union (IoU) score of greater than 0.5 was used as the reference point, the tympanic membrane was 100% detectable, the accuracy of side of the tympanic membrane based on the malleus segmentation was 88.6% and detection accuracy of suspicious perforation was 91.4%.ConclusionsAnatomical segmentation may allow the inclusion of an explanation provided by deep learning as part of the results. This method is applicable not only to tympanic endoscope, but also to sinus endoscope, laryngoscope, and stroboscope. Finally, it will be the starting point for the development of automated medical records descriptor of endoscope images.

Download Full-text

CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.3301590 ◽

2019 ◽

Vol 33 ◽

pp. 590-597 ◽

Cited By ~ 104

Author(s):

Jeremy Irvin ◽

Pranav Rajpurkar ◽

Michael Ko ◽

Yifan Yu ◽

Silviana Ciurea-Ilcus ◽

...

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Chest Radiograph ◽

Large Dataset ◽

Test Set ◽

The Public ◽

Radiology Reports ◽

Validation Set ◽

Operating Points ◽

Level Performance

Large, labeled datasets have driven deep learning methods to achieve expert-level performance on a variety of medical imaging tasks. We present CheXpert, a large dataset that contains 224,316 chest radiographs of 65,240 patients. We design a labeler to automatically detect the presence of 14 observations in radiology reports, capturing uncertainties inherent in radiograph interpretation. We investigate different approaches to using the uncertainty labels for training convolutional neural networks that output the probability of these observations given the available frontal and lateral radiographs. On a validation set of 200 chest radiographic studies which were manually annotated by 3 board-certified radiologists, we find that different uncertainty approaches are useful for different pathologies. We then evaluate our best model on a test set composed of 500 chest radiographic studies annotated by a consensus of 5 board-certified radiologists, and compare the performance of our model to that of 3 additional radiologists in the detection of 5 selected pathologies. On Cardiomegaly, Edema, and Pleural Effusion, the model ROC and PR curves lie above all 3 radiologist operating points. We release the dataset to the public as a standard benchmark to evaluate performance of chest radiograph interpretation models.

Download Full-text

Identifying Periampullary Regions in MRI Images Using Deep Learning

Frontiers in Oncology ◽

10.3389/fonc.2021.674579 ◽

2021 ◽

Vol 11 ◽

Author(s):

Yong Tang ◽

Yingjun Zheng ◽

Xinpei Chen ◽

Weijia Wang ◽

Qingxi Guo ◽

...

Keyword(s):

Deep Learning ◽

Learning Algorithm ◽

Test Set ◽

Deep Learning Algorithm ◽

Invasive Method ◽

T1 And T2 ◽

Magnetic Resonance Imaging Mri ◽

Validation Set ◽

Development And Validation ◽

Human Assessment

BackgroundDevelopment and validation of a deep learning method to automatically segment the peri-ampullary (PA) region in magnetic resonance imaging (MRI) images.MethodsA group of patients with or without periampullary carcinoma (PAC) was included. The PA regions were manually annotated in MRI images by experts. Patients were randomly divided into one training set, one validation set, and one test set. Deep learning methods were developed to automatically segment the PA region in MRI images. The segmentation performance of the methods was compared in the validation set. The model with the highest intersection over union (IoU) was evaluated in the test set.ResultsThe deep learning algorithm achieved optimal accuracies in the segmentation of the PA regions in both T1 and T2 MRI images. The value of the IoU was 0.68, 0.68, and 0.64 for T1, T2, and combination of T1 and T2 images, respectively.ConclusionsDeep learning algorithm is promising with accuracies of concordance with manual human assessment in segmentation of the PA region in MRI images. This automated non-invasive method helps clinicians to identify and locate the PA region using preoperative MRI scanning.

Download Full-text

Identifying Peripheral Neuropathy in Colour Fundus Photographs Based on Deep Learning

Diagnostics ◽

10.3390/diagnostics11111943 ◽

2021 ◽

Vol 11 (11) ◽

pp. 1943

Author(s):

Diego R. Cervera ◽

Luke Smith ◽

Luis Diaz-Santana ◽

Meenakshi Kumar ◽

Rajiv Raman ◽

...

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Peripheral Neuropathy ◽

Deep Neural Networks ◽

Retinal Images ◽

Perception Threshold ◽

Test Set ◽

Fundus Photographs ◽

Validation Set ◽

Vibration Perception

The aim of this study was to develop and validate a deep learning-based system to detect peripheral neuropathy (DN) from retinal colour images in people with diabetes. Retinal images from 1561 people with diabetes were used to predictDN diagnosed on vibration perception threshold. A total of 189 had diabetic retinopathy (DR), 276 had DN, and 43 had both DR and DN. 90% of the images were used for training and validation and 10% for testing. Deep neural networks, including Squeezenet, Inception, and Densenet were utilized, and the architectures were tested with and without pre-trained weights. Random transform of images was used during training. The algorithm was trained and tested using three sets of data: all retinal images, images without DR and images with DR. Area under the ROC curve (AUC) was used to evaluate performance. The AUC to predict DN on the whole cohort was 0.8013 (±0.0257) on the validation set and 0.7097 (±0.0031) on the test set. The AUC increased to 0.8673 (±0.0088) in the presence of DR. The retinal images can be used to identify individuals with DN and provides an opportunity to educate patients about their DN status when they attend DR screening.

Download Full-text

Construction of an Enterococcus faecalis Tn917-Mediated-Gene-Disruption Library Offers Insight into Tn917 Insertion Patterns

Journal of Bacteriology ◽

10.1128/jb.186.21.7280-7289.2004 ◽

2004 ◽

Vol 186 (21) ◽

pp. 7280-7289 ◽

Cited By ~ 44

Author(s):

Danielle A. Garsin ◽

Jonathan Urbach ◽

Jose C. Huguet-Tapia ◽

Joseph E. Peters ◽

Frederick M. Ausubel

Keyword(s):

Enterococcus Faecalis ◽

Hot Spot ◽

Consensus Sequence ◽

Insertion Site ◽

Open Reading Frames ◽

Site Preference ◽

Specific Sequence ◽

The Public ◽

Insertion Sites ◽

Replication Terminus

ABSTRACT Sequencing the insertion sites of 8,865 Tn917 insertions in Enterococcus faecalis strain OG1RF identified a hot spot in the replication terminus region corresponding to 6% of the genome where 65% of the transposons had inserted. In E. faecalis, Tn917 preferentially inserted at a 29-bp consensus sequence centered on TATAA, a 5-bp sequence that is duplicated during insertion. The regional insertion site preference at the chromosome terminus was not observed in another low-G+C gram-positive bacterium, Listeria monocytogenes, although the consensus insertion sequence was the same. The 8,865 Tn917 insertion sites sequenced in E. faecalis corresponded to only ∼610 different open reading frames, far fewer than the predicted number of 2,400, assuming random insertion. There was no significant preference in orientation of the Tn917 insertions with either transcription or replication. Even though OG1RF has a smaller genome than strain V583 (2.8 Mb versus 3.2 Mb), the only E. faecalis strain whose sequence is in the public domain, over 10% of the Tn917 insertions appear to be in a OG1RF-specific sequence, suggesting that there are significant genomic differences among E. faecalis strains.

Download Full-text

Multi-Class Parrot Image Classification Including Subspecies with Similar Appearance

Biology ◽

10.3390/biology10111140 ◽

2021 ◽

Vol 10 (11) ◽

pp. 1140

Author(s):

Woohyuk Jang ◽

Eui Chul Lee

Keyword(s):

Deep Learning ◽

Endangered Species ◽

Image Classification ◽

Data Augmentation ◽

Training Model ◽

Test Set ◽

Plant Image ◽

Similar Appearance ◽

Parrot Species ◽

Validation Set

Owing to climate change and human indiscriminate development, the population of endangered species has been decreasing. To protect endangered species, many countries worldwide have adopted the CITES treaty to prevent the extinction of endangered plants and animals. Moreover, research has been conducted using diverse approaches, particularly deep learning-based animal and plant image recognition methods. In this paper, we propose an automated image classification method for 11 endangered parrot species included in CITES. The 11 species include subspecies that are very similar in appearance. Data images were collected from the Internet and built in cooperation with Seoul Grand Park Zoo to build an indigenous database. The dataset for deep learning training consisted of 70% training set, 15% validation set, and 15% test set. In addition, a data augmentation technique was applied to reduce the data collection limit and prevent overfitting. The performance of various backbone CNN architectures (i.e., VGGNet, ResNet, and DenseNet) were compared using the SSD model. The experiment derived the test set image performance for the training model, and the results show that the DenseNet18 had the best performance with an mAP of approximately 96.6% and an inference time of 0.38 s.

Download Full-text

An Ensemble Model for Short-Term Wind Power Forecasting using Deep Learning and Gradient Boosting Algorithms

2020 21st National Power Systems Conference (NPSC) ◽

10.1109/npsc49263.2020.9331902 ◽

2020 ◽

Author(s):

Devesh Kumar ◽

Rishabh Abhinav ◽

Naran Pindoriya

Keyword(s):

Deep Learning ◽

Wind Power ◽

Gradient Boosting ◽

Ensemble Model ◽

Short Term ◽

Wind Power Forecasting ◽

Boosting Algorithms ◽

Power Forecasting

Download Full-text

A MYC-Driven Plasma Polyamine Signature for Early Detection of Ovarian Cancer

Cancers ◽

10.3390/cancers13040913 ◽

2021 ◽

Vol 13 (4) ◽

pp. 913

Author(s):

Johannes Fahrmann ◽

Ehsan Irajizad ◽

Makoto Kobayashi ◽

Jody Vykoukal ◽

Jennifer Dennison ◽

...

Keyword(s):

Ovarian Cancer ◽

Early Stage ◽

Area Under The Curve ◽

Polyamine Metabolism ◽

Healthy Controls ◽

Test Set ◽

Exact Test ◽

Oncogenic Driver ◽

Characteristic Area ◽

Validation Set

MYC is an oncogenic driver in the pathogenesis of ovarian cancer. We previously demonstrated that MYC regulates polyamine metabolism in triple-negative breast cancer (TNBC) and that a plasma polyamine signature is associated with TNBC development and progression. We hypothesized that a similar plasma polyamine signature may associate with ovarian cancer (OvCa) development. Using mass spectrometry, four polyamines were quantified in plasma from 116 OvCa cases and 143 controls (71 healthy controls + 72 subjects with benign pelvic masses) (Test Set). Findings were validated in an independent plasma set from 61 early-stage OvCa cases and 71 healthy controls (Validation Set). Complementarity of polyamines with CA125 was also evaluated. Receiver operating characteristic area under the curve (AUC) of individual polyamines for distinguishing cases from healthy controls ranged from 0.74–0.88. A polyamine signature consisting of diacetylspermine + N-(3-acetamidopropyl)pyrrolidin-2-one in combination with CA125 developed in the Test Set yielded improvement in sensitivity at >99% specificity relative to CA125 alone (73.7% vs 62.2%; McNemar exact test 2-sided P: 0.019) in the validation set and captured 30.4% of cases that were missed with CA125 alone. Our findings reveal a MYC-driven plasma polyamine signature associated with OvCa that complemented CA125 in detecting early-stage ovarian cancer.

Download Full-text