scholarly journals DeepHBV: A deep learning model to predict hepatitis B virus (HBV) integration sites.

2021 ◽  
Author(s):  
Canbiao Wu ◽  
Xiaofang Guo ◽  
Mengyuan Li ◽  
Xiayu Fu ◽  
Zeliang Hou ◽  
...  

Hepatitis B virus (HBV) is one of the main causes for viral hepatitis and liver cancer. Previous studies showed HBV can integrate into host genome and further promote malignant transformation. In this study, we developed an attention-based deep learning model DeepHBV to predict HBV integration sites by learning local genomic features automatically. We trained and tested DeepHBV using the HBV integration sites data from dsVIS database. Initially, DeepHBV showed AUROC of 0.6363 and AUPR of 0.5471 on the dataset. Adding repeat peaks and TCGA Pan Cancer peaks can significantly improve the model performance, with an AUROC of 0.8378 and 0.9430 and an AUPR of 0.7535 and 0.9310, respectively. On independent validation dataset of HBV integration sites from VISDB, DeepHBV with HBV integration sequences plus TCGA Pan Cancer (AUROC of 0.7603 and AUPR of 0.6189) performed better than HBV integration sequences plus repeat peaks (AUROC of 0.6657 and AUPR of 0.5737). Next, we found the transcriptional factor binding sites (TFBS) were significantly enriched near genomic positions that were paid attention to by convolution neural network. The binding sites of AR-halfsite, Arnt, Atf1, bHLHE40, bHLHE41, BMAL1, CLOCK, c-Myc, COUP-TFII, E2A, EBF1, Erra and Foxo3 were highlighted by DeepHBV attention mechanism in both dsVIS dataset and VISDB dataset, revealing the HBV integration preference. In summary, DeepHBV is a robust and explainable deep learning model not only for the prediction of HBV integration sites but also for further mechanism study of HBV induced cancer.

2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Canbiao Wu ◽  
Xiaofang Guo ◽  
Mengyuan Li ◽  
Jingxian Shen ◽  
Xiayu Fu ◽  
...  

Abstract Background The hepatitis B virus (HBV) is one of the main causes of viral hepatitis and liver cancer. HBV integration is one of the key steps in the virus-promoted malignant transformation. Results An attention-based deep learning model, DeepHBV, was developed to predict HBV integration sites. By learning local genomic features automatically, DeepHBV was trained and tested using HBV integration site data from the dsVIS database. Initially, DeepHBV showed an AUROC of 0.6363 and an AUPR of 0.5471 for the dataset. The integration of genomic features of repeat peaks and TCGA Pan-Cancer peaks significantly improved model performance, with AUROCs of 0.8378 and 0.9430 and AUPRs of 0.7535 and 0.9310, respectively. The transcription factor binding sites (TFBS) were significantly enriched near the genomic positions that were considered. The binding sites of the AR-halfsite, Arnt, Atf1, bHLHE40, bHLHE41, BMAL1, CLOCK, c-Myc, COUP-TFII, E2A, EBF1, Erra, and Foxo3 were highlighted by DeepHBV in both the dsVIS and VISDB datasets, revealing a novel integration preference for HBV. Conclusions DeepHBV is a useful tool for predicting HBV integration sites, revealing novel insights into HBV integration-related carcinogenesis.


Author(s):  
Rui Tian ◽  
Ping Zhou ◽  
Mengyuan Li ◽  
Jinfeng Tan ◽  
Zifeng Cui ◽  
...  

Abstract Human papillomavirus (HPV) integrating into human genome is the main cause of cervical carcinogenesis. HPV integration selection preference shows strong dependence on local genomic environment. Due to this theory, it is possible to predict HPV integration sites. However, a published bioinformatic tool is not available to date. Thus, we developed an attention-based deep learning model DeepHPV to predict HPV integration sites by learning environment features automatically. In total, 3608 known HPV integration sites were applied to train the model, and 584 reviewed HPV integration sites were used as the testing dataset. DeepHPV showed an area under the receiver-operating characteristic (AUROC) of 0.6336 and an area under the precision recall (AUPR) of 0.5670. Adding RepeatMasker and TCGA Pan Cancer peaks improved the model performance to 0.8464 and 0.8501 in AUROC and 0.7985 and 0.8106 in AUPR, respectively. Next, we tested these trained models on independent database VISDB and found the model adding TCGA Pan Cancer performed better (AUROC: 0.7175, AUPR: 0.6284) than the model adding RepeatMasker peaks (AUROC: 0.6102, AUPR: 0.5577). Moreover, we introduced attention mechanism in DeepHPV and enriched the transcription factor binding sites including BHLHA15, CHR, COUP-TFII, DMRTA2, E2A, HIC1, INR, NPAS, Nr5a2, RARa, SCL, Snail1, Sox10, Sox3, Sox4, Sox6, STAT6, Tbet, Tbx5, TEAD, Tgif2, ZNF189, ZNF416 near attention intensive sites. Together, DeepHPV is a robust and explainable deep learning model, providing new insights into HPV integration preference and mechanism. Availability: DeepHPV is available as an open-source software and can be downloaded from https://github.com/JiuxingLiang/DeepHPV.git, Contact: [email protected], [email protected], [email protected]


2021 ◽  
Vol 27 (1) ◽  
pp. 207-218
Author(s):  
Jeong Won Jang ◽  
Jin Seoub Kim ◽  
Hye Seon Kim ◽  
Kwon Yong Tak ◽  
Heechul Nam ◽  
...  

Background/Aims: The role of hepatitis B virus (HBV) integration into the host genome in hepatocarcinogenesis following hepatitis B surface antigen (HBsAg) seroclearance remains unknown. Our study aimed to investigate and characterize HBV integration events in chronic hepatitis B (CHB) patients who developed hepatocellular carcinoma (HCC) after HBsAg seroclearance.<br/>Methods: Using probe-based HBV capturing followed by next-generation sequencing technology, HBV integration was examined in 10 samples (seven tumors and three non-tumor tissues) from seven chronic carriers who developed HCC after HBsAg loss. Genomic locations and patterns of HBV integration were investigated.<br/>Results: HBV integration was observed in six patients (85.7%) and eight (80.0%) of 10 tested samples. HBV integration breakpoints were detected in all of the non-tumor (3/3, 100%) and five of the seven (71.4%) tumor samples, with an average number of breakpoints of 4.00 and 2.43, respectively. Despite the lower total number of tumoral integration breakpoints, HBV integration sites in the tumors were more enriched within the genic area. In contrast, non-tumor tissues more often showed intergenic integration. Regarding functions of the affected genes, tumoral genes with HBV integration were mostly associated with carcinogenesis. At enrollment, patients who did not remain under regular HCC surveillance after HBsAg seroclearance had a large HCC, while those on regular surveillance had a small HCC.<br/>Conclusions: The biological functions of HBV integration are almost comparable between HBsAg-positive and HBsAgserocleared HCCs, with continuing pro-oncogenic effects of HBV integration. Thus, ongoing HCC surveillance and clinical management should continue even after HBsAg seroclearance in patients with CHB.


2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Yi Yin ◽  
Mingyue Xue ◽  
Lingen Shi ◽  
Tao Qiu ◽  
Derun Xia ◽  
...  

Objective. To establish a machine learning model for identifying patients coinfected with hepatitis B virus (HBV) and human immunodeficiency virus (HIV) through two sexual transmission routes in Jiangsu, China. Methods. A total of 14197 HIV cases transmitted by homosexual and heterosexual routes were recruited. After data processing, 12469 cases (HIV and HBV, 1033; HIV, 11436) were left for further analysis, including 7849 cases with homosexual transmission and 4620 cases with heterosexual transmission. Univariate logistic regression was used to select variables with significant P value and odds ratio for multivariable analysis. In homosexual transmission and heterosexual transmission groups, 10 and 6 variables were selected, respectively. For identifying HIV individuals coinfected with HBV, a machine learning model was constructed with four algorithms, including Decision Tree, Random Forest, AdaBoost with decision tree (AdaBoost), and extreme gradient boosting decision tree (XGBoost). The detective value of each variable was calculated using the optimal machine learning algorithm. Results. AdaBoost algorithm showed the highest efficiency in both transmission groups (homosexual transmission group: accuracy = 0.928 , precision = 0.915 , recall = 0.944 , F − 1 = 0.930 , and AUC = 0.96 ; heterosexual transmission group: accuracy = 0.892 , precision = 0.881 , recall = 0.905 , F − 1 = 0.893 , and AUC = 0.98 ). Calculated by AdaBoost algorithm, the detective value of PLA was the highest in homosexual transmission group, followed by CR, AST, HB, ALT, TBIL, leucocyte, age, marital status, and treatment condition; in the heterosexual transmission group, the detective value of PLA was the highest (consistent with the condition in the homosexual group), followed by ALT, AST, TBIL, leucocyte, and symptom severity. Conclusions. The univariate logistics regression combined with the AdaBoost algorithm could accurately screen the risk factors of HBV in HIV coinfection without invasive testing. Further studies are needed to evaluate the utility and feasibility of this model in various settings.


2021 ◽  
Author(s):  
Jérôme Tubiana ◽  
Dina Schneidman-Duhovny ◽  
Haim J. Wolfson

Predicting the functional sites of a protein from its structure, such as the binding sites of small molecules, other proteins or antibodies sheds light on its function in vivo. Currently, two classes of methods prevail: Machine Learning (ML) models built on top of handcrafted features and comparative modeling. They are respectively limited by the expressivity of the handcrafted features and the availability of similar proteins. Here, we introduce ScanNet, an end-to-end, interpretable geometric deep learning model that learns features directly from 3D structures. ScanNet builds representations of atoms and amino acids based on the spatio-chemical arrangement of their neighbors. We train ScanNet for detecting protein-protein and protein-antibody binding sites, demonstrate its accuracy - including for unseen protein folds - and interpret the filters learned. Finally, we predict epitopes of the SARS-CoV-2 spike protein, validating known antigenic regions and predicting previously uncharacterized ones. Overall, ScanNet is a versatile, powerful, and interpretable model suitable for functional site prediction tasks. A webserver for ScanNet is available from http://bioinfo3d.cs.tau.ac.il/ScanNet/


1989 ◽  
Vol 9 (4) ◽  
pp. 1804-1809 ◽  
Author(s):  
R Ben-Levy ◽  
O Faktor ◽  
I Berger ◽  
Y Shaul

An 83-base-pair-long hepatitis B virus DNA fragment efficiently activates the transcription of the heterologous globin gene promoter. This fragment contains binding sites for at least four distinct cellular factors termed E, TGT3, EP, and NF-I. E is a positively acting factor, responsive to phorbol ester. EP is apparently identical to the factor EF-C that binds to the polyomavirus enhancer. The conservation of the binding site sequences for most of these factors in the genomes of other members of the hepadnavirus family suggests that these viruses share common enhancer elements.


Sign in / Sign up

Export Citation Format

Share Document