scholarly journals DeepHBV: a deep learning model to predict hepatitis B virus (HBV) integration sites

2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Canbiao Wu ◽  
Xiaofang Guo ◽  
Mengyuan Li ◽  
Jingxian Shen ◽  
Xiayu Fu ◽  
...  

Abstract Background The hepatitis B virus (HBV) is one of the main causes of viral hepatitis and liver cancer. HBV integration is one of the key steps in the virus-promoted malignant transformation. Results An attention-based deep learning model, DeepHBV, was developed to predict HBV integration sites. By learning local genomic features automatically, DeepHBV was trained and tested using HBV integration site data from the dsVIS database. Initially, DeepHBV showed an AUROC of 0.6363 and an AUPR of 0.5471 for the dataset. The integration of genomic features of repeat peaks and TCGA Pan-Cancer peaks significantly improved model performance, with AUROCs of 0.8378 and 0.9430 and AUPRs of 0.7535 and 0.9310, respectively. The transcription factor binding sites (TFBS) were significantly enriched near the genomic positions that were considered. The binding sites of the AR-halfsite, Arnt, Atf1, bHLHE40, bHLHE41, BMAL1, CLOCK, c-Myc, COUP-TFII, E2A, EBF1, Erra, and Foxo3 were highlighted by DeepHBV in both the dsVIS and VISDB datasets, revealing a novel integration preference for HBV. Conclusions DeepHBV is a useful tool for predicting HBV integration sites, revealing novel insights into HBV integration-related carcinogenesis.

2021 ◽  
Author(s):  
Canbiao Wu ◽  
Xiaofang Guo ◽  
Mengyuan Li ◽  
Xiayu Fu ◽  
Zeliang Hou ◽  
...  

Hepatitis B virus (HBV) is one of the main causes for viral hepatitis and liver cancer. Previous studies showed HBV can integrate into host genome and further promote malignant transformation. In this study, we developed an attention-based deep learning model DeepHBV to predict HBV integration sites by learning local genomic features automatically. We trained and tested DeepHBV using the HBV integration sites data from dsVIS database. Initially, DeepHBV showed AUROC of 0.6363 and AUPR of 0.5471 on the dataset. Adding repeat peaks and TCGA Pan Cancer peaks can significantly improve the model performance, with an AUROC of 0.8378 and 0.9430 and an AUPR of 0.7535 and 0.9310, respectively. On independent validation dataset of HBV integration sites from VISDB, DeepHBV with HBV integration sequences plus TCGA Pan Cancer (AUROC of 0.7603 and AUPR of 0.6189) performed better than HBV integration sequences plus repeat peaks (AUROC of 0.6657 and AUPR of 0.5737). Next, we found the transcriptional factor binding sites (TFBS) were significantly enriched near genomic positions that were paid attention to by convolution neural network. The binding sites of AR-halfsite, Arnt, Atf1, bHLHE40, bHLHE41, BMAL1, CLOCK, c-Myc, COUP-TFII, E2A, EBF1, Erra and Foxo3 were highlighted by DeepHBV attention mechanism in both dsVIS dataset and VISDB dataset, revealing the HBV integration preference. In summary, DeepHBV is a robust and explainable deep learning model not only for the prediction of HBV integration sites but also for further mechanism study of HBV induced cancer.


2009 ◽  
Vol 83 (17) ◽  
pp. 8396-8408 ◽  
Author(s):  
William S. Mason ◽  
Huey-Chi Low ◽  
Chunxiao Xu ◽  
Carol E. Aldrich ◽  
Catherine A. Scougall ◽  
...  

ABSTRACT During a hepadnavirus infection, viral DNA integrates at a low rate into random sites in the host DNA, producing unique virus-cell junctions detectable by inverse nested PCR (invPCR). These junctions serve as genetic markers of individual hepatocytes, providing a means to detect their subsequent proliferation into clones of two or more hepatocytes. A previous study suggested that the livers of 2.4-year-old woodchucks (Marmota monax) chronically infected with woodchuck hepatitis virus contained at least 100,000 clones of >1,000 hepatocytes (W. S. Mason, A. R. Jilbert, and J. Summers, Proc. Natl. Acad. Sci. USA 102:1139-1144, 2005). However, possible correlations between sites of viral-DNA integration and clonal expansion could not be explored because the woodchuck genome has not yet been sequenced. In order to further investigate this issue, we looked for similar clonal expansion of hepatocytes in the livers of chimpanzees chronically infected with hepatitis B virus (HBV). Liver samples for invPCR were collected from eight chimpanzees chronically infected with HBV for at least 20 years. Fifty clones ranging in size from ∼35 to 10,000 hepatocytes were detected using invPCR in 32 liver biopsy fragments (∼1 mg) containing, in total, ∼3 × 107 liver cells. Based on searching the analogous human genome, integration sites were found on all chromosomes except Y, ∼30% in known or predicted genes. However, no obvious association between the extent of clonal expansion and the integration site was apparent. This suggests that the integration site per se is not responsible for the outgrowth of large clones of hepatocytes.


2021 ◽  
Vol 27 (1) ◽  
pp. 207-218
Author(s):  
Jeong Won Jang ◽  
Jin Seoub Kim ◽  
Hye Seon Kim ◽  
Kwon Yong Tak ◽  
Heechul Nam ◽  
...  

Background/Aims: The role of hepatitis B virus (HBV) integration into the host genome in hepatocarcinogenesis following hepatitis B surface antigen (HBsAg) seroclearance remains unknown. Our study aimed to investigate and characterize HBV integration events in chronic hepatitis B (CHB) patients who developed hepatocellular carcinoma (HCC) after HBsAg seroclearance.<br/>Methods: Using probe-based HBV capturing followed by next-generation sequencing technology, HBV integration was examined in 10 samples (seven tumors and three non-tumor tissues) from seven chronic carriers who developed HCC after HBsAg loss. Genomic locations and patterns of HBV integration were investigated.<br/>Results: HBV integration was observed in six patients (85.7%) and eight (80.0%) of 10 tested samples. HBV integration breakpoints were detected in all of the non-tumor (3/3, 100%) and five of the seven (71.4%) tumor samples, with an average number of breakpoints of 4.00 and 2.43, respectively. Despite the lower total number of tumoral integration breakpoints, HBV integration sites in the tumors were more enriched within the genic area. In contrast, non-tumor tissues more often showed intergenic integration. Regarding functions of the affected genes, tumoral genes with HBV integration were mostly associated with carcinogenesis. At enrollment, patients who did not remain under regular HCC surveillance after HBsAg seroclearance had a large HCC, while those on regular surveillance had a small HCC.<br/>Conclusions: The biological functions of HBV integration are almost comparable between HBsAg-positive and HBsAgserocleared HCCs, with continuing pro-oncogenic effects of HBV integration. Thus, ongoing HCC surveillance and clinical management should continue even after HBsAg seroclearance in patients with CHB.


2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Yi Yin ◽  
Mingyue Xue ◽  
Lingen Shi ◽  
Tao Qiu ◽  
Derun Xia ◽  
...  

Objective. To establish a machine learning model for identifying patients coinfected with hepatitis B virus (HBV) and human immunodeficiency virus (HIV) through two sexual transmission routes in Jiangsu, China. Methods. A total of 14197 HIV cases transmitted by homosexual and heterosexual routes were recruited. After data processing, 12469 cases (HIV and HBV, 1033; HIV, 11436) were left for further analysis, including 7849 cases with homosexual transmission and 4620 cases with heterosexual transmission. Univariate logistic regression was used to select variables with significant P value and odds ratio for multivariable analysis. In homosexual transmission and heterosexual transmission groups, 10 and 6 variables were selected, respectively. For identifying HIV individuals coinfected with HBV, a machine learning model was constructed with four algorithms, including Decision Tree, Random Forest, AdaBoost with decision tree (AdaBoost), and extreme gradient boosting decision tree (XGBoost). The detective value of each variable was calculated using the optimal machine learning algorithm. Results. AdaBoost algorithm showed the highest efficiency in both transmission groups (homosexual transmission group: accuracy = 0.928 , precision = 0.915 , recall = 0.944 , F − 1 = 0.930 , and AUC = 0.96 ; heterosexual transmission group: accuracy = 0.892 , precision = 0.881 , recall = 0.905 , F − 1 = 0.893 , and AUC = 0.98 ). Calculated by AdaBoost algorithm, the detective value of PLA was the highest in homosexual transmission group, followed by CR, AST, HB, ALT, TBIL, leucocyte, age, marital status, and treatment condition; in the heterosexual transmission group, the detective value of PLA was the highest (consistent with the condition in the homosexual group), followed by ALT, AST, TBIL, leucocyte, and symptom severity. Conclusions. The univariate logistics regression combined with the AdaBoost algorithm could accurately screen the risk factors of HBV in HIV coinfection without invasive testing. Further studies are needed to evaluate the utility and feasibility of this model in various settings.


2021 ◽  
Author(s):  
Jérôme Tubiana ◽  
Dina Schneidman-Duhovny ◽  
Haim J. Wolfson

Predicting the functional sites of a protein from its structure, such as the binding sites of small molecules, other proteins or antibodies sheds light on its function in vivo. Currently, two classes of methods prevail: Machine Learning (ML) models built on top of handcrafted features and comparative modeling. They are respectively limited by the expressivity of the handcrafted features and the availability of similar proteins. Here, we introduce ScanNet, an end-to-end, interpretable geometric deep learning model that learns features directly from 3D structures. ScanNet builds representations of atoms and amino acids based on the spatio-chemical arrangement of their neighbors. We train ScanNet for detecting protein-protein and protein-antibody binding sites, demonstrate its accuracy - including for unseen protein folds - and interpret the filters learned. Finally, we predict epitopes of the SARS-CoV-2 spike protein, validating known antigenic regions and predicting previously uncharacterized ones. Overall, ScanNet is a versatile, powerful, and interpretable model suitable for functional site prediction tasks. A webserver for ScanNet is available from http://bioinfo3d.cs.tau.ac.il/ScanNet/


1989 ◽  
Vol 9 (4) ◽  
pp. 1804-1809 ◽  
Author(s):  
R Ben-Levy ◽  
O Faktor ◽  
I Berger ◽  
Y Shaul

An 83-base-pair-long hepatitis B virus DNA fragment efficiently activates the transcription of the heterologous globin gene promoter. This fragment contains binding sites for at least four distinct cellular factors termed E, TGT3, EP, and NF-I. E is a positively acting factor, responsive to phorbol ester. EP is apparently identical to the factor EF-C that binds to the polyomavirus enhancer. The conservation of the binding site sequences for most of these factors in the genomes of other members of the hepadnavirus family suggests that these viruses share common enhancer elements.


Sign in / Sign up

Export Citation Format

Share Document