Current Bioinformatics
Latest Publications


TOTAL DOCUMENTS

1042
(FIVE YEARS 372)

H-INDEX

28
(FIVE YEARS 7)

Published By Bentham Science

1574-8936

2022 ◽  
Vol 17 ◽  
Author(s):  
Boyu Pan ◽  
Chen Huang ◽  
Yafei Xia ◽  
Cuicui Zhang ◽  
Bole Li ◽  
...  

Background: Nowadays, non-small cell lung cancer (NSCLC) is a common and highly fatal malignancy in worldwide. Therefore, to identify the potential prognostic markers and therapeutic targets is urgent for patients. Objective: This study aims to find hub targets associated with NSCLC using multiple databases. Methods: Differentially expressed genes (DEGs) from Genome Expression Omnibus (GEO) cohorts were employed for the enrichment analyses of Gene Ontology (GO) terms and the Kyoto Encyclopedia of Genes and Genome (KEGG) pathways. Candidate key genes, filtered from the topological parameter 'Degree' and validated using the The Cancer Genome Atlas (TCGA) cohort, were analyzed for their association with clinicopathological features and prognosis of NSCLC. Meanwhile, immunohistochemical cohort analyses and biological verification were further evaluated. Results: A total of 146 DEGs were identified following data preprocessing, and a protein-protein interaction (PPI) systematic network was constructed based on them. The top ten candidate core genes were further extracted from the above PPI network by using 'Degree' value, among which COL1A1 was shown to associate with overall survival (OS) of NSCLC as determined by using the Kaplan-Meier analysis (p=0.028), and could serve as an independent prognostic factor for OS in NSCLC patients (HR, 0.814; 95% CI, 0.665-0.996; p=0.046). We then analyzed the clinical stages, PPI, mutations, potential biological functions and immune regulations of COL1A1 in NSCLC patients using multiple bioinformatics tools, including GEPIA, GeneMANIA, cBioPortal, GESA and TISIDB. Finally, we further experimentally validated the overexpression of COL1A1 in NSCLC samples, and found that inhibition of COL1A1 expression moderately sensitized NSCLC cells to cisplatin. Conclusion: Thus, our results show that COL1A1 may serve as a potential prognostic marker and therapeutic target in NSCLC.


2022 ◽  
Vol 17 ◽  
Author(s):  
Xinyi Liao ◽  
Xiaomei Gu ◽  
Dejun Peng

Background: Many malaria infections are caused by Plasmodium falciparum. Accurate classification of the proteins secreted by the malaria parasite, which are essential for the development of anti-malarial drugs, is essential. Objective: To accurately classify the proteins secreted by the malaria parasite. Methods: Therefore, in order to improve the accuracy of the prediction of plasmodium secreted proteins, we established a classification model MGAP-SGD. MonodikGap features (k=7) of the secreted proteins were extracted, and then the optimal features were selected by the AdaBoost method. Finally, based on the optimal set of secreted proteins, the model was used to predict the secreted proteins using the stochastic gradient descent (SGD) algorithm. Results: Our model uses a 10-fold cross-validation set and independent test set in the stochastic gradient descent (SGD) classifier to validate the model, and the accuracy rates are 98.5859% and 97.973%, respectively. Conclusion: This also fully proves that the effectiveness and robustness of the prediction results of the MGAP-SGD model can meet the prediction needs of the secreted proteins of plasmodium.


2022 ◽  
Vol 17 (1) ◽  
pp. 1-1
Author(s):  
Chaeyoung Lee
Keyword(s):  


2021 ◽  
Vol 17 ◽  
Author(s):  
Jingyu Lee ◽  
Myeong-Sang Yu ◽  
Dokyun Na

Background: Drug-induced liver injury (DILI) is a leading cause of drug failure, accounting for nearly 20% of drug withdrawal. Thus, there has been a great demand for in silico DILI prediction models for successful drug discovery. To date, various models have been developed for DILI prediction; however, building an accurate model for practical use in drug discovery remains challenging. Methods: We constructed an ensemble model composed of three high-performance DILI prediction models to utilize the unique advantage of each machine learning algorithm. Results: The ensemble model exhibited high predictive performance, with an area under the curve of 0.88, sensitivity of 0.83, specificity of 0.77, F1-score of 0.82, and accuracy of 0.80. When a test dataset collected from the literature was used to compare the performance of our model with publicly available DILI prediction models, our model achieved an accuracy of 0.77, sensitivity of 0.82, specificity of 0.72, and F1-score of 0.79, which were higher than those of the other DILI prediction models. As many published DILI prediction models are not available for public access, which hinders in silico drug discovery, we made our DILI prediction model publicly accessible (http://ssbio.cau.ac.kr/software/dili/). Conclusion: We expect that our ensemble model may facilitate advancements in drug discovery by providing a highly predictive model and reducing the drug withdrawal rate.


2021 ◽  
Vol 17 ◽  
Author(s):  
Hui Zhang ◽  
Qidong Liu ◽  
Xiaoru Sun ◽  
Yaru Xu ◽  
Yiling Fang ◽  
...  

Background: The pathophysiology of Alzheimer's disease (AD) is still not fully studied. Objective: This study aimed to explore the differently expressed key genes in AD and build a predictive model of diagnosis and treatment. Methods: Gene expression data of the entorhinal cortex of AD, asymptomatic AD, and control samples from the GEO database were analyzed to explore the relevant pathways and key genes in the progression of AD. Differentially expressed genes between AD and the other two groups in the module were selected to identify biological mechanisms in AD through KEGG and PPI network analysis in Metascape. Furthermore, genes with a high connectivity degree by PPI network analysis were selected to build a predictive model using different machine learning algorithms. Besides, model performance was tested with five-fold cross-validation to select the best fitting model. Results: A total of 20 co-expression gene clusters were identified after the network was constructed. Module 1 (in black) and module 2 (in royal blue) were most positively and negatively correlated with AD, respectively. Total 565 genes in module 1 and 215 genes in module 2, respectively, overlapped in two differentially expressed genes lists. They were enriched in the G protein-coupled receptor signaling pathway, immune-related processes, and so on. 11 genes were screened by using lasso logistic regression, and they were considered to play an important role in predicting AD samples. The model built by the support vector machine algorithm with 11 genes showed the best performance. Conclusion: This result shed light on the diagnosis and treatment of AD.


2021 ◽  
Vol 17 ◽  
Author(s):  
Ke Yan ◽  
Hongwu Lv ◽  
Yichen Guo ◽  
Jie Wen ◽  
Bin Liu

Background: Therapeutic peptide prediction is critical for drug development and therapy. Researchers have been studying this essential task, developing several computational methods to identify different therapeutic peptide types. Objective: Most predictors are the specific methods for certain peptides. Currently, developing methods to predict the presence of multiple peptides remains a challenging problem. Moreover, it is still challenging to combine different features to make the therapeutic prediction. Method: In this paper, we proposed a new ensemble method TP-MV for general therapeutic peptide recognition. TP-MV is developed using the stacking framework in conjunction with the KNN, SVM, ET, RF, and XGB. Then TP-MV constructs a multi-view learning model as meta-classifiers to extract the discriminative feature for different peptides. Results: In the experiment, the proposed method outperforms the other existing methods on the benchmark datasets, indicating that the proposed method has the ability to predict multiple therapeutic peptides simultaneously. Conclusion: The TP-MV is a useful tool for predicting therapeutic peptides.


2021 ◽  
Vol 16 ◽  
Author(s):  
Rania Hamdy ◽  
Yasser M.K. Omar ◽  
Fahima A. Maghraby

Background: Gene regulation is a complex and a dynamic process that not only depends on the DNA sequence of genes, but also is influenced by a key factor called Epigenetic Mechanisms. This factor along with other factors contributes to change the behavior of DNA. While these factors cannot affect the structure of DNA, they can control the behavior of DNA by turning genes "on" or "off" that leads to determine which proteins are transcribed. Objective: This paper will focus on histone modifications mechanism, histones are the group of proteins that bundle the DNA into a structural form called nucleosomes (coils); how DNA wraps with these histone proteins describes how gene can be accessed to express or not. When histones bound tightly to DNA, that make the gene cannot be expressed and vise verse. It is important to know Histone Modifications’ combinatorial patterns, and how these combinatorial patterns can affect and work together to control the process of gene expression. Methods: In this paper, ConvChrome deep learning methodologies are proposed for predicting the gene expression behavior from Histone modifications data as an input to use more than one Convolutional Network model, this happens in order to recognize patterns of histones signals and to interpret their spatial relationship arranged on chromatin structure to give insights into regulatory signatures of histone modifications. Results and Conclusion: The experiments results show that ConvChrome achieved 88.741 % in terms of Area under the Curve (AUC) score, which is an outstanding improvement over the baseline for gene expression classification prediction task from combinatorial interactions among five histone modifications on 56 different cell-types.


2021 ◽  
Vol 16 ◽  
Author(s):  
Yanjuan Cao ◽  
Qiang Zhang ◽  
Zuwei Yan ◽  
Xiaoqing Zhao

Background: Introns are ubiquitous in pre-mRNA but are often overlooked. They also play an important role in the regulation of gene expression. Objective and Method : We mainly use the improved Smith-Waterman local alignment approach to compare the optimal matching regions between introns and mRNA sequences in Caenorhabditis elegans (C. elegans) genes with high and low expression. Results We found that the relative matching frequency distributions of all genes lie exactly between highly and lowly expressed genes, indicating that introns in highly and lowly expressed genes have different biological functions. Highly expressed genes have higher matching strengths on mRNA sequences than genes expressed at lower levels; the remarkably matched regions appear in UTR regions, particularly in the 3'UTR. The optimal matching frequency distributions have obvious differences in functional regions of the translation initiation and termination sites in highly and lowly expressed genes. The mRNA sequences with CpG islands tend to have stronger relative matching frequency distributions, especially in highly expressed genes. Additionally, the sequence characteristics of the optimal matched segments are consistent with those of the miRNAs, and they are considered a type of functional RNA segment. Conclusion: Introns in highly and lowly expressed genes contribute to the recognition translation initiation sites and translation termination sites. Moreover, our results suggest that the potential matching relationships between introns and mRNA sequences in highly and lowly expressed genes are significantly different and indicate that the matching strength correlates with the ability of introns to enhance gene expression.


2021 ◽  
Vol 16 ◽  
Author(s):  
Jun Wu ◽  
Guoping Yang ◽  
Lulu Qu ◽  
Nan Han

Background: with the increasing quality of life of people, people begin to have more time and energy to pay attention to their own health problems. Among them, diabetes, as one of the most common and fastest-growing diseases, has attracted widespread attention from experts in bioinformatics. People of different ages all over the world suffer from diabetes which can shorten the life span of patients. Diabetes has a significant impact on human health, so that the accuracy of the initial diagnosis becomes essential. Diabetes can bring some serious complications, especially in the elderly, such as cardiovascular and cerebrovascular diseases, stroke, and multiple organ damage. The initial diagnosis of diabetes can reduce the possibility of deterioration. Identifying and analyzing potential risk factors for different physical attributes can help diagnose the prevalence of diabetes. The more accurate the prevalence, the more likely it is to reduce the incidence of complications. Methods: In this paper, we use the open source NHANES data set to analyze and determine potential risk factors relevant to diabetes by an improved version of Logistic Regression, SVM, and other improved machine learning algorithms. Results: Experimental results show that the improved version of Random Forest has the best effect, with a classification accuracy of 92%, and it can be found that age, blood-related diabetes, high blood pressure, cholesterol and BMI are the most important risk factors related to diabetes. Conclusion: Through the proposed method of machine learning, we can cope with the class imbalance and outlier detection problems.


Sign in / Sign up

Export Citation Format

Share Document