scholarly journals Identification of lung cancer gene markers through kernel maximum mean discrepancy and information entropy

2019 ◽  
Vol 12 (S8) ◽  
Author(s):  
Zhixun Zhao ◽  
Hui Peng ◽  
Xiaocai Zhang ◽  
Yi Zheng ◽  
Fang Chen ◽  
...  

Abstract Background The early diagnosis of lung cancer has been a critical problem in clinical practice for a long time and identifying differentially expressed gene as disease marker is a promising solution. However, the most existing gene differential expression analysis (DEA) methods have two main drawbacks: First, these methods are based on fixed statistical hypotheses and not always effective; Second, these methods can not identify a certain expression level boundary when there is no obvious expression level gap between control and experiment groups. Methods This paper proposed a novel approach to identify marker genes and gene expression level boundary for lung cancer. By calculating a kernel maximum mean discrepancy, our method can evaluate the expression differences between normal, normal adjacent to tumor (NAT) and tumor samples. For the potential marker genes, the expression level boundaries among different groups are defined with the information entropy method. Results Compared with two conventional methods t-test and fold change, the top average ranked genes selected by our method can achieve better performance under all metrics in the 10-fold cross-validation. Then GO and KEGG enrichment analysis are conducted to explore the biological function of the top 100 ranked genes. At last, we choose the top 10 average ranked genes as lung cancer markers and their expression boundaries are calculated and reported. Conclusion The proposed approach is effective to identify gene markers for lung cancer diagnosis. It is not only more accurate than conventional DEA methods but also provides a reliable method to identify the gene expression level boundaries.

2013 ◽  
Vol 123 (12) ◽  
pp. 672-679 ◽  
Author(s):  
Daria Domańska ◽  
Adam Antczak ◽  
Dorota Pastuszak‑Lewandoska ◽  
Paweł Górski ◽  
Jacek Kordiak ◽  
...  

2020 ◽  
Vol 21 (S18) ◽  
Author(s):  
Sudipta Acharya ◽  
Laizhong Cui ◽  
Yi Pan

Abstract Background In recent years, to investigate challenging bioinformatics problems, the utilization of multiple genomic and proteomic sources has become immensely popular among researchers. One such issue is feature or gene selection and identifying relevant and non-redundant marker genes from high dimensional gene expression data sets. In that context, designing an efficient feature selection algorithm exploiting knowledge from multiple potential biological resources may be an effective way to understand the spectrum of cancer or other diseases with applications in specific epidemiology for a particular population. Results In the current article, we design the feature selection and marker gene detection as a multi-view multi-objective clustering problem. Regarding that, we propose an Unsupervised Multi-View Multi-Objective clustering-based gene selection approach called UMVMO-select. Three important resources of biological data (gene ontology, protein interaction data, protein sequence) along with gene expression values are collectively utilized to design two different views. UMVMO-select aims to reduce gene space without/minimally compromising the sample classification efficiency and determines relevant and non-redundant gene markers from three cancer gene expression benchmark data sets. Conclusion A thorough comparative analysis has been performed with five clustering and nine existing feature selection methods with respect to several internal and external validity metrics. Obtained results reveal the supremacy of the proposed method. Reported results are also validated through a proper biological significance test and heatmap plotting.


2010 ◽  
Vol 27 ◽  
pp. S66
Author(s):  
M. Piechota ◽  
A. Banaszewska ◽  
E. Guzniczak ◽  
G. Rosinski ◽  
T. Siminiak ◽  
...  

Gene ◽  
2021 ◽  
pp. 145862
Author(s):  
Lu-Qiang Zhang ◽  
Jun-Jie Liu ◽  
Li Liu ◽  
Guo-Liang Fan ◽  
Yan-Nan Li ◽  
...  

Author(s):  
Rajnics P ◽  
◽  
Kellner A ◽  
Nagy F ◽  
Alföldi V ◽  
...  

Purpose: Elevated level of Lipocalin-2 (LCN2), a new acute phase adipokine, was described after ischemic stroke. A number of researchers feel as though that LCN2 originated from the infiltrating neutrophils and other cells in brain after stroke. Others measured elevated LCN2 expression in arteriosclerotic plaque. Therefore we have investigated LCN2 relative gene expression level of blood neutrophil granulocytes in patients with ischemic stroke to assess if elevated LCN2 is the cause or consequence of ischemic stroke. Methods: Laboratory and anamnestic data were collected, which could have a role in development of thrombo-embolic events in patients with ischemic stroke. RNA based method was used to evaluate the relative gene expression level of LCN2. We calculated Odds Ratio (OR) and Confidence Interval (CI) for the association between LCN2 and ischemic stroke. Results: 34 samples were available for evaluation. The LCN 2 relative gene expression level was decreased in 12 cases. In this group, 91% of patients have Atrial Fibrillation (AF) at the time of hospitalisation. The mean LCN2 relative gene expression value was 64.25% (ranges: 34%-115%) in patients with AF. It was significantly lower than in patients with normal sinus rhythm (409.2%; ranges: 127%-1127%; p=0.0003). The elevated LCN2 relative gene expression level significantly (p=0.012) increases the risk of stroke (OR: 12.6) independently from other factors. Conclusions: High LCN2 expression level seems to have strong positive predictive value on ischemic stroke, and may be useful in thrombotic risk stratification of plaque vulnerability in these patients.


Sign in / Sign up

Export Citation Format

Share Document