A Novel Approach to Gene Selection of Leukemia Dataset Using Different Clustering Methods

Author(s):  
P. Prasath ◽  
K. Perumal ◽  
K. Thangavel ◽  
R. Manavalan
2021 ◽  
Vol 22 (S6) ◽  
Author(s):  
Rui-Yi Li ◽  
Jihong Guan ◽  
Shuigeng Zhou

Abstract Background The rapid development of single-cell RNA sequencing (scRNA-seq) enables the exploration of cell heterogeneity, which is usually done by scRNA-seq data clustering. The essence of scRNA-seq data clustering is to group cells by measuring the similarities among genes/transcripts of cells. And the selection of features for cell similarity evaluation is of great importance, which will significantly impact clustering effectiveness and efficiency. Results In this paper, we propose a novel method called CaFew to select genes based on cluster-aware feature weighting. By optimizing the clustering objective function, CaFew obtains a feature weight matrix, which is further used for feature selection. The genes have large weights in at least one cluster or the genes whose weights vary greatly in different clusters are selected. Experiments on 8 real scRNA-seq datasets show that CaFew can obviously improve the clustering performance of existing scRNA-seq data clustering methods. Particularly, the combination of CaFew with SC3 achieves the state-of-art performance. Furthermore, CaFew also benefits the visualization of scRNA-seq data. Conclusion CaFew is an effective scRNA-seq data clustering method due to its gene selection mechanism based on cluster-aware feature weighting, and it is a useful tool for scRNA-seq data analysis.


Author(s):  
Behnam Jahangiri ◽  
Punyaslok Rath ◽  
Hamed Majidifard ◽  
William G. Buttlar

Various agencies have begun to research and introduce performance-related specifications (PRS) for the design of modern asphalt paving mixtures. The focus of most recent studies has been directed toward simplified cracking test development and evaluation. In some cases, development and validation of PRS has been performed, building on these new tests, often by comparison of test values to accelerated pavement test studies and/or to limited field data. This study describes the findings of a comprehensive research project conducted at Illinois Tollway, leading to a PRS for the design of mainline and shoulder asphalt mixtures. A novel approach was developed, involving the systematic establishment of specification requirements based on: 1) selection of baseline values based on minimally acceptable field performance thresholds; 2) elevation of thresholds to account for differences between short-term lab aging and expected long-term field aging; 3) further elevation of thresholds to account for variability in lab testing, plus variability in the testing of field cores; and 4) final adjustment and rounding of thresholds based on a consensus process. After a thorough evaluation of different candidate cracking tests in the course of the project, the Disk-shaped Compact Tension—DC(T)—test was chosen to be retained in the Illinois Tollway PRS and to be presented in this study for the design of crack-resistant mixtures. The DC(T) test was selected because of its high degree of correlation with field results and its excellent repeatability. Tailored Hamburg rut depth and stripping inflection point thresholds were also established for mainline and shoulder mixes.


IEEE Access ◽  
2021 ◽  
Vol 9 ◽  
pp. 64895-64905
Author(s):  
Essam H. Houssein ◽  
Diaa Salama Abdelminaam ◽  
Hager N. Hassan ◽  
Mustafa M. Al-Sayed ◽  
Emad Nabil

2017 ◽  
Vol 56 (5) ◽  
pp. 959-972 ◽  
Author(s):  
Christian Krogh ◽  
Mathias H. Jungersen ◽  
Erik Lund ◽  
Esben Lindgaard

2018 ◽  
Vol 19 (1) ◽  
Author(s):  
Qin Chen ◽  
Shengping Qiu ◽  
Huanhuan Li ◽  
Chaolong Lin ◽  
Yong Luo ◽  
...  

2021 ◽  
Vol 12 ◽  
Author(s):  
Yuan Zhao ◽  
Zhao-Yu Fang ◽  
Cui-Xiang Lin ◽  
Chao Deng ◽  
Yun-Pei Xu ◽  
...  

In recent years, the application of single cell RNA-seq (scRNA-seq) has become more and more popular in fields such as biology and medical research. Analyzing scRNA-seq data can discover complex cell populations and infer single-cell trajectories in cell development. Clustering is one of the most important methods to analyze scRNA-seq data. In this paper, we focus on improving scRNA-seq clustering through gene selection, which also reduces the dimensionality of scRNA-seq data. Studies have shown that gene selection for scRNA-seq data can improve clustering accuracy. Therefore, it is important to select genes with cell type specificity. Gene selection not only helps to reduce the dimensionality of scRNA-seq data, but also can improve cell type identification in combination with clustering methods. Here, we proposed RFCell, a supervised gene selection method, which is based on permutation and random forest classification. We first use RFCell and three existing gene selection methods to select gene sets on 10 scRNA-seq data sets. Then, three classical clustering algorithms are used to cluster the cells obtained by these gene selection methods. We found that the gene selection performance of RFCell was better than other gene selection methods.


2021 ◽  
pp. 1-16
Author(s):  
Aikaterini Karanikola ◽  
Charalampos M. Liapis ◽  
Sotiris Kotsiantis

In short, clustering is the process of partitioning a given set of objects into groups containing highly related instances. This relation is determined by a specific distance metric with which the intra-cluster similarity is estimated. Finding an optimal number of such partitions is usually the key step in the entire process, yet a rather difficult one. Selecting an unsuitable number of clusters might lead to incorrect conclusions and, consequently, to wrong decisions: the term “optimal” is quite ambiguous. Furthermore, various inherent characteristics of the datasets, such as clusters that overlap or clusters containing subclusters, will most often increase the level of difficulty of the task. Thus, the methods used to detect similarities and the parameter selection of the partition algorithm have a major impact on the quality of the groups and the identification of their optimal number. Given that each dataset constitutes a rather distinct case, validity indices are indicators introduced to address the problem of selecting such an optimal number of clusters. In this work, an extensive set of well-known validity indices, based on the approach of the so-called relative criteria, are examined comparatively. A total of 26 cluster validation measures were investigated in two distinct case studies: one in real-world and one in artificially generated data. To ensure a certain degree of difficulty, both real-world and generated data were selected to exhibit variations and inhomogeneity. Each of the indices is being deployed under the schemes of 9 different clustering methods, which incorporate 5 different distance metrics. All results are presented in various explanatory forms.


2019 ◽  
Vol 57 (1) ◽  
pp. 5-16 ◽  
Author(s):  
Anamarija Štafa ◽  
Andrea Pranklin ◽  
Ivan Krešimir Svetec ◽  
Božidar Šantek ◽  
Marina Svetec Miklenić ◽  
...  

Bioethanol production from lignocellulosic hydrolysates requires a producer strain that tolerates both the presence of growth and fermentation inhibitors and high ethanol concentrations. Therefore, we constructed heterozygous intraspecies hybrid diploids of Saccharomyces cerevisiae by crossing two natural S. cerevisiae isolates, YIIc17_E5 and UWOPS87-2421, a good ethanol producer found in wine and a strain from the flower of the cactus Opuntia megacantha resistant to inhibitors found in lignocellulosic hydrolysates, respectively. Hybrids grew faster than parental strains in the absence and in the presence of acetic and levulinic acids and 2-furaldehyde, inhibitors frequently found in lignocellulosic hydrolysates, and the overexpression of YAP1 gene increased their survival. Furthermore, although originating from the same parental strains, hybrids displayed different fermentative potential in a CO2 production test, suggesting genetic variability that could be used for further selection of desirable traits. Therefore, our results suggest that the construction of intraspecies hybrids coupled with the use of genetic engineering techniques is a promising approach for improvement or development of new biotechnologically relevant strains of S. cerevisiae. Moreover, it was found that the success of gene targeting (gene targeting fidelity) in natural S. cerevisiae isolates (YIIc17_E5α and UWOPS87-2421α) was strikingly lower than in laboratory strains and the most frequent off-targeting event was targeted chromosome duplication.


Sign in / Sign up

Export Citation Format

Share Document