An Overview on Bioinformatics

This chapter presents a thorough background and deep literature review of the current topic of study. It also presents and defines the key concepts utilised throughout this investigation. It consists of ten sections: (1) a background on bioinformatics, (2) a discussion of colon cancer, (3) an overview of the microarray technology that is used to extract the dataset, (4) an overview of the colon cancer dataset, (5) a review of the most prevalent algorithms employed for gene selection and cancer classification, (6) a presentation of related works from the literature, (7) identification of feature selection approaches and procedures, (8) an investigation of the ML concept, (9) a review of algorithm efficiency and time complexity analysis, and (10) identification of current problems in the research area.

Generally, classification accuracy is very important to gene processing and selection and cancer classification. It is needed to achieve better cancer treatments and improve medical drug assignments. However, the time complexity analysis will enhance the application's significance. To answer the research questions in Chapter 1, several case studies have been implemented (see Chapters 4 and 5), each was essential to sustain the methodologies discussed in Chapter 3. The study used a colon-cancer dataset comprising 2000 genes. The best search algorithm, GA, showed high performance with a good efficient time complexity. However, both DTs and SVMs showed the best classification contribution with reference to performance accuracy and time efficiency. However, it is difficult to apply a completely fair comparative study because existing algorithms and methods were tested by different authors to reflect the effectiveness and powerful of their own methods.


In this chapter, the design of each proposed case study model mentioned in Chapter 3 is presented with their different experimental procedures. The chapter includes the data preparation, suitable parameters and data pre-processing, and detailed design of two case studies. Case 1: examining the accuracy and efficiency (time complexity) of high-performance gene selection and cancer classification algorithms; Case 2: A two-stage hybrid multi-filter feature selection method for high colon-cancer classification. It shows the experimental setup and environment and the description of the hardware and software components used.


2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Biying Zhou ◽  
Behdad Arandian

Skin cancer is one of the most common types of cancers that is sometimes difficult for doctors and experts to diagnose. The noninvasive dermatoscopic method is a popular method for observing and diagnosing skin cancer. Because this method is based on ocular inference, the skin cancer diagnosis by the dermatologists is difficult, especially in the early stages of the disease. Artificial intelligence is a proper complementary tool that can be used alongside the experts to increase the accuracy of the diagnosis. In the present study, a new computer-aided method has been introduced for the diagnosis of the skin cancer. The method is designed based on combination of deep learning and a newly introduced metaheuristic algorithm, namely, Wildebeest Herd Optimization (WHO) Algorithm. The method uses an Inception convolutional neural network for the initial features’ extraction. Afterward, the WHO algorithm has been employed for selecting the useful features to decrease the analysis time complexity. The method is then performed to an ISIC-2008 skin cancer dataset. Final results of the feature selection based on the proposed WHO are compared with three other algorithms, and the results have indicated good results for the system. Finally, the total diagnosis system has been compared with five other methods to indicate its effectiveness against the studied methods. Final results showed that the proposed method has the best results than the comparative methods.


Author(s):  
Manpreet Kaur ◽  
Chamkaur Singh

Educational Data Mining (EDM) is an emerging research area help the educational institutions to improve the performance of their students. Feature Selection (FS) algorithms remove irrelevant data from the educational dataset and hence increases the performance of classifiers used in EDM techniques. This paper present an analysis of the performance of feature selection algorithms on student data set. .In this papers the different problems that are defined in problem formulation. All these problems are resolved in future. Furthermore the paper is an attempt of playing a positive role in the improvement of education quality, as well as guides new researchers in making academic intervention.


2020 ◽  
Vol 21 (S18) ◽  
Author(s):  
Sudipta Acharya ◽  
Laizhong Cui ◽  
Yi Pan

Abstract Background In recent years, to investigate challenging bioinformatics problems, the utilization of multiple genomic and proteomic sources has become immensely popular among researchers. One such issue is feature or gene selection and identifying relevant and non-redundant marker genes from high dimensional gene expression data sets. In that context, designing an efficient feature selection algorithm exploiting knowledge from multiple potential biological resources may be an effective way to understand the spectrum of cancer or other diseases with applications in specific epidemiology for a particular population. Results In the current article, we design the feature selection and marker gene detection as a multi-view multi-objective clustering problem. Regarding that, we propose an Unsupervised Multi-View Multi-Objective clustering-based gene selection approach called UMVMO-select. Three important resources of biological data (gene ontology, protein interaction data, protein sequence) along with gene expression values are collectively utilized to design two different views. UMVMO-select aims to reduce gene space without/minimally compromising the sample classification efficiency and determines relevant and non-redundant gene markers from three cancer gene expression benchmark data sets. Conclusion A thorough comparative analysis has been performed with five clustering and nine existing feature selection methods with respect to several internal and external validity metrics. Obtained results reveal the supremacy of the proposed method. Reported results are also validated through a proper biological significance test and heatmap plotting.


Sensors ◽  
2021 ◽  
Vol 21 (11) ◽  
pp. 3627
Author(s):  
Bo Jin ◽  
Chunling Fu ◽  
Yong Jin ◽  
Wei Yang ◽  
Shengbin Li ◽  
...  

Identifying the key genes related to tumors from gene expression data with a large number of features is important for the accurate classification of tumors and to make special treatment decisions. In recent years, unsupervised feature selection algorithms have attracted considerable attention in the field of gene selection as they can find the most discriminating subsets of genes, namely the potential information in biological data. Recent research also shows that maintaining the important structure of data is necessary for gene selection. However, most current feature selection methods merely capture the local structure of the original data while ignoring the importance of the global structure of the original data. We believe that the global structure and local structure of the original data are equally important, and so the selected genes should maintain the essential structure of the original data as far as possible. In this paper, we propose a new, adaptive, unsupervised feature selection scheme which not only reconstructs high-dimensional data into a low-dimensional space with the constraint of feature distance invariance but also employs ℓ2,1-norm to enable a matrix with the ability to perform gene selection embedding into the local manifold structure-learning framework. Moreover, an effective algorithm is developed to solve the optimization problem based on the proposed scheme. Comparative experiments with some classical schemes on real tumor datasets demonstrate the effectiveness of the proposed method.


2010 ◽  
Vol 9 ◽  
pp. CIN.S3794 ◽  
Author(s):  
Xiaosheng Wang ◽  
Osamu Gotoh

Gene selection is of vital importance in molecular classification of cancer using high-dimensional gene expression data. Because of the distinct characteristics inherent to specific cancerous gene expression profiles, developing flexible and robust feature selection methods is extremely crucial. We investigated the properties of one feature selection approach proposed in our previous work, which was the generalization of the feature selection method based on the depended degree of attribute in rough sets. We compared the feature selection method with the established methods: the depended degree, chi-square, information gain, Relief-F and symmetric uncertainty, and analyzed its properties through a series of classification experiments. The results revealed that our method was superior to the canonical depended degree of attribute based method in robustness and applicability. Moreover, the method was comparable to the other four commonly used methods. More importantly, the method can exhibit the inherent classification difficulty with respect to different gene expression datasets, indicating the inherent biology of specific cancers.


2014 ◽  
Vol 2014 ◽  
pp. 1-9 ◽  
Author(s):  
Hongyan Zhang ◽  
Lanzhi Li ◽  
Chao Luo ◽  
Congwei Sun ◽  
Yuan Chen ◽  
...  

In efforts to discover disease mechanisms and improve clinical diagnosis of tumors, it is useful to mine profiles for informative genes with definite biological meanings and to build robust classifiers with high precision. In this study, we developed a new method for tumor-gene selection, the Chi-square test-based integrated rank gene and direct classifier (χ2-IRG-DC). First, we obtained the weighted integrated rank of gene importance from chi-square tests of single and pairwise gene interactions. Then, we sequentially introduced the ranked genes and removed redundant genes by using leave-one-out cross-validation of the chi-square test-based Direct Classifier (χ2-DC) within the training set to obtain informative genes. Finally, we determined the accuracy of independent test data by utilizing the genes obtained above withχ2-DC. Furthermore, we analyzed the robustness ofχ2-IRG-DC by comparing the generalization performance of different models, the efficiency of different feature-selection methods, and the accuracy of different classifiers. An independent test of ten multiclass tumor gene-expression datasets showed thatχ2-IRG-DC could efficiently control overfitting and had higher generalization performance. The informative genes selected byχ2-IRG-DC could dramatically improve the independent test precision of other classifiers; meanwhile, the informative genes selected by other feature selection methods also had good performance inχ2-DC.


2020 ◽  
Vol 34 (03) ◽  
pp. 2742-2749
Author(s):  
Ringo Baumann ◽  
Gerhard Brewka ◽  
Markus Ulbricht

In his seminal 1995 paper, Dung paved the way for abstract argumentation, a by now major research area in knowledge representation. He pointed out that there is a problematic issue with self-defeating arguments underlying all traditional semantics. A self-defeat occurs if an argument attacks itself either directly or indirectly via an odd attack loop, unless the loop is broken up by some argument attacking the loop from outside. Motivated by the fact that such arguments represent self-contradictory or paradoxical arguments, he asked for reasonable semantics which overcome the problem that such arguments may indeed invalidate any argument they attack. This paper tackles this problem from scratch. More precisely, instead of continuing to use previous concepts defined by Dung we provide new foundations for abstract argumentation, so-called weak admissibility and weak defense. After showing that these key concepts are compatible as in the classical case we introduce new versions of the classical Dung-style semantics including complete, preferred and grounded semantics. We provide a rigorous study of these new concepts including interrelationships as well as the relations to their Dung-style counterparts. The newly introduced semantics overcome the issue with self-defeating arguments, and they are semantically insensitive to syntactic deletions of self-attacking arguments, a special case of self-defeat.


Sign in / Sign up

Export Citation Format

Share Document