Machine Learning in Cancer Research With Applications in Colon Cancer and Big Data Analysis - Advances in Medical Technologies and Clinical Practice
Latest Publications


TOTAL DOCUMENTS

13
(FIVE YEARS 13)

H-INDEX

0
(FIVE YEARS 0)

Published By IGI Global

9781799873167, 9781799873174

Generally, classification accuracy is very important to gene processing and selection and cancer classification. It is needed to achieve better cancer treatments and improve medical drug assignments. However, the time complexity analysis will enhance the application's significance. To answer the research questions in Chapter 1, several case studies have been implemented (see Chapters 4 and 5), each was essential to sustain the methodologies discussed in Chapter 3. The study used a colon-cancer dataset comprising 2000 genes. The best search algorithm, GA, showed high performance with a good efficient time complexity. However, both DTs and SVMs showed the best classification contribution with reference to performance accuracy and time efficiency. However, it is difficult to apply a completely fair comparative study because existing algorithms and methods were tested by different authors to reflect the effectiveness and powerful of their own methods.


This chapter describes and discusses a combination of research methodologies (e.g., experimental, theoretical, and systems design) used in this research, allowing us to eliminate as much as possible every limitation that can be encountered with the individual methods themselves. For example, experimental research methodology has a limitation because the experiments are performed mainly in a controlled environment and might not reflect properly some practices performed ‘in the wild'. But combining this with some survey and prototype (system's) design reduced such limitations. The knowledge gained from carrying out preliminary experimentation is used in the next chapter to design and model the Hybrid-AutoML system.


This chapter describes several methodologies and proposed models used to examine the accuracy and efficiency of high-performance colon-cancer feature selection and classification algorithms to solve the problems identified in Chapter 2. An elaboration of the diverse methods of gene/feature selection algorithms and the related classification algorithms implemented throughout this study are presented. A prototypical methodology blueprint for each experiment is developed to answer the research questions in Chapter 1. Each system model is also presented, and the measures used to validate the performance of the model's outcome are discussed.


In this chapter, the design of each proposed case study model mentioned in Chapter 3 is presented with their different experimental procedures. The chapter includes the data preparation, suitable parameters and data pre-processing, and detailed design of two case studies. Case 1: examining the accuracy and efficiency (time complexity) of high-performance gene selection and cancer classification algorithms; Case 2: A two-stage hybrid multi-filter feature selection method for high colon-cancer classification. It shows the experimental setup and environment and the description of the hardware and software components used.


This chapter focuses on the results produced from each case study experiment. For case one, the experiments were conducted in three phases. Phase one implemented GA, PSO, and IG as the gene/feature selection algorithms over the entire dataset. Phase =two2 utilised the original dataset to implement only the cancer classification algorithms without involving any gene/feature selection algorithms. Four recognised classification algorithms are employed: SVM, NB, GP, and DT. The third phase implemented the combined approach of gene selection and cancer classification algorithms. The results of these phases are presented in the next subsections. For case two, these experiments were implemented in two phases. Phase one implemented the classification algorithms over the features selected by the hybridised selection algorithms (GA+IG), whereas Phase two classified the features using the proposed two-stage multifilter selection system. In this section, the results are presented as follows


This chapter presents a thorough background and deep literature review of the current topic of study. It also presents and defines the key concepts utilised throughout this investigation. It consists of ten sections: (1) a background on bioinformatics, (2) a discussion of colon cancer, (3) an overview of the microarray technology that is used to extract the dataset, (4) an overview of the colon cancer dataset, (5) a review of the most prevalent algorithms employed for gene selection and cancer classification, (6) a presentation of related works from the literature, (7) identification of feature selection approaches and procedures, (8) an investigation of the ML concept, (9) a review of algorithm efficiency and time complexity analysis, and (10) identification of current problems in the research area.


This chapter addresses that the various use cases have proved that the aims and contributions of this research to conceptualise, design, and develop a scalable and flexible toolkit for automatic big data ML mode and model selection, on single or multi-varying datasets has been achieved. A major benefit of the hybrid-autoML toolkit is that it reduces the time data scientists and researchers in the field spend, searching through the algorithm selections and hyper parameter space. This advantage was discussed in Section 5.2 where the authors compared the hybrid-autoML tool with autoWeka on about 35 datasets using measures such as accuracy, mean absolute error (MAE), and time.


In this chapter, the authors use a set of use cases to evaluate how the hybrid autoML system is used to achieve the goals set out in the aims and objectives of this research. The authors map each use case to their aims and contributions as outlined in Section 1.3 of this research. A performance comparison is also made between autoWeka and the hybrid autoML system on 33 datasets. The comparison is carried out based on three main evaluation metrics such as the percentage accuracy (or correlation coefficient where applicable), the mean absolute error (MAE), and the time (in seconds) spent building the model on training data. It is observed that the hybrid autoML system fully outperforms autoWeka with regards to the time spent on building models or finding the best algorithms in the first instance.


The purpose of this chapter is to discuss and analyse the results produced in Chapter 5. To evaluate the proposed models, this chapter compares the models with others existing in the literature. Additionally, the chapter discusses the evaluation measures used to validate the experimental results of Chapter 5. For example, from experiments, GA/DT demonstrated the highest average accuracy (92%) for classifying colon cancer, compared with other algorithms. PSO/DT presented 89%, PSO/SVM presented 89%, and IG/DT presented 89%, demonstrating very good classification accuracy. PSO/NB presented 57% and GA/NB presented 58%: less classification accuracy. Table ‎6.1 lists all accuracies resulting from experiments of case study one, as applied to the full data set. There are 45 algorithmic incorporation methods that have accuracy above 80% when applied to the full dataset. One algorithm presents an accuracy of 92%. Nine others scored below 60%.


This chapter provides some background information, highlights the motivations and problems resolved, and then discusses the aims and contributions of the research conducted. Over the past decades, there has been an explosion in the volume, variety, and velocity of data. Offering effective solutions as a resolution of some major problems this explosion brings has become ever more important. One such solution is big data machine learning (ML) classification or clustering. However, with the solutions offered, we are faced with several problems that include but are not limited to the context.


Sign in / Sign up

Export Citation Format

Share Document