breast cancer dataset
Recently Published Documents


TOTAL DOCUMENTS

177
(FIVE YEARS 118)

H-INDEX

9
(FIVE YEARS 4)

2022 ◽  
Vol 12 (1) ◽  
pp. 0-0

This study presents an intelligent information retrieval system that will effectively extract useful information from breast cancer datasets and utilized that information to build a classification model. The proposed model will reduce the missed cancer rate by providing a comprehensive decision support to the radiologist. The model is built on two datasets, Wisconsin Breast Cancer Dataset (WBCD) and 365 free text mammography reports from a hospital. Effective pre-processing techniques including filling missing values with regression, an effective Natural Language Processing (NLP) Parser is developed to handle free text mammography reports, balancing the dataset with Synthetic Minority Oversampling (SMOTE) was applied to prepare the dataset for learning. Most relevant features were selected with the help of filter method and tf-idf scores. K-NN and SGD classifiers are optimized with optimum value of k for K-NN and hyper tuning the SGD parameters with grid search technique.


Author(s):  
Tsehay Admassu Assegie ◽  
Ravulapalli Lakshmi Tulasi ◽  
Vadivel Elanangai ◽  
Napa Komal Kumar

Breast cancer is the most common type of cancer occurring mostly in females. In recent years, many researchers have devoted to automate diagnosis of breast cancer by developing different machine learning model. However, the quality and quantity of feature in breast cancer diagnostic dataset have significant effect on the accuracy and efficiency of predictive model. Feature selection is effective method for reducing the dimensionality and improving the accuracy of predictive model. The use of feature selection is to determine feature required for training model and to remove irrelevant and duplicate feature. Duplicate feature is a feature that is highly correlated to another feature. The objective of this study is to conduct experimental research on three different feature selection methods for breast cancer prediction. Sequential, embedded and chi-square feature selection are implemented using breast cancer diagnostic dataset. The study compares the performance of sequential embedded and chi-square feature selection on test set. The experimental result evidently shows that sequential feature selection outperforms as compared to chi-square (X<sup>2</sup>) statistics and embedded feature selection. Overall, sequential feature selection achieves better accuracy of 98.3% as compared to chi-square (X<sup>2</sup>) statistics and embedded feature selection.


Author(s):  
A. B Yusuf ◽  
R. M Dima ◽  
S. K Aina

Breast cancer is the second most commonly diagnosed cancer in women throughout the world. It is on the rise, especially in developing countries, where the majority of cases are discovered late. Breast cancer develops when cancerous tumors form on the surface of the breast cells. The absence of accurate prognostic models to assist physicians recognize symptoms early makes it difficult to develop a treatment plan that would help patients live longer. However, machine learning techniques have recently been used to improve the accuracy and speed of breast cancer diagnosis. If the accuracy is flawless, the model will be more efficient, and the solution to breast cancer diagnosis will be better. Nevertheless, the primary difficulty for systems developed to detect breast cancer using machine-learning models is attaining the greatest classification accuracy and picking the most predictive feature useful for increasing accuracy. As a result, breast cancer prognosis remains a difficulty in today's society. This research seeks to address a flaw in an existing technique that is unable to enhance classification of continuous-valued data, particularly its accuracy and the selection of optimal features for breast cancer prediction. In order to address these issues, this study examines the impact of outliers and feature reduction on the Wisconsin Diagnostic Breast Cancer Dataset, which was tested using seven different machine learning algorithms. The results show that Logistic Regression, Random Forest, and Adaboost classifiers achieved the greatest accuracy of 99.12%, on removal of outliers from the dataset. Also, this filtered dataset with feature selection, on the other hand, has the greatest accuracy of 100% and 99.12% with Random Forest and Gradient boost classifiers, respectively. When compared to other state-of-the-art approaches, the two suggested strategies outperformed the unfiltered data in terms of accuracy. The suggested architecture might be a useful tool for radiologists to reduce the number of false negatives and positives. As a result, the efficiency of breast cancer diagnosis analysis will be increased.


2021 ◽  
Vol 23 (11) ◽  
pp. 749-758
Author(s):  
Saranya N ◽  
◽  
Kavi Priya S ◽  

Breast Cancer is one of the chronic diseases occurred to human beings throughout the world. Early detection of this disease is the most promising way to improve patients’ chances of survival. The strategy employed in this paper is to select the best features from various breast cancer datasets using a genetic algorithm and machine learning algorithm is applied to predict the outcomes. Two machine learning algorithms such as Support Vector Machines and Decision Tree are used along with Genetic Algorithm. The proposed work is experimented on five datasets such as Wisconsin Breast Cancer-Diagnosis Dataset, Wisconsin Breast Cancer-Original Dataset, Wisconsin Breast Cancer-Prognosis Dataset, ISPY1 Clinical trial Dataset, and Breast Cancer Dataset. The results exploit that SVM-GA achieves higher accuracy of 98.16% than DT-GA of 97.44%.


Author(s):  
Yijia Hua ◽  
Mengzhu Yang

Megakaryocytic leukemia 1 (MKL1) acts as a transcription factor in the regulation of the immune system and is associated with cancer biology. However, its function in the infiltrating immune cells in breast cancer has not been explored. Our study aimed to analyze the expression of MKL1 in The Cancer Genome Atlas (TCGA) breast cancer dataset. The aim of this study was to evaluate the correlations between MKL1 expression, infiltrating immune cells, and immune control genes. Enriched signaling pathways and drug sensitivity analyses were also performed. Our results indicate that high MKL1 expression could predict better survival in breast cancer patients. MKL1 expression was associated with the expression and function of different immune cells, including T cells, B cells, natural killer (NK) cells, macrophages, neutrophils and dendritic cells (DCs). The chromatin modifying enzymes, cellular senescence, epigenetic regulation of gene expression, estrogen-dependent gene expression, and chromosome maintenance were differentially enriched in MKL1 low expression phenotype. Patients in the high MKL1 expression group showed sensitivity to paclitaxel, while those in the low expression group showed potential sensitivity for cisplatin and docetaxel. In conclusion, MKL1 might act as a potential biomarker of prognostic value for immune infiltration and drug sensitivity in breast cancer.


Author(s):  
Maad M. Mijwil ◽  
Israa Ezzat Salem ◽  
Rana A. Abttan

On our planet, chemical waste increases day after day, the emergence of new types of it, as well as the high level of toxic pollution, the difficulty of daily life, the increase in the psychological state of humans, and other factors all have led to the emergence of many diseases that affect humans, including deadly once like COVID-19 disease. Symptoms may appear on a person, and sometimes they may not; some people may know their condition, and others may neglect their health status due to lack of knowledge that may lead to death, or the disease may be chronic for life. In this regard, the author executes machine learning techniques (Support Vector Machine, C5.0 Decision Tree, K-Nearest Neighbours, and Random Forest) due to their influence in medical sciences to identify the best technique that gives the highest level of accuracy in detecting diseases. Thus, this technique will help to recognise symptoms and diagnose them correctly. This article covers a dataset from the UCI machine learning repository, namely the Wisconsin Breast Cancer dataset, Chronic Kidney disease dataset, Immunotherapy dataset, Cryotherapy dataset, Hepatitis dataset and COVID-19 dataset. In the results section, a comparison is made between the execution of each technique to find out which one is the best and which one is the worst in the performance of analysis related to the dataset of each disease.


PeerJ ◽  
2021 ◽  
Vol 9 ◽  
pp. e12202
Author(s):  
Chen Shuai ◽  
Fengyan Yuan ◽  
Yu Liu ◽  
Chengchen Wang ◽  
Jiansong Wang ◽  
...  

Background In recent years, ER+ and HER2- breast cancer of adjuvant therapy has made great progress, including chemotherapy and endocrine therapy. We found that the responsiveness of breast cancer treatment was related to the prognosis of patients. However, reliable prognostic signatures based on ER+ and HER2- breast cancer and drug resistance-related prognostic markers have not been well confirmed, This study in amied to establish a drug resistance-related gene signature for risk stratification in ER+ and HER2- breast cancer. Methods We used the data from The Cancer Genoma Atlas (TCGA) breast cancer dataset and gene expression database (Gene Expression Omnibus, GEO), constructed a risk profile based on four drug resistance-related genes, and developed a nomogram to predict the survival of patients with I-III ER+ and HER2- breast cancer. At the same time, we analyzed the relationship between immune infiltration and the expression of these four genes or risk groups. Results Four drug resistance genes (AMIGO2, LGALS3BP, SCUBE2 and WLS) were found to be promising tools for ER+ and HER2- breast cancer risk stratification. Then, the nomogram, which combines genetic characteristics with known risk factors, produced better performance and net benefits in calibration and decision curve analysis. Similar results were validated in three separate GEO cohorts. All of these results showed that the model can be used as a prognostic classifier for clinical decision-making, individual prediction and treatment, as well as follow-up.


2021 ◽  
Author(s):  
Thao Vu ◽  
Julia Wrobel ◽  
Benjamin G. Bitler ◽  
Erin L. Schenk ◽  
Kimberly R. Jordan ◽  
...  

AbstractThe tumor microenvironment (TME), which characterizes the tumor and its surroundings, plays a critical role in understanding cancer development and progression. Recent advances in imaging techniques enable researchers to study spatial structure of the TME at a single-cell level. Investigating spatial patterns and interactions of cell subtypes within the TME provides useful insights into how cells with different biological purposes behave, which may consequentially impact a subject’s clinical outcomes. We utilize a class of well-known spatial summary statistics, the K-function and its variants, to explore inter-cell dependence as a function of distances between cells. Using techniques from functional data analysis, we introduce an approach to model the association between these summary spatial functions and subject-level outcomes, while controlling for other clinical scalar predictors such as age and disease stage. In particular, we leverage the additive functional Cox regression model (AFCM) to study the nonlinear impact of spatial interaction between tumor and stromal cells on overall survival in patients with non-small cell lung cancer, using multiplex immunohistochemistry (mIHC) data. The applicability of our approach is further validated using a publicly available Multiplexed Ion beam Imaging (MIBI) triple-negative breast cancer dataset.


Sign in / Sign up

Export Citation Format

Share Document