cancer dataset
Recently Published Documents


TOTAL DOCUMENTS

269
(FIVE YEARS 170)

H-INDEX

11
(FIVE YEARS 5)

Author(s):  
Tsehay Admassu Assegie ◽  
Ravulapalli Lakshmi Tulasi ◽  
Vadivel Elanangai ◽  
Napa Komal Kumar

Breast cancer is the most common type of cancer occurring mostly in females. In recent years, many researchers have devoted to automate diagnosis of breast cancer by developing different machine learning model. However, the quality and quantity of feature in breast cancer diagnostic dataset have significant effect on the accuracy and efficiency of predictive model. Feature selection is effective method for reducing the dimensionality and improving the accuracy of predictive model. The use of feature selection is to determine feature required for training model and to remove irrelevant and duplicate feature. Duplicate feature is a feature that is highly correlated to another feature. The objective of this study is to conduct experimental research on three different feature selection methods for breast cancer prediction. Sequential, embedded and chi-square feature selection are implemented using breast cancer diagnostic dataset. The study compares the performance of sequential embedded and chi-square feature selection on test set. The experimental result evidently shows that sequential feature selection outperforms as compared to chi-square (X<sup>2</sup>) statistics and embedded feature selection. Overall, sequential feature selection achieves better accuracy of 98.3% as compared to chi-square (X<sup>2</sup>) statistics and embedded feature selection.


Author(s):  
Juergen Hench ◽  
Tatjana Vlajnic ◽  
Savas Deniz Soysal ◽  
Ellen C Obermann ◽  
Stephan Frank ◽  
...  

Fibroepithelial lesions (FL) of the breast, in particular Phyllodes tumors (PT) and fibroadenomas, pose a significant diagnostic challenge. There are no generally accepted criteria that distinguish benign, borderline, malignant PT, and FA. Combined genome-wide DNA methylation and copy number variant (CNV) profiling is an emerging strategy to classify tumors. We compiled a series of patient-derived archival biopsy specimens reflecting the FL spectrum and histological mimickers including clinical follow-up data. DNA methylation and CNVs were determined by well-established microarrays. Comparison of the patterns with a pan-cancer dataset assembled from public resources including "The Cancer Genome Atlas" (TCGA) and "Gene Expression Omnibus" (GEO) suggests that FLs form a methylation class distinct from both control breast tissue as well as common breast cancers. Complex CNVs were enriched in clinically aggressive FLs. Subsequent fluorescence in situ hybridization (FISH) analysis detected respective aberrations in the neoplastic mesenchymal component of FLs only, confirming that the epithelial component is non-neoplastic. Of note, our approach could lead to the elimination of the diagnostically problematic category of borderline PT and allow for optimized prognostic patient stratification. Furthermore, the identified recurrent genomic aberrations such as 1q gains (including MDM4), CDKN2a/b deletions and EGFR amplifications may inform therapeutic decision-making.


2021 ◽  
Author(s):  
Aiden Smith ◽  
Paul Lambert ◽  
Mark Rutherford

Abstract BackgroundA lack of availability of data and statistical code being published alongside journal articles provides a significant barrier to open scientific discourse, and reproducibility of research. Information governance restrictions inhibit the active dissemination of individual level data to accompany published manuscripts. Realistic, accurate time-to-event synthetic data can aid in the acceleration of methodological developments in survival analysis and beyond by enabling researchers to access and test published methods using data similar to that which they were developed on.MethodsThis paper presents methods to accurately replicate the covariate patterns and survival times found in real-world datasets using simulation techniques, without compromising individual patient identifiability. We model the joint covariate distribution of the original data using covariate specific sequential conditional regression models, then fit a complex flexible parametric survival model from which to simulate survival times conditional on individual covariate patterns. We recreate the administrative censoring mechanism using the last observed follow-up date information from the initial dataset. Metrics for evaluating the accuracy of the synthetic data, and the non-identifiability of individuals from the original dataset, are presented.ResultsWe successfully create a synthetic version of an example colon cancer dataset consisting of 9064 patients which aims to show good similarity to both covariate distributions and survival times from the original data, without containing any exact information from the original data, therefore allowing them to be published openly alongside research. ConclusionsWe evaluate the effectiveness of the simulation methods for constructing synthetic data, as well as providing evidence that it is almost impossible that a given patient from the original data could be identified from their individual unique date information. Simulated datasets using this methodology could be made available alongside published research without breaching data privacy protocols, and allow for data and code to be made available alongside methodological or applied manuscripts to greatly improve the transparency and accessibility of medical research.


2021 ◽  
Vol 2129 (1) ◽  
pp. 012022
Author(s):  
Mohamad Faiz Dzulkalnine ◽  
Roselina Sallehuddin ◽  
Yusliza Yussof ◽  
Nor Haizan Mohd Radzi ◽  
Noorfa Haszlinna Binti Mustaffa ◽  
...  

Abstract In Malaysia, Colorectal Cancer (CRC) is one of the most common cancers that occur in both men and women. Early detection is very crucial and it can significantly increase the rate of survival for the patients and if left untreated can lead to death. With the lack of high-quality CRC data, expert systems and machine learning analysis are burdened with the presence of irrelevant features, outliers, and noise. This can reduce the classification accuracy for data analysis. Accordingly, it is essential to find a reliable feature selection method that can identify and remove any irrelevant feature while being resistant to noise and outliers. In this paper, Fuzzy Principal Component Analysis (FPCA) was tested for the classification of Malaysian’s CRC dataset. With the utilization of fuzzy membership in FPCA, the experimental results showed that the proposed method produces higher accuracy compared to PCA and SVM by almost 2% and 5% respectively. Empirical results showed that FPCA is a reliable feature selection method that can find the most informative features in the CRC dataset that could assist medical practitioners in making an informed decision.


Author(s):  
D. Merlin ◽  
Dr. J. G. R. Sathiaseelan

The major purpose of this research is to forecast cervical cancer, compare which algorithms perform well, and then choose the best algorithm to predict cervical cancer at an early stage. Cervical cancer classification can be automated using a machine learning system. This study evaluates multiple machine learning techniques for cervical cancer classification. For this classification, algorithms such as Decision Tree, Naive Bayes, KNN, SVM, and MLP are proposed and evaluated. The cervical cancer Dataset, which was retrieved from the UCI machine learning data repository, was used to test these methods. With the help of Sciklit-learn, the algorithms' results were compared in terms of Accuracy, Sensitivity, and Specificity. Sciklit-learn is a Python-based machine learning package that is available for free. Finally, the best model for predicting cervical cancer is developed.


Diagnostics ◽  
2021 ◽  
Vol 11 (11) ◽  
pp. 2132
Author(s):  
Wan-Shan Li ◽  
Chih-I Chen ◽  
Hsin-Pao Chen ◽  
Kuang-Wen Liu ◽  
Chia-Jen Tsai ◽  
...  

Data mining of a public transcriptomic rectal cancer dataset (GSE35452) from the Gene Expression Omnibus, National Center for Biotechnology Information identified the melanophilin (MLPH) gene as the most significant intracellular protein transport-related gene (GO:0006886) associated with a poor response to preoperative chemoradiation. An MLPH immunostain was performed on biopsy specimens from 172 rectal cancer patients receiving preoperative chemoradiation; samples were divided into high- and low-expression groups by H-scores. Subsequently, the correlations between MLPH expression and clinicopathologic features, tumor regression grade, disease-specific survival (DSS), local recurrence-free survival (LRFS), and metastasis-free survival (MeFS) were analyzed. MLPH expression was significantly associated with CEA level (p = 0.001), pre-treatment tumor status (p = 0.022), post-treatment tumor status (p < 0.001), post-treatment nodal status (p < 0.001), vascular invasion (p = 0.028), and tumor regression grade (p < 0.001). After uni- and multi-variable analysis of five-year survival, MLPH expression was still associated with lower DSS (hazard ratio (HR), 10.110; 95% confidence interval (CI), 2.178–46.920; p = 0.003) and MeFS (HR, 5.621; 95% CI, 1.762–17.931; p = 0.004). In conclusion, identifying MLPH expression could help to predict the response to chemoradiation and survival, and aid in personal therapeutic modification.


Author(s):  
Maad M. Mijwil ◽  
Israa Ezzat Salem ◽  
Rana A. Abttan

On our planet, chemical waste increases day after day, the emergence of new types of it, as well as the high level of toxic pollution, the difficulty of daily life, the increase in the psychological state of humans, and other factors all have led to the emergence of many diseases that affect humans, including deadly once like COVID-19 disease. Symptoms may appear on a person, and sometimes they may not; some people may know their condition, and others may neglect their health status due to lack of knowledge that may lead to death, or the disease may be chronic for life. In this regard, the author executes machine learning techniques (Support Vector Machine, C5.0 Decision Tree, K-Nearest Neighbours, and Random Forest) due to their influence in medical sciences to identify the best technique that gives the highest level of accuracy in detecting diseases. Thus, this technique will help to recognise symptoms and diagnose them correctly. This article covers a dataset from the UCI machine learning repository, namely the Wisconsin Breast Cancer dataset, Chronic Kidney disease dataset, Immunotherapy dataset, Cryotherapy dataset, Hepatitis dataset and COVID-19 dataset. In the results section, a comparison is made between the execution of each technique to find out which one is the best and which one is the worst in the performance of analysis related to the dataset of each disease.


2021 ◽  
Author(s):  
Li Zeng ◽  
Hongqiu Wang ◽  
Xin Wang ◽  
Miao Tian ◽  
Shaozhi Wu

Cervical cancer is one of the most common causes of cancer death in women. During the treatment of cervical cancer, it is necessary to make a radiation plan based on the clinical target volume (CTV) on the CT image. At present, CTV is manually sketched by physicists, which is time-consuming and laborious. With the help of deep learning model, computer can accurately draw the outline of CTV in Colleges and universities. The CDBNet proposed in this paper is a cascaded segmentation network based on double-branch boundary enhancement. First, classification network determines whether a single image contains a region of interest (ROI), and then the segmentation network uses DBNet to segment more accurately at the ROI contour. In this paper, we propose CDBNet, a cascaded segmentation network based on doublebranch boundary enhancement. First, classification network determines whether a single image contains a region of interest (ROI), and then the segmentation network uses DBNet to segment more accurately at the ROI contour. The CDBNet proposed in this paper was verified on the cervical cancer dataset provided by the Department of Radiation Oncology, West China Hospital, Sichuan Province. The average dice and 95HD of the delineation results are 86.12% and 2.51mm. At the same time, the classification accuracy rate of whether the image contains ROI can reach 93.19%, and the average Dice of the image containing ROI can reach 70%.


Sign in / Sign up

Export Citation Format

Share Document