Predicting Invasive Lobular Carcinoma Using Machine Learning

2021 ◽  
Author(s):  
Rucha P.

The critical identification and prediction of the kind of malignant development should generate an interest in illness research, to assist and manage patients. The criticality of classifying illness patients into high or low risk groups necessitates that several examination groups from the biomedical and bioinformatics fields study and study the use of artificial intelligence (AI) technologies.An approach based on strategic regression and multi-classifiers has been presented to predict breast cancer.To develop deep projections in a different environment based on facts on bosom illness. This article examines the many information mining techniques that make use of classification that may be used to Breast Cancer data to provide deeper projections. Apart from that, this inquiry forecasts the best Model-generating elite by evaluating the dataset using several classifiers. Breast malignant growth dataset was gathered from the UCI AI vault and contains 569 instances with 31 attributes. The data gathering process begins with the simple logistic regression methodology, followed by IBK, K-star, Multi-Layer Perceptron (MLP), Random Forest, Decision table, Decision Trees (DT), PART, Multi-Class Classifiers, and REP Tree. Cross approval with a 10-overlap is used, and preparation is carried out to design and test new Models. The outputs are evaluated against a variety of criteria, including accuracy, root mean square error, sensitivity, specificity, F-Measure, ROC Curve Area, and Kappa measurement, as well as the time required to construct the model. The analysis of the results reveals that, of all the classifiers, Simple Logistic Regression produces the deepest predictions and obtains the best model that produces high and precise results, followed by other techniques. IBK: Nearest Neighbor Classifier, K-Star: Example-Based Classifier, and MLP-Neural Organization Different methods have a lower degree of accuracy when examined using the Logistic relapse methodology.

2018 ◽  
Vol 7 (4.20) ◽  
pp. 22 ◽  
Author(s):  
Jabeen Sultana ◽  
Abdul Khader Jilani ◽  
. .

The primary identification and prediction of type of the cancer ought to develop a compulsion in cancer study, in order to assist and supervise the patients. The significance of classifying cancer patients into high or low risk clusters needs commanded many investigation teams, from the biomedical and the bioinformatics area, to learn and analyze the application of machine learning (ML) approaches. Logistic Regression method and Multi-classifiers has been proposed to predict the breast cancer. To produce deep predictions in a new environment on the breast cancer data. This paper explores the different data mining approaches using Classification which can be applied on Breast Cancer data to build deep predictions. Besides this, this study predicts the best Model yielding high performance by evaluating dataset on various classifiers. In this paper Breast cancer dataset is collected from the UCI machine learning repository has 569 instances with 31 attributes. Data set is pre-processed first and fed to various classifiers like Simple Logistic-regression method, IBK, K-star, Multi-Layer Perceptron (MLP), Random Forest, Decision table, Decision Trees (DT), PART, Multi-Class Classifiers and REP Tree.  10-fold cross validation is applied, training is performed so that new Models are developed and tested. The results obtained are evaluated on various parameters like Accuracy, RMSE Error, Sensitivity, Specificity, F-Measure, ROC Curve Area and Kappa statistic and time taken to build the model. Result analysis reveals that among all the classifiers Simple Logistic Regression yields the deep predictions and obtains the best model yielding high and accurate results followed by other methods IBK: Nearest Neighbor Classifier, K-Star: instance-based Classifier, MLP- Neural network. Other Methods obtained less accuracy in comparison with Logistic regression method.  


Author(s):  
Kazutaka Uchida ◽  
Junichi Kouno ◽  
Shinichi Yoshimura ◽  
Norito Kinjo ◽  
Fumihiro Sakakibara ◽  
...  

AbstractIn conjunction with recent advancements in machine learning (ML), such technologies have been applied in various fields owing to their high predictive performance. We tried to develop prehospital stroke scale with ML. We conducted multi-center retrospective and prospective cohort study. The training cohort had eight centers in Japan from June 2015 to March 2018, and the test cohort had 13 centers from April 2019 to March 2020. We use the three different ML algorithms (logistic regression, random forests, XGBoost) to develop models. Main outcomes were large vessel occlusion (LVO), intracranial hemorrhage (ICH), subarachnoid hemorrhage (SAH), and cerebral infarction (CI) other than LVO. The predictive abilities were validated in the test cohort with accuracy, positive predictive value, sensitivity, specificity, area under the receiver operating characteristic curve (AUC), and F score. The training cohort included 3178 patients with 337 LVO, 487 ICH, 131 SAH, and 676 CI cases, and the test cohort included 3127 patients with 183 LVO, 372 ICH, 90 SAH, and 577 CI cases. The overall accuracies were 0.65, and the positive predictive values, sensitivities, specificities, AUCs, and F scores were stable in the test cohort. The classification abilities were also fair for all ML models. The AUCs for LVO of logistic regression, random forests, and XGBoost were 0.89, 0.89, and 0.88, respectively, in the test cohort, and these values were higher than the previously reported prediction models for LVO. The ML models developed to predict the probability and types of stroke at the prehospital stage had superior predictive abilities.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Jing Tian ◽  
Jianping Zhao ◽  
Chunhou Zheng

Abstract Background In recent years, various sequencing techniques have been used to collect biomedical omics datasets. It is usually possible to obtain multiple types of omics data from a single patient sample. Clustering of omics data plays an indispensable role in biological and medical research, and it is helpful to reveal data structures from multiple collections. Nevertheless, clustering of omics data consists of many challenges. The primary challenges in omics data analysis come from high dimension of data and small size of sample. Therefore, it is difficult to find a suitable integration method for structural analysis of multiple datasets. Results In this paper, a multi-view clustering based on Stiefel manifold method (MCSM) is proposed. The MCSM method comprises three core steps. Firstly, we established a binary optimization model for the simultaneous clustering problem. Secondly, we solved the optimization problem by linear search algorithm based on Stiefel manifold. Finally, we integrated the clustering results obtained from three omics by using k-nearest neighbor method. We applied this approach to four cancer datasets on TCGA. The result shows that our method is superior to several state-of-art methods, which depends on the hypothesis that the underlying omics cluster class is the same. Conclusion Particularly, our approach has better performance than compared approaches when the underlying clusters are inconsistent. For patients with different subtypes, both consistent and differential clusters can be identified at the same time.


2007 ◽  
Vol 86 (9) ◽  
pp. 852-856 ◽  
Author(s):  
M.T. John ◽  
W. Micheelis ◽  
J.G. Steele

Depression is associated with impaired health outcomes. This study investigated whether there is a significant association between depression and dissatisfaction with dentures in older adults. In a population-based study (1180 adults aged 65–74 yrs), depression was measured by an abbreviated Geriatric Depression Scale. Denture dissatisfaction was assessed with a five-point Likert-type question ("very dissatisfied" to "very satisfied"). The depression-denture dissatisfaction association was analyzed with simple (dissatisfied vs. not dissatisfied outcome) and ordinal logistic regression (based on outcome’s full range). For each unit increase on the 15-point depression scale, the probability of denture dissatisfaction increased by 24% [95% confidence interval, 15–34%, P < 0.001 (simple logistic regression)] and the probability for higher levels on the five-point dissatisfaction scale increased by 16% [95% CI, 11–22%, P < 0.001 (ordinal logistic regression)], adjusted for potential confounding variables. The likely causal association in older adults has major implications for the evaluation of treatment effects and the demand for prosthodontic therapy.


2021 ◽  
Vol 13 (5) ◽  
pp. 941
Author(s):  
Rong Lu ◽  
Jennifer L. Miskimins ◽  
Mikhail Zhizhin

In today’s oil industry, companies frequently flare the produced natural gas from oil wells. The flaring activities are extensive in some regions including North Dakota. Besides company-reported data, which are compiled by the North Dakota Industrial Commission, flaring statistics such as count and volume can be estimated via Visible Infrared Imaging Radiometer Suite nighttime observations. Following data gathering and preprocessing, Bayesian machine learning implemented with Markov chain Monte Carlo methods is performed to tackle two tasks: flaring time series analysis and distribution approximation. They help further understanding of the flaring profiles and reporting qualities, which are important for decision/policy making. First, although fraught with measurement and estimation errors, the time series provide insights into flaring approaches and characteristics. Gaussian processes are successful in inferring the latent flaring trends. Second, distribution approximation is achieved by unsupervised learning. The negative binomial and Gaussian mixture models are utilized to describe the distributions of field flare count and volume, respectively. Finally, a nearest-neighbor-based approach for company level flared volume allocation is developed. Potential discrepancies are spotted between the company reported and the remotely sensed flaring profiles.


Author(s):  
El-Housainy A. Rady ◽  
Mohamed R. Abonazel ◽  
Mariam H. Metawe’e

Goodness of fit (GOF) tests of logistic regression attempt to find out the suitability of the model to the data. The null hypothesis of all GOF tests is the model fit. R as a free software package has many GOF tests in different packages. A Monte Carlo simulation has been conducted to study two situations; the first, studying the ability of each test, under its default settings, to accept the null hypothesis when the model truly fitted. The second, studying the power of these tests when assumptions of sufficient linear combination of the explanatory variables are violated (by omitting linear covariate term, quadratic term, or interaction term). Moreover, checking whether the same test in different R packages had the same results or not. As the sample size supposed to affect simulation results, so the pattern of change of GOF tests results under different sample sizes as well as different model settings was estimated. All tests accept the null hypothesis (more than 95% of simulation trials) when the model truly fitted except modified Hosmer-Lemeshow test in "LogisticDx" package under all different model settings and Osius and Rojek’s (OsRo) test when the true model had an interaction term between binary and categorical covariates. In addition, le Cessie-van Houwelingen-Copas-Hosmer unweighted sum of squares (CHCH) test gave unexpected different results under different packages. Concerning the power study, all tests had a very low power when a departure of missing covariate existed. Generally, stukel’s test (package ’LogisticDX) and CHCH test (package "RMS") reached a power in detecting a missing quadratic term greater than 80% under lower sample size while OsRo test (package ’LogisticDX’) was better in detecting missing interaction term. Beside the simulation study, we evaluated the performance of GOF tests using the breast cancer dataset.


1993 ◽  
Vol 39 (12) ◽  
pp. 2495-2499 ◽  
Author(s):  
J P Corsetti ◽  
C Cox ◽  
T J Schulz ◽  
D A Arvan

Abstract Serum amylase and lipase measurements are often used to diagnose acute pancreatitis. This study addresses the question of whether it is advantageous to order serum amylase and lipase tests simultaneously. We evaluated performance of the two tests separately and in combination through a retrospective study of patients for whom both amylase and lipase determinations were ordered. Initial analysis of test performance was conducted with a uniformly applied criterion based on determination of optimal sensitivity-specificity pairs. Individual tests and combinations of tests, including the "AND" and "OR" rules and discriminant functions, were examined. Only the discriminant approach demonstrated better performance than the lipase test alone. This finding was subsequently confirmed by logistic regression analysis. We conclude that ordering both tests simultaneously can be advantageous in diagnosing acute pancreatitis when a bivariate approach is used; however, this must be weighed against the difficulties associated with clinical implementation of such approaches.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
David E. Booth ◽  
Venugopal Gopalakrishna-Remani ◽  
Matthew L. Cooper ◽  
Fiona R. Green ◽  
Margaret P. Rayman

AbstractWe begin by arguing that the often used algorithm for the discovery and use of disease risk factors, stepwise logistic regression, is unstable. We then argue that there are other algorithms available that are much more stable and reliable (e.g. the lasso and gradient boosting). We then propose a protocol for the discovery and use of risk factors using lasso or boosting variable selection. We then illustrate the use of the protocol with a set of prostate cancer data and show that it recovers known risk factors. Finally, we use the protocol to identify new and important SNP based risk factors for prostate cancer and further seek evidence for or against the hypothesis of an anticancer function for Selenium in prostate cancer. We find that the anticancer effect may depend on the SNP-SNP interaction and, in particular, which alleles are present.


Sign in / Sign up

Export Citation Format

Share Document