scholarly journals Does Deep Learning Work Well for Categorical Datasets with Mainly Nominal Attributes?

Electronics ◽  
2020 ◽  
Vol 9 (11) ◽  
pp. 1966
Author(s):  
Yoichi Hayashi

Given the complexity of real-world datasets, it is difficult to present data structures using existing deep learning (DL) models. Most research to date has concentrated on datasets with only one type of attribute: categorical or numerical. Categorical data are common in datasets such as the German (-categorical) credit scoring dataset, which contains numerical, ordinal, and nominal attributes. The heterogeneous structure of this dataset makes very high accuracy difficult to achieve. DL-based methods have achieved high accuracy (99.68%) for the Wisconsin Breast Cancer Dataset, whereas DL-inspired methods have achieved high accuracy (97.39%) for the Australian credit dataset. However, to our knowledge, no such method has been proposed to classify the German credit dataset. This study aimed to provide new insights into the reasons why DL-based and DL-inspired classifiers do not work well for categorical datasets, mainly consisting of nominal attributes. We also discuss the problems associated with using nominal attributes to design high-performance classifiers. Considering the expanded utility of DL, this study's findings should aid in the development of a new type of DL that can handle categorical datasets consisting of mainly nominal attributes, which are commonly used in risk evaluation, finance, banking, and marketing.

Author(s):  
Runpu Chen ◽  
Le Yang ◽  
Steve Goodison ◽  
Yijun Sun

Abstract Motivation Cancer subtype classification has the potential to significantly improve disease prognosis and develop individualized patient management. Existing methods are limited by their ability to handle extremely high-dimensional data and by the influence of misleading, irrelevant factors, resulting in ambiguous and overlapping subtypes. Results To address the above issues, we proposed a novel approach to disentangling and eliminating irrelevant factors by leveraging the power of deep learning. Specifically, we designed a deep-learning framework, referred to as DeepType, that performs joint supervised classification, unsupervised clustering and dimensionality reduction to learn cancer-relevant data representation with cluster structure. We applied DeepType to the METABRIC breast cancer dataset and compared its performance to state-of-the-art methods. DeepType significantly outperformed the existing methods, identifying more robust subtypes while using fewer genes. The new approach provides a framework for the derivation of more accurate and robust molecular cancer subtypes by using increasingly complex, multi-source data. Availability and implementation An open-source software package for the proposed method is freely available at http://www.acsu.buffalo.edu/~yijunsun/lab/DeepType.html. Supplementary information Supplementary data are available at Bioinformatics online.


2014 ◽  
Vol 88 ◽  
pp. 60-64 ◽  
Author(s):  
Martin Schwentenwein ◽  
Peter Schneider ◽  
Johannes Homa

Albeit widely established in plastic and metal industry, additive manufacturing technologies are still a rare sight in the field of ceramic manufacturing. This is mainly due to the requirements for high performance ceramic parts, which no additive manufacturing process was able to meet to date.The Lithography-based Ceramic Manufacturing (LCM)-technology which enables the production of dense and precise ceramic parts by using a photocurable ceramic suspension that is hardened via a photolithographic process. This new technology not only provides very high accuracy, it also reaches high densities for the sintered parts. In the case of alumina a relative density of over 99.4 % and a 4-point-bending strength of almost 430 MPa were realized. Thus, the achievable properties are similar to conventional manufacturing methods, making the LCM-technology an interesting complement for the ceramic industry.


Author(s):  
Vinod Jagannath Kadam ◽  
Shivajirao Manikrao Jadhav

Medical data classification is the process of transforming descriptions of medical diagnoses and procedures into universal medical code numbers. The diagnoses and procedures are usually taken from a variety of sources within the healthcare record, such as the transcription of the physician’s notes, laboratory results, radiologic results and other sources. However, there exist many frequency distribution problems in these domains. Hence, this paper intends to develop an advanced and precise medical data classification approach for diabetes and breast cancer dataset. With the knowledge of the features and challenges persisting with the state-of-the-art classification methods, deep learning-based medical data classification methodology is proposed here. It is well known that deep learning networks learn directly from the data. In this paper, the medical data is dimensionally reduced using Principle Component Analysis (PCA). The dimensionally reduced data are transformed by multiplying by a weighting factor, which is optimized using Whale Optimization Algorithm (WOA), to obtain the maximum distance between the features. As a result, the data are transformed into a label-distinguishable plane under which the Deep Belief Network (DBN) is adopted to perform the deep learning process, and the data classification is performed. Further, the proposed WOA-based DBN (WOADBN) method is compared with the Neural Network (NN), DBN, Generic Algorithm-based NN (GANN), GADBN, Particle Swarm Optimization (PSONN), PSO-based DBN (PSODBN), WOA-based NN (WOANN) techniques and the results are obtained, which shows the superiority of proposed algorithm over conventional methods.


Transport ◽  
2010 ◽  
Vol 25 (4) ◽  
pp. 345-351 ◽  
Author(s):  
Vytautas Paulauskas

All ports and a number of waterways have straits to optimize investments in developing such systems reaching the maximum results with minimum expenditures. New high accuracy port navigational systems have a possibility of high precision ship positioning and any time should guarantee shipping safety in port waters which makes a good basis for the optimization of port development. A new type of ships with good steering equipment and ship steering knowledge and methods in combination with very high accuracy port navigational systems such as E‐Sea Fix and horizontal/vertical port channel bottom scanning possibilities guaranteeing real port water bottom conditions could stimulate dramatically increasing ship sizes at the port entrance in case of guaranteed shipping safety. With reference to straits, a theoretical study and experimental results received by simulators and real ships under much the same conditions have delivered a new knowledge of the limit of big ship sailing in straits and the possibilities of increasing ship size under similar sailing conditions. The Klaipeda strait is taken as the case study for practical testing. The paper presents the results, conclusions and recommendations of a theoretical and practical study for the ships of an increased size at strait ports.


2017 ◽  
Vol 7 (4) ◽  
pp. 265-286 ◽  
Author(s):  
Guido Bologna ◽  
Yoichi Hayashi

AbstractRule extraction from neural networks is a fervent research topic. In the last 20 years many authors presented a number of techniques showing how to extract symbolic rules from Multi Layer Perceptrons (MLPs). Nevertheless, very few were related to ensembles of neural networks and even less for networks trained by deep learning. On several datasets we performed rule extraction from ensembles of Discretized Interpretable Multi Layer Perceptrons (DIMLP), and DIMLPs trained by deep learning. The results obtained on the Thyroid dataset and the Wisconsin Breast Cancer dataset show that the predictive accuracy of the extracted rules compare very favorably with respect to state of the art results. Finally, in the last classification problem on digit recognition, generated rules from the MNIST dataset can be viewed as discriminatory features in particular digit areas. Qualitatively, with respect to rule complexity in terms of number of generated rules and number of antecedents per rule, deep DIMLPs and DIMLPs trained by arcing give similar results on a binary classification problem involving digits 5 and 8. On the whole MNIST problem we showed that it is possible to determine the feature detectors created by neural networks and also that the complexity of the extracted rulesets can be well balanced between accuracy and interpretability.


2020 ◽  
Author(s):  
Noor Ayesha ◽  
Saleha Yurf ◽  
Syed Mohammad Mehmood Abbas ◽  
Ali Haider Bangash ◽  
Adil Baloch ◽  
...  

NAFLD is reported to be the only hepatic ailment increasing in itsprevalence concurrently with both; obesity & T2DM. In the wake of a massivestrain on global health resources due to COVID 19 pandemic, NAFLD is boundto be neglected & shelved. Abdominal ultrasonography is done for NAFLDscreening diagnosis which has a high monetary cost associated with it. Wepresent a deep learning model that requires only easy-to-measureanthropometric measures for coming up with a screening diagnosis for NAFLDwith very high accuracy. Further studies are suggested to validate thegeneralization of the presented model.


Breast cancer is one of the dangerous diseases leads fast death among women. Several kinds of cancers are affecting people, but breast cancer affects highly women. In medical industry removal of women breasts or major surgery is taken forward as the solution, where it reoccurs after surgery also. Only solution to save women from breast cancer is to identify and detect the earlier stage of cancer and provide necessary treatment. Hence various research works have been focused on finding good solution for diagnosing and classifying the cancer stages as benign, malignant or severe malignant. Still the accuracy of classification needs to be improved on complex breast cancer datasets. Few of the earlier research works have proposed machine learning algorithms, which are semiautomatic and accuracy is also not high. Thus, to provide a better solution this paper aimed to use one of the deep learning algorithms such as Convolution Neural Networks for diagnosing various kinds of breast cancer dataset. From the experimental results, it is obtained that the proposed deep learning algorithms outperforms than the other algorithms.


2020 ◽  
Vol 34 (01) ◽  
pp. 997-1004
Author(s):  
Xinying Wang ◽  
Olamide Timothy Tawose ◽  
Feng Yan ◽  
Dongfang Zhao

The Kirchhoff law is one of the most widely used physical laws in many engineering principles, e.g., biomedical engineering, electrical engineering, and computer engineering. One challenge of applying the Kirchhoff law to real-world applications at scale lies in the high, if not prohibitive, computational cost to solve a large number of nonlinear equations. Despite recent advances in leveraging a convolutional neural network (CNN) to estimate the solutions of Kirchhoff equations, the low performance is still significantly hindering the broad adoption of CNN-based approaches. This paper proposes a high-performance deep-learning-based approach for Kirchhoff analysis, namely HDK. HDK employs two techniques to improve the performance: (i) early pruning of unqualified input candidates and (ii) parallelization of forward labelling. To retain high accuracy, HDK also applies various optimizations to the data such as randomized augmentation and dimension reduction. Collectively, the aforementioned techniques improve the analysis speed by 8× with accuracy as high as 99.6%.


2017 ◽  
Author(s):  
Evelina Gabasova ◽  
John Reid ◽  
Lorenz Wernisch

AbstractIntegrative clustering is used to identify groups of samples by jointly analysing multiple datasets describing the same set of biological samples, such as gene expression, copy number, methylation etc. Most existing algorithms for integrative clustering assume that there is a shared consistent set of clusters across all datasets, and most of the data samples follow this structure. However in practice, the structure across heterogeneous datasets can be more varied, with clusters being joined in some datasets and separated in others.In this paper, we present a probabilistic clustering method to identify groups across datasets that do not share the same cluster structure. The proposed algorithm, Clusternomics, identifies groups of samples that share their global behaviour across heterogeneous datasets. The algorithm models clusters on the level of individual datasets, while also extracting global structure that arises from the local cluster assignments. Clusters on both the local and the global level are modelled using a hierarchical Dirichlet mixture model to identify structure on both levels.We evaluated the model both on simulated and on real-world datasets. The simulated data exemplifies datasets with varying degrees of common structure. In such a setting Clusternomics outperforms existing algorithms for integrative and consensus clustering. In a real-world application, we used the algorithm for cancer subtyping, identifying subtypes of cancer from heterogeneous datasets. We applied the algorithm to TCGA breast cancer dataset, integrating gene expression, miRNA expression, DNA methylation and proteomics. The algorithm extracted clinically meaningful clusters with significantly different survival probabilities. We also evaluated the algorithm on lung and kidney cancer TCGA datasets with high dimensionality, again showing clinically significant results and scalability of the algorithm.Author SummaryIntegrative clustering is the task of identifying groups of samples by combining information from several datasets. An example of this task is cancer subtyping, where we cluster tumour samples based on several datasets, such as gene expression, proteomics and others. Most existing algorithms assume that all such datasets share a similar cluster structure, with samples outside these clusters treated as noise. The structure can, however, be much more heterogeneous: some meaningful clusters may appear only in some datasets.In the paper, we introduce the Clusternomics algorithm that identifies groups of samples across heterogeneous datasets. It models both cluster structure of individual datasets, and the global structure that appears as a combination of local structures. The algorithm uses probabilistic modelling to identify the groups and share information across the local and global levels. We evaluated the algorithm on both simulated and real world datasets, where the algorithm found clinically significant clusters with different survival outcomes.


Sign in / Sign up

Export Citation Format

Share Document