Large-scale Data Classification based on K-means Clustering and Deep Learning

Nuntuschaporn Senawong; Supawadee Wichitchan; Orawich Kumphon

doi:10.14416/j.kmutnb.2021.03.012

Whole-cell segmentation of tissue images with human-level performance using large-scale data annotation and deep learning

Nature Biotechnology ◽

10.1038/s41587-021-01094-0 ◽

2021 ◽

Author(s):

Noah F. Greenwald ◽

Geneva Miller ◽

Erick Moen ◽

Alex Kong ◽

Adam Kagel ◽

...

Keyword(s):

Deep Learning ◽

Large Scale ◽

Cell Segmentation ◽

Whole Cell ◽

Data Annotation ◽

Large Scale Data ◽

Level Performance ◽

Scale Data

Download Full-text

Automatic large-scale data acquisition via crowdsourcing for crosswalk classification: A deep learning approach

Computers & Graphics ◽

10.1016/j.cag.2017.08.004 ◽

2017 ◽

Vol 68 ◽

pp. 32-42 ◽

Cited By ~ 18

Author(s):

Rodrigo F. Berriel ◽

Franco Schmidt Rossi ◽

Alberto F. de Souza ◽

Thiago Oliveira-Santos

Keyword(s):

Deep Learning ◽

Data Acquisition ◽

Large Scale ◽

Learning Approach ◽

Large Scale Data ◽

Scale Data

Download Full-text

Approximate kernel extreme learning machine for large scale data classification

Neurocomputing ◽

10.1016/j.neucom.2016.09.023 ◽

2017 ◽

Vol 219 ◽

pp. 210-220 ◽

Cited By ~ 23

Author(s):

Alexandros Iosifidis ◽

Anastasios Tefas ◽

Ioannis Pitas

Keyword(s):

Extreme Learning Machine ◽

Large Scale ◽

Data Classification ◽

Large Scale Data ◽

Kernel Extreme Learning Machine ◽

Learning Machine ◽

Scale Data

Download Full-text

Deep Learning Method for RNA Secondary Structure Prediction with Pseudoknots Based on Large-Scale Data

Journal of Healthcare Engineering ◽

10.1155/2021/6699996 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Bowen Shen ◽

Hao Zhang ◽

Cong Li ◽

Tianheng Zhao ◽

Yuanning Liu

Keyword(s):

Deep Learning ◽

Secondary Structure ◽

Structure Prediction ◽

Rna Secondary Structure ◽

Large Scale ◽

Secondary Structure Prediction ◽

Learning Methods ◽

Rna Secondary Structure Prediction ◽

Large Scale Data ◽

Scale Data

Traditional machine learning methods are widely used in the field of RNA secondary structure prediction and have achieved good results. However, with the emergence of large-scale data, deep learning methods have more advantages than traditional machine learning methods. As the number of network layers increases in deep learning, there will often be problems such as increased parameters and overfitting. We used two deep learning models, GoogLeNet and TCN, to predict RNA secondary results. And from the perspective of the depth and width of the network, improvements are made based on the neural network model, which can effectively improve the computational efficiency while extracting more feature information. We process the existing real RNA data through experiments, use deep learning models to extract useful features from a large amount of RNA sequence data and structure data, and then predict the extracted features to obtain each base’s pairing probability. The characteristics of RNA secondary structure and dynamic programming methods are used to process the base prediction results, and the structure with the largest sum of the probability of each base pairing is obtained, and this structure will be used as the optimal RNA secondary structure. We, respectively, evaluated GoogLeNet and TCN models based on 5sRNA, tRNA data, and tmRNA data, and compared them with other standard prediction algorithms. The sensitivity and specificity of the GoogLeNet model on the 5sRNA and tRNA data sets are about 16% higher than the best prediction results in other algorithms. The sensitivity and specificity of the GoogLeNet model on the tmRNA dataset are about 9% higher than the best prediction results in other algorithms. As deep learning algorithms’ performance is related to the size of the data set, as the scale of RNA data continues to expand, the prediction accuracy of deep learning methods for RNA secondary structure will continue to improve.

Download Full-text

Large-scale data classification based on hierarchical clustering and re-sampling

Journal of Computer Applications ◽

10.3724/sp.j.1087.2013.02801 ◽

2013 ◽

Vol 33 (10) ◽

pp. 2801-2803

Author(s):

Yong ZHANG ◽

Panpan FU ◽

Yuting ZHANG

Keyword(s):

Hierarchical Clustering ◽

Large Scale ◽

Data Classification ◽

Large Scale Data ◽

Scale Data

Download Full-text

E-Commerce data classification in the cloud environment based on bayesian algorithm

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189421 ◽

2020 ◽

pp. 1-8

Author(s):

Bing Xu

Keyword(s):

Large Scale ◽

High Efficiency ◽

Feature Selection Method ◽

Data Classification ◽

Classification Algorithm ◽

Testing Time ◽

Bayesian Algorithm ◽

Large Scale Data ◽

Distributed Platform ◽

Scale Data

In the process of e-commerce transactions, a large amount of data will be generated, whose effective classification is one of current research hotspots. An improved feature selection method was proposed based on the characteristics of Bayesian classification algorithm. Due to the long training and testing time of modern large-scale data classification on a single computer, a data classification algorithm based on Naive Bayes was designed and implemented on the Hadoop distributed platform. The experimental results showed that the improved algorithm could effectively improve the accuracy of classification, and the designed parallel Bayesian data classification algorithm had high efficiency, which was suitable for the processing and analysis of massive data.

Download Full-text

High-Performance Machine Learning for Large-Scale Data Classification considering Class Imbalance

Scientific Programming ◽

10.1155/2020/1953461 ◽

2020 ◽

Vol 2020 ◽

pp. 1-16

Author(s):

Yang Liu ◽

Xiang Li ◽

Xianbang Chen ◽

Xi Wang ◽

Huaqiang Li

Keyword(s):

Neural Network ◽

Machine Learning ◽

High Performance ◽

Large Scale ◽

Class Imbalance ◽

Data Classification ◽

Training Dataset ◽

Large Scale Data ◽

Input Layer ◽

Scale Data

Currently, data classification is one of the most important ways to analysis data. However, along with the development of data collection, transmission, and storage technologies, the scale of the data has been sharply increased. Additionally, due to multiple classes and imbalanced data distribution in the dataset, the class imbalance issue is also gradually highlighted. The traditional machine learning algorithms lack of abilities for handling the aforementioned issues so that the classification efficiency and precision may be significantly impacted. Therefore, this paper presents an improved artificial neural network in enabling the high-performance classification for the imbalanced large volume data. Firstly, the Borderline-SMOTE (synthetic minority oversampling technique) algorithm is employed to balance the training dataset, which potentially aims at improving the training of the back propagation neural network (BPNN), and then, zero-mean, batch-normalization, and rectified linear unit (ReLU) are further employed to optimize the input layer and hidden layers of BPNN. At last, the ensemble learning-based parallelization of the improved BPNN is implemented using the Hadoop framework. Positive conclusions can be summarized according to the experimental results. Benefitting from Borderline-SMOTE, the imbalanced training dataset can be balanced, which improves the training performance and the classification accuracy. The improvements for the input layer and hidden layer also enhance the training performances in terms of convergence. The parallelization and the ensemble learning techniques enable BPNN to implement the high-performance large-scale data classification. The experimental results show the effectiveness of the presented classification algorithm.

Download Full-text

Machine Learning and Deep Learning frameworks and libraries for large-scale data mining: a survey

Artificial Intelligence Review ◽

10.1007/s10462-018-09679-z ◽

2019 ◽

Vol 52 (1) ◽

pp. 77-124 ◽

Cited By ~ 70

Author(s):

Giang Nguyen ◽

Stefan Dlugolinsky ◽

Martin Bobák ◽

Viet Tran ◽

Álvaro López García ◽

...

Keyword(s):

Machine Learning ◽

Data Mining ◽

Deep Learning ◽

Large Scale ◽

Large Scale Data ◽

Learning Frameworks ◽

Scale Data

Download Full-text

Whole-cell segmentation of tissue images with human-level performance using large-scale data annotation and deep learning

10.1101/2021.03.01.431313 ◽

2021 ◽

Author(s):

Noah F. Greenwald ◽

Geneva Miller ◽

Erick Moen ◽

Alex Kong ◽

Adam Kagel ◽

...

Keyword(s):

Deep Learning ◽

Large Scale ◽

Cell Segmentation ◽

Tissue Imaging ◽

Imaging Data ◽

Whole Cell ◽

Data Annotation ◽

Large Scale Data ◽

Level Performance ◽

Scale Data

AbstractUnderstanding the spatial organization of tissues is of critical importance for both basic and translational research. While recent advances in tissue imaging are opening an exciting new window into the biology of human tissues, interpreting the data that they create is a significant computational challenge. Cell segmentation, the task of uniquely identifying each cell in an image, remains a substantial barrier for tissue imaging, as existing approaches are inaccurate or require a substantial amount of manual curation to yield useful results. Here, we addressed the problem of cell segmentation in tissue imaging data through large-scale data annotation and deep learning. We constructed TissueNet, an image dataset containing >1 million paired whole-cell and nuclear annotations for tissue images from nine organs and six imaging platforms. We created Mesmer, a deep learning-enabled segmentation algorithm trained on TissueNet that performs nuclear and whole-cell segmentation in tissue imaging data. We demonstrated that Mesmer has better speed and accuracy than previous methods, generalizes to the full diversity of tissue types and imaging platforms in TissueNet, and achieves human-level performance for whole-cell segmentation. Mesmer enabled the automated extraction of key cellular features, such as subcellular localization of protein signal, which was challenging with previous approaches. We further showed that Mesmer could be adapted to harness cell lineage information present in highly multiplexed datasets. We used this enhanced version to quantify cell morphology changes during human gestation. All underlying code and models are released with permissive licenses as a community resource.

Download Full-text

Prediction of residential gross yields by using a deep learning method on large scale data processing framework

Pressacademia ◽

10.17261/pressacademia.2018.801 ◽

2018 ◽

Vol 7 (1) ◽

pp. 125-130

Author(s):

Semra Erpolat Tasabat ◽

Olgun Aydin ◽

Ali Hepsen

Keyword(s):

Deep Learning ◽

Data Processing ◽

Large Scale ◽

Learning Method ◽

Large Scale Data ◽

Large Scale Data Processing ◽

Scale Data ◽

Processing Framework

Download Full-text