Whole-cell segmentation of tissue images with human-level performance using large-scale data annotation and deep learning

AbstractUnderstanding the spatial organization of tissues is of critical importance for both basic and translational research. While recent advances in tissue imaging are opening an exciting new window into the biology of human tissues, interpreting the data that they create is a significant computational challenge. Cell segmentation, the task of uniquely identifying each cell in an image, remains a substantial barrier for tissue imaging, as existing approaches are inaccurate or require a substantial amount of manual curation to yield useful results. Here, we addressed the problem of cell segmentation in tissue imaging data through large-scale data annotation and deep learning. We constructed TissueNet, an image dataset containing >1 million paired whole-cell and nuclear annotations for tissue images from nine organs and six imaging platforms. We created Mesmer, a deep learning-enabled segmentation algorithm trained on TissueNet that performs nuclear and whole-cell segmentation in tissue imaging data. We demonstrated that Mesmer has better speed and accuracy than previous methods, generalizes to the full diversity of tissue types and imaging platforms in TissueNet, and achieves human-level performance for whole-cell segmentation. Mesmer enabled the automated extraction of key cellular features, such as subcellular localization of protein signal, which was challenging with previous approaches. We further showed that Mesmer could be adapted to harness cell lineage information present in highly multiplexed datasets. We used this enhanced version to quantify cell morphology changes during human gestation. All underlying code and models are released with permissive licenses as a community resource.

Download Full-text

Towards Large-Scale Data Annotation of Audio from Wearables: Validating Zooniverse Annotations of Infant Vocalization Types

2021 IEEE Spoken Language Technology Workshop (SLT) ◽

10.1109/slt48900.2021.9383511 ◽

2021 ◽

Author(s):

Chiara Semenzin ◽

Lisa Hamrick ◽

Amanda Seidl ◽

Bridgette Kelleher ◽

Alejandrina Cristia

Keyword(s):

Large Scale ◽

Data Annotation ◽

Large Scale Data ◽

Infant Vocalization ◽

Scale Data

Download Full-text

Automatic large-scale data acquisition via crowdsourcing for crosswalk classification: A deep learning approach

Computers & Graphics ◽

10.1016/j.cag.2017.08.004 ◽

2017 ◽

Vol 68 ◽

pp. 32-42 ◽

Cited By ~ 18

Author(s):

Rodrigo F. Berriel ◽

Franco Schmidt Rossi ◽

Alberto F. de Souza ◽

Thiago Oliveira-Santos

Keyword(s):

Deep Learning ◽

Data Acquisition ◽

Large Scale ◽

Learning Approach ◽

Large Scale Data ◽

Scale Data

Download Full-text

Deep Learning Method for RNA Secondary Structure Prediction with Pseudoknots Based on Large-Scale Data

Journal of Healthcare Engineering ◽

10.1155/2021/6699996 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Bowen Shen ◽

Hao Zhang ◽

Cong Li ◽

Tianheng Zhao ◽

Yuanning Liu

Keyword(s):

Deep Learning ◽

Secondary Structure ◽

Structure Prediction ◽

Rna Secondary Structure ◽

Large Scale ◽

Secondary Structure Prediction ◽

Learning Methods ◽

Rna Secondary Structure Prediction ◽

Large Scale Data ◽

Scale Data

Traditional machine learning methods are widely used in the field of RNA secondary structure prediction and have achieved good results. However, with the emergence of large-scale data, deep learning methods have more advantages than traditional machine learning methods. As the number of network layers increases in deep learning, there will often be problems such as increased parameters and overfitting. We used two deep learning models, GoogLeNet and TCN, to predict RNA secondary results. And from the perspective of the depth and width of the network, improvements are made based on the neural network model, which can effectively improve the computational efficiency while extracting more feature information. We process the existing real RNA data through experiments, use deep learning models to extract useful features from a large amount of RNA sequence data and structure data, and then predict the extracted features to obtain each base’s pairing probability. The characteristics of RNA secondary structure and dynamic programming methods are used to process the base prediction results, and the structure with the largest sum of the probability of each base pairing is obtained, and this structure will be used as the optimal RNA secondary structure. We, respectively, evaluated GoogLeNet and TCN models based on 5sRNA, tRNA data, and tmRNA data, and compared them with other standard prediction algorithms. The sensitivity and specificity of the GoogLeNet model on the 5sRNA and tRNA data sets are about 16% higher than the best prediction results in other algorithms. The sensitivity and specificity of the GoogLeNet model on the tmRNA dataset are about 9% higher than the best prediction results in other algorithms. As deep learning algorithms’ performance is related to the size of the data set, as the scale of RNA data continues to expand, the prediction accuracy of deep learning methods for RNA secondary structure will continue to improve.

Download Full-text

Large-scale Data Classification based on K-means Clustering and Deep Learning

The Journal of King Mongkut s University of Technology North Bangkok ◽

10.14416/j.kmutnb.2021.03.012 ◽

2021 ◽

Vol 32 (4) ◽

Author(s):

Nuntuschaporn Senawong ◽

Supawadee Wichitchan ◽

Orawich Kumphon

Keyword(s):

Deep Learning ◽

Large Scale ◽

Data Classification ◽

Large Scale Data ◽

Scale Data

Download Full-text

Machine Learning and Deep Learning frameworks and libraries for large-scale data mining: a survey

Artificial Intelligence Review ◽

10.1007/s10462-018-09679-z ◽

2019 ◽

Vol 52 (1) ◽

pp. 77-124 ◽

Cited By ~ 70

Author(s):

Giang Nguyen ◽

Stefan Dlugolinsky ◽

Martin Bobák ◽

Viet Tran ◽

Álvaro López García ◽

...

Keyword(s):

Machine Learning ◽

Data Mining ◽

Deep Learning ◽

Large Scale ◽

Large Scale Data ◽

Learning Frameworks ◽

Scale Data

Download Full-text

Prediction of residential gross yields by using a deep learning method on large scale data processing framework

Pressacademia ◽

10.17261/pressacademia.2018.801 ◽

2018 ◽

Vol 7 (1) ◽

pp. 125-130

Author(s):

Semra Erpolat Tasabat ◽

Olgun Aydin ◽

Ali Hepsen

Keyword(s):

Deep Learning ◽

Data Processing ◽

Large Scale ◽

Learning Method ◽

Large Scale Data ◽

Large Scale Data Processing ◽

Scale Data ◽

Processing Framework

Download Full-text

TVD-MRDL: traffic violation detection system using MapReduce-based deep learning for large-scale data

Multimedia Tools and Applications ◽

10.1007/s11042-020-09714-8 ◽

2020 ◽

Author(s):

Shiva Asadianfam ◽

Mahboubeh Shamsi ◽

Abdolreza Rasouli Kenari

Keyword(s):

Deep Learning ◽

Large Scale ◽

Detection System ◽

Large Scale Data ◽

Violation Detection ◽

Traffic Violation ◽

Scale Data

Download Full-text

A deep learning based approach for trajectory estimation using geographically clustered data

SN Applied Sciences ◽

10.1007/s42452-021-04556-x ◽

2021 ◽

Vol 3 (6) ◽

Author(s):

Aditya Shrivastava ◽

Jai Prakash V Verma ◽

Swati Jain ◽

Sanjay Garg

Keyword(s):

Deep Learning ◽

Prediction Accuracy ◽

Large Scale ◽

Average Distance ◽

Trajectory Data ◽

Distance Error ◽

Large Scale Data ◽

Route Prediction ◽

One Step ◽

Scale Data

AbstractThis study presents a novel approach to predict a complete source to destination trajectory of a vehicle using a partial trajectory query. The proposed architecture is scalable to extremely large-scale data with respect to the dense road network. A deep learning model Long Short Term Memory (LSTM) has been used for analyzing the temporal data and predicting the complete trajectory. To handle a large amount of data, clustering of similar trajectory data is used that helps in reducing the search space. The clusters based on geographical locations and temporal values are used for training different LSTM models. The proposed approach is compared with the other published work on the parameters as Average distance error and one step prediction accuracy The one-step prediction accuracy is as good as 81% and Distance error are .33 Km. Our proposed approach termed Clustered LSTM is outperforming in both the parameters when compared with other reported results. The proposed solution is a clustering-based predictive model that effectively contributes to accurately handle the large scale data. The outcome of this study leads to improvise the navigation systems, route prediction, traffic management, and location-based recommendation systems.

Download Full-text