Whole-cell segmentation of tissue images with human-level performance using large-scale data annotation and deep learning

Mapping Intimacies ◽

10.1101/2021.03.01.431313 ◽

2021 ◽

Author(s):

Noah F. Greenwald ◽

Geneva Miller ◽

Erick Moen ◽

Alex Kong ◽

Adam Kagel ◽

...

Keyword(s):

Deep Learning ◽

Large Scale ◽

Cell Segmentation ◽

Tissue Imaging ◽

Imaging Data ◽

Whole Cell ◽

Data Annotation ◽

Large Scale Data ◽

Level Performance ◽

Scale Data

AbstractUnderstanding the spatial organization of tissues is of critical importance for both basic and translational research. While recent advances in tissue imaging are opening an exciting new window into the biology of human tissues, interpreting the data that they create is a significant computational challenge. Cell segmentation, the task of uniquely identifying each cell in an image, remains a substantial barrier for tissue imaging, as existing approaches are inaccurate or require a substantial amount of manual curation to yield useful results. Here, we addressed the problem of cell segmentation in tissue imaging data through large-scale data annotation and deep learning. We constructed TissueNet, an image dataset containing >1 million paired whole-cell and nuclear annotations for tissue images from nine organs and six imaging platforms. We created Mesmer, a deep learning-enabled segmentation algorithm trained on TissueNet that performs nuclear and whole-cell segmentation in tissue imaging data. We demonstrated that Mesmer has better speed and accuracy than previous methods, generalizes to the full diversity of tissue types and imaging platforms in TissueNet, and achieves human-level performance for whole-cell segmentation. Mesmer enabled the automated extraction of key cellular features, such as subcellular localization of protein signal, which was challenging with previous approaches. We further showed that Mesmer could be adapted to harness cell lineage information present in highly multiplexed datasets. We used this enhanced version to quantify cell morphology changes during human gestation. All underlying code and models are released with permissive licenses as a community resource.

Download Full-text

Whole-cell segmentation of tissue images with human-level performance using large-scale data annotation and deep learning

Nature Biotechnology ◽

10.1038/s41587-021-01094-0 ◽

2021 ◽

Author(s):

Noah F. Greenwald ◽

Geneva Miller ◽

Erick Moen ◽

Alex Kong ◽

Adam Kagel ◽

...

Keyword(s):

Deep Learning ◽

Large Scale ◽

Cell Segmentation ◽

Whole Cell ◽

Data Annotation ◽

Large Scale Data ◽

Level Performance ◽

Scale Data

Download Full-text

Towards Large-Scale Data Annotation of Audio from Wearables: Validating Zooniverse Annotations of Infant Vocalization Types

2021 IEEE Spoken Language Technology Workshop (SLT) ◽

10.1109/slt48900.2021.9383511 ◽

2021 ◽

Author(s):

Chiara Semenzin ◽

Lisa Hamrick ◽

Amanda Seidl ◽

Bridgette Kelleher ◽

Alejandrina Cristia

Keyword(s):

Large Scale ◽

Data Annotation ◽

Large Scale Data ◽

Infant Vocalization ◽

Scale Data

Download Full-text

Automatic large-scale data acquisition via crowdsourcing for crosswalk classification: A deep learning approach

Computers & Graphics ◽

10.1016/j.cag.2017.08.004 ◽

2017 ◽

Vol 68 ◽

pp. 32-42 ◽

Cited By ~ 18

Author(s):

Rodrigo F. Berriel ◽

Franco Schmidt Rossi ◽

Alberto F. de Souza ◽

Thiago Oliveira-Santos

Keyword(s):

Deep Learning ◽

Data Acquisition ◽

Large Scale ◽

Learning Approach ◽

Large Scale Data ◽

Scale Data

Download Full-text

Opportunities for Understanding MS Mechanisms and Progression With MRI Using Large-Scale Data Sharing and Artificial Intelligence

Neurology ◽

10.1212/wnl.0000000000012884 ◽

2021 ◽

pp. 10.1212/WNL.0000000000012884

Author(s):

Hugo Vrenken ◽

Mark Jenkinson ◽

Dzung Pham ◽

Charles R.G. Guttmann ◽

Deborah Pareto ◽

...

Keyword(s):

Artificial Intelligence ◽

Image Analysis ◽

Data Sharing ◽

Large Scale ◽

Personal Data ◽

Human Observer ◽

Imaging Data ◽

Large Scale Data ◽

Scale Data

Multiple sclerosis (MS) patients have heterogeneous clinical presentations, symptoms and progression over time, making MS difficult to assess and comprehend in vivo. The combination of large-scale data-sharing and artificial intelligence creates new opportunities for monitoring and understanding MS using magnetic resonance imaging (MRI).First, development of validated MS-specific image analysis methods can be boosted by verified reference, test and benchmark imaging data. Using detailed expert annotations, artificial intelligence algorithms can be trained on such MS-specific data. Second, understanding disease processes could be greatly advanced through shared data of large MS cohorts with clinical, demographic and treatment information. Relevant patterns in such data that may be imperceptible to a human observer could be detected through artificial intelligence techniques. This applies from image analysis (lesions, atrophy or functional network changes) to large multi-domain datasets (imaging, cognition, clinical disability, genetics, etc.).After reviewing data-sharing and artificial intelligence, this paper highlights three areas that offer strong opportunities for making advances in the next few years: crowdsourcing, personal data protection, and organized analysis challenges. Difficulties as well as specific recommendations to overcome them are discussed, in order to best leverage data sharing and artificial intelligence to improve image analysis, imaging and the understanding of MS.

Download Full-text

Deep Learning Method for RNA Secondary Structure Prediction with Pseudoknots Based on Large-Scale Data

Journal of Healthcare Engineering ◽

10.1155/2021/6699996 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Bowen Shen ◽

Hao Zhang ◽

Cong Li ◽

Tianheng Zhao ◽

Yuanning Liu

Keyword(s):

Deep Learning ◽

Secondary Structure ◽

Structure Prediction ◽

Rna Secondary Structure ◽

Large Scale ◽

Secondary Structure Prediction ◽

Learning Methods ◽

Rna Secondary Structure Prediction ◽

Large Scale Data ◽

Scale Data

Traditional machine learning methods are widely used in the field of RNA secondary structure prediction and have achieved good results. However, with the emergence of large-scale data, deep learning methods have more advantages than traditional machine learning methods. As the number of network layers increases in deep learning, there will often be problems such as increased parameters and overfitting. We used two deep learning models, GoogLeNet and TCN, to predict RNA secondary results. And from the perspective of the depth and width of the network, improvements are made based on the neural network model, which can effectively improve the computational efficiency while extracting more feature information. We process the existing real RNA data through experiments, use deep learning models to extract useful features from a large amount of RNA sequence data and structure data, and then predict the extracted features to obtain each base’s pairing probability. The characteristics of RNA secondary structure and dynamic programming methods are used to process the base prediction results, and the structure with the largest sum of the probability of each base pairing is obtained, and this structure will be used as the optimal RNA secondary structure. We, respectively, evaluated GoogLeNet and TCN models based on 5sRNA, tRNA data, and tmRNA data, and compared them with other standard prediction algorithms. The sensitivity and specificity of the GoogLeNet model on the 5sRNA and tRNA data sets are about 16% higher than the best prediction results in other algorithms. The sensitivity and specificity of the GoogLeNet model on the tmRNA dataset are about 9% higher than the best prediction results in other algorithms. As deep learning algorithms’ performance is related to the size of the data set, as the scale of RNA data continues to expand, the prediction accuracy of deep learning methods for RNA secondary structure will continue to improve.

Download Full-text

FlywheelTools: Data Curation and Manipulation on the Flywheel Platform

Frontiers in Neuroinformatics ◽

10.3389/fninf.2021.678403 ◽

2021 ◽

Vol 15 ◽

Author(s):

Tinashe M. Tapera ◽

Matthew Cieslak ◽

Max Bertolero ◽

Azeez Adebimpe ◽

Geoffrey K. Aguirre ◽

...

Keyword(s):

Large Scale ◽

Data Curation ◽

Imaging Data ◽

Large Scale Data ◽

Neuroscience Research ◽

Neuroimaging Data ◽

Brain Imaging Data ◽

Database Platform ◽

The Brain ◽

Scale Data

The recent and growing focus on reproducibility in neuroimaging studies has led many major academic centers to use cloud-based imaging databases for storing, analyzing, and sharing complex imaging data. Flywheel is one such database platform that offers easily accessible, large-scale data management, along with a framework for reproducible analyses through containerized pipelines. The Brain Imaging Data Structure (BIDS) is the de facto standard for neuroimaging data, but curating neuroimaging data into BIDS can be a challenging and time-consuming task. In particular, standard solutions for BIDS curation are limited on Flywheel. To address these challenges, we developed “FlywheelTools,” a software toolbox for reproducible data curation and manipulation on Flywheel. FlywheelTools includes two elements: fw-heudiconv, for heuristic-driven curation of data into BIDS, and flaudit, which audits and inventories projects on Flywheel. Together, these tools accelerate reproducible neuroscience research on the widely used Flywheel platform.

Download Full-text

Large-scale Data Classification based on K-means Clustering and Deep Learning

The Journal of King Mongkut s University of Technology North Bangkok ◽

10.14416/j.kmutnb.2021.03.012 ◽

2021 ◽

Vol 32 (4) ◽

Author(s):

Nuntuschaporn Senawong ◽

Supawadee Wichitchan ◽

Orawich Kumphon

Keyword(s):

Deep Learning ◽

Large Scale ◽

Data Classification ◽

Large Scale Data ◽

Scale Data

Download Full-text

Machine Learning and Deep Learning frameworks and libraries for large-scale data mining: a survey

Artificial Intelligence Review ◽

10.1007/s10462-018-09679-z ◽

2019 ◽

Vol 52 (1) ◽

pp. 77-124 ◽

Cited By ~ 70

Author(s):

Giang Nguyen ◽

Stefan Dlugolinsky ◽

Martin Bobák ◽

Viet Tran ◽

Álvaro López García ◽

...

Keyword(s):

Machine Learning ◽

Data Mining ◽

Deep Learning ◽

Large Scale ◽

Large Scale Data ◽

Learning Frameworks ◽

Scale Data

Download Full-text

Prediction of residential gross yields by using a deep learning method on large scale data processing framework

Pressacademia ◽

10.17261/pressacademia.2018.801 ◽

2018 ◽

Vol 7 (1) ◽

pp. 125-130

Author(s):

Semra Erpolat Tasabat ◽

Olgun Aydin ◽

Ali Hepsen

Keyword(s):

Deep Learning ◽

Data Processing ◽

Large Scale ◽

Learning Method ◽

Large Scale Data ◽

Large Scale Data Processing ◽

Scale Data ◽

Processing Framework

Download Full-text

Scan Once, Analyse Many: Using large open-access neuroimaging datasets to understand the brain

10.31234/osf.io/yrd27 ◽

2020 ◽

Author(s):

Christopher R Madan

Keyword(s):

Large Scale ◽

Use Cases ◽

Imaging Data ◽

Replication Sample ◽

Methods Development ◽

Large Scale Data ◽

Secondary Analyses ◽

Brain Imaging Data ◽

The Brain ◽

Scale Data

We are now in a time of readily available brain imaging data. Not only are researchers now sharing data more than ever before, but additionally large-scale data collecting initiatives are underway with the vision that many future researchers will use the data for secondary analyses. Here I provide an overview of available datasets and some example use cases. Example use cases include examining individual differences, more robust findings, reproducibility--both in public input data and availability as a replication sample, and methods development. I further discuss a variety of considerations associated with using existing data and the opportunities associated with large datasets. Suggestions for further readings on general neuroimaging and topic-specific discussions are also provided.

Download Full-text