A Large-Scale Fully Annotated Low-Cost Cost Microscopy Image Dataset for Deep Learning Framework

Author(s):  
Sumona Biswas ◽  
Shovan Barma
2020 ◽  
Vol 7 (1) ◽  
Author(s):  
Sumona Biswas ◽  
Shovan Barma

Abstract We present a new large-scale three-fold annotated microscopy image dataset, aiming to advance the plant cell biology research by exploring different cell microstructures including cell size and shape, cell wall thickness, intercellular space, etc. in deep learning (DL) framework. This dataset includes 9,811 unstained and 6,127 stained (safranin-o, toluidine blue-o, and lugol’s-iodine) images with three-fold annotation including physical, morphological, and tissue grading based on weight, different section area, and tissue zone respectively. In addition, we prepared ground truth segmentation labels for three different tuber weights. We have validated the pertinence of annotations by performing multi-label cell classification, employing convolutional neural network (CNN), VGG16, for unstained and stained images. The accuracy has been achieved up to 0.94, while, F2-score reaches to 0.92. Furthermore, the ground truth labels have been verified by semantic segmentation algorithm using UNet architecture which presents the mean intersection of union up to 0.70. Hence, the overall results show that the data are very much efficient and could enrich the domain of microscopy plant cell analysis for DL-framework.


2021 ◽  
Vol 2021 ◽  
pp. 1-7
Author(s):  
Juncai Li ◽  
Xiaofei Jiang

Molecular property prediction is an essential task in drug discovery. Most computational approaches with deep learning techniques either focus on designing novel molecular representation or combining with some advanced models together. However, researchers pay fewer attention to the potential benefits in massive unlabeled molecular data (e.g., ZINC). This task becomes increasingly challenging owing to the limitation of the scale of labeled data. Motivated by the recent advancements of pretrained models in natural language processing, the drug molecule can be naturally viewed as language to some extent. In this paper, we investigate how to develop the pretrained model BERT to extract useful molecular substructure information for molecular property prediction. We present a novel end-to-end deep learning framework, named Mol-BERT, that combines an effective molecular representation with pretrained BERT model tailored for molecular property prediction. Specifically, a large-scale prediction BERT model is pretrained to generate the embedding of molecular substructures, by using four million unlabeled drug SMILES (i.e., ZINC 15 and ChEMBL 27). Then, the pretrained BERT model can be fine-tuned on various molecular property prediction tasks. To examine the performance of our proposed Mol-BERT, we conduct several experiments on 4 widely used molecular datasets. In comparison to the traditional and state-of-the-art baselines, the results illustrate that our proposed Mol-BERT can outperform the current sequence-based methods and achieve at least 2% improvement on ROC-AUC score on Tox21, SIDER, and ClinTox dataset.


2020 ◽  
Vol 10 (8) ◽  
pp. 2878 ◽  
Author(s):  
Jihyun Seo ◽  
Hanse Ahn ◽  
Daewon Kim ◽  
Sungju Lee ◽  
Yongwha Chung ◽  
...  

Automated pig monitoring is an important issue in the surveillance environment of a pig farm. For a large-scale pig farm in particular, practical issues such as monitoring cost should be considered but such consideration based on low-cost embedded boards has not yet been reported. Since low-cost embedded boards have more limited computing power than typical PCs and have tradeoffs between execution speed and accuracy, achieving fast and accurate detection of individual pigs for “on-device” pig monitoring applications is very challenging. Therefore, in this paper, we propose a method for the fast detection of individual pigs by reducing the computational workload of 3 × 3 convolution in widely-used, deep learning-based object detectors. Then, in order to recover the accuracy of the “light-weight” deep learning-based object detector, we generate a three-channel composite image as its input image, through “simple” image preprocessing techniques. Our experimental results on an NVIDIA Jetson Nano embedded board show that the proposed method can improve the integrated performance of both execution speed and accuracy of widely-used, deep learning-based object detectors, by a factor of up to 8.7.


Processes ◽  
2020 ◽  
Vol 8 (6) ◽  
pp. 649
Author(s):  
Yifeng Liu ◽  
Wei Zhang ◽  
Wenhao Du

Deep learning based on a large number of high-quality data plays an important role in many industries. However, deep learning is hard to directly embed in the real-time system, because the data accumulation of the system depends on real-time acquisitions. However, the analysis tasks of such systems need to be carried out in real time, which makes it impossible to complete the analysis tasks by accumulating data for a long time. In order to solve the problems of high-quality data accumulation, high timeliness of the data analysis, and difficulty in embedding deep-learning algorithms directly in real-time systems, this paper proposes a new progressive deep-learning framework and conducts experiments on image recognition. The experimental results show that the proposed framework is effective and performs well and can reach a conclusion similar to the deep-learning framework based on large-scale data.


2019 ◽  
Author(s):  
Yair Fogel-Dror ◽  
Shaul R. Shenhav ◽  
Tamir Sheafer

The collaborative effort of theory-driven content analysis can benefit significantly from the use of topic analysis methods, which allow researchers to add more categories while developing or testing a theory. This additive approach enables the reuse of previous efforts of analysis or even the merging of separate research projects, thereby making these methods more accessible and increasing the discipline’s ability to create and share content analysis capabilities. This paper proposes a weakly supervised topic analysis method that uses both a low-cost unsupervised method to compile a training set and supervised deep learning as an additive and accurate text classification method. We test the validity of the method, specifically its additivity, by comparing the results of the method after adding 200 categories to an initial number of 450. We show that the suggested method provides a foundation for a low-cost solution for large-scale topic analysis.


2021 ◽  
Vol 3 (4) ◽  
Author(s):  
Runyu Jing ◽  
Tingke Wen ◽  
Chengxiang Liao ◽  
Li Xue ◽  
Fengjuan Liu ◽  
...  

Abstract Type III secretion systems (T3SSs) are bacterial membrane-embedded nanomachines that allow a number of humans, plant and animal pathogens to inject virulence factors directly into the cytoplasm of eukaryotic cells. Export of effectors through T3SSs is critical for motility and virulence of most Gram-negative pathogens. Current computational methods can predict type III secreted effectors (T3SEs) from amino acid sequences, but due to algorithmic constraints, reliable and large-scale prediction of T3SEs in Gram-negative bacteria remains a challenge. Here, we present DeepT3 2.0 (http://advintbioinforlab.com/deept3/), a novel web server that integrates different deep learning models for genome-wide predicting T3SEs from a bacterium of interest. DeepT3 2.0 combines various deep learning architectures including convolutional, recurrent, convolutional-recurrent and multilayer neural networks to learn N-terminal representations of proteins specifically for T3SE prediction. Outcomes from the different models are processed and integrated for discriminating T3SEs and non-T3SEs. Because it leverages diverse models and an integrative deep learning framework, DeepT3 2.0 outperforms existing methods in validation datasets. In addition, the features learned from networks are analyzed and visualized to explain how models make their predictions. We propose DeepT3 2.0 as an integrated and accurate tool for the discovery of T3SEs.


Author(s):  
Shihua Li ◽  
Kai Yu ◽  
Guandi Wu ◽  
Qingfeng Zhang ◽  
Panqin Wang ◽  
...  

Thiol groups on cysteines can undergo multiple post-translational modifications (PTMs), acting as a molecular switch to maintain redox homeostasis and regulating a series of cell signaling transductions. Identification of sophistical protein cysteine modifications is crucial for dissecting its underlying regulatory mechanism. Instead of a time-consuming and labor-intensive experimental method, various computational methods have attracted intense research interest due to their convenience and low cost. Here, we developed the first comprehensive deep learning based tool pCysMod for multiple protein cysteine modification prediction, including S-nitrosylation, S-palmitoylation, S-sulfenylation, S-sulfhydration, and S-sulfinylation. Experimentally verified cysteine sites curated from literature and sites collected by other databases and predicting tools were integrated as benchmark dataset. Several protein sequence features were extracted and united into a deep learning model, and the hyperparameters were optimized by particle swarm optimization algorithms. Cross-validations indicated our model showed excellent robustness and outperformed existing tools, which was able to achieve an average AUC of 0.793, 0.807, 0.796, 0.793, and 0.876 for S-nitrosylation, S-palmitoylation, S-sulfenylation, S-sulfhydration, and S-sulfinylation, demonstrating pCysMod was stable and suitable for protein cysteine modification prediction. Besides, we constructed a comprehensive protein cysteine modification prediction web server based on this model to benefit the researches finding the potential modification sites of their interested proteins, which could be accessed at http://pcysmod.omicsbio.info. This work will undoubtedly greatly promote the study of protein cysteine modification and contribute to clarifying the biological regulation mechanisms of cysteine modification within and among the cells.


2021 ◽  
Vol 3 (1) ◽  
pp. 29-59
Author(s):  
Yair Fogel-Dror ◽  
Shaul R. Shenhav ◽  
Tamir Sheafer

Abstract The collaborative effort of theory-driven content analysis can benefit significantly from the use of topic analysis methods, which allow researchers to add more categories while developing or testing a theory. This additive approach enables the reuse of previous efforts of analysis or even the merging of separate research projects, thereby making these methods more accessible and increasing the discipline’s ability to create and share content analysis capabilities. This paper proposes a weakly supervised topic analysis method that uses both a low-cost unsupervised method to compile a training set and supervised deep learning as an additive and accurate text classification method. We test the validity of the method, specifically its additivity, by comparing the results of the method after adding 200 categories to an initial number of 450. We show that the suggested method provides a foundation for a low-cost solution for large-scale topic analysis.


2020 ◽  
Vol 10 (14) ◽  
pp. 4913
Author(s):  
Tin Kramberger ◽  
Božidar Potočnik

Currently there is no publicly available adequate dataset that could be used for training Generative Adversarial Networks (GANs) on car images. All available car datasets differ in noise, pose, and zoom levels. Thus, the objective of this work was to create an improved car image dataset that would be better suited for GAN training. To improve the performance of the GAN, we coupled the LSUN and Stanford car datasets. A new merged dataset was then pruned in order to adjust zoom levels and reduce the noise of images. This process resulted in fewer images that could be used for training, with increased quality though. This pruned dataset was evaluated by training the StyleGAN with original settings. Pruning the combined LSUN and Stanford datasets resulted in 2,067,710 images of cars with less noise and more adjusted zoom levels. The training of the StyleGAN on the LSUN-Stanford car dataset proved to be superior to the training with just the LSUN dataset by 3.7% using the Fréchet Inception Distance (FID) as a metric. Results pointed out that the proposed LSUN-Stanford car dataset is more consistent and better suited for training GAN neural networks than other currently available large car datasets.


Sign in / Sign up

Export Citation Format

Share Document