Generation of a Large-Scale Line Image Dataset with Ground Truth Texts from Page-Level Autograph Documents

2021 ◽  
pp. 354-366
Author(s):  
Ayumu Nagai
2020 ◽  
Vol 7 (1) ◽  
Author(s):  
Sumona Biswas ◽  
Shovan Barma

Abstract We present a new large-scale three-fold annotated microscopy image dataset, aiming to advance the plant cell biology research by exploring different cell microstructures including cell size and shape, cell wall thickness, intercellular space, etc. in deep learning (DL) framework. This dataset includes 9,811 unstained and 6,127 stained (safranin-o, toluidine blue-o, and lugol’s-iodine) images with three-fold annotation including physical, morphological, and tissue grading based on weight, different section area, and tissue zone respectively. In addition, we prepared ground truth segmentation labels for three different tuber weights. We have validated the pertinence of annotations by performing multi-label cell classification, employing convolutional neural network (CNN), VGG16, for unstained and stained images. The accuracy has been achieved up to 0.94, while, F2-score reaches to 0.92. Furthermore, the ground truth labels have been verified by semantic segmentation algorithm using UNet architecture which presents the mean intersection of union up to 0.70. Hence, the overall results show that the data are very much efficient and could enrich the domain of microscopy plant cell analysis for DL-framework.


2020 ◽  
Vol 64 (5) ◽  
pp. 50411-1-50411-8
Author(s):  
Hoda Aghaei ◽  
Brian Funt

Abstract For research in the field of illumination estimation and color constancy, there is a need for ground-truth measurement of the illumination color at many locations within multi-illuminant scenes. A practical approach to obtaining such ground-truth illumination data is presented here. The proposed method involves using a drone to carry a gray ball of known percent surface spectral reflectance throughout a scene while photographing it frequently during the flight using a calibrated camera. The captured images are then post-processed. In the post-processing step, machine vision techniques are used to detect the gray ball within each frame. The camera RGB of light reflected from the gray ball provides a measure of the illumination color at that location. In total, the dataset contains 30 scenes with 100 illumination measurements on average per scene. The dataset is available for download free of charge.


Author(s):  
A. V. Ponomarev

Introduction: Large-scale human-computer systems involving people of various skills and motivation into the information processing process are currently used in a wide spectrum of applications. An acute problem in such systems is assessing the expected quality of each contributor; for example, in order to penalize incompetent or inaccurate ones and to promote diligent ones.Purpose: To develop a method of assessing the expected contributor’s quality in community tagging systems. This method should only use generally unreliable and incomplete information provided by contributors (with ground truth tags unknown).Results:A mathematical model is proposed for community image tagging (including the model of a contributor), along with a method of assessing the expected contributor’s quality. The method is based on comparing tag sets provided by different contributors for the same images, being a modification of pairwise comparison method with preference relation replaced by a special domination characteristic. Expected contributors’ quality is evaluated as a positive eigenvector of a pairwise domination characteristic matrix. Community tagging simulation has confirmed that the proposed method allows you to adequately estimate the expected quality of community tagging system contributors (provided that the contributors' behavior fits the proposed model).Practical relevance: The obtained results can be used in the development of systems based on coordinated efforts of community (primarily, community tagging systems). 


Author(s):  
Sharon E. Nicholson ◽  
Douglas Klotter ◽  
Adam T. Hartman

AbstractThis article examined rainfall enhancement over Lake Victoria. Estimates of over-lake rainfall were compared with rainfall in the surrounding lake catchment. Four satellite products were initially tested against estimates based on gauges or water balance models. These included TRMM 3B43, IMERG V06 Final Run (IMERG-F), CHIRPS2, and PERSIANN-CDR. There was agreement among the satellite products for catchment rainfall but a large disparity among them for over-lake rainfall. IMERG-F was clearly an outlier, exceeding the estimate from TRMM 3B43 by 36%. The overestimation by IMERG-F was likely related to passive microwave assessments of strong convection, such as prevails over Lake Victoria. Overall, TRMM 3B43 showed the best agreement with the "ground truth" and was used in further analyses. Over-lake rainfall was found to be enhanced compared to catchment rainfall in all months. During the March-to-May long rains the enhancement varied between 40% and 50%. During the October-to-December short rains the enhancement varied between 33% and 44%. Even during the two dry seasons the enhancement was at least 20% and over 50% in some months. While the magnitude of enhancement varied from month to month, the seasonal cycle was essentially the same for over-lake and catchment rainfall, suggesting that the dominant influence on over-lake rainfall is the large-scale environment. The association with Mesoscale Convective Systems (MCSs) was also evaluated. The similarity of the spatial patterns of rainfall and MCS count each month suggested that these produced a major share of rainfall over the lake. Similarity in interannual variability further supported this conclusion.


Author(s):  
Maggie Hess

Purpose: Intraventricular hemorrhage (IVH) affects nearly 15% of preterm infants. It can lead to ventricular dilation and cognitive impairment. To ablate IVH clots, MR-guided focused ultrasound surgery (MRgFUS) is investigated. This procedure requires accurate, fast and consistent quantification of ventricle and clot volumes. Methods: We developed a semi-autonomous segmentation (SAS) algorithm for measuring changes in the ventricle and clot volumes. Images are normalized, and then ventricle and clot masks are registered to the images. Voxels of the registered masks and voxels obtained by thresholding the normalized images are used as seed points for competitive region growing, which provides the final segmentation. The user selects the areas of interest for correspondence after thresholding and these selections are the final seeds for region growing. SAS was evaluated on an IVH porcine model.  Results: SAS was compared to ground truth manual segmentation (MS) for accuracy, efficiency, and consistency. Accuracy was determined by comparing clot and ventricle volumes produced by SAS and MS. In Two-One-Sided Test, SAS and MS were found to be significantly equivalent (p < 0.01). SAS on average was found to be 15 times faster than MS (p < 0.01). Consistency was determined by repeated segmentation of the same image by both SAS and manual methods, SAS being significantly more consistent than MS (p < 0.05).  Conclusion: SAS is a viable method to quantify the IVH clot and the lateral brain ventricles and it is serving in a large- scale porcine study of MRgFUS treatment of IVH clot lysis.


2020 ◽  
Vol 1 (2) ◽  
pp. 101-123
Author(s):  
Hiroaki Shiokawa ◽  
Yasunori Futamura

This paper addressed the problem of finding clusters included in graph-structured data such as Web graphs, social networks, and others. Graph clustering is one of the fundamental techniques for understanding structures present in the complex graphs such as Web pages, social networks, and others. In the Web and data mining communities, the modularity-based graph clustering algorithm is successfully used in many applications. However, it is difficult for the modularity-based methods to find fine-grained clusters hidden in large-scale graphs; the methods fail to reproduce the ground truth. In this paper, we present a novel modularity-based algorithm, \textit{CAV}, that shows better clustering results than the traditional algorithm. The proposed algorithm employs a cohesiveness-aware vector partitioning into the graph spectral analysis to improve the clustering accuracy. Additionally, this paper also presents a novel efficient algorithm \textit{P-CAV} for further improving the clustering speed of CAV; P-CAV is an extension of CAV that utilizes the thread-based parallelization on a many-core CPU. Our extensive experiments on synthetic and public datasets demonstrate the performance superiority of our approaches over the state-of-the-art approaches.


2020 ◽  
Vol 36 (10) ◽  
pp. 3011-3017 ◽  
Author(s):  
Olga Mineeva ◽  
Mateo Rojas-Carulla ◽  
Ruth E Ley ◽  
Bernhard Schölkopf ◽  
Nicholas D Youngblut

Abstract Motivation Methodological advances in metagenome assembly are rapidly increasing in the number of published metagenome assemblies. However, identifying misassemblies is challenging due to a lack of closely related reference genomes that can act as pseudo ground truth. Existing reference-free methods are no longer maintained, can make strong assumptions that may not hold across a diversity of research projects, and have not been validated on large-scale metagenome assemblies. Results We present DeepMAsED, a deep learning approach for identifying misassembled contigs without the need for reference genomes. Moreover, we provide an in silico pipeline for generating large-scale, realistic metagenome assemblies for comprehensive model training and testing. DeepMAsED accuracy substantially exceeds the state-of-the-art when applied to large and complex metagenome assemblies. Our model estimates a 1% contig misassembly rate in two recent large-scale metagenome assembly publications. Conclusions DeepMAsED accurately identifies misassemblies in metagenome-assembled contigs from a broad diversity of bacteria and archaea without the need for reference genomes or strong modeling assumptions. Running DeepMAsED is straight-forward, as well as is model re-training with our dataset generation pipeline. Therefore, DeepMAsED is a flexible misassembly classifier that can be applied to a wide range of metagenome assembly projects. Availability and implementation DeepMAsED is available from GitHub at https://github.com/leylabmpi/DeepMAsED. Supplementary information Supplementary data are available at Bioinformatics online.


1994 ◽  
Vol 50 (3) ◽  
pp. 271-274 ◽  
Author(s):  
Ru-xin Li ◽  
Pin-zhong Fan ◽  
Zhizhan Xu ◽  
Pei-xiang Lu ◽  
Zhengquan Zhang
Keyword(s):  
X Ray ◽  

Sign in / Sign up

Export Citation Format

Share Document