scholarly journals Arrow R-CNN for handwritten diagram recognition

Author(s):  
Bernhard Schäfer ◽  
Margret Keuper ◽  
Heiner Stuckenschmidt

AbstractWe address the problem of offline handwritten diagram recognition. Recently, it has been shown that diagram symbols can be directly recognized with deep learning object detectors. However, object detectors are not able to recognize the diagram structure. We propose Arrow R-CNN, the first deep learning system for joint symbol and structure recognition in handwritten diagrams. Arrow R-CNN extends the Faster R-CNN object detector with an arrow head and tail keypoint predictor and a diagram-aware postprocessing method. We propose a network architecture and data augmentation methods targeted at small diagram datasets. Our diagram-aware postprocessing method addresses the insufficiencies of standard Faster R-CNN postprocessing. It reconstructs a diagram from a set of symbol detections and arrow keypoints. Arrow R-CNN improves state-of-the-art substantially: on a scanned flowchart dataset, we increase the rate of recognized diagrams from 37.7 to 78.6%.

2020 ◽  
Author(s):  
Dean Sumner ◽  
Jiazhen He ◽  
Amol Thakkar ◽  
Ola Engkvist ◽  
Esben Jannik Bjerrum

<p>SMILES randomization, a form of data augmentation, has previously been shown to increase the performance of deep learning models compared to non-augmented baselines. Here, we propose a novel data augmentation method we call “Levenshtein augmentation” which considers local SMILES sub-sequence similarity between reactants and their respective products when creating training pairs. The performance of Levenshtein augmentation was tested using two state of the art models - transformer and sequence-to-sequence based recurrent neural networks with attention. Levenshtein augmentation demonstrated an increase performance over non-augmented, and conventionally SMILES randomization augmented data when used for training of baseline models. Furthermore, Levenshtein augmentation seemingly results in what we define as <i>attentional gain </i>– an enhancement in the pattern recognition capabilities of the underlying network to molecular motifs.</p>


2021 ◽  
Vol 11 (15) ◽  
pp. 7148
Author(s):  
Bedada Endale ◽  
Abera Tullu ◽  
Hayoung Shi ◽  
Beom-Soo Kang

Unmanned aerial vehicles (UAVs) are being widely utilized for various missions: in both civilian and military sectors. Many of these missions demand UAVs to acquire artificial intelligence about the environments they are navigating in. This perception can be realized by training a computing machine to classify objects in the environment. One of the well known machine training approaches is supervised deep learning, which enables a machine to classify objects. However, supervised deep learning comes with huge sacrifice in terms of time and computational resources. Collecting big input data, pre-training processes, such as labeling training data, and the need for a high performance computer for training are some of the challenges that supervised deep learning poses. To address these setbacks, this study proposes mission specific input data augmentation techniques and the design of light-weight deep neural network architecture that is capable of real-time object classification. Semi-direct visual odometry (SVO) data of augmented images are used to train the network for object classification. Ten classes of 10,000 different images in each class were used as input data where 80% were for training the network and the remaining 20% were used for network validation. For the optimization of the designed deep neural network, a sequential gradient descent algorithm was implemented. This algorithm has the advantage of handling redundancy in the data more efficiently than other algorithms.


2021 ◽  
Vol 12 ◽  
Author(s):  
Prabhakar Maheswari ◽  
Purushothaman Raja ◽  
Orly Enrique Apolo-Apolo ◽  
Manuel Pérez-Ruiz

Smart farming employs intelligent systems for every domain of agriculture to obtain sustainable economic growth with the available resources using advanced technologies. Deep Learning (DL) is a sophisticated artificial neural network architecture that provides state-of-the-art results in smart farming applications. One of the main tasks in this domain is yield estimation. Manual yield estimation undergoes many hurdles such as labor-intensive, time-consuming, imprecise results, etc. These issues motivate the development of an intelligent fruit yield estimation system that offers more benefits to the farmers in deciding harvesting, marketing, etc. Semantic segmentation combined with DL adds promising results in fruit detection and localization by performing pixel-based prediction. This paper reviews the different literature employing various techniques for fruit yield estimation using DL-based semantic segmentation architectures. It also discusses the challenging issues that occur during intelligent fruit yield estimation such as sampling, collection, annotation and data augmentation, fruit detection, and counting. Results show that the fruit yield estimation employing DL-based semantic segmentation techniques yields better performance than earlier techniques because of human cognition incorporated into the architecture. Future directions like customization of DL architecture for smart-phone applications to predict the yield, development of more comprehensive model encompassing challenging situations like occlusion, overlapping and illumination variation, etc., were also discussed.


2021 ◽  
Vol 21 (S2) ◽  
Author(s):  
Xiaoming Yu ◽  
Yedan Shen ◽  
Yuan Ni ◽  
Xiaowei Huang ◽  
Xiaolong Wang ◽  
...  

Abstract Background Text Matching (TM) is a fundamental task of natural language processing widely used in many application systems such as information retrieval, automatic question answering, machine translation, dialogue system, reading comprehension, etc. In recent years, a large number of deep learning neural networks have been applied to TM, and have refreshed benchmarks of TM repeatedly. Among the deep learning neural networks, convolutional neural network (CNN) is one of the most popular networks, which suffers from difficulties in dealing with small samples and keeping relative structures of features. In this paper, we propose a novel deep learning architecture based on capsule network for TM, called CapsTM, where capsule network is a new type of neural network architecture proposed to address some of the short comings of CNN and shows great potential in many tasks. Methods CapsTM is a five-layer neural network, including an input layer, a representation layer, an aggregation layer, a capsule layer and a prediction layer. In CapsTM, two pieces of text are first individually converted into sequences of embeddings and are further transformed by a highway network in the input layer. Then, Bidirectional Long Short-Term Memory (BiLSTM) is used to represent each piece of text and attention-based interaction matrix is used to represent interactive information of the two pieces of text in the representation layer. Subsequently, the two kinds of representations are fused together by BiLSTM in the aggregation layer, and are further represented with capsules (vectors) in the capsule layer. Finally, the prediction layer is a connected network used for classification. CapsTM is an extension of ESIM by adding a capsule layer before the prediction layer. Results We construct a corpus of Chinese medical question matching, which contains 36,360 question pairs. This corpus is randomly split into three parts: a training set of 32,360 question pairs, a development set of 2000 question pairs and a test set of 2000 question pairs. On this corpus, we conduct a series of experiments to evaluate the proposed CapsTM and compare it with other state-of-the-art methods. CapsTM achieves the highest F-score of 0.8666. Conclusion The experimental results demonstrate that CapsTM is effective for Chinese medical question matching and outperforms other state-of-the-art methods for comparison.


2020 ◽  
Vol 34 (08) ◽  
pp. 13294-13299
Author(s):  
Hangzhi Guo ◽  
Alexander Woodruff ◽  
Amulya Yadav

Farmer suicides have become an urgent social problem which governments around the world are trying hard to solve. Most farmers are driven to suicide due to an inability to sell their produce at desired profit levels, which is caused by the widespread uncertainty/fluctuation in produce prices resulting from varying market conditions. To prevent farmer suicides, this paper takes the first step towards resolving the issue of produce price uncertainty by presenting PECAD, a deep learning algorithm for accurate prediction of future produce prices based on past pricing and volume patterns. While previous work presents machine learning algorithms for prediction of produce prices, they suffer from two limitations: (i) they do not explicitly consider the spatio-temporal dependence of future prices on past data; and as a result, (ii) they rely on classical ML prediction models which often perform poorly when applied to spatio-temporal datasets. PECAD addresses these limitations via three major contributions: (i) we gather real-world daily price and (produced) volume data of different crops over a period of 11 years from an official Indian government administered website; (ii) we pre-process this raw dataset via state-of-the-art imputation techniques to account for missing data entries; and (iii) PECAD proposes a novel wide and deep neural network architecture which consists of two separate convolutional neural network models (trained for pricing and volume data respectively). Our simulation results show that PECAD outperforms existing state-of-the-art baseline methods by achieving significantly lesser root mean squared error (RMSE) - PECAD achieves ∼25% lesser coefficient of variance than state-of-the-art baselines. Our work is done in collaboration with a non-profit agency that works on preventing farmer suicides in the Indian state of Jharkhand, and PECAD is currently being reviewed by them for potential deployment.


2020 ◽  
Vol 2020 ◽  
pp. 1-11 ◽  
Author(s):  
Qinghe Zheng ◽  
Mingqiang Yang ◽  
Xinyu Tian ◽  
Nan Jiang ◽  
Deqiang Wang

Nowadays, deep learning has achieved remarkable results in many computer vision related tasks, among which the support of big data is essential. In this paper, we propose a full stage data augmentation framework to improve the accuracy of deep convolutional neural networks, which can also play the role of implicit model ensemble without introducing additional model training costs. Simultaneous data augmentation during training and testing stages can ensure network optimization and enhance its generalization ability. Augmentation in two stages needs to be consistent to ensure the accurate transfer of specific domain information. Furthermore, this framework is universal for any network architecture and data augmentation strategy and therefore can be applied to a variety of deep learning based tasks. Finally, experimental results about image classification on the coarse-grained dataset CIFAR-10 (93.41%) and fine-grained dataset CIFAR-100 (70.22%) demonstrate the effectiveness of the framework by comparing with state-of-the-art results.


2020 ◽  
Vol 2020 ◽  
pp. 1-6
Author(s):  
Zhehao He ◽  
Wang Lv ◽  
Jian Hu

Background. The differential diagnosis of subcentimetre lung nodules with a diameter of less than 1 cm has always been one of the problems of imaging doctors and thoracic surgeons. We plan to create a deep learning model for the diagnosis of pulmonary nodules in a simple method. Methods. Image data and pathological diagnosis of patients come from the First Affiliated Hospital of Zhejiang University School of Medicine from October 1, 2016, to October 1, 2019. After data preprocessing and data augmentation, the training set is used to train the model. The test set is used to evaluate the trained model. At the same time, the clinician will also diagnose the test set. Results. A total of 2,295 images of 496 lung nodules and their corresponding pathological diagnosis were selected as a training set and test set. After data augmentation, the number of training set images reached 12,510 images, including 6,648 malignant nodular images and 5,862 benign nodular images. The area under the P-R curve of the trained model is 0.836 in the classification of malignant and benign nodules. The area under the ROC curve of the trained model is 0.896 (95% CI: 78.96%~100.18%), which is higher than that of three doctors. However, the P value is not less than 0.05. Conclusion. With the help of an automatic machine learning system, clinicians can create a deep learning pulmonary nodule pathology classification model without the help of deep learning experts. The diagnostic efficiency of this model is not inferior to that of the clinician.


2020 ◽  
Author(s):  
Pedro Silva ◽  
Eduardo Luz ◽  
Guilherme Silva ◽  
Gladston Moreira ◽  
Rodrigo Silva ◽  
...  

Abstract Early detection and diagnosis are critical factors to control the COVID-19 spreading. A number of deep learning-based methodologies have been recently proposed for COVID-19 screening in CT scans as a tool to automate and help with the diagnosis. To achieve these goals, in this work, we propose a slice voting-based approach extending the EfficientNet Family of deep artificial neural networks.We also design a specific data augmentation process and transfer learning for such task.Moreover, a cross-dataset study is performed into the two largest datasets to date. The proposed method presents comparable results to the state-of-the-art methods and the highest accuracy to date on both datasets (accuracy of 87.60\% for the COVID-CT dataset and accuracy of 98.99% for the SARS-CoV-2 CT-scan dataset). The cross-dataset analysis showed that the generalization power of deep learning models is far from acceptable for the task since accuracy drops from 87.68% to 56.16% on the best evaluation scenario.These results highlighted that the methods that aim at COVID-19 detection in CT-images have to improve significantly to be considered as a clinical option and larger and more diverse datasets are needed to evaluate the methods in a realistic scenario.


2016 ◽  
Vol 10 (03) ◽  
pp. 379-397 ◽  
Author(s):  
Hilal Ergun ◽  
Yusuf Caglar Akyuz ◽  
Mustafa Sert ◽  
Jianquan Liu

Visual concept recognition is an active research field in the last decade. Related to this attention, deep learning architectures are showing great promise in various computer vision domains including image classification, object detection, event detection and action recognition in videos. In this study, we investigate various aspects of convolutional neural networks for visual concept recognition. We analyze recent studies and different network architectures both in terms of running time and accuracy. In our proposed visual concept recognition system, we first discuss various important properties of popular convolutional network architecture under consideration. Then we describe our method for feature extraction at different levels of abstraction. We present extensive empirical information along with best practices for big data practitioners. Using these best practices we propose efficient fusion mechanisms both for single and multiple network models. We present state-of-the-art results on benchmark datasets while keeping computational costs at low level. Our results show that these state-of-the-art results can be reached without using extensive data augmentation techniques.


2020 ◽  
pp. bjophthalmol-2020-317327
Author(s):  
Zhongwen Li ◽  
Chong Guo ◽  
Duoru Lin ◽  
Danyao Nie ◽  
Yi Zhu ◽  
...  

Background/AimsTo develop a deep learning system for automated glaucomatous optic neuropathy (GON) detection using ultra-widefield fundus (UWF) images.MethodsWe trained, validated and externally evaluated a deep learning system for GON detection based on 22 972 UWF images from 10 590 subjects that were collected at 4 different institutions in China and Japan. The InceptionResNetV2 neural network architecture was used to develop the system. The area under the receiver operating characteristic curve (AUC), sensitivity and specificity were used to assess the performance of detecting GON by the system. The data set from the Zhongshan Ophthalmic Center (ZOC) was selected to compare the performance of the system to that of ophthalmologists who mainly conducted UWF image analysis in clinics.ResultsThe system for GON detection achieved AUCs of 0.983–0.999 with sensitivities of 97.5–98.2% and specificities of 94.3–98.4% in four independent data sets. The most common reasons for false-negative results were confounding optic disc characteristics caused by high myopia or pathological myopia (n=39 (53%)). The leading cause for false-positive results was having other fundus lesions (n=401 (96%)). The performance of the system in the ZOC data set was comparable to that of an experienced ophthalmologist (p>0.05).ConclusionOur deep learning system can accurately detect GON from UWF images in an automated fashion. It may be used as a screening tool to improve the accessibility of screening and promote the early diagnosis and management of glaucoma.


Sign in / Sign up

Export Citation Format

Share Document