scholarly journals Investigasi Pengaruh Skema Stride dan Step Training untuk Deteksi Jari Pada Region-based Fully Convolutional Network (R-FCN) dalam Teknologi Augmented Reality

2021 ◽  
Vol 12 (2) ◽  
pp. 138
Author(s):  
Hashfi Fadhillah ◽  
Suryo Adhi Wibowo ◽  
Rita Purnamasari

Abstract  Combining the real world with the virtual world and then modeling it in 3D is an effort carried on Augmented Reality (AR) technology. Using fingers for computer operations on multi-devices makes the system more interactive. Marker-based AR is one type of AR that uses markers in its detection. This study designed the AR system by detecting fingertips as markers. This system is designed using the Region-based Deep Fully Convolutional Network (R-FCN) deep learning method. This method develops detection results obtained from the Fully Connected Network (FCN). Detection results will be integrated with a computer pointer for basic operations. This study uses a predetermined step scheme to get the best IoU parameters, precision and accuracy. The scheme in this study uses a step scheme, namely: 25K, 50K and 75K step. High precision creates centroid point changes that are not too far away. High accuracy can improve AR performance under conditions of rapid movement and improper finger conditions. The system design uses a dataset in the form of an index finger image with a configuration of 10,800 training data and 3,600 test data. The model will be tested on each scheme using video at different distances, locations and times. This study produced the best results on the 25K step scheme with IoU of 69%, precision of 5.56 and accuracy of 96%.Keyword: Augmented Reality, Region-based Convolutional Network, Fully Convolutional Network, Pointer, Step training Abstrak Menggabungkan dunia nyata dengan dunia virtual lalu memodelkannya bentuk 3D merupakan upaya yang diusung pada teknologi Augmented Reality (AR). Menggunakan jari untuk operasi komputer pada multi-device membuat sistem yang lebih interaktif. Marker-based AR merupakan salah satu jenis AR yang menggunakan marker dalam deteksinya. Penelitian ini merancang sistem AR dengan mendeteksi ujung jari sebagai marker. Sistem ini dirancang menggunakan metode deep learning Region-based Fully Convolutional Network (R-FCN). Metode ini mengembangkan hasil deteksi yang didapat dari Fully Connected Network (FCN). Hasil deteksi akan diintegrasikan dengan pointer komputer untuk operasi dasar. Penelitian ini menggunakan skema step training yang telah ditentukan untuk mendapatkan parameter IoU, presisi dan akurasi yang terbaik. Skema pada penelitian ini menggunakan skema step yaitu: 25K, 50K dan 75K step. Presisi tinggi menciptakan perubahan titik centroid yang tidak terlalu jauh. Akurasi  yang tinggi dapat meningkatkan kinerja AR dalam kondisi pergerakan yang cepat dan kondisi jari yang tidak tepat. Perancangan sistem menggunakan dataset berupa citra jari telunjuk dengan konfigurasi 10.800 data latih dan 3.600 data uji. Model akan diuji pada tiap skema dilakukan menggunakan video pada jarak, lokasi dan waktu yang berbeda. Penelitian ini menghasilkan hasil terbaik pada skema step 25K dengan IoU sebesar 69%, presisi sebesar 5,56 dan akurasi sebesar 96%.Kata kunci: Augmented Reality, Region-based Convolutional Network, Fully Convolutional Network, Pointer, Step training 

2018 ◽  
Vol 10 (11) ◽  
pp. 1827 ◽  
Author(s):  
Ahram Song ◽  
Jaewan Choi ◽  
Youkyung Han ◽  
Yongil Kim

Hyperspectral change detection (CD) can be effectively performed using deep-learning networks. Although these approaches require qualified training samples, it is difficult to obtain ground-truth data in the real world. Preserving spatial information during training is difficult due to structural limitations. To solve such problems, our study proposed a novel CD method for hyperspectral images (HSIs), including sample generation and a deep-learning network, called the recurrent three-dimensional (3D) fully convolutional network (Re3FCN), which merged the advantages of a 3D fully convolutional network (FCN) and a convolutional long short-term memory (ConvLSTM). Principal component analysis (PCA) and the spectral correlation angle (SCA) were used to generate training samples with high probabilities of being changed or unchanged. The strategy assisted in training fewer samples of representative feature expression. The Re3FCN was mainly comprised of spectral–spatial and temporal modules. Particularly, a spectral–spatial module with a 3D convolutional layer extracts the spectral–spatial features from the HSIs simultaneously, whilst a temporal module with ConvLSTM records and analyzes the multi-temporal HSI change information. The study first proposed a simple and effective method to generate samples for network training. This method can be applied effectively to cases with no training samples. Re3FCN can perform end-to-end detection for binary and multiple changes. Moreover, Re3FCN can receive multi-temporal HSIs directly as input without learning the characteristics of multiple changes. Finally, the network could extract joint spectral–spatial–temporal features and it preserved the spatial structure during the learning process through the fully convolutional structure. This study was the first to use a 3D FCN and a ConvLSTM for the remote-sensing CD. To demonstrate the effectiveness of the proposed CD method, we performed binary and multi-class CD experiments. Results revealed that the Re3FCN outperformed the other conventional methods, such as change vector analysis, iteratively reweighted multivariate alteration detection, PCA-SCA, FCN, and the combination of 2D convolutional layers-fully connected LSTM.


Author(s):  
C. Xiao ◽  
R. Qin ◽  
X. Huang ◽  
J. Li

<p><strong>Abstract.</strong> Individual tree detection and counting are critical for the forest inventory management. In almost all of these methods that based on remote sensing data, the treetop detection is the most important and essential part. However, due to the diversities of the tree attributes, such as crown size and branch distribution, it is hard to find a universal treetop detector and most of the current detectors need to be carefully designed based on the heuristic or prior knowledge. Hence, to find an efficient and versatile detector, we apply deep neural network to extract and learn the high-level semantic treetop features. In contrast to using manually labelled training data, we innovatively train the network with the pseudo ones that come from the result of the conventional non-supervised treetop detectors which may be not robust in different scenarios. In this study, we use multi-view high-resolution satellite imagery derived DSM (Digital Surface Model) and multispectral orthophoto as data and apply the top-hat by reconstruction (THR) operation to find treetops as the pseudo labels. The FCN (fully convolutional network) is adopted as a pixel-level classification network to segment the input image into treetops and non-treetops pixels. Our experiments show that the FCN based treetop detector is able to achieve a detection accuracy of 99.7<span class="thinspace"></span>% at the prairie area and 66.3<span class="thinspace"></span>% at the complicated town area which shows better performance than THR in the various scenarios. This study demonstrates that without manual labels, the FCN treetop detector can be trained by the pseudo labels that generated using the non-supervised detector and achieve better and robust results in different scenarios.</p>


2021 ◽  
Vol 247 ◽  
pp. 03013
Author(s):  
Qian Zhang ◽  
Jinchao Zhang ◽  
Liang Liang ◽  
Zhuo Li ◽  
Tengfei Zhang

A deep learning based surrogate model is proposed for replacing the conventional diffusion equation solver and predicting the flux and power distribution of the reactor core. Using the training data generated by the conventional diffusion equation solver, a special designed convolutional neural network inspired by the FCN (Fully Convolutional Network) is trained under the deep learning platform TensorFlow. Numerical results show that the deep learning based surrogate model is effective for estimating the flux and power distribution calculated by the diffusion method, which means it can be used for replacing the conventional diffusion equation solver with high efficiency boost.


2020 ◽  
pp. 147592172094295
Author(s):  
Homin Song ◽  
Yongchao Yang

Subwavelength defect imaging using guided waves has been known to be a difficult task mainly due to the diffraction limit and dispersion of guided waves. In this article, we present a noncontact super-resolution guided wave array imaging approach based on deep learning to visualize subwavelength defects in plate-like structures. The proposed approach is a novel hierarchical multiscale imaging approach that combines two distinct fully convolutional networks. The first fully convolutional network, the global detection network, globally detects subwavelength defects in a raw low-resolution guided wave beamforming image. Then, the subsequent second fully convolutional network, the local super-resolution network, locally resolves subwavelength-scale fine structural details of the detected defects. We conduct a series of numerical simulations and laboratory-scale experiments using a noncontact guided wave array enabled by a scanning laser Doppler vibrometer on aluminate plates with various subwavelength defects. The results demonstrate that the proposed super-resolution guided wave array imaging approach not only locates subwavelength defects but also visualizes super-resolution fine structural details of these defects, thus enabling further estimation of the size and shape of the detected subwavelength defects. We discuss several key aspects of the performance of our approach, compare with an existing super-resolution algorithm, and make recommendations for its successful implementations.


Author(s):  
V. R. S. Mani

In this chapter, the author paints a comprehensive picture of different deep learning models used in different multi-modal image segmentation tasks. This chapter is an introduction for those new to the field, an overview for those working in the field, and a reference for those searching for literature on a specific application. Methods are classified according to the different types of multi-modal images and the corresponding types of convolution neural networks used in the segmentation task. The chapter starts with an introduction to CNN topology and describes various models like Hyper Dense Net, Organ Attention Net, UNet, VNet, Dilated Fully Convolutional Network, Transfer Learning, etc.


Author(s):  
S. Daniel ◽  
V. Dupont

Abstract. The benefit of autonomous vehicles in hydrography is largely based on the ability of these platforms to carry out survey campaigns in a fully autonomous manner. One solution is to have real-time processing onboard the survey vessel. To meet this real-time processing goal, deep learning based-models are favored. Although Artificial Intelligence (AI) is booming, the main studies have been devoted to optical images and more recently, to LIDAR point clouds. However, little attention has been paid to the underwater environment. In this paper, we present an investigation into the adaptation of deep neural network to multi-beam echo-sounder (MBES) point cloud in order to classify sea-bottom morphology. More precisely, the paper investigates whether fully convolutional network can be trained while using the native 3D structure of the point cloud. A preprocessing approach is provided in order to overcome the lack of adequate training data. The results reported from the test data sets show the level of complexity related to natural, underwater terrain features where a classification accuracy no better than 65% can be reached when 2 micro topographic classes are used. Point density and resolution have a strong impact on the seabed morphology thereby affecting the classification scheme.


Plant Methods ◽  
2019 ◽  
Vol 15 (1) ◽  
Author(s):  
Yu Jiang ◽  
Changying Li ◽  
Andrew H. Paterson ◽  
Jon S. Robertson

Abstract Background Plant population density is an important factor for agricultural production systems due to its substantial influence on crop yield and quality. Traditionally, plant population density is estimated by using either field assessment or a germination-test-based approach. These approaches can be laborious and inaccurate. Recent advances in deep learning provide new tools to solve challenging computer vision tasks such as object detection, which can be used for detecting and counting plant seedlings in the field. The goal of this study was to develop a deep-learning-based approach to count plant seedlings in the field. Results Overall, the final detection model achieved F1 scores of 0.727 (at $$IOU_{all}$$IOUall) and 0.969 (at $$IOU_{0.5}$$IOU0.5) on the $$Seedling_{All}$$SeedlingAll testing set in which images had large variations, indicating the efficacy of the Faster RCNN model with the Inception ResNet v2 feature extractor for seedling detection. Ablation experiments showed that training data complexity substantially affected model generalizability, transfer learning efficiency, and detection performance improvements due to increased training sample size. Generally, the seedling counts by the developed method were highly correlated ($$R^2$$R2 = 0.98) with that found through human field assessment for 75 test videos collected in multiple locations during multiple years, indicating the accuracy of the developed approach. Further experiments showed that the counting accuracy was largely affected by the detection accuracy: the developed approach provided good counting performance for unknown datasets as long as detection models were well generalized to those datasets. Conclusion The developed deep-learning-based approach can accurately count plant seedlings in the field. Seedling detection models trained in this study and the annotated images can be used by the research community and the cotton industry to further the development of solutions for seedling detection and counting.


2020 ◽  
Vol 34 (07) ◽  
pp. 12104-12111
Author(s):  
Yi Tu ◽  
Li Niu ◽  
Weijie Zhao ◽  
Dawei Cheng ◽  
Liqing Zhang

Aesthetic image cropping is a practical but challenging task which aims at finding the best crops with the highest aesthetic quality in an image. Recently, many deep learning methods have been proposed to address this problem, but they did not reveal the intrinsic mechanism of aesthetic evaluation. In this paper, we propose an interpretable image cropping model to unveil the mystery. For each image, we use a fully convolutional network to produce an aesthetic score map, which is shared among all candidate crops during crop-level aesthetic evaluation. Then, we require the aesthetic score map to be both composition-aware and saliency-aware. In particular, the same region is assigned with different aesthetic scores based on its relative positions in different crops. Moreover, a visually salient region is supposed to have more sensitive aesthetic scores so that our network can learn to place salient objects at more proper positions. Such an aesthetic score map can be used to localize aesthetically important regions in an image, which sheds light on the composition rules learned by our model. We show the competitive performance of our model in the image cropping task on several benchmark datasets, and also demonstrate its generality in real-world applications.


Sign in / Sign up

Export Citation Format

Share Document