scholarly journals Convolution-Based Encoding of Depth Images for Transfer Learning in RGB-D Scene Classification

Sensors ◽  
2021 ◽  
Vol 21 (23) ◽  
pp. 7950
Author(s):  
Radhakrishnan Gopalapillai ◽  
Deepa Gupta ◽  
Mohammed Zakariah ◽  
Yousef Ajami Alotaibi

Classification of indoor environments is a challenging problem. The availability of low-cost depth sensors has opened up a new research area of using depth information in addition to color image (RGB) data for scene understanding. Transfer learning of deep convolutional networks with pairs of RGB and depth (RGB-D) images has to deal with integrating these two modalities. Single-channel depth images are often converted to three-channel images by extracting horizontal disparity, height above ground, and the angle of the pixel’s local surface normal (HHA) to apply transfer learning using networks trained on the Places365 dataset. The high computational cost of HHA encoding can be a major disadvantage for the real-time prediction of scenes, although this may be less important during the training phase. We propose a new, computationally efficient encoding method that can be integrated with any convolutional neural network. We show that our encoding approach performs equally well or better in a multimodal transfer learning setup for scene classification. Our encoding is implemented in a customized and pretrained VGG16 Net. We address the class imbalance problem seen in the image dataset using a method based on the synthetic minority oversampling technique (SMOTE) at the feature level. With appropriate image augmentation and fine-tuning, our network achieves scene classification accuracy comparable to that of other state-of-the-art architectures.

2021 ◽  
Author(s):  
Wenjia Ding ◽  
Huyin Zhang ◽  
Ralf Reulke ◽  
Yulin Wang

Abstract In previous data hiding techniques, binary rules are usually used to guide the fine-tuning of the values of basic objects in the host media to hide bit 0 and bit 1. In this paper, we propose a new data hiding technique for gray images based on querying a 256x256 information table. The information table is constructed by cloning a 3x3 basic block, which we call seed block. Eight unsigned integer values between 0 and 7, i.e., 3 bit binary data, are assigned to different elements of the seed block. Each time, a pair of pixels are chosen from a host image, and their pixel values are used as row and column numbers to look up the information table. If element value obtained by looking up the table is equal to the 3 bit binary data to be hidden, the values of the pixel pair will remain unchanged. Otherwise, take this element as the central point, we call it the focus element, to enclose a 3x3 window in the information table. Then in the window, find the element which is equal to the data to be hidden. Finally, update the pixel values of the pair with the row and column numbers of the found element in the window. Since the row and column numbers are in the range of 0-255, the updated pixel values will not overflow. In the proposed algorithm, a pair of pixels can hide 3 bits of information, so the embedding capacity is very high. Since the adjustment of pixel values is constrained in a 3x3 window, the modification amount of pixel values is small. The proposed technique belongs to fragile digital watermarking, so it can be used for image authentication and tamper localization. By the evaluation of data hiding capacity, security, imperceptibility, computational cost and extensibility, this algorithm is superior to existing information hiding techniques. The proposed technique can also be used in color image and audio data hiding.


2022 ◽  
Vol 12 (2) ◽  
pp. 622
Author(s):  
Saadman Sakib ◽  
Kaushik Deb ◽  
Pranab Kumar Dhar ◽  
Oh-Jin Kwon

The pedestrian attribute recognition task is becoming more popular daily because of its significant role in surveillance scenarios. As the technological advances are significantly more than before, deep learning came to the surface of computer vision. Previous works applied deep learning in different ways to recognize pedestrian attributes. The results are satisfactory, but still, there is some scope for improvement. The transfer learning technique is becoming more popular for its extraordinary performance in reducing computation cost and scarcity of data in any task. This paper proposes a framework that can work in surveillance scenarios to recognize pedestrian attributes. The mask R-CNN object detector extracts the pedestrians. Additionally, we applied transfer learning techniques on different CNN architectures, i.e., Inception ResNet v2, Xception, ResNet 101 v2, ResNet 152 v2. The main contribution of this paper is fine-tuning the ResNet 152 v2 architecture, which is performed by freezing layers, last 4, 8, 12, 14, 20, none, and all. Moreover, data balancing techniques are applied, i.e., oversampling, to resolve the class imbalance problem of the dataset and analysis of the usefulness of this technique is discussed in this paper. Our proposed framework outperforms state-of-the-art methods, and it provides 93.41% mA and 89.24% mA on the RAP v2 and PARSE100K datasets, respectively.


2021 ◽  
Vol 7 ◽  
pp. e557
Author(s):  
Priyal Sobti ◽  
Anand Nayyar ◽  
Niharika ◽  
Preeti Nagrath

Convolutional neural network is widely used to perform the task of image classification, including pretraining, followed by fine-tuning whereby features are adapted to perform the target task, on ImageNet. ImageNet is a large database consisting of 15 million images belonging to 22,000 categories. Images collected from the Web are labeled using Amazon Mechanical Turk crowd-sourcing tool by human labelers. ImageNet is useful for transfer learning because of the sheer volume of its dataset and the number of object classes available. Transfer learning using pretrained models is useful because it helps to build computer vision models in an accurate and inexpensive manner. Models that have been pretrained on substantial datasets are used and repurposed for our requirements. Scene recognition is a widely used application of computer vision in many communities and industries, such as tourism. This study aims to show multilabel scene classification using five architectures, namely, VGG16, VGG19, ResNet50, InceptionV3, and Xception using ImageNet weights available in the Keras library. The performance of different architectures is comprehensively compared in the study. Finally, EnsemV3X is presented in this study. The proposed model with reduced number of parameters is superior to state-of-of-the-art models Inception and Xception because it demonstrates an accuracy of 91%.


2019 ◽  
Vol 11 (24) ◽  
pp. 2908 ◽  
Author(s):  
Yakoub Bazi ◽  
Mohamad M. Al Rahhal ◽  
Haikel Alhichri ◽  
Naif Alajlan

The current literature of remote sensing (RS) scene classification shows that state-of-the-art results are achieved using feature extraction methods, where convolutional neural networks (CNNs) (mostly VGG16 with 138.36 M parameters) are used as feature extractors and then simple to complex handcrafted modules are added for additional feature learning and classification, thus coming back to feature engineering. In this paper, we revisit the fine-tuning approach for deeper networks (GoogLeNet and Beyond) and show that it has not been well exploited due to the negative effect of the vanishing gradient problem encountered when transferring knowledge to small datasets. The aim of this work is two-fold. Firstly, we provide best practices for fine-tuning pre-trained CNNs using the root-mean-square propagation (RMSprop) method. Secondly, we propose a simple yet effective solution for tackling the vanishing gradient problem by injecting gradients at an earlier layer of the network using an auxiliary classification loss function. Then, we fine-tune the resulting regularized network by optimizing both the primary and auxiliary losses. As for pre-trained CNNs, we consider in this work inception-based networks and EfficientNets with small weights: GoogLeNet (7 M) and EfficientNet-B0 (5.3 M) and their deeper versions Inception-v3 (23.83 M) and EfficientNet-B3 (12 M), respectively. The former networks have been used previously in the context of RS and yielded low accuracies compared to VGG16, while the latter are new state-of-the-art models. Extensive experimental results on several benchmark datasets reveal clearly that if fine-tuning is done in an appropriate way, it can settle new state-of-the-art results with low computational cost.


Author(s):  
Sarat Chandra Nayak ◽  
Subhranginee Das ◽  
Mohammad Dilsad Ansari

Background and Objective: Stock closing price prediction is enormously complicated. Artificial Neural Networks (ANN) are excellent approximation algorithms applied to this area. Several nature-inspired evolutionary optimization techniques are proposed and used in the literature to search the optimum parameters of ANN based forecasting models. However, most of them need fine-tuning of several control parameters as well as algorithm specific parameters to achieve optimal performance. Improper tuning of such parameters either leads toward additional computational cost or local optima. Methods: Teaching Learning Based Optimization (TLBO) is a newly proposed algorithm which does not necessitate any parameters specific to it. The intrinsic capability of Functional Link Artificial Neural Network (FLANN) to recognize the multifaceted nonlinear relationship present in the historical stock data made it popular and got wide applications in the stock market prediction. This article presents a hybrid model termed as Teaching Learning Based Optimization of Functional Neural Networks (TLBO-FLN) by combining the advantages of both TLBO and FLANN. Results and Conclusion: The model is evaluated by predicting the short, medium, and long-term closing prices of four emerging stock markets. The performance of the TLBO-FLN model is measured through Mean Absolute Percentage of Error (MAPE), Average Relative Variance (ARV), and coefficient of determination (R2); compared with that of few other state-of-the-art models similarly trained and found superior.


Sensors ◽  
2019 ◽  
Vol 19 (22) ◽  
pp. 4850 ◽  
Author(s):  
Carlos S. Pereira ◽  
Raul Morais ◽  
Manuel J. C. S. Reis

Frequently, the vineyards in the Douro Region present multiple grape varieties per parcel and even per row. An automatic algorithm for grape variety identification as an integrated software component was proposed that can be applied, for example, to a robotic harvesting system. However, some issues and constraints in its development were highlighted, namely, the images captured in natural environment, low volume of images, high similarity of the images among different grape varieties, leaf senescence, and significant changes on the grapevine leaf and bunch images in the harvest seasons, mainly due to adverse climatic conditions, diseases, and the presence of pesticides. In this paper, the performance of the transfer learning and fine-tuning techniques based on AlexNet architecture were evaluated when applied to the identification of grape varieties. Two natural vineyard image datasets were captured in different geographical locations and harvest seasons. To generate different datasets for training and classification, some image processing methods, including a proposed four-corners-in-one image warping algorithm, were used. The experimental results, obtained from the application of an AlexNet-based transfer learning scheme and trained on the image dataset pre-processed through the four-corners-in-one method, achieved a test accuracy score of 77.30%. Applying this classifier model, an accuracy of 89.75% on the popular Flavia leaf dataset was reached. The results obtained by the proposed approach are promising and encouraging in helping Douro wine growers in the automatic task of identifying grape varieties.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Young Jae Kim ◽  
Jang Pyo Bae ◽  
Jun-Won Chung ◽  
Dong Kyun Park ◽  
Kwang Gi Kim ◽  
...  

AbstractWhile colorectal cancer is known to occur in the gastrointestinal tract. It is the third most common form of cancer of 27 major types of cancer in South Korea and worldwide. Colorectal polyps are known to increase the potential of developing colorectal cancer. Detected polyps need to be resected to reduce the risk of developing cancer. This research improved the performance of polyp classification through the fine-tuning of Network-in-Network (NIN) after applying a pre-trained model of the ImageNet database. Random shuffling is performed 20 times on 1000 colonoscopy images. Each set of data are divided into 800 images of training data and 200 images of test data. An accuracy evaluation is performed on 200 images of test data in 20 experiments. Three compared methods were constructed from AlexNet by transferring the weights trained by three different state-of-the-art databases. A normal AlexNet based method without transfer learning was also compared. The accuracy of the proposed method was higher in statistical significance than the accuracy of four other state-of-the-art methods, and showed an 18.9% improvement over the normal AlexNet based method. The area under the curve was approximately 0.930 ± 0.020, and the recall rate was 0.929 ± 0.029. An automatic algorithm can assist endoscopists in identifying polyps that are adenomatous by considering a high recall rate and accuracy. This system can enable the timely resection of polyps at an early stage.


2021 ◽  
Vol 29 (1) ◽  
pp. 19-36
Author(s):  
Çağín Polat ◽  
Onur Karaman ◽  
Ceren Karaman ◽  
Güney Korkmaz ◽  
Mehmet Can Balcı ◽  
...  

BACKGROUND: Chest X-ray imaging has been proved as a powerful diagnostic method to detect and diagnose COVID-19 cases due to its easy accessibility, lower cost and rapid imaging time. OBJECTIVE: This study aims to improve efficacy of screening COVID-19 infected patients using chest X-ray images with the help of a developed deep convolutional neural network model (CNN) entitled nCoV-NET. METHODS: To train and to evaluate the performance of the developed model, three datasets were collected from resources of “ChestX-ray14”, “COVID-19 image data collection”, and “Chest X-ray collection from Indiana University,” respectively. Overall, 299 COVID-19 pneumonia cases and 1,522 non-COVID 19 cases are involved in this study. To overcome the probable bias due to the unbalanced cases in two classes of the datasets, ResNet, DenseNet, and VGG architectures were re-trained in the fine-tuning stage of the process to distinguish COVID-19 classes using a transfer learning method. Lastly, the optimized final nCoV-NET model was applied to the testing dataset to verify the performance of the proposed model. RESULTS: Although the performance parameters of all re-trained architectures were determined close to each other, the final nCOV-NET model optimized by using DenseNet-161 architecture in the transfer learning stage exhibits the highest performance for classification of COVID-19 cases with the accuracy of 97.1 %. The Activation Mapping method was used to create activation maps that highlights the crucial areas of the radiograph to improve causality and intelligibility. CONCLUSION: This study demonstrated that the proposed CNN model called nCoV-NET can be utilized for reliably detecting COVID-19 cases using chest X-ray images to accelerate the triaging and save critical time for disease control as well as assisting the radiologist to validate their initial diagnosis.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Simon Tam ◽  
Mounir Boukadoum ◽  
Alexandre Campeau-Lecours ◽  
Benoit Gosselin

AbstractMyoelectric hand prostheses offer a way for upper-limb amputees to recover gesture and prehensile abilities to ease rehabilitation and daily life activities. However, studies with prosthesis users found that a lack of intuitiveness and ease-of-use in the human-machine control interface are among the main driving factors in the low user acceptance of these devices. This paper proposes a highly intuitive, responsive and reliable real-time myoelectric hand prosthesis control strategy with an emphasis on the demonstration and report of real-time evaluation metrics. The presented solution leverages surface high-density electromyography (HD-EMG) and a convolutional neural network (CNN) to adapt itself to each unique user and his/her specific voluntary muscle contraction patterns. Furthermore, a transfer learning approach is presented to drastically reduce the training time and allow for easy installation and calibration processes. The CNN-based gesture recognition system was evaluated in real-time with a group of 12 able-bodied users. A real-time test for 6 classes/grip modes resulted in mean and median positive predictive values (PPV) of 93.43% and 100%, respectively. Each gesture state is instantly accessible from any other state, with no mode switching required for increased responsiveness and natural seamless control. The system is able to output a correct prediction within less than 116 ms latency. 100% PPV has been attained in many trials and is realistically achievable consistently with user practice and/or employing a thresholded majority vote inference. Using transfer learning, these results are achievable after a sensor installation, data recording and network training/fine-tuning routine taking less than 10 min to complete, a reduction of 89.4% in the setup time of the traditional, non-transfer learning approach.


Sign in / Sign up

Export Citation Format

Share Document