scholarly journals Social Video Advertisement Replacement and its Evaluation in Convolutional Neural Networks

2021 ◽  
Vol 20 (1) ◽  
pp. 117-136
Author(s):  
Cheng Yang ◽  
Xiang Yu ◽  
Arun Kumar ◽  
G.G. Md. Nawaz Ali ◽  
Peter Han Joo Chong ◽  
...  

This paper introduces a method to use deep convolutional neural networks (CNNs) to automatically replace advertisement (AD) photo on social (or self-media) videos and provides the suitable evaluation method to compare different CNNs. An AD photo can replace a picture inside a video. However, if a human being occludes the replaced picture in the original video, the newly pasted AD photo will block the human occluded part. The deep learning algorithm is implemented to segment the human being from the video. The segmented human pixels are then pasted back to the occluded area, so that the AD photo replacement becomes natural and perfect appearance in the video. This process requires the predicted occlusion edge to be closed to the ground truth occlusion edge, so that the AD photo can be occluded naturally. Therefore, this research introduces a curve fitting method to measure the predicted occlusion edge’s error. By using this method, three CNN methods are applied and compared for the AD replacement. They are mask of regions convolutional neural network (Mask RCNN), a recurrent network for video object segmentation (ROVS) and DeeplabV3. The experimental results show the comparative segmentation accuracy of the different models and DeeplabV3 shows the best performance.

BMC Genomics ◽  
2019 ◽  
Vol 20 (S9) ◽  
Author(s):  
Yang-Ming Lin ◽  
Ching-Tai Chen ◽  
Jia-Ming Chang

Abstract Background Tandem mass spectrometry allows biologists to identify and quantify protein samples in the form of digested peptide sequences. When performing peptide identification, spectral library search is more sensitive than traditional database search but is limited to peptides that have been previously identified. An accurate tandem mass spectrum prediction tool is thus crucial in expanding the peptide space and increasing the coverage of spectral library search. Results We propose MS2CNN, a non-linear regression model based on deep convolutional neural networks, a deep learning algorithm. The features for our model are amino acid composition, predicted secondary structure, and physical-chemical features such as isoelectric point, aromaticity, helicity, hydrophobicity, and basicity. MS2CNN was trained with five-fold cross validation on a three-way data split on the large-scale human HCD MS2 dataset of Orbitrap LC-MS/MS downloaded from the National Institute of Standards and Technology. It was then evaluated on a publicly available independent test dataset of human HeLa cell lysate from LC-MS experiments. On average, our model shows better cosine similarity and Pearson correlation coefficient (0.690 and 0.632) than MS2PIP (0.647 and 0.601) and is comparable with pDeep (0.692 and 0.642). Notably, for the more complex MS2 spectra of 3+ peptides, MS2PIP is significantly better than both MS2PIP and pDeep. Conclusions We showed that MS2CNN outperforms MS2PIP for 2+ and 3+ peptides and pDeep for 3+ peptides. This implies that MS2CNN, the proposed convolutional neural network model, generates highly accurate MS2 spectra for LC-MS/MS experiments using Orbitrap machines, which can be of great help in protein and peptide identifications. The results suggest that incorporating more data for deep learning model may improve performance.


Author(s):  
Fawziya M. Rammo ◽  
Mohammed N. Al-Hamdani

Many languages identification (LID) systems rely on language models that use machine learning (ML) approaches, LID systems utilize rather long recording periods to achieve satisfactory accuracy. This study aims to extract enough information from short recording intervals in order to successfully classify the spoken languages under test. The classification process is based on frames of (2-18) seconds where most of the previous LID systems were based on much longer time frames (from 3 seconds to 2 minutes). This research defined and implemented many low-level features using MFCC (Mel-frequency cepstral coefficients), containing speech files in five languages (English. French, German, Italian, Spanish), from voxforge.org an open-source corpus that consists of user-submitted audio clips in various languages, is the source of data used in this paper. A CNN (convolutional Neural Networks) algorithm applied in this paper for classification and the result was perfect, binary language classification had an accuracy of 100%, and five languages classification with six languages had an accuracy of 99.8%.


Water ◽  
2020 ◽  
Vol 12 (12) ◽  
pp. 3412
Author(s):  
Joakim Bruslund Haurum ◽  
Chris H. Bahnsen ◽  
Malte Pedersen ◽  
Thomas B. Moeslund

Sewer pipe inspections are currently conducted by professionals who remotely control a robot from above ground. This expensive and slow approach is prone to human mistakes. Therefore, there is both an economic and scientific interest in automating the inspection process by creating systems able to recognize sewer defects. However, the extent of research put into automatic water level estimation in sewers has been limited despite being a prerequisite for further analysis of the pipe as only sections above the water level can be visually inspected. In this work, we utilize a dataset of still images obtained from over 5000 inspections carried out for three different Danish water utilities companies. This dataset is used for training and testing decision tree methods and convolutional neural networks (CNNs) for automatic water level estimation. We pose the estimation problem as a classification and regression problem, and compare the results of both approaches. Furthermore, we compare the effect of using different inspection standards for labeling the ground truth water level. By treating the problem as a classification task and using the 2015 Danish sewer inspection standard, where water levels are clustered based on visual appearance, we achieve an averaged F1 score of 79.29% using a fine-tuned ResNet-50 CNN. This shows the potential of using CNNs for water level estimation. We believe including temporal and contextual information will improve the results further.


Diagnostics ◽  
2021 ◽  
Vol 11 (7) ◽  
pp. 1246
Author(s):  
Ning Hung ◽  
Andy Kuan-Yu Shih ◽  
Chihung Lin ◽  
Ming-Tse Kuo ◽  
Yih-Shiou Hwang ◽  
...  

In this study, we aimed to develop a deep learning model for identifying bacterial keratitis (BK) and fungal keratitis (FK) by using slit-lamp images. We retrospectively collected slit-lamp images of patients with culture-proven microbial keratitis between 1 January 2010 and 31 December 2019 from two medical centers in Taiwan. We constructed a deep learning algorithm consisting of a segmentation model for cropping cornea images and a classification model that applies different convolutional neural networks (CNNs) to differentiate between FK and BK. The CNNs included DenseNet121, DenseNet161, DenseNet169, DenseNet201, EfficientNetB3, InceptionV3, ResNet101, and ResNet50. The model performance was evaluated and presented as the area under the curve (AUC) of the receiver operating characteristic curves. A gradient-weighted class activation mapping technique was used to plot the heat map of the model. By using 1330 images from 580 patients, the deep learning algorithm achieved the highest average accuracy of 80.0%. Using different CNNs, the diagnostic accuracy for BK ranged from 79.6% to 95.9%, and that for FK ranged from 26.3% to 65.8%. The CNN of DenseNet161 showed the best model performance, with an AUC of 0.85 for both BK and FK. The heat maps revealed that the model was able to identify the corneal infiltrations. The model showed a better diagnostic accuracy than the previously reported diagnostic performance of both general ophthalmologists and corneal specialists.


2019 ◽  
Vol 11 (23) ◽  
pp. 2858 ◽  
Author(s):  
Tianyu Ci ◽  
Zhen Liu ◽  
Ying Wang

We propose a new convolutional neural networks method in combination with ordinal regression aiming at assessing the degree of building damage caused by earthquakes with aerial imagery. The ordinal regression model and a deep learning algorithm are incorporated to make full use of the information to improve the accuracy of the assessment. A new loss function was introduced in this paper to combine convolutional neural networks and ordinal regression. Assessing the level of damage to buildings can be considered as equivalent to predicting the ordered labels of buildings to be assessed. In the existing research, the problem has usually been simplified as a problem of pure classification to be further studied and discussed, which ignores the ordinal relationship between different levels of damage, resulting in a waste of information. Data accumulated throughout history are used to build network models for assessing the level of damage, and models for assessing levels of damage to buildings based on deep learning are described in detail, including model construction, implementation methods, and the selection of hyperparameters, and verification is conducted by experiments. When categorizing the damage to buildings into four types, we apply the method proposed in this paper to aerial images acquired from the 2014 Ludian earthquake and achieve an overall accuracy of 77.39%; when categorizing damage to buildings into two types, the overall accuracy of the model is 93.95%, exceeding such values in similar types of theories and methods.


2021 ◽  
pp. 147592172110537
Author(s):  
Dong H Kang ◽  
Young-Jin Cha

Recently, crack segmentation studies have been investigated using deep convolutional neural networks. However, significant deficiencies remain in the preparation of ground truth data, consideration of complex scenes, development of an object-specific network for crack segmentation, and use of an evaluation method, among other issues. In this paper, a novel semantic transformer representation network (STRNet) is developed for crack segmentation at the pixel level in complex scenes in a real-time manner. STRNet is composed of a squeeze and excitation attention-based encoder, a multi head attention-based decoder, coarse upsampling, a focal-Tversky loss function, and a learnable swish activation function to design the network concisely by keeping its fast-processing speed. A method for evaluating the level of complexity of image scenes was also proposed. The proposed network is trained with 1203 images with further extensive synthesis-based augmentation, and it is investigated with 545 testing images (1280 × 720, 1024 × 512); it achieves 91.7%, 92.7%, 92.2%, and 92.6% in terms of precision, recall, F1 score, and mIoU (mean intersection over union), respectively. Its performance is compared with those of recently developed advanced networks (Attention U-net, CrackSegNet, Deeplab V3+, FPHBN, and Unet++), with STRNet showing the best performance in the evaluation metrics-it achieves the fastest processing at 49.2 frames per second.


Author(s):  
Minh-Trieu Tran ◽  
Quang-Nhat Vo ◽  
Guee-Sang Lee

AbstractBinarization is an important step for most of document analysis systems. Regarding music score images with a complex background, the existence of background clutters with a variety of shapes and colors creates many challenges for the binarization. This paper presents a model for binarization of the complex background music score images by fusion of deep convolutional neural networks. Our model is directly trained from image regions using pixel values as inputs and the binary ground truth as labels. By utilizing the generalization capability of the residual network backbone and useful feature learning ability of dense layer, the proposed network structures can differentiate foreground pixels from background clutters, minimize the possibility of overfitting phenomenon and thus can deal with complex background noises appearing in the music score images. Comparing to traditional algorithms, binary images generated by our method have a cleaner background and better-preserved strokes. The experiments with captured and synthetic music score images show promising results compared to existing methods.


Drones ◽  
2020 ◽  
Vol 4 (1) ◽  
pp. 7 ◽  
Author(s):  
Robert Chew ◽  
Jay Rineer ◽  
Robert Beach ◽  
Maggie O’Neil ◽  
Noel Ujeneza ◽  
...  

Accurate projections of seasonal agricultural output are essential for improving food security. However, the collection of agricultural information through seasonal agricultural surveys is often not timely enough to inform public and private stakeholders about crop status during the growing season. Acquiring timely and accurate crop estimates can be particularly challenging in countries with predominately smallholder farms because of the large number of small plots, intense intercropping, and high diversity of crop types. In this study, we used RGB images collected from unmanned aerial vehicles (UAVs) flown in Rwanda to develop a deep learning algorithm for identifying crop types, specifically bananas, maize, and legumes, which are key strategic food crops in Rwandan agriculture. The model leverages advances in deep convolutional neural networks and transfer learning, employing the VGG16 architecture and the publicly accessible ImageNet dataset for pretraining. The developed model performs with an overall test set F1 of 0.86, with individual classes ranging from 0.49 (legumes) to 0.96 (bananas). Our findings suggest that although certain staple crops such as bananas and maize can be classified at this scale with high accuracy, crops involved in intercropping (legumes) can be difficult to identify consistently. We discuss the potential use cases for the developed model and recommend directions for future research in this area.


Sign in / Sign up

Export Citation Format

Share Document