iCaps-Dfake: An Integrated Capsule-Based Model for Deepfake Image and Video Detection

Fake media is spreading like wildfire all over the internet as a result of the great advancement in deepfake creation tools and the huge interest researchers and corporations are showing to explore its limits. Now anyone can create manipulated unethical media forensics, defame, humiliate others or even scam them out of their money with a click of a button. In this research a new deepfake detection approach, iCaps-Dfake, is proposed that competes with state-of-the-art techniques of deepfake video detection and addresses their low generalization problem. Two feature extraction methods are combined, texture-based Local Binary Patterns (LBP) and Convolutional Neural Networks (CNN) based modified High-Resolution Network (HRNet), along with an application of capsule neural networks (CapsNets) implementing a concurrent routing technique. Experiments have been conducted on large benchmark datasets to evaluate the performance of the proposed model. Several performance metrics are applied and experimental results are analyzed. The proposed model was primarily trained and tested on the DeepFakeDetectionChallenge-Preview (DFDC-P) dataset then tested on Celeb-DF to examine its generalization capability. Experiments achieved an Area-Under Curve (AUC) score improvement of 20.25% over state-of-the-art models.

Download Full-text

Hybrid pooling with wavelets for convolutional neural networks

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-219223 ◽

2022 ◽

pp. 1-10

Author(s):

Daniel Trevino-Sanchez ◽

Vicente Alarcon-Aquino

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

State Of The Art ◽

Computational Cost ◽

Relevant Information ◽

Accuracy Improvement ◽

Proposed Model ◽

Benchmark Datasets ◽

Augmentation Techniques ◽

High Computational Cost

The need to detect and classify objects correctly is a constant challenge, being able to recognize them at different scales and scenarios, sometimes cropped or badly lit is not an easy task. Convolutional neural networks (CNN) have become a widely applied technique since they are completely trainable and suitable to extract features. However, the growing number of convolutional neural networks applications constantly pushes their accuracy improvement. Initially, those improvements involved the use of large datasets, augmentation techniques, and complex algorithms. These methods may have a high computational cost. Nevertheless, feature extraction is known to be the heart of the problem. As a result, other approaches combine different technologies to extract better features to improve the accuracy without the need of more powerful hardware resources. In this paper, we propose a hybrid pooling method that incorporates multiresolution analysis within the CNN layers to reduce the feature map size without losing details. To prevent relevant information from losing during the downsampling process an existing pooling method is combined with wavelet transform technique, keeping those details "alive" and enriching other stages of the CNN. Achieving better quality characteristics improves CNN accuracy. To validate this study, ten pooling methods, including the proposed model, are tested using four benchmark datasets. The results are compared with four of the evaluated methods, which are also considered as the state-of-the-art.

Download Full-text

Two-Level Transformer and Auxiliary Coherence Modeling for Improved Text Segmentation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6284 ◽

2020 ◽

Vol 34 (05) ◽

pp. 7797-7804

Author(s):

Goran Glavašš ◽

Swapna Somasundaran

Keyword(s):

State Of The Art ◽

Language Transfer ◽

Text Segmentation ◽

Word Embeddings ◽

Neural Architecture ◽

Text Coherence ◽

Sentence Level ◽

Proposed Model ◽

Benchmark Datasets ◽

Cross Lingual

Breaking down the structure of long texts into semantically coherent segments makes the texts more readable and supports downstream applications like summarization and retrieval. Starting from an apparent link between text coherence and segmentation, we introduce a novel supervised model for text segmentation with simple but explicit coherence modeling. Our model – a neural architecture consisting of two hierarchically connected Transformer networks – is a multi-task learning model that couples the sentence-level segmentation objective with the coherence objective that differentiates correct sequences of sentences from corrupt ones. The proposed model, dubbed Coherence-Aware Text Segmentation (CATS), yields state-of-the-art segmentation performance on a collection of benchmark datasets. Furthermore, by coupling CATS with cross-lingual word embeddings, we demonstrate its effectiveness in zero-shot language transfer: it can successfully segment texts in languages unseen in training.

Download Full-text

A New Click-Through Rates Prediction Model Based on Deep&Cross Network

Algorithms ◽

10.3390/a13120342 ◽

2020 ◽

Vol 13 (12) ◽

pp. 342

Author(s):

Guojing Huang ◽

Qingliang Chen ◽

Congjian Deng

Keyword(s):

Neural Networks ◽

Prediction Model ◽

Deep Neural Networks ◽

Prediction Models ◽

State Of The Art ◽

Online Advertising ◽

Optimization Technique ◽

Proposed Model ◽

Great Progress ◽

Online Advertisement

With the development of E-commerce, online advertising began to thrive and has gradually developed into a new mode of business, of which Click-Through Rates (CTR) prediction is the essential driving technology. Given a user, commodities and scenarios, the CTR model can predict the user’s click probability of an online advertisement. Recently, great progress has been made with the introduction of Deep Neural Networks (DNN) into CTR. In order to further advance the DNN-based CTR prediction models, this paper introduces a new model of FO-FTRL-DCN, based on the prestigious model of Deep&Cross Network (DCN) augmented with the latest optimization technique of Follow The Regularized Leader (FTRL) for DNN. The extensive comparative experiments on the iPinYou datasets show that the proposed model has outperformed other state-of-the-art baselines, with better generalization across different datasets in the benchmark.

Download Full-text

Multimodal Feature Learning for Video Captioning

Mathematical Problems in Engineering ◽

10.1155/2018/3125879 ◽

2018 ◽

Vol 2018 ◽

pp. 1-8 ◽

Cited By ~ 2

Author(s):

Sujin Lee ◽

Incheol Kim

Keyword(s):

Neural Networks ◽

Natural Language ◽

Feature Learning ◽

Visual Features ◽

Semantic Features ◽

Video Captioning ◽

Video Feature ◽

Proposed Model ◽

Generation Network ◽

Benchmark Datasets

Video captioning refers to the task of generating a natural language sentence that explains the content of the input video clips. This study proposes a deep neural network model for effective video captioning. Apart from visual features, the proposed model learns additionally semantic features that describe the video content effectively. In our model, visual features of the input video are extracted using convolutional neural networks such as C3D and ResNet, while semantic features are obtained using recurrent neural networks such as LSTM. In addition, our model includes an attention-based caption generation network to generate the correct natural language captions based on the multimodal video feature sequences. Various experiments, conducted with the two large benchmark datasets, Microsoft Video Description (MSVD) and Microsoft Research Video-to-Text (MSR-VTT), demonstrate the performance of the proposed model.

Download Full-text

Building Detection using Two-Layered Novel Convolutional Neural Networks

Journal of Soft Computing Paradigm - September 2019 ◽

10.36548/jscp.2021.1.004 ◽

2021 ◽

Vol 3 (1) ◽

pp. 29-37

Author(s):

Karuppusamy P

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Video Processing ◽

Feature Vector ◽

State Of The Art ◽

Local Binary Patterns ◽

Building Detection ◽

Histogram Of Oriented Gradients ◽

Sensing Applications ◽

Remote Sensing Applications

In the recent years, there has been a high surge in the use of convolutional neural networks (CNNs) because of the state-of-the art performance in a number of areas like text, audio and video processing. The field of remote sensing applications is however a field that has not fully incorporated the use of CNN. To address this issue, we introduced a novel CNN that can be used to increase the performance of detectors built that use Local Binary Patterns (LBP) and Histogram of Oriented Gradients (HOG). Moreover, in this paper, we have also increased the accuracy of the CNN using two improvements. The first improvement involves feature vector transformation with Euler methodology and combining normalized and raw features. Based on the results observed, we have also performed a comparative study using similar methods and it has been identified that the proposed CNN proves to be an improvement over the others.

Download Full-text

A CNN Model for Human Parsing Based on Capacity Optimization

Applied Sciences ◽

10.3390/app9071330 ◽

2019 ◽

Vol 9 (7) ◽

pp. 1330 ◽

Cited By ~ 1

Author(s):

Yalong Jiang ◽

Zheru Chi

Keyword(s):

Neural Networks ◽

Computational Efficiency ◽

Semantic Information ◽

State Of The Art ◽

Depth Estimation ◽

Baseline Model ◽

Computational Burden ◽

Proposed Model ◽

Saliency Prediction ◽

Benchmark Solutions

Although a state-of-the-art performance has been achieved in pixel-specific tasks, such as saliency prediction and depth estimation, convolutional neural networks (CNNs) still perform unsatisfactorily in human parsing where semantic information of detailed regions needs to be perceived under the influences of variations in viewpoints, poses, and occlusions. In this paper, we propose to improve the robustness of human parsing modules by introducing a depth-estimation module. A novel scheme is proposed for the integration of a depth-estimation module and a human-parsing module. The robustness of the overall model is improved with the automatically obtained depth labels. As another major concern, the computational efficiency is also discussed. Our proposed human parsing module with 24 layers can achieve a similar performance as the baseline CNN model with over 100 layers. The number of parameters in the overall model is less than that in the baseline model. Furthermore, we propose to reduce the computational burden by replacing a conventional CNN layer with a stack of simplified sub-layers to further reduce the overall number of trainable parameters. Experimental results show that the integration of two modules contributes to the improvement of human parsing without additional human labeling. The proposed model outperforms the benchmark solutions and the capacity of our model is better matched to the complexity of the task.

Download Full-text

A Novel Architecture to Classify Histopathology Images Using Convolutional Neural Networks

Applied Sciences ◽

10.3390/app10082929 ◽

2020 ◽

Vol 10 (8) ◽

pp. 2929 ◽

Cited By ~ 2

Author(s):

Ibrahem Kandel ◽

Mauro Castelli

Keyword(s):

Neural Network ◽

Neural Networks ◽

State Of The Art ◽

Treatment Plan ◽

Tissue Structure ◽

Activation Functions ◽

Proposed Model ◽

Histopathology Images ◽

Fully Connected

Histopathology is the study of tissue structure under the microscope to determine if the cells are normal or abnormal. Histopathology is a very important exam that is used to determine the patients’ treatment plan. The classification of histopathology images is very difficult to even an experienced pathologist, and a second opinion is often needed. Convolutional neural network (CNN), a particular type of deep learning architecture, obtained outstanding results in computer vision tasks like image classification. In this paper, we propose a novel CNN architecture to classify histopathology images. The proposed model consists of 15 convolution layers and two fully connected layers. A comparison between different activation functions was performed to detect the most efficient one, taking into account two different optimizers. To train and evaluate the proposed model, the publicly available PatchCamelyon dataset was used. The dataset consists of 220,000 annotated images for training and 57,000 unannotated images for testing. The proposed model achieved higher performance compared to the state-of-the-art architectures with an AUC of 95.46%.

Download Full-text

Learning Structured Text Representations

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00005 ◽

2018 ◽

Vol 6 ◽

pp. 63-75 ◽

Cited By ~ 12

Author(s):

Yang Liu ◽

Mirella Lapata

Keyword(s):

Neural Networks ◽

State Of The Art ◽

Neural Model ◽

Document Modeling ◽

Proposed Model ◽

Parsing Algorithm ◽

Document Representations

In this paper, we focus on learning structure-aware document representations from data without recourse to a discourse parser or additional annotations. Drawing inspiration from recent efforts to empower neural networks with a structural bias (Cheng et al., 2016; Kim et al., 2017), we propose a model that can encode a document while automatically inducing rich structural dependencies. Specifically, we embed a differentiable non-projective parsing algorithm into a neural model and use attention mechanisms to incorporate the structural biases. Experimental evaluations across different tasks and datasets show that the proposed model achieves state-of-the-art results on document modeling tasks while inducing intermediate structures which are both interpretable and meaningful.

Download Full-text

Biomedical document triage using a hierarchical attention-based capsule network

BMC Bioinformatics ◽

10.1186/s12859-020-03673-5 ◽

2020 ◽

Vol 21 (S13) ◽

Author(s):

Jian Wang ◽

Mengying Li ◽

Qishuai Diao ◽

Hongfei Lin ◽

Zhihao Yang ◽

...

Keyword(s):

Neural Networks ◽

Information Extraction ◽

Precision Medicine ◽

State Of The Art ◽

Attention Mechanism ◽

Feature Representation ◽

Experimental Results ◽

Biomedical Domain ◽

Proposed Model ◽

Document Triage

Abstract Background Biomedical document triage is the foundation of biomedical information extraction, which is important to precision medicine. Recently, some neural networks-based methods have been proposed to classify biomedical documents automatically. In the biomedical domain, documents are often very long and often contain very complicated sentences. However, the current methods still find it difficult to capture important features across sentences. Results In this paper, we propose a hierarchical attention-based capsule model for biomedical document triage. The proposed model effectively employs hierarchical attention mechanism and capsule networks to capture valuable features across sentences and construct a final latent feature representation for a document. We evaluated our model on three public corpora. Conclusions Experimental results showed that both hierarchical attention mechanism and capsule networks are helpful in biomedical document triage task. Our method proved itself highly competitive or superior compared with other state-of-the-art methods.

Download Full-text

Histopathological Classification of Breast Cancer Images Using a Multi-Scale Input and Multi-Feature Network

Cancers ◽

10.3390/cancers12082031 ◽

2020 ◽

Vol 12 (8) ◽

pp. 2031 ◽

Cited By ~ 2

Author(s):

Taimoor Shakeel Sheikh ◽

Yonghee Lee ◽

Migyung Cho

Keyword(s):

State Of The Art ◽

Texture Features ◽

Feature Maps ◽

Histopathological Classification ◽

Multi Scale ◽

Machine Learning Methods ◽

Proposed Model ◽

Benchmark Datasets ◽

Histopathological Images

Diagnosis of pathologies using histopathological images can be time-consuming when many images with different magnification levels need to be analyzed. State-of-the-art computer vision and machine learning methods can help automate the diagnostic pathology workflow and thus reduce the analysis time. Automated systems can also be more efficient and accurate, and can increase the objectivity of diagnosis by reducing operator variability. We propose a multi-scale input and multi-feature network (MSI-MFNet) model, which can learn the overall structures and texture features of different scale tissues by fusing multi-resolution hierarchical feature maps from the network’s dense connectivity structure. The MSI-MFNet predicts the probability of a disease on the patch and image levels. We evaluated the performance of our proposed model on two public benchmark datasets. Furthermore, through ablation studies of the model, we found that multi-scale input and multi-feature maps play an important role in improving the performance of the model. Our proposed model outperformed the existing state-of-the-art models by demonstrating better accuracy, sensitivity, and specificity.

Download Full-text