scholarly journals A comparison of deep learning models for end-to-end face-based video retrieval in unconstrained videos

Author(s):  
Gioele Ciaparrone ◽  
Leonardo Chiariglione ◽  
Roberto Tagliaferri

AbstractFace-based video retrieval (FBVR) is the task of retrieving videos that containing the same face shown in the query image. In this article, we present the first end-to-end FBVR pipeline that is able to operate on large datasets of unconstrained, multi-shot, multi-person videos. We adapt an existing audiovisual recognition dataset to the task of FBVR and use it to evaluate our proposed pipeline. We compare a number of deep learning models for shot detection, face detection, and face feature extraction as part of our pipeline on a validation dataset made of more than 4000 videos. We obtain 97.25% mean average precision on an independent test set, composed of more than 1000 videos. The pipeline is able to extract features from videos at $$\sim $$ ∼ 7 times the real-time speed, and it is able to perform a query on thousands of videos in less than 0.5 s.

2021 ◽  
pp. 1-30
Author(s):  
Qingtian Zou ◽  
Anoop Singhal ◽  
Xiaoyan Sun ◽  
Peng Liu

Network attacks have become a major security concern for organizations worldwide. A category of network attacks that exploit the logic (security) flaws of a few widely-deployed authentication protocols has been commonly observed in recent years. Such logic-flaw-exploiting network attacks often do not have distinguishing signatures, and can thus easily evade the typical signature-based network intrusion detection systems. Recently, researchers have applied neural networks to detect network attacks with network logs. However, public network data sets have major drawbacks such as limited data sample variations and unbalanced data with respect to malicious and benign samples. In this paper, we present a new end-to-end approach based on protocol fuzzing to automatically generate high-quality network data, on which deep learning models can be trained for network attack detection. Our findings show that protocol fuzzing can generate data samples that cover real-world data, and deep learning models trained with fuzzed data can successfully detect the logic-flaw-exploiting network attacks.


2021 ◽  
pp. 1-38
Author(s):  
Wenya Wang ◽  
Sinno Jialin Pan

Abstract Nowadays, deep learning models have been widely adopted and achieved promising results on various application domains. Despite of their intriguing performance, most deep learning models function as black-boxes, lacking explicit reasoning capabilities and explanations, which are usually essential for complex problems. Take joint inference in information extraction as an example. This task requires the identification of multiple structured knowledge from texts, which is inter-correlated, including entities, events and the relationships between them. Various deep neural networks have been proposed to jointly perform entity extraction and relation prediction, which only propagate information implicitly via representation learning. However, they fail to encode the intensive correlations between entity types and relations to enforce their co-existence. On the other hand, some approaches adopt rules to explicitly constrain certain relational facts. However, the separation of rules with representation learning usually restrains the approaches with error propagation. Moreover, the pre-defined rules are inflexible and might bring negative effects when data is noisy. To address these limitations, we propose a variational deep logic network that incorporates both representation learning and relational reasoning via the variational EM algorithm. The model consists of a deep neural network to learn high-level features with implicit interactions via the self-attention mechanism and a relational logic network to explicitly exploit target interactions. These two components are trained interactively to bring the best of both worlds. We conduct extensive experiments ranging from fine-grained sentiment terms extraction, end-to-end relation prediction to end-to-end event extraction to demonstrate the effectiveness of our proposed method.


2020 ◽  
Vol 11 (1) ◽  
Author(s):  
Qingyu Zhao ◽  
Ehsan Adeli ◽  
Kilian M. Pohl

AbstractThe presence of confounding effects (or biases) is one of the most critical challenges in using deep learning to advance discovery in medical imaging studies. Confounders affect the relationship between input data (e.g., brain MRIs) and output variables (e.g., diagnosis). Improper modeling of those relationships often results in spurious and biased associations. Traditional machine learning and statistical models minimize the impact of confounders by, for example, matching data sets, stratifying data, or residualizing imaging measurements. Alternative strategies are needed for state-of-the-art deep learning models that use end-to-end training to automatically extract informative features from large set of images. In this article, we introduce an end-to-end approach for deriving features invariant to confounding factors while accounting for intrinsic correlations between the confounder(s) and prediction outcome. The method does so by exploiting concepts from traditional statistical methods and recent fair machine learning schemes. We evaluate the method on predicting the diagnosis of HIV solely from Magnetic Resonance Images (MRIs), identifying morphological sex differences in adolescence from those of the National Consortium on Alcohol and Neurodevelopment in Adolescence (NCANDA), and determining the bone age from X-ray images of children. The results show that our method can accurately predict while reducing biases associated with confounders. The code is available at https://github.com/qingyuzhao/br-net.


2020 ◽  
Author(s):  
Xi Yang ◽  
Hansi Zhang ◽  
Xing He ◽  
Jiang Bian ◽  
Yonghui Wu

BACKGROUND Patients’ family history (FH) is a critical risk factor associated with numerous diseases. However, FH information is not well captured in the structured database but often documented in clinical narratives. Natural language processing (NLP) is the key technology to extract patients’ FH from clinical narratives. In 2019, the National NLP Clinical Challenge (n2c2) organized shared tasks to solicit NLP methods for FH information extraction. OBJECTIVE This study presents our end-to-end FH extraction system developed during the 2019 n2c2 open shared task as well as the new transformer-based models that we developed after the challenge. We seek to develop a machine learning–based solution for FH information extraction without task-specific rules created by hand. METHODS We developed deep learning–based systems for FH concept extraction and relation identification. We explored deep learning models including long short-term memory-conditional random fields and bidirectional encoder representations from transformers (BERT) as well as developed ensemble models using a majority voting strategy. To further optimize performance, we systematically compared 3 different strategies to use BERT output representations for relation identification. RESULTS Our system was among the top-ranked systems (3 out of 21) in the challenge. Our best system achieved micro-averaged F1 scores of 0.7944 and 0.6544 for concept extraction and relation identification, respectively. After challenge, we further explored new transformer-based models and improved the performances of both subtasks to 0.8249 and 0.6775, respectively. For relation identification, our system achieved a performance comparable to the best system (0.6810) reported in the challenge. CONCLUSIONS This study demonstrated the feasibility of utilizing deep learning methods to extract FH information from clinical narratives.


2020 ◽  
Vol 5 (2) ◽  
pp. 96-116
Author(s):  
subhashini narayan ◽  

In this modern world of ever-increasing one-click purchases, movie bookings, music, healthcare, fashion, the need for recommendations have increased the more. Google, Netflix, Spotify, Amazon and other tech giants use recommendations to customize and tailor their search engines to suit the user’s interests. Many of the existing systems are based on older algorithms which although have decent accuracies, require large training and testing datasets and with the emergence of deep learning, the accuracy of algorithms has further improved, and error rates have reduced due to the use of multiple layers. The need for large datasets has declined as well. This research article propose a recommendation system based on deep learning models such as multilayer perceptron that would provide a slightly more efficient and accurate recommendations.


Photoniques ◽  
2020 ◽  
pp. 30-33
Author(s):  
Adrian Shajkofci ◽  
Michael Liebling

In microscopy, the time burden and cost of acquiring and annotating large datasets that many deep learning models take as a prerequisite, often appears to make these methods impractical. Can this requirement for annotated data be relaxed? Is it possible to borrow the knowledge gathered from datasets in other application fields and leverage it for microscopy? Here, we aim to provide an overview of methods that have recently emerged to successfully train learning-based methods in bio-microscopy.


2020 ◽  
pp. paper75-1-paper75-11
Author(s):  
Viacheslav Danilov ◽  
Olga Gerget ◽  
Kirill Klyshnikov ◽  
Evgeny Ovcharenko ◽  
Alejandro Frangi

The article explores the application of machine learning approach to detect both single-vessel and multivessel coronary artery disease from X-ray angiography. Since the interpretation of coronary angiography images requires interventional cardiologists to have considerable training, our study is aimed at analysing, training, and assessing the potential of the existing object detectors for classifying and detecting coronary artery stenosis using angiographic imaging series. 100 patients who underwent coronary angiography at the Research Institute for Complex Issues of Cardiovascular Diseases were retrospectively enrolled in the study. To automate the medical data analysis, we examined and compared three models (SSD MobileNet V1, Faster-RCNN ResNet-50 V1, FasterRCNN NASNet) with various architecture, network complexity, and a number of weights. To compare developed deep learning models, we used the mean Average Precision (mAP) metric, training time, and inference time. Testing results show that the training/inference time is directly proportional to the model complexity. Thus, Faster-RCNN NASNet demonstrates the slowest inference time. Its mean inference time per one image made up 880 ms. In terms of accuracy, FasterRCNN ResNet-50 V1 demonstrates the highest prediction accuracy. This model has reached the mAP metric of 0.92 on the validation dataset. SSD MobileNet V1 has demonstrated the best inference time with the inference rate of 23 frames per second.


Author(s):  
Vahid Noroozi ◽  
Lei Zheng ◽  
Sara Bahaadini ◽  
Sihong Xie ◽  
Philip S. Yu

Verification determines whether two samples belong to the same class or not, and has important applications such as face and fingerprint verification, where thousands or millions of categories are present but each category has scarce labeled examples, presenting two major challenges for existing deep learning models. We propose a deep semi-supervised model named SEmi-supervised VErification Network (SEVEN) to address these challenges. The model consists of two complementary components. The generative component addresses the lack of supervision within each category by learning general salient structures from a large amount of data across categories. The discriminative component exploits the learned general features to mitigate the lack of supervision within categories, and also directs the generative component to find more informative structures of the whole data manifold. The two components are tied together in SEVEN to allow an end-to-end training of the two components. Extensive experiments on four verification tasks demonstrate that SEVEN significantly outperforms other state-of-the-art deep semi-supervised techniques when labeled data are in short supply. Furthermore, SEVEN is competitive with fully supervised baselines trained with a larger amount of labeled data. It indicates the importance of the generative component in SEVEN.


Author(s):  
Muhammad Siraj

In high population cities, the gatherings of large crowds in public places and public areas accelerate or jeopardize people safety and transportation, which is a key challenge to the researchers. Although much research has been carried out on crowd analytics, many of existing methods are problem-specific, i.e., methods learned from a specific scene cannot be properly adopted to other videos. Therefore, this presents weakness and the discovery of these researches, since additional training samples have to be found from diverse videos. This paper will investigate diverse scene crowd analytics with traditional and deep learning models. We will also consider pros and cons of these approaches. However, once general deep methods are investigated from large datasets, they can be consider to investigate different crowd videos and images. Therefore, it would be able to cope with the problem including to not limited to crowd density estimation, crowd people counting, and crowd event recognition. Deep learning models and approaches are required to have large datasets for training and testing. Many datasets are collected taking into account many different and various problems related to building crowd datasets, including manual annotations and increasing diversity of videos and images. In this paper, we will also propose many models of deep neural networks and training approaches to learn the feature modeling for crowd analytics.


Sign in / Sign up

Export Citation Format

Share Document