Color and Gradient Features for Text Segmentation from Video Frames

Marcatori conclusivi del discorso diretto in italiano antico

Romanistisches Jahrbuch ◽

10.1515/roja-2019-0004 ◽

2019 ◽

Vol 70 (1) ◽

pp. 105-122

Author(s):

Davide Mastrantonio

Keyword(s):

Word Reading ◽

Old French ◽

Spoken Word ◽

Text Segmentation ◽

Direct Speech ◽

Unequal Distribution ◽

Specific Subset ◽

Ancient Texts

Abstract In this paper we deal with a specific subset of direct speech markers, to which little or no attention has been given so far: the expressions which codify the ending of the direct speech (“marcatori conclusivi del discorso diretto”). We analyse these markers in Old Italian texts, comparing them with their Latin and, in some cases, Old French equivalents. In the introduction (§1), we take into account various general issues related to ancient texts, namely the practice of spoken-word reading and the lack of systematic punctuation marking that helps text segmentation. After that (§2), we classify the different strategies ancient writers had at their disposal to signal that a direct speech is over, hence that what follows has to be interpreted as the narrator voice; the markers are organized in a range from most explicit to most implicit (disse > quando ebbe detto > a queste parole > allora > [Ø]). Thereafter (§3), we focus on two specific markers, the participial marker (detto questo) and the “connector + finite tense” marker (quando ebbe detto questo) in a corpus of nine texts. Though these two markers are roughly synonymic, their occurrence is not uniform among the analysed texts. The explanation of their unequal distribution is that they belong to different discourse traditions (Diskurstraditionen): “quando + finite tense” is a typical expression attested in Romance narrations (the so-called “quand-Satz”), whereas detto questo appears to be dependent on Latin tradition.

Download Full-text

A Dataset of Photos and Videos for Digital Forensics Analysis Using Machine Learning Processing

Data ◽

10.3390/data6080087 ◽

2021 ◽

Vol 6 (8) ◽

pp. 87

Author(s):

Sara Ferreira ◽

Mário Antunes ◽

Manuel E. Correia

Keyword(s):

Machine Learning ◽

Digital Forensics ◽

State Of The Art ◽

Forensic Analysis ◽

Third Party ◽

Support Vector ◽

Multimedia Content ◽

Digital Forensic ◽

Video Frames ◽

Forensic Tools

Deepfake and manipulated digital photos and videos are being increasingly used in a myriad of cybercrimes. Ransomware, the dissemination of fake news, and digital kidnapping-related crimes are the most recurrent, in which tampered multimedia content has been the primordial disseminating vehicle. Digital forensic analysis tools are being widely used by criminal investigations to automate the identification of digital evidence in seized electronic equipment. The number of files to be processed and the complexity of the crimes under analysis have highlighted the need to employ efficient digital forensics techniques grounded on state-of-the-art technologies. Machine Learning (ML) researchers have been challenged to apply techniques and methods to improve the automatic detection of manipulated multimedia content. However, the implementation of such methods have not yet been massively incorporated into digital forensic tools, mostly due to the lack of realistic and well-structured datasets of photos and videos. The diversity and richness of the datasets are crucial to benchmark the ML models and to evaluate their appropriateness to be applied in real-world digital forensics applications. An example is the development of third-party modules for the widely used Autopsy digital forensic application. This paper presents a dataset obtained by extracting a set of simple features from genuine and manipulated photos and videos, which are part of state-of-the-art existing datasets. The resulting dataset is balanced, and each entry comprises a label and a vector of numeric values corresponding to the features extracted through a Discrete Fourier Transform (DFT). The dataset is available in a GitHub repository, and the total amount of photos and video frames is 40,588 and 12,400, respectively. The dataset was validated and benchmarked with deep learning Convolutional Neural Networks (CNN) and Support Vector Machines (SVM) methods; however, a plethora of other existing ones can be applied. Generically, the results show a better F1-score for CNN when comparing with SVM, both for photos and videos processing. CNN achieved an F1-score of 0.9968 and 0.8415 for photos and videos, respectively. Regarding SVM, the results obtained with 5-fold cross-validation are 0.9953 and 0.7955, respectively, for photos and videos processing. A set of methods written in Python is available for the researchers, namely to preprocess and extract the features from the original photos and videos files and to build the training and testing sets. Additional methods are also available to convert the original PKL files into CSV and TXT, which gives more flexibility for the ML researchers to use the dataset on existing ML frameworks and tools.

Download Full-text

A Deep Learning Approach for Text Segmentation in Document Analysis

2020 International Conference on Advanced Computing and Applications (ACOMP) ◽

10.1109/acomp50827.2020.00027 ◽

2020 ◽

Author(s):

Van-Linh Pham ◽

Xuan-Phung Pham ◽

Hoai-Nam Tran ◽

Sy-Tuyen Ho ◽

Vinh-Loi Ly ◽

...

Keyword(s):

Deep Learning ◽

Document Analysis ◽

Text Segmentation ◽

Learning Approach

Download Full-text

GPU-Enabled Serverless Workflows for Efficient Multimedia Processing

Applied Sciences ◽

10.3390/app11041438 ◽

2021 ◽

Vol 11 (4) ◽

pp. 1438

Author(s):

Sebastián Risco ◽

Germán Moltó

Keyword(s):

Video Processing ◽

Cost Effective ◽

Multimedia Processing ◽

Computing Services ◽

Video Frames ◽

Open Source Framework ◽

Cloud Infrastructures ◽

Aws Lambda ◽

Runtime Environments

Serverless computing has introduced scalable event-driven processing in Cloud infrastructures. However, it is not trivial for multimedia processing to benefit from the elastic capabilities featured by serverless applications. To this aim, this paper introduces the evolution of a framework to support the execution of customized runtime environments in AWS Lambda in order to accommodate workloads that do not satisfy its strict computational requirements: increased execution times and the ability to use GPU-based resources. This has been achieved through the integration of AWS Batch, a managed service to deploy virtual elastic clusters for the execution of containerized jobs. In addition, a Functions Definition Language (FDL) is introduced for the description of data-driven workflows of functions. These workflows can simultaneously leverage both AWS Lambda for the highly-scalable execution of short jobs and AWS Batch, for the execution of compute-intensive jobs that can profit from GPU-based computing. To assess the developed open-source framework, we executed a case study for efficient serverless video processing. The workflow automatically generates subtitles based on the audio and applies GPU-based object recognition to the video frames, thus simultaneously harnessing different computing services. This allows for the creation of cost-effective highly-parallel scale-to-zero serverless workflows in AWS.

Download Full-text

A Systematic Deep Learning Based Overhead Tracking and Counting System Using RGB-D Remote Cameras

Applied Sciences ◽

10.3390/app11125503 ◽

2021 ◽

Vol 11 (12) ◽

pp. 5503

Author(s):

Munkhjargal Gochoo ◽

Syeda Amna Rizwan ◽

Yazeed Yasin Ghadi ◽

Ahmad Jalal ◽

Kibum Kim

Keyword(s):

Deep Learning ◽

Human Head ◽

Point Clouds ◽

Head Tracking ◽

Complex Environments ◽

People Tracking ◽

Practical Applications ◽

Remote Cameras ◽

Video Frames ◽

Benchmark Datasets

Automatic head tracking and counting using depth imagery has various practical applications in security, logistics, queue management, space utilization and visitor counting. However, no currently available system can clearly distinguish between a human head and other objects in order to track and count people accurately. For this reason, we propose a novel system that can track people by monitoring their heads and shoulders in complex environments and also count the number of people entering and exiting the scene. Our system is split into six phases; at first, preprocessing is done by converting videos of a scene into frames and removing the background from the video frames. Second, heads are detected using Hough Circular Gradient Transform, and shoulders are detected by HOG based symmetry methods. Third, three robust features, namely, fused joint HOG-LBP, Energy based Point clouds and Fused intra-inter trajectories are extracted. Fourth, the Apriori-Association is implemented to select the best features. Fifth, deep learning is used for accurate people tracking. Finally, heads are counted using Cross-line judgment. The system was tested on three benchmark datasets: the PCDS dataset, the MICC people counting dataset and the GOTPD dataset and counting accuracy of 98.40%, 98%, and 99% respectively was achieved. Our system obtained remarkable results.

Download Full-text

Performance Evaluation of Background Subtraction Techniques for Video Frames

2021 International Conference on Artificial Intelligence (ICAI) ◽

10.1109/icai52203.2021.9445253 ◽

2021 ◽

Author(s):

Salman Qasim ◽

Kaleem Nawaz Khan ◽

Miao Yu ◽

Muhammad Salman Khan

Keyword(s):

Performance Evaluation ◽

Background Subtraction ◽

Video Frames

Download Full-text

Two Stage Continuous Gesture Recognition Based on Deep Learning

Electronics ◽

10.3390/electronics10050534 ◽

2021 ◽

Vol 10 (5) ◽

pp. 534

Author(s):

Huogen Wang

Keyword(s):

Gesture Recognition ◽

Large Scale ◽

Short Term Memory ◽

Short Term ◽

Hand Motion ◽

Spatiotemporal Features ◽

Spatiotemporal Information ◽

Video Frames ◽

Depth Sequences

The paper proposes an effective continuous gesture recognition method, which includes two modules: segmentation and recognition. In the segmentation module, the video frames are divided into gesture frames and transitional frames by using the information of hand motion and appearance, and continuous gesture sequences are segmented into isolated sequences. In the recognition module, our method exploits the spatiotemporal information embedded in RGB and depth sequences. For the RGB modality, our method adopts Convolutional Long Short-Term Memory Networks to learn long-term spatiotemporal features from short-term spatiotemporal features obtained from a 3D convolutional neural network. For the depth modality, our method converts a sequence into Dynamic Images and Motion Dynamic Images through weighted rank pooling and feed them into Convolutional Neural Networks, respectively. Our method has been evaluated on both ChaLearn LAP Large-scale Continuous Gesture Dataset and Montalbano Gesture Dataset and achieved state-of-the-art performance.

Download Full-text

Video Frame Interpolation via Deformable Separable Convolution

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6634 ◽

2020 ◽

Vol 34 (07) ◽

pp. 10607-10614 ◽

Cited By ~ 2

Author(s):

Xianhang Cheng ◽

Zhenzhong Chen

Keyword(s):

State Of The Art ◽

Video Frame ◽

Kernel Size ◽

Frame Interpolation ◽

Interpolation Methods ◽

Video Frames ◽

Convolution Process ◽

Strong Performance ◽

Existing Frames ◽

Better Than

Learning to synthesize non-existing frames from the original consecutive video frames is a challenging task. Recent kernel-based interpolation methods predict pixels with a single convolution process to replace the dependency of optical flow. However, when scene motion is larger than the pre-defined kernel size, these methods yield poor results even though they take thousands of neighboring pixels into account. To solve this problem in this paper, we propose to use deformable separable convolution (DSepConv) to adaptively estimate kernels, offsets and masks to allow the network to obtain information with much fewer but more relevant pixels. In addition, we show that the kernel-based methods and conventional flow-based methods are specific instances of the proposed DSepConv. Experimental results demonstrate that our method significantly outperforms the other kernel-based interpolation methods and shows strong performance on par or even better than the state-of-the-art algorithms both qualitatively and quantitatively.

Download Full-text

Detecting Toe-Off Events Utilizing a Vision-Based Method

Entropy ◽

10.3390/e21040329 ◽

2019 ◽

Vol 21 (4) ◽

pp. 329 ◽

Cited By ~ 4

Author(s):

Yunqi Tang ◽

Zhuorong Li ◽

Huawei Tian ◽

Jianwei Ding ◽

Bingxian Lin

Keyword(s):

Wearable Sensors ◽

Gait Pattern ◽

Video Data ◽

Detection Methods ◽

Detection Accuracy ◽

Public Database ◽

Video Frames ◽

Different Types ◽

Events Detection ◽

Good Detection

Detecting gait events from video data accurately would be a challenging problem. However, most detection methods for gait events are currently based on wearable sensors, which need high cooperation from users and power consumption restriction. This study presents a novel algorithm for achieving accurate detection of toe-off events using a single 2D vision camera without the cooperation of participants. First, a set of novel feature, namely consecutive silhouettes difference maps (CSD-maps), is proposed to represent gait pattern. A CSD-map can encode several consecutive pedestrian silhouettes extracted from video frames into a map. And different number of consecutive pedestrian silhouettes will result in different types of CSD-maps, which can provide significant features for toe-off events detection. Convolutional neural network is then employed to reduce feature dimensions and classify toe-off events. Experiments on a public database demonstrate that the proposed method achieves good detection accuracy.

Download Full-text

Applying the Bell’s Test to Chinese Texts

Entropy ◽

10.3390/e22030275 ◽

2020 ◽

Vol 22 (3) ◽

pp. 275

Author(s):

Igor A. Bessmertny ◽

Xiaoxi Huang ◽

Aleksei V. Platonov ◽

Chuqiao Yu ◽

Julia A. Koroleva

Keyword(s):

Quantum Entanglement ◽

Chinese Text ◽

Search Engines ◽

Text Processing ◽

Word Segmentation ◽

Significant Problem ◽

Text Segmentation ◽

Text Documents ◽

Segmentation Algorithms ◽

Chinese Texts

Search engines are able to find documents containing patterns from a query. This approach can be used for alphabetic languages such as English. However, Chinese is highly dependent on context. The significant problem of Chinese text processing is the missing blanks between words, so it is necessary to segment the text to words before any other action. Algorithms for Chinese text segmentation should consider context; that is, the word segmentation process depends on other ideograms. As the existing segmentation algorithms are imperfect, we have considered an approach to build the context from all possible n-grams surrounding the query words. This paper proposes a quantum-inspired approach to rank Chinese text documents by their relevancy to the query. Particularly, this approach uses Bell’s test, which measures the quantum entanglement of two words within the context. The contexts of words are built using the hyperspace analogue to language (HAL) algorithm. Experiments fulfilled in three domains demonstrated that the proposed approach provides acceptable results.

Download Full-text