2021 ◽  
Vol 4 (1) ◽  
pp. 15-28
Vladislav Li ◽  
Georgios Amponis ◽  
Jean-Christophe Nebel ◽  
Vasileios Argyriou ◽  

Developments in the field of neural networks, deep learning, and increases in computing systems’ capacity have allowed for a significant performance boost in scene semantic information extraction algorithms and their respective mechanisms. The work presented in this paper investigates the performance of various object classification- recognition frameworks and proposes a novel framework, which incorporates Super-Resolution as a preprocessing method, along with YOLO/Retina as the deep neural network component. The resulting scene analysis framework was fine-tuned and benchmarked using the COCO dataset, with the results being encouraging. The presented framework can potentially be utilized, not only in still image recognition scenarios but also in video processing.

2020 ◽  
Vol 11 (1) ◽  
pp. 10
Muchun Su ◽  
Diana Wahyu Hayati ◽  
Shaowu Tseng ◽  
Jiehhaur Chen ◽  
Hsihsien Wei

Health care for independently living elders is more important than ever. Automatic recognition of their Activities of Daily Living (ADL) is the first step to solving the health care issues faced by seniors in an efficient way. The paper describes a Deep Neural Network (DNN)-based recognition system aimed at facilitating smart care, which combines ADL recognition, image/video processing, movement calculation, and DNN. An algorithm is developed for processing skeletal data, filtering noise, and pattern recognition for identification of the 10 most common ADL including standing, bending, squatting, sitting, eating, hand holding, hand raising, sitting plus drinking, standing plus drinking, and falling. The evaluation results show that this DNN-based system is suitable method for dealing with ADL recognition with an accuracy rate of over 95%. The findings support the feasibility of this system that is efficient enough for both practical and academic applications.

2018 ◽  
Vol 146 ◽  
pp. 305-319 ◽  
Charis Lanaras ◽  
José Bioucas-Dias ◽  
Silvano Galliani ◽  
Emmanuel Baltsavias ◽  
Konrad Schindler

Yang Fang ◽  
Xiang Zhao ◽  
Zhen Tan

Network Embedding (NE) is an important method to learn the representations of network via a low-dimensional space. Conventional NE models focus on capturing the structure information and semantic information of vertices while neglecting such information for edges. In this work, we propose a novel NE model named BimoNet to capture both the structure and semantic information of edges. BimoNet is composed of two parts, i.e., the bi-mode embedding part and the deep neural network part. For bi-mode embedding part, the first mode named add-mode is used to express the entity-shared features of edges and the second mode named subtract-mode is employed to represent the entity-specific features of edges. These features actually reflect the semantic information. For deep neural network part, we firstly regard the edges in a network as nodes, and the vertices as links, which will not change the overall structure of the whole network. Then we take the nodes' adjacent matrix as the input of the deep neural network as it can obtain similar representations for nodes with similar structure. Afterwards, by jointly optimizing the objective function of these two parts, BimoNet could preserve both the semantic and structure information of edges. In experiments, we evaluate BimoNet on three real-world datasets and task of relation extraction, and BimoNet is demonstrated to outperform state-of-the-art baseline models consistently and significantly.

2022 ◽  
Vol 11 (1) ◽  
Fei Wang ◽  
Chenglong Wang ◽  
Mingliang Chen ◽  
Wenlin Gong ◽  
Yu Zhang ◽  

AbstractGhost imaging (GI) facilitates image acquisition under low-light conditions by single-pixel measurements and thus has great potential in applications in various fields ranging from biomedical imaging to remote sensing. However, GI usually requires a large amount of single-pixel samplings in order to reconstruct a high-resolution image, imposing a practical limit for its applications. Here we propose a far-field super-resolution GI technique that incorporates the physical model for GI image formation into a deep neural network. The resulting hybrid neural network does not need to pre-train on any dataset, and allows the reconstruction of a far-field image with the resolution beyond the diffraction limit. Furthermore, the physical model imposes a constraint to the network output, making it effectively interpretable. We experimentally demonstrate the proposed GI technique by imaging a flying drone, and show that it outperforms some other widespread GI techniques in terms of both spatial resolution and sampling ratio. We believe that this study provides a new framework for GI, and paves a way for its practical applications.

VLSI Design ◽  
2012 ◽  
Vol 2012 ◽  
pp. 1-14 ◽  
Khaled Jerbi ◽  
Mickaël Raulet ◽  
Olivier Déforges ◽  
Mohamed Abid

In this paper, we introduce the Reconfigurable Video Coding (RVC) standard based on the idea that video processing algorithms can be defined as a library of components that can be updated and standardized separately. MPEG RVC framework aims at providing a unified high-level specification of current MPEG coding technologies using a dataflow language called Cal Actor Language (CAL). CAL is associated with a set of tools to design dataflow applications and to generate hardware and software implementations. Before this work, the existing CAL hardware compilers did not support high-level features of the CAL. After presenting the main notions of the RVC standard, this paper introduces an automatic transformation process that analyses the non-compliant features and makes the required changes in the intermediate representation of the compiler while keeping the same behavior. Finally, the implementation results of the transformation on video and still image decoders are summarized. We show that the obtained results can largely satisfy the real time constraints for an embedded design on FPGA as we obtain a throughput of 73 FPS for MPEG 4 decoder and 34 FPS for coding and decoding process of the LAR coder using a video of CIF image size. This work resolves the main limitation of hardware generation from CAL designs.

Sign in / Sign up

Export Citation Format

Share Document