GPU Accelerated real-time Melanoma Detection

Melanoma is recognized as one of the most dangerous type of skin cancer. A novel method to detect melanoma in real time with the help of Graphical Processing Unit (GPU) is proposed. Existing systems can process medical images and perform a diagnosis based on Image Processing technique and Artiﬁcial Intelligence. They are also able to perform video processing with the help of large hardware resources at the backend. This incurs signiﬁcantly higher costs and space and are complex by both software and hardware. Graphical Processing Units have high processing capabilities compared to a Central Processing Unit of a system. Various approaches were used for implementing real time detection of Melanoma. The results and analysis based on various approaches and the best approach based on our study is discussed in this work. A performance analysis for the approaches on the basis of CPU and GPU environment is also discussed. The proposed system will perform real-time analysis of live medical video data and performs diagnosis. The system when implemented yielded an accuracy of 90.133% which is comparable to existing systems.

Download Full-text

Real Time Defects of Beer Bottle Detection System Based on DSP

Key Engineering Materials ◽

10.4028/www.scientific.net/kem.392-394.414 ◽

2008 ◽

Vol 392-394 ◽

pp. 414-418 ◽

Cited By ~ 1

Author(s):

B. Ren ◽

Tan Cheng Xie ◽

X. Nan

Keyword(s):

Real Time ◽

Video Processing ◽

Detection System ◽

Processing System ◽

Ccd Camera ◽

Processing Technique ◽

Image Processing Technique ◽

Experimental Result ◽

Detection Techniques ◽

Video Images

The paper analyses the problem of beer bottles detection techniques on the beer bottles production line, uses digital image processing technique on the beer bottles online defect detection. The paper puts forward the designing ideas of the hardware, developing flow of the software and the algorithm of beer bottles detection. TMSDM642 is used to set up the real-time video processing system of the hardware .The hardware system is mainly composed of three parts: the part of memory, the part of the input and the part of the output. When beer bottles are put into the work area, the video images of the bottle-mouth and bottle-bottom will be gained by the CCD camera, firstly, preprocessing is used to eliminate video image noise. Secondly, the image segmentation algorithm is used to detect defects in video images. Lastly the goal of extracting defects will be accomplished. The experimental result indicated that this system may effectively exam the flaw or the unqualified beer bottles.

Download Full-text

P–029 Identification of spermatozoa by unsupervised learning from video data

Human Reproduction ◽

10.1093/humrep/deab130.028 ◽

2021 ◽

Vol 36 (Supplement_1) ◽

Author(s):

V Thambawita ◽

T B Haugen ◽

M H Stensen ◽

O Witczak ◽

H L Hammer ◽

...

Keyword(s):

Real Time ◽

Video Data ◽

Training Data ◽

Semen Sample ◽

Generative Adversarial Networks ◽

Processing Unit ◽

Time Analysis ◽

Real Time Analysis ◽

Computer Aided ◽

Unsupervised Methods

Abstract Study question Can artificial intelligence (AI) algorithms identify spermatozoa in a semen sample without using training data annotated by professionals? Summary answer Unsupervised AI methods can discriminate the spermatozoon from other cells and debris. These unsupervised methods may have a potential for several applications in reproductive medicine. What is known already Identification of individual sperm is essential to assess a given sperm sample’s motility behaviour. Existing computer-aided systems need training data based on annotations by professionals, which is resource demanding. On the other hand, data analysed by unsupervised machine learning algorithms can improve supervised algorithms that are more stable for clinical applications. Therefore, unsupervised sperm identification can improve computer-aided sperm analysis systems predicting different aspects of sperm samples. Other possible applications are assessing kinematics and counting of spermatozoa. Study design, size, duration Three sperm-like paint images were manipulated using a graphic design tool and used to train our AI system. Two paintings have an ash colour background and randomly distributed white colour circles, and one painting has a predefined pattern of circles. Selected semen sample videos from a public dataset with videos obtained from 85 participants were used to test our AI system. Participants/materials, setting, methods Generative adversarial networks (GANs) have become common AI methods to process data in an unsupervised way. Based on single image frames extracted from videos, a GAN (SinGAN) can be trained to determine and track locations of sperms by translating the real images into localization paintings. The resulting model showed the potential of identifying the presence of sperms without any prior knowledge about data. Main results and the role of chance Visual comparisons of localization paintings to real sperm images show that inverse training of SinGANs can track sperms. Converting colour frames into grayscale frames and using grayscale synthetic sperm-like frames showed the best visual quality of generated localization paintings of sperm frames. Feeding real sperm video frames to the SinGAN at different scaling factors, which is defining the resolution of the input image, showed different quality levels of generated sperm localization paintings. A sperm frame given to the algorithm with a scaling factor of one leads to random sperm tracking, while the scales two to four result in more accurate localization maps than scaling levels five to eight. In contrast, scales from six to eight result in an output close to the input frame. The proposed method is robust in terms of the number of spermatozoa, meaning that the detection works well for samples with a low or high sperm count. For visual comparisons, visit our Github page: https://vlbthambawita.github.io/singan-sperm/. The sperm tracking speed of our SinGAN using an NVIDIA 1080 graphic processing unit, is around 17 frames per second, which can be improved by using parallel video processing capabilities. This shows the capability of using this method for real-time analysis. Limitations, reasons for caution Unsupervised methods are hard to train, and the results need human verification. The proposed method will need quality control and must be standardized. Unsupervised sperm tracking SinGAN may identify blurry bright spots as non-existing sperm heads which may restrict the use of SinGAN sperm tracking for sperm counting. Wider implications of the findings: Assessment of semen samples according to the WHO guidelines is subjective and resource-demanding. This unsupervised model might be used to develop new systems for less time-consuming and more accurate evaluation of semen samples. It may also be used for real-time analysis of prepared spermatozoa for use in assisted reproduction technology. Trial registration number N/A

Download Full-text

Deep-Framework: A Distributed, Scalable, and Edge-Oriented Framework for Real-Time Analysis of Video Streams

Sensors ◽

10.3390/s21124045 ◽

2021 ◽

Vol 21 (12) ◽

pp. 4045

Author(s):

Alessandro Sassu ◽

Jose Francisco Saenz-Cogollo ◽

Maurizio Agelli

Keyword(s):

Deep Learning ◽

Real Time ◽

Video Data ◽

Video Analytics ◽

Web Based ◽

Real Time Analysis ◽

Open Source Framework ◽

Cluster Configuration ◽

Time Requirements ◽

High Level

Edge computing is the best approach for meeting the exponential demand and the real-time requirements of many video analytics applications. Since most of the recent advances regarding the extraction of information from images and video rely on computation heavy deep learning algorithms, there is a growing need for solutions that allow the deployment and use of new models on scalable and flexible edge architectures. In this work, we present Deep-Framework, a novel open source framework for developing edge-oriented real-time video analytics applications based on deep learning. Deep-Framework has a scalable multi-stream architecture based on Docker and abstracts away from the user the complexity of cluster configuration, orchestration of services, and GPU resources allocation. It provides Python interfaces for integrating deep learning models developed with the most popular frameworks and also provides high-level APIs based on standard HTTP and WebRTC interfaces for consuming the extracted video data on clients running on browsers or any other web-based platform.

Download Full-text

Significance of Parallel Computing on the Performance of Digital Image Correlation Algorithms in MATLAB

Designs ◽

10.3390/designs5010015 ◽

2021 ◽

Vol 5 (1) ◽

pp. 15

Author(s):

Andreas Thoma ◽

Abhijith Moni ◽

Sridhar Ravi

Keyword(s):

Digital Image Correlation ◽

Real Time ◽

Digital Image ◽

Correct Choice ◽

Image Correlation ◽

Particle Swarm Algorithm ◽

Processing Unit ◽

Time Analysis ◽

The Real ◽

Real Time Analysis

Digital Image Correlation (DIC) is a powerful tool used to evaluate displacements and deformations in a non-intrusive manner. By comparing two images, one from the undeformed reference states of the sample and the other from the deformed target state, the relative displacement between the two states is determined. DIC is well-known and often used for post-processing analysis of in-plane displacements and deformation of the specimen. Increasing the analysis speed to enable real-time DIC analysis will be beneficial and expand the scope of this method. Here we tested several combinations of the most common DIC methods in combination with different parallelization approaches in MATLAB and evaluated their performance to determine whether the real-time analysis is possible with these methods. The effects of computing with different hardware settings were also analyzed and discussed. We found that implementation problems can reduce the efficiency of a theoretically superior algorithm, such that it becomes practically slower than a sub-optimal algorithm. The Newton–Raphson algorithm in combination with a modified particle swarm algorithm in parallel image computation was found to be most effective. This is contrary to theory, suggesting that the inverse-compositional Gauss–Newton algorithm is superior. As expected, the brute force search algorithm is the least efficient method. We also found that the correct choice of parallelization tasks is critical in attaining improvements in computing speed. A poorly chosen parallelization approach with high parallel overhead leads to inferior performance. Finally, irrespective of the computing mode, the correct choice of combinations of integer-pixel and sub-pixel search algorithms is critical for efficient analysis. The real-time analysis using DIC will be difficult on computers with standard computing capabilities, even if parallelization is implemented, so the suggested solution would be to use graphics processing unit (GPU) acceleration.

Download Full-text

Note: Quasi-real-time analysis of dynamic near field scattering data using a graphics processing unit

Review of Scientific Instruments ◽

10.1063/1.4755747 ◽

2012 ◽

Vol 83 (10) ◽

pp. 106101 ◽

Cited By ~ 18

Author(s):

G. Cerchiari ◽

F. Croccolo ◽

F. Cardinaux ◽

F. Scheffold

Keyword(s):

Real Time ◽

Graphics Processing Unit ◽

Near Field ◽

Scattering Data ◽

Processing Unit ◽

Time Analysis ◽

Real Time Analysis ◽

Graphics Processing

Download Full-text

An efficient solution for fast generation of multi-GNSS real-time products

10.5194/egusphere-egu21-8306 ◽

2021 ◽

Author(s):

Hongjie Zheng ◽

Hanyu Chang ◽

Yongqiang Yuan ◽

Qingyun Wang ◽

Yuhao Li ◽

...

Keyword(s):

Data Processing ◽

Real Time ◽

Processing Time ◽

Efficient Solution ◽

Gpu Computing ◽

Sampling Rate ◽

Precise Orbit Determination ◽

Processing Unit ◽

Processing Efficiency ◽

Central Processing

Global navigation satellite systems (GNSS) have been playing an indispensable role in providing positioning, navigation and timing (PNT) services to global users. Over the past few years, GNSS have been rapidly developed with abundant networks, modern constellations, and multi-frequency observations. To take full advantages of multi-constellation and multi-frequency GNSS, several new mathematic models have been developed such as multi-frequency ambiguity resolution (AR) and the uncombined data processing with raw observations. In addition, new GNSS products including the uncalibrated phase delay (UPD), the observable signal bias (OSB), and the integer recovery clock (IRC) have been generated and provided by analysis centers to support advanced GNSS applications.&#160;&#160;&#160;&#160;&#160;&#160; However, the increasing number of GNSS observations raises a great challenge to the fast generation of multi-constellation and multi-frequency products. In this study, we proposed an efficient solution to realize the fast updating of multi-GNSS real-time products by making full use of the advanced computing techniques. Firstly, instead of the traditional vector operations, the &#8220;level-3 operations&#8221; (matrix by matrix) of Basic Liner Algebra Subprograms (BLAS) is used as much as possible in the Least Square (LSQ) processing, which can improve the efficiency due to the central processing unit (CPU) optimization and faster memory data transmission. Furthermore, most steps of multi-GNSS data processing are transformed from serial mode to parallel mode to take advantage of the multi-core CPU architecture and graphics processing unit (GPU) computing resources. Moreover, we choose the OpenBLAS library for matrix computation as it has good performances in parallel environment.&#160;&#160;&#160;&#160;&#160;&#160; The proposed method is then validated on a 3.30 GHz AMD CPU with 6 cores. The result demonstrates that the proposed method can substantially improve the processing efficiency for multi-GNSS product generation. For the precise orbit determination (POD) solution with 150 ground stations and 128 satellites (GPS/BDS/Galileo/GLONASS/QZSS) in ionosphere-free (IF) mode, the processing time can be shortened from 50 to 10 minutes, which can guarantee the hourly updating of multi-GNSS ultra-rapid orbit products. The processing time of uncombined POD can also be reduced by about 80%. Meanwhile, the multi-GNSS real-time clock products can be easily generated in 5 seconds or even higher sampling rate. In addition, the processing efficiency of UPD and OSB products can also be increased by 4-6 times.

Download Full-text

New Optimal Solutions for Real-Time Reconfigurable Periodic Asynchronous Operating System Tasks with Minimizations of Response Time

International Journal of System Dynamics Applications ◽

10.4018/ijsda.2012100105 ◽

2012 ◽

Vol 1 (4) ◽

pp. 88-131 ◽

Cited By ~ 2

Author(s):

Hamza Gharsellaoui ◽

Mohamed Khalgui ◽

Samir Ben Ahmed

Keyword(s):

Response Time ◽

Real Time ◽

Scheduling Algorithm ◽

Processing Unit ◽

Software Faults ◽

Worst Case ◽

Agent Based ◽

Central Processing ◽

Technical Solutions ◽

Task Systems

Scheduling tasks is an essential requirement in most real-time and embedded systems, but leads to unwanted central processing unit (CPU) overheads. The authors present a real-time schedulability algorithm for preemptable, asynchronous and periodic reconfigurable task systems with arbitrary relative deadlines, scheduled on a uniprocessor by an optimal scheduling algorithm based on the earliest deadline first (EDF) principles and on the dynamic reconfiguration. A reconfiguration scenario is assumed to be a dynamic automatic operation allowing addition, removal or update of operating system’s (OS) functional asynchronous tasks. When such a scenario is applied to save the system at the occurrence of hardware-software faults, or to improve its performance, some real-time properties can be violated. The authors propose an intelligent agent-based architecture where a software agent is used to satisfy the user requirements and to respect time constraints. The agent dynamically provides precious technical solutions for users when these constraints are not verified, by removing tasks according to predefined heuristic, or by modifying the worst case execution times (WCETs), periods, and deadlines of tasks in order to meet deadlines and to minimize their response time. They implement the agent to support these services which are applied to a Blackberry Bold 9700 and to a Volvo system and present and discuss the results of experiments.

Download Full-text

SIAT: A Distributed Video Analytics Framework for Intelligent Video Surveillance

Symmetry ◽

10.3390/sym11070911 ◽

2019 ◽

Vol 11 (7) ◽

pp. 911 ◽

Cited By ~ 5

Author(s):

Md Azher Uddin ◽

Aftab Alam ◽

Nguyen Anh Tu ◽

Md Siyamul Islam ◽

Young-Koo Lee

Keyword(s):

Distributed Computing ◽

Real Time ◽

Video Surveillance ◽

Video Processing ◽

Large Scale ◽

Distributed Processing ◽

Content Management ◽

Video Data ◽

Video Analytics ◽

Intelligent Video Surveillance

In recent years, the amount of intelligent CCTV cameras installed in public places for surveillance has increased enormously and as a result, a large amount of video data is produced every moment. Due to this situation, there is an increasing request for the distributed processing of large-scale video data. In an intelligent video analytics platform, a submitted unstructured video undergoes through several multidisciplinary algorithms with the aim of extracting insights and making them searchable and understandable for both human and machine. Video analytics have applications ranging from surveillance to video content management. In this context, various industrial and scholarly solutions exist. However, most of the existing solutions rely on a traditional client/server framework to perform face and object recognition while lacking the support for more complex application scenarios. Furthermore, these frameworks are rarely handled in a scalable manner using distributed computing. Besides, existing works do not provide any support for low-level distributed video processing APIs (Application Programming Interfaces). They also failed to address a complete service-oriented ecosystem to meet the growing demands of consumers, researchers and developers. In order to overcome these issues, in this paper, we propose a distributed video analytics framework for intelligent video surveillance known as SIAT. The proposed framework is able to process both the real-time video streams and batch video analytics. Each real-time stream also corresponds to batch processing data. Hence, this work correlates with the symmetry concept. Furthermore, we introduce a distributed video processing library on top of Spark. SIAT exploits state-of-the-art distributed computing technologies with the aim to ensure scalability, effectiveness and fault-tolerance. Lastly, we implant and evaluate our proposed framework with the goal to authenticate our claims.

Download Full-text

Real-Time 3D Reconstruction of Thin Surface Based on Laser Line Scanner

Sensors ◽

10.3390/s20020534 ◽

2020 ◽

Vol 20 (2) ◽

pp. 534 ◽

Cited By ~ 1

Author(s):

Yuan He ◽

Shunyi Zheng ◽

Fengbo Zhu ◽

Xia Huang

Keyword(s):

3D Reconstruction ◽

Real Time ◽

Laser Line ◽

Frame Rate ◽

Processing Unit ◽

Memory Usage ◽

Central Processing ◽

Time Performance ◽

Topological Errors ◽

Line Scanner

The truncated signed distance field (TSDF) has been applied as a fast, accurate, and flexible geometric fusion method in 3D reconstruction of industrial products based on a hand-held laser line scanner. However, this method has some problems for the surface reconstruction of thin products. The surface mesh will collapse to the interior of the model, resulting in some topological errors, such as overlap, intersections, or gaps. Meanwhile, the existing TSDF method ensures real-time performance through significant graphics processing unit (GPU) memory usage, which limits the scale of reconstruction scene. In this work, we propose three improvements to the existing TSDF methods, including: (i) a thin surface attribution judgment method in real-time processing that solves the problem of interference between the opposite sides of the thin surface; we distinguish measurements originating from different parts of a thin surface by the angle between the surface normal and the observation line of sight; (ii) a post-processing method to automatically detect and repair the topological errors in some areas where misjudgment of thin-surface attribution may occur; (iii) a framework that integrates the central processing unit (CPU) and GPU resources to implement our 3D reconstruction approach, which ensures real-time performance and reduces GPU memory usage. The proposed results show that this method can provide more accurate 3D reconstruction of a thin surface, which is similar to the state-of-the-art laser line scanners with 0.02 mm accuracy. In terms of performance, the algorithm can guarantee a frame rate of more than 60 frames per second (FPS) with the GPU memory footprint under 500 MB. In total, the proposed method can achieve a real-time and high-precision 3D reconstruction of a thin surface.

Download Full-text

Lightweight Architecture for Real-Time Hand Pose Estimation with Deep Supervision

Symmetry ◽

10.3390/sym11040585 ◽

2019 ◽

Vol 11 (4) ◽

pp. 585

Author(s):

Yufei Wu ◽

Xiaofei Ruan ◽

Yu Zhang ◽

Huang Zhou ◽

Shengyu Du ◽

...

Keyword(s):

Real Time ◽

Pose Estimation ◽

Graphics Processing Unit ◽

Parallel Execution ◽

Processing Unit ◽

Network Efficiency ◽

Hand Pose Estimation ◽

Central Processing ◽

Deployment Optimization ◽

Hand Pose

The high demand for computational resources severely hinders the deployment of deep learning applications in resource-limited devices. In this work, we investigate the under-studied but practically important network efficiency problem and present a new, lightweight architecture for hand pose estimation. Our architecture is essentially a deeply-supervised pruned network in which less important layers and branches are removed to achieve a higher real-time inference target on resource-constrained devices without much accuracy compromise. We further make deployment optimization to facilitate the parallel execution capability of central processing units (CPUs). We conduct experiments on NYU and ICVL datasets and develop a demo1 using the RealSense camera. Experimental results show our lightweight network achieves an average running time of 32 ms (31.3 FPS, the original is 22.7 FPS) before deployment optimization. Meanwhile, the model is only about half parameters size of the original one with 11.9 mm mean joint error. After the further optimization with OpenVINO, the optimized model can run at 56 FPS on CPUs in contrast to 44 FPS running on a graphics processing unit (GPU) (Tensorflow) and it can achieve the real-time goal.

Download Full-text