Research of a MapReduce Model to Process the Traffic Big Data

Normally, the job of the Traffic Data Processing Center (TDPC) is to monitor and retain data. There is a tendency to put more capability into the TDPC, such as ad-hoc query for speeding car identification and feedback abnormal traffic information. Thus we definitely need to think about what can be kept in working storage and how to analysis it. Obviously, the ordinary database cannot handle the massive dataset and complex ad-hoc query. MapReduce is a popular and widely used fine grain parallel runtime, which is developed for high performance processing of large scale dataset. In this paper, we propose MRTP, a MapReduce Traffic Processing system based on Hive/Hadoop frameworks. A distributed file system HDFS is used in MRTP for fast data sharing and query. MRTP supports fast locating speeding car and also optimizes the route to catch fugitive. Our results show that the model achieves a higher efficiency.

Download Full-text

A MapReduce Clone Car Identification Model over Traffic Data Stream

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.346.117 ◽

2013 ◽

Vol 346 ◽

pp. 117-122

Author(s):

Wen Chuan Yang ◽

Guang Jie Lin ◽

Jiang Yong Wang

Keyword(s):

High Performance ◽

Large Scale ◽

Ad Hoc ◽

Image Data ◽

Traffic Information ◽

Identification System ◽

Traffic Data ◽

Traffic Surveillance ◽

Fine Grain ◽

Large Scale Dataset

Accompany the widely use of Intelligent Traffic in China, all traffic input data streams to the Traffic Surveillance Center (TSC). Some metropolitan TSC, such as in Beijing, produces up to 18 million records and 1T image data arriving every hour. Normally, the job of the TSC is to monitor and retain data. There is a tendency to put more capability into the TSC, such as ad-hoc query for clone car identification and feedback abnormal traffic information. Thus we definitely need to think about what can be kept in working storage and how to analysis it. Obviously, the ordinary database cannot handle the massive dataset and complex ad-hoc query. MapReduce is a popular and widely used fine grain parallel runtime, which is developed for high performance processing of large scale dataset. In this paper, we propose CarMR, a MapReduce Clone Car Identification system based on Hive/Hadoop frameworks. A distributed file system HDFS is used in CarMR for fast data sharing and query. CarMR supports fast locating clone car and also optimizes the route to catch fugitive. Our results show that the model achieves a higher efficiency.

Download Full-text

Massive Image Treatment System Based on Cloud Computing Platform

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.687-691.3733 ◽

2014 ◽

Vol 687-691 ◽

pp. 3733-3737

Author(s):

Dan Wu ◽

Ming Quan Zhou ◽

Rong Fang Bie

Keyword(s):

Image Processing ◽

Cloud Computing ◽

High Performance ◽

Large Scale ◽

Processing System ◽

Virtual Space ◽

Image Processing System ◽

Computing Platform ◽

Simulation Calculation ◽

Computer Resources

Massive image processing technology requires high requirements of processor and memory, and it needs to adopt high performance of processor and the large capacity memory. While the single or single core processing and traditional memory can’t satisfy the need of image processing. This paper introduces the cloud computing function into the massive image processing system. Through the cloud computing function it expands the virtual space of the system, saves computer resources and improves the efficiency of image processing. The system processor uses multi-core DSP parallel processor, and develops visualization parameter setting window and output results using VC software settings. Through simulation calculation we get the image processing speed curve and the system image adaptive curve. It provides the technical reference for the design of large-scale image processing system.

Download Full-text

ON A NETWORK SENSING PROBLEM

Journal of Interconnection Networks ◽

10.1142/s0219265906001582 ◽

2006 ◽

Vol 07 (01) ◽

pp. 63-73 ◽

Cited By ~ 3

Author(s):

WEIZHEN GU ◽

D. FRANK HSU ◽

XINGDE JIA

Keyword(s):

Traffic Flow ◽

Wireless Ad Hoc Networks ◽

Information Needs ◽

Large Scale ◽

Ad Hoc ◽

Telecommunication Networks ◽

Traffic Information ◽

Other Information ◽

Hoc Networks ◽

Entire Network

Live traffic flow information can help improve the efficiency of a communication network. There are many ways available to monitor the traffic flow of a network. In this paper, we propose a very efficient monitoring strategy. This strategy not only reduces the number of nodes to be monitored but also determines the complete traffic information of the entire network using the information from the monitored nodes. The strategy is optimal for monitoring a network because it reduces the number of monitored nodes to a minimum. Fast algorithms are also presented in this paper to compute the traffic information for the entire network based on the information collected from the monitored nodes. The monitoring scheme discussed in this paper can be applied to the internet, telecommunication networks, wireless ad hoc networks, large scale multiprocessor computing systems, and other information systems where the transmission of information needs to be monitored.

Download Full-text

A high-performance, ad hoc, fuzzy query processing system

Journal of Intelligent Information Systems ◽

10.1007/bf00961661 ◽

1993 ◽

Vol 2 (4) ◽

pp. 397-419 ◽

Cited By ~ 6

Author(s):

William H. Mansfield ◽

Robert M. Fleischman

Keyword(s):

Query Processing ◽

High Performance ◽

Ad Hoc ◽

Processing System ◽

Fuzzy Query

Download Full-text

Cloud Computing Cloud Computing in Remote Sensing : High Performance Remote Sensing Data Processing in a Big data Environment

International Journal of Computers Communications & Control ◽

10.15837/ijccc.2021.6.4236 ◽

2021 ◽

Vol 16 (6) ◽

Author(s):

Yassine Sabri ◽

Aouad Siham

Keyword(s):

Remote Sensing ◽

Cloud Computing ◽

Data Processing ◽

High Performance ◽

Large Scale ◽

Processing System ◽

Remote Sensing Data ◽

Cloud Service ◽

Intermediate Data ◽

High Level

Multi-area and multi-faceted remote sensing (SAR) datasets are widely used due to the increasing demand for accurate and up-to-date information on resources and the environment for regional and global monitoring. In general, the processing of RS data involves a complex multi-step processing sequence that includes several independent processing steps depending on the type of RS application. The processing of RS data for regional disaster and environmental monitoring is recognized as computationally and data demanding.Recently, by combining cloud computing and HPC technology, we propose a method to efficiently solve these problems by searching for a large-scale RS data processing system suitable for various applications. Real-time on-demand service. The ubiquitous, elastic, and high-level transparency of the cloud computing model makes it possible to run massive RS data management and data processing monitoring dynamic environments in any cloud. via the web interface. Hilbert-based data indexing methods are used to optimally query and access RS images, RS data products, and intermediate data. The core of the cloud service provides a parallel file system of large RS data and an interface for accessing RS data from time to time to improve localization of the data. It collects data and optimizes I/O performance. Our experimental analysis demonstrated the effectiveness of our method platform.

Download Full-text

A Method for Detecting and Analyzing Facial Features of People with Drug Use Disorders

Diagnostics ◽

10.3390/diagnostics11091562 ◽

2021 ◽

Vol 11 (9) ◽

pp. 1562

Author(s):

Yongjie Li ◽

Xiangyu Yan ◽

Bo Zhang ◽

Zekun Wang ◽

Hexuan Su ◽

...

Keyword(s):

Drug Use ◽

High Performance ◽

Large Scale ◽

Facial Features ◽

Clinical Workflow ◽

Performance Accuracy ◽

Drug Use Disorders ◽

Large Scale Dataset ◽

Novel Method ◽

Facial Images

Drug use disorders caused by illicit drug use are significant contributors to the global burden of disease, and it is vital to conduct early detection of people with drug use disorders (PDUD). However, the primary care clinics and emergency departments lack simple and effective tools for screening PDUD. This study proposes a novel method to detect PDUD using facial images. Various experiments are designed to obtain the convolutional neural network (CNN) model by transfer learning based on a large-scale dataset (9870 images from PDUD and 19,567 images from GP (the general population)). Our results show that the model achieved 84.68%, 87.93%, and 83.01% in accuracy, sensitivity, and specificity in the dataset, respectively. To verify its effectiveness, the model is evaluated on external datasets based on real scenarios, and we found it still achieved high performance (accuracy > 83.69%, specificity > 90.10%, sensitivity > 80.00%). Our results also show differences between PDUD and GP in different facial areas. Compared with GP, the facial features of PDUD were mainly concentrated in the left cheek, right cheek, and nose areas (p < 0.001), which also reveals the potential relationship between mechanisms of drugs action and changes in facial tissues. This is the first study to apply the CNN model to screen PDUD in clinical practice and is also the first attempt to quantitatively analyze the facial features of PDUD. This model could be quickly integrated into the existing clinical workflow and medical care to provide capabilities.

Download Full-text

Measuring Traffic in Cities Through a Large-Scale Online Platform

Journal of Big Data Analytics in Transportation ◽

10.1007/s42421-019-00007-7 ◽

2019 ◽

Vol 1 (2-3) ◽

pp. 161-173 ◽

Cited By ~ 3

Author(s):

Vilhelm Verendel ◽

Sonia Yeh

Keyword(s):

Real Time ◽

Large Scale ◽

Data Availability ◽

Traffic Information ◽

Traffic Data ◽

Online Platform ◽

Real Time Traffic ◽

Data Source ◽

Road Segments

Abstract Online real-time traffic data services could effectively deliver traffic information to people all over the world and provide large benefits to the society and research about cities. Yet, city-wide road network traffic data are often hard to come by on a large scale over a longer period of time. We collect, describe, and analyze traffic data for 45 cities from HERE, a major online real-time traffic information provider. We sampled the online platform for city traffic data every 5 min during 1 year, in total more than 5 million samples covering more than 300 thousand road segments. Our aim is to describe some of the practical issues surrounding the data that we experienced in working with this type of data source, as well as to explore the data patterns and see how this data source provides information to study traffic in cities. We focus on data availability to characterize how traffic information is available for different cities; it measures the share of road segments with real-time traffic information at a given time for a given city. We describe the patterns of real-time data availability, and evaluate methods to handle filling in missing speed data for road segments when real-time information was not available. We conduct a validation case study based on Swedish traffic sensor data and point out challenges for future validation. Our findings include (i) a case study of validating the HERE data against ground truth available for roads and lanes in a Swedish city, showing that real-time traffic data tends to follow dips in travel speed but miss instantaneous higher speed measured in some sensors, typically at times when there are fewer vehicles on the road; (ii) using time series clustering, we identify four clusters of cities with different types of measurement patterns; and (iii) a k-nearest neighbor-based method consistently outperforms other methods to fill in missing real-time traffic speeds. We illustrate how to work with this kind of traffic data source that is increasingly available to researchers, travellers, and city planners. Future work is needed to broaden the scope of validation, and to apply these methods to use online data for improving our knowledge of traffic in cities.

Download Full-text

DAPT: A package enabling distributed automated parameter testing

Gigabyte ◽

10.46471/gigabyte.22 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Ben Duggan ◽

John Metzcar ◽

Paul Macklin

Keyword(s):

High Performance ◽

Large Scale ◽

Ad Hoc ◽

Simulation Models ◽

Power Combining ◽

Agent Based ◽

Tool Set ◽

Computational Resources ◽

Performance Computing ◽

Python Package

Modern agent-based models (ABM) and other simulation models require evaluation and testing of many different parameters. Managing that testing for large scale parameter sweeps (grid searches), as well as storing simulation data, requires multiple, potentially customizable steps that may vary across simulations. Furthermore, parameter testing, processing, and analysis are slowed if simulation and processing jobs cannot be shared across teammates or computational resources. While high-performance computing (HPC) has become increasingly available, models can often be tested faster with the use of multiple computers and HPC resources. To address these issues, we created the Distributed Automated Parameter Testing (DAPT) Python package. By hosting parameters in an online (and often free) “database”, multiple individuals can run parameter sets simultaneously in a distributed fashion, enabling ad hoc crowdsourcing of computational power. Combining this with a flexible, scriptable tool set, teams can evaluate models and assess their underlying hypotheses quickly. Here, we describe DAPT and provide an example demonstrating its use.

Download Full-text

High performance computing for flood simulation using Telemac based on hybrid MPI/OpenMP parallel programming

International Journal of Modeling Simulation and Scientific Computing ◽

10.1142/s1793962314720015 ◽

2014 ◽

Vol 05 (04) ◽

pp. 1472001 ◽

Cited By ~ 5

Author(s):

Zhi Shang

Keyword(s):

Parallel Computing ◽

Parallel Programming ◽

High Performance ◽

Large Scale ◽

Flood Simulation ◽

Hybrid Programming ◽

Fine Grain ◽

Domain Partitioning ◽

Performance Computing ◽

Parallel Technique

Usually simulations on environment flood issues will face the scalability problem of large scale parallel computing. The plain parallel technique based on pure MPI is difficult to have a good scalability due to the large number of domain partitioning. Therefore, the hybrid programming using MPI and OpenMP is introduced to deal with the issue of scalability. This kind of parallel technique can give a full play to the strengths of MPI and OpenMP. During the parallel computing, OpenMP is employed by its efficient fine grain parallel computing and MPI is used to perform the coarse grain parallel domain partitioning for data communications. Through the tests, the hybrid MPI/OpenMP parallel programming was used to renovate the finite element solvers in the BIEF library of Telemac. It was found that the hybrid programming is able to provide helps for Telemac to deal with the scalability issue.

Download Full-text

Raythena: a vertically integrated scheduler for ATLAS applications on heterogeneous distributed resources

EPJ Web of Conferences ◽

10.1051/epjconf/202024505042 ◽

2020 ◽

Vol 245 ◽

pp. 05042

Author(s):

Miha Muškinja ◽

Paolo Calafiura ◽

Charles Leggett ◽

Illya Shapoval ◽

Vakho Tsulaia

Keyword(s):

High Performance ◽

Large Scale ◽

Ad Hoc ◽

Heterogeneous Systems ◽

Application Framework ◽

Event Processing ◽

Throughput Optimization ◽

Distributed Resources ◽

Processing Application ◽

Fine Grained

The ATLAS experiment has successfully integrated HighPerformance Computing resources (HPCs) in its production system. Unlike the current generation of HPC systems, and the LHC computing grid, the next generation of supercomputers is expected to be extremely heterogeneous in nature: different systems will have radically different architectures, and most of them will provide partitions optimized for different kinds of workloads. In this work we explore the applicability of concepts and tools realized in Ray (the high-performance distributed execution framework targeting large-scale machine learning applications) to ATLAS event throughput optimization on heterogeneous distributed resources, ranging from traditional grid clusters to Exascale computers. We present a prototype of Raythena, a Ray-based implementation of the ATLAS Event Service (AES), a fine-grained event processing workflow aimed at improving the efficiency of ATLAS workflows on opportunistic resources, specifically HPCs. The AES is implemented as an event processing task farm that distributes packets of events to several worker processes running on multiple nodes. Each worker in the task farm runs an event-processing application (Athena) as a daemon. The whole system is orchestrated by Ray, which assigns work in a distributed, possibly heterogeneous, environment. For all its flexibility, the AES implementation is currently comprised of multiple separate layers that communicate through ad-hoc command-line and filebased interfaces. The goal of Raythena is to integrate these layers through a feature-rich, efficient application framework. Besides increasing usability and robustness, a vertically integrated scheduler will enable us to explore advanced concepts such as dynamically shaping of workflows to exploit currently available resources, particularly on heterogeneous systems.

Download Full-text