Calculation of data processing time for an acyclic wavefront array processor

Network bandwidth is a scarce resource in big data environments, so data locality is a fundamental problem for data-parallel frameworks such as Hadoop and Spark. This problem is exacerbated in multicore server-based clusters, where multiple tasks running on the same server compete for the server’s network bandwidth. Existing approaches solve this problem by scheduling computational tasks near the input data and considering the server’s free time, data placements, and data transfer costs. However, such approaches usually set identical values for data transfer costs, even though a multicore server’s data transfer cost increases with the number of data-remote tasks. Eventually, this hampers data-processing time, by minimizing it ineffectively. As a solution, we propose DynDL (Dynamic Data Locality), a novel data-locality-aware task-scheduling model that handles dynamic data transfer costs for multicore servers. DynDL offers greater flexibility than existing approaches by using a set of non-decreasing functions to evaluate dynamic data transfer costs. We also propose online and offline algorithms (based on DynDL) that minimize data-processing time and adaptively adjust data locality. Although DynDL is NP-complete (nondeterministic polynomial-complete), we prove that the offline algorithm runs in quadratic time and generates optimal results for DynDL’s specific uses. Using a series of simulations and real-world executions, we show that our algorithms are 30% better than algorithms that do not consider dynamic data transfer costs in terms of data-processing time. Moreover, they can adaptively adjust data localities based on the server’s free time, data placement, and network bandwidth, and schedule tens of thousands of tasks within subseconds or seconds.

Download Full-text

An efficient solution for fast generation of multi-GNSS real-time products

10.5194/egusphere-egu21-8306 ◽

2021 ◽

Author(s):

Hongjie Zheng ◽

Hanyu Chang ◽

Yongqiang Yuan ◽

Qingyun Wang ◽

Yuhao Li ◽

...

Keyword(s):

Data Processing ◽

Real Time ◽

Processing Time ◽

Efficient Solution ◽

Gpu Computing ◽

Sampling Rate ◽

Precise Orbit Determination ◽

Processing Unit ◽

Processing Efficiency ◽

Central Processing

Global navigation satellite systems (GNSS) have been playing an indispensable role in providing positioning, navigation and timing (PNT) services to global users. Over the past few years, GNSS have been rapidly developed with abundant networks, modern constellations, and multi-frequency observations. To take full advantages of multi-constellation and multi-frequency GNSS, several new mathematic models have been developed such as multi-frequency ambiguity resolution (AR) and the uncombined data processing with raw observations. In addition, new GNSS products including the uncalibrated phase delay (UPD), the observable signal bias (OSB), and the integer recovery clock (IRC) have been generated and provided by analysis centers to support advanced GNSS applications.&#160;&#160;&#160;&#160;&#160;&#160; However, the increasing number of GNSS observations raises a great challenge to the fast generation of multi-constellation and multi-frequency products. In this study, we proposed an efficient solution to realize the fast updating of multi-GNSS real-time products by making full use of the advanced computing techniques. Firstly, instead of the traditional vector operations, the &#8220;level-3 operations&#8221; (matrix by matrix) of Basic Liner Algebra Subprograms (BLAS) is used as much as possible in the Least Square (LSQ) processing, which can improve the efficiency due to the central processing unit (CPU) optimization and faster memory data transmission. Furthermore, most steps of multi-GNSS data processing are transformed from serial mode to parallel mode to take advantage of the multi-core CPU architecture and graphics processing unit (GPU) computing resources. Moreover, we choose the OpenBLAS library for matrix computation as it has good performances in parallel environment.&#160;&#160;&#160;&#160;&#160;&#160; The proposed method is then validated on a 3.30 GHz AMD CPU with 6 cores. The result demonstrates that the proposed method can substantially improve the processing efficiency for multi-GNSS product generation. For the precise orbit determination (POD) solution with 150 ground stations and 128 satellites (GPS/BDS/Galileo/GLONASS/QZSS) in ionosphere-free (IF) mode, the processing time can be shortened from 50 to 10 minutes, which can guarantee the hourly updating of multi-GNSS ultra-rapid orbit products. The processing time of uncombined POD can also be reduced by about 80%. Meanwhile, the multi-GNSS real-time clock products can be easily generated in 5 seconds or even higher sampling rate. In addition, the processing efficiency of UPD and OSB products can also be increased by 4-6 times.

Download Full-text

Real-Time Maritime Traffic Anomaly Detection Based on Sensors and History Data Embedding

Sensors ◽

10.3390/s19173782 ◽

2019 ◽

Vol 19 (17) ◽

pp. 3782 ◽

Cited By ~ 3

Author(s):

Julius Venskus ◽

Povilas Treigys ◽

Jolita Bernatavičienė ◽

Gintautas Tamulevičius ◽

Viktor Medvedev

Keyword(s):

Neural Network ◽

Data Processing ◽

Processing Time ◽

Heavy Traffic ◽

Sensor Data ◽

Identification System ◽

Automated Identification ◽

Maritime Traffic ◽

Abnormal Movements ◽

The Neural Network

The automated identification system of vessel movements receives a huge amount of multivariate, heterogeneous sensor data, which should be analyzed to make a proper and timely decision on vessel movements. The large number of vessels makes it difficult and time-consuming to detect abnormalities, thus rapid response algorithms should be developed for a decision support system to identify abnormal movements of vessels in areas of heavy traffic. This paper extends the previous study on a self-organizing map application for processing of sensor stream data received by the maritime automated identification system. The more data about the vessel’s movement is registered and submitted to the algorithm, the higher the accuracy of the algorithm should be. However, the task cannot be guaranteed without using an effective retraining strategy with respect to precision and data processing time. In addition, retraining ensures the integration of the latest vessel movement data, which reflects the actual conditions and context. With a view to maintaining the quality of the results of the algorithm, data batching strategies for the neural network retraining to detect anomalies in streaming maritime traffic data were investigated. The effectiveness of strategies in terms of modeling precision and the data processing time were estimated on real sensor data. The obtained results show that the neural network retraining time can be shortened by half while the sensitivity and precision only change slightly.

Download Full-text

Retraction: Parallel Algorithm for Reduction of Data Processing Time in Big Data (Journal of Physics: Conference Series 1432 012095)

Journal of Physics Conference Series ◽

10.1088/1742-6596/1432/1/012110 ◽

2020 ◽

Vol 1432 ◽

pp. 012110

Author(s):

Jesús Silva ◽

Hugo Hernández Palma ◽

William Niebles Núñez ◽

David Ovallos-Gazabon ◽

Noel Varela

Keyword(s):

Big Data ◽

Data Processing ◽

Parallel Algorithm ◽

Processing Time ◽

Conference Series

Download Full-text

Penerapan Metode Rational Unified Process pada Aplikasi Monitoring Periodic Service Alat Berat

Indonesian Journal of Applied Informatics ◽

10.20961/ijai.v1i2.11002 ◽

2017 ◽

Vol 1 (2) ◽

pp. 1

Author(s):

Putri Kusuma Wardani

Keyword(s):

Data Processing ◽

Customer Service ◽

Human Error ◽

Processing Time ◽

System Development ◽

General Information ◽

Heavy Machinery ◽

Unified Process ◽

Monitoring Application

Product Support is one of the divisions in the company who deliver technical support to customers, especially in terms of after sales service. Commonly, the data obtained by the product support division at PT Hexindo Adiperkasa Tbk is still a general information concerning the units owned by the company. This data needs to be analyzed and then reprocessed using an application such as Excel spreadsheets to find out the periodic service of heavy machinery in each period, in order that the data processing time is relatively longer and allow for errors or human error. In addition, the data processing tends to be ineffective.This research aims to design a "Application Monitoring of Periodic Services Heavy Machinery" that can help companies monitor the periodic service of heavy machinery better by using the programming language PHP and MySQL as the database.The results of this research to build a monitoring application using the method of system development RUP (Rational Unified Process) to determine schedule of periodic service heavy machinery in order to processing the information would be more appropriate, effective, and efficiency can improve the quality of customer service.

Download Full-text

Big Data Deployment for an Efficient Resource Prerequisite Job

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2019.8163 ◽

2019 ◽

Vol 16 (8) ◽

pp. 3211-3215 ◽

Cited By ~ 1

Author(s):

S. Prince Mary ◽

D. Usha Nandini ◽

B. Ankayarkanni ◽

R. Sathyabama Krishna

Keyword(s):

Big Data ◽

Data Processing ◽

Processing Time ◽

Map Reduce ◽

Efficient Resource ◽

Server System ◽

Distributed Server

Integration of cloud and big data very difficult and challenging task and to find the number of resources to complete their job is very difficult and challenging. So, virtualization is implemented it involves 3 phases map reduce, shuffle phase and reduce phase. Many researchers have been done already they have applied Heterogeneousmap reduce application and they use least-work-left policy technique to distributed server system. In this paper we have discussed about virtualization is used for hadoop jobs for effective data processing and to find the processing time of job and balance partition algorithm is used. The main objective is to implement virtualization in our local machines.

Download Full-text

Proposing a load balancing algorithm with an integrative approach to reduce response time and service process time in data centers

Brazilian Journal of Operations & Production Management ◽

10.14488/bjopm.2019.v16.n4.a8 ◽

2019 ◽

Vol 16 (4) ◽

pp. 627-637

Author(s):

Sanaz Hosseinzadeh Sabeti ◽

Maryam Mollabgher

Keyword(s):

Load Balancing ◽

Data Processing ◽

Response Time ◽

Virtual Machine ◽

Data Center ◽

Processing Time ◽

Data Centers ◽

Virtual Machines ◽

Overall Response ◽

Load Balancing Algorithm

Goal: Load balancing policies often map workloads on virtual machines, and are being sought to achieve their goals by creating an almost equal level of workload on any virtual machine. In this research, a hybrid load balancing algorithm is proposed with the aim of reducing response time and processing time. Design / Methodology / Approach: The proposed algorithm performs load balancing using a table including the status indicators of virtual machines and the task list allocated to each virtual machine. The evaluation results of response time and processing time in data centers from four algorithms, ESCE, Throttled, Round Robin and the proposed algorithm is done. Results: The overall response time and data processing time in the proposed algorithm data center are shorter than other algorithms and improve the response time and data processing time in the data center. The results of the overall response time for all algorithms show that the response time of the proposed algorithm is 12.28%, compared to the Round Robin algorithm, 9.1% compared to the Throttled algorithm, and 4.86% of the ESCE algorithm. Limitations of the investigation: Due to time and technical limitations, load balancing has not been achieved with more goals, such as lowering costs and increasing productivity. Practical implications: The implementation of a hybrid load factor policy can improve the response time and processing time. The use of load balancing will cause the traffic load between virtual machines to be properly distributed and prevent bottlenecks. This will be effective in increasing customer responsiveness. And finally, improving response time increases the satisfaction of cloud users and increases the productivity of computing resources. Originality/Value: This research can be effective in optimizing the existing algorithms and will take a step towards further research in this regard.

Download Full-text

Vertical Data Processing for Mining Big Data: A Predicate Tree Approach

10.29007/db8n ◽

2019 ◽

Author(s):

Mohammad Hossain ◽

Maninder Singh ◽

Sameer Abufardeh

Keyword(s):

Data Mining ◽

Big Data ◽

Data Processing ◽

Processing Time ◽

Traditional Approach ◽

Critical Factor ◽

Boolean Operations ◽

Data Mining Algorithms ◽

Vertical Data ◽

Big Data Application

Time is a critical factor in processing a very large volume of data a.k.a ‘Big Data’. Many existing data mining algorithms (supervised and unsupervised) become futile because of the ubiquitous use of horizontal processing i.e. row-by-row processing of stored data. Processing time for big data is further exacerbated by its high dimensionality (# of features) and high cardinality (# of records). To address this processing-time issue, we proposed a vertical approach with predicate trees (pTree). Our approach structures data into columns of bit slices, which range from few to hundreds and are processed vertically i.e. column by column. We tested and compared our vertical approach to traditional (horizontal) approach using three basic Boolean operations namely addition, subtraction and multiplication with 10 data sizes. The length of data size ranged from half a billion bits to 5 billion bits. The results are analyzed w.r.t processing speed time and speed gain for both the approaches. The result shows that our vertical approach outperformed the traditional approach for all Boolean operations (add, subtract and multiply) across all data sizes and results in speed-gain between 24% to 96%. We concluded from our results that our approach being in data-mining ready format is best suited to apply to operations involving complex computations in big data application to achieve significant speed gain.

Download Full-text

On an Enhancement of XML Applied for Mobile E-Commerce

Journal of Electronic Commerce in Organizations ◽

10.4018/jeco.2012070102 ◽

2012 ◽

Vol 10 (3) ◽

pp. 13-26

Author(s):

Xiaomin Zhu ◽

Zhongxiang He ◽

Shengbo Shi

Keyword(s):

Data Processing ◽

Mobile Computing ◽

Web Service ◽

Size Effects ◽

Processing Time ◽

The Internet ◽

Markup Language ◽

Mobile Web ◽

Xml Documents ◽

Extensible Markup

Extensible Markup Language (XML) is a textual markup language which becomes more and more important in the Internet web service. However, some distinct disadvantages exist in XML, such as its nature of redundancy, which consumes the limited network’s bandwidth greatly especially in mobile computing. Considering the characteristics of the mobile commerce, the handsets’ memory capability and data processing time are two problems for XML being applied. This paper studies an enhancement of XML for the purpose of application in mobile e-commerce, called SXML, which means Simple XML to enhance the XML used in mobile web service. It helps XML producers minimizing the size effects of XML, e.g., the size overhead and slow implementation speed. Comprehensive simulations show that the SXML could reduce the size of XML documents and reduce the time of implementation, consequently utilize the bandwidth effectively.

Download Full-text

Simplified Mapreduce Mechanism for Large Scale Data Processing

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i3.8.15211 ◽

2018 ◽

Vol 7 (3.8) ◽

pp. 16

Author(s):

Md Tahsir Ahmed Munna ◽

Shaikh Muhammad Allayear ◽

Mirza Mohtashim Alam ◽

Sheikh Shah Mohammad Motiur Rahman ◽

Md Samadur Rahman ◽

...

Keyword(s):

Data Processing ◽

Large Scale ◽

Processing Time ◽

Programming Model ◽

Data Sets ◽

Hadoop Mapreduce ◽

Large Scale Data ◽

Large Scale Data Processing ◽

Scale Data ◽

Large Scale Data Sets

MapReduce has become a popular programming model for processing and running large-scale data sets with a parallel, distributed paradigm on a cluster. Hadoop MapReduce is needed especially for large scale data like big data processing. In this paper, we work to modify the Hadoop MapReduce Algorithm and implement it to reduce processing time.

Download Full-text