massive data Latest Research Papers

Distributed Sufficient Dimension Reduction for Heterogeneous Massive Data

Statistica Sinica ◽

10.5705/ss.202021.0031 ◽

2023 ◽

Author(s):

Kelin Xu ◽

Liping Zhu ◽

Jianqing Fan

Keyword(s):

Dimension Reduction ◽

Massive Data ◽

Sufficient Dimension Reduction

Download Full-text

A Trajectory Evaluator by Sub-tracks for Detecting VOT-based Anomalous Trajectory

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3490032 ◽

2022 ◽

Vol 16 (4) ◽

pp. 1-19

Author(s):

Fei Gao ◽

Jiada Li ◽

Yisu Ge ◽

Jianwen Shao ◽

Shufang Lu ◽

...

Keyword(s):

Data Analysis ◽

Mobile Robots ◽

State Of The Art ◽

Least Square Method ◽

Least Square ◽

Visual Object ◽

Massive Data ◽

Trajectory Data ◽

Visual Object Tracking ◽

Research Hotspots

With the popularization of visual object tracking (VOT), more and more trajectory data are obtained and have begun to gain widespread attention in the fields of mobile robots, intelligent video surveillance, and the like. How to clean the anomalous trajectories hidden in the massive data has become one of the research hotspots. Anomalous trajectories should be detected and cleaned before the trajectory data can be effectively used. In this article, a Trajectory Evaluator by Sub-tracks (TES) for detecting VOT-based anomalous trajectory is proposed. Feature of Anomalousness is defined and described as the Eigenvector of classifier to filter Track Lets anomalous trajectory and IDentity Switch anomalous trajectory, which includes Feature of Anomalous Pose and Feature of Anomalous Sub-tracks (FAS). In the comparative experiments, TES achieves better results on different scenes than state-of-the-art methods. Moreover, FAS makes better performance than point flow, least square method fitting and Chebyshev Polynomial Fitting. It is verified that TES is more accurate and effective and is conducive to the sub-tracks trajectory data analysis.

Download Full-text

Compressed SAR Interferometry in the Big Data Era

Remote Sensing ◽

10.3390/rs14020390 ◽

2022 ◽

Vol 14 (2) ◽

pp. 390

Author(s):

Dinh Ho Tong Minh ◽

Yen-Nhi Ngo

Keyword(s):

Open Source ◽

Sar Interferometry ◽

Massive Data ◽

Data Sets ◽

Volume Data ◽

Compression Technique ◽

Processing Scheme ◽

Interferometric Sar ◽

Term Monitoring

Modern Synthetic Aperture Radar (SAR) missions provide an unprecedented massive interferometric SAR (InSAR) time series. The processing of the Big InSAR Data is challenging for long-term monitoring. Indeed, as most deformation phenomena develop slowly, a strategy of a processing scheme can be worked on reduced volume data sets. This paper introduces a novel ComSAR algorithm based on a compression technique for reducing computational efforts while maintaining the performance robustly. The algorithm divides the massive data into many mini-stacks and then compresses them. The compressed estimator is close to the theoretical Cramer–Rao lower bound under a realistic C-band Sentinel-1 decorrelation scenario. Both persistent and distributed scatterers (PSDS) are exploited in the ComSAR algorithm. The ComSAR performance is validated via simulation and application to Sentinel-1 data to map land subsidence of the salt mine Vauvert area, France. The proposed ComSAR yields consistently better performance when compared with the state-of-the-art PSDS technique. We make our PSDS and ComSAR algorithms as an open-source TomoSAR package. To make it more practical, we exploit other open-source projects so that people can apply our PSDS and ComSAR methods for an end-to-end processing chain. To our knowledge, TomoSAR is the first public domain tool available to jointly handle PS and DS targets.

Download Full-text

Achieving data privacy for decision support systems in times of massive data sharing

Cluster Computing ◽

10.1007/s10586-021-03514-x ◽

2022 ◽

Author(s):

Rabeeha Fazal ◽

Munam Ali Shah ◽

Hasan Ali Khattak ◽

Hafiz Tayyab Rauf ◽

Fadi Al-Turjman

Keyword(s):

Decision Support ◽

Decision Support Systems ◽

Data Sharing ◽

Data Privacy ◽

Support Systems ◽

Massive Data

Download Full-text

Neural Network for Big Data Sets

10.4018/978-1-6684-2408-7.ch003 ◽

2022 ◽

pp. 41-67

Author(s):

Vo Ngoc Phu ◽

Vo Thi Ngoc Tran

Keyword(s):

Neural Network ◽

Big Data ◽

Computer Science ◽

Large Scale ◽

Massive Data ◽

Data Sets ◽

Massive Data Sets ◽

Large Scale Data ◽

Commercial Applications ◽

Novel Model

Machine learning (ML), neural network (NN), evolutionary algorithm (EA), fuzzy systems (FSs), as well as computer science have been very famous and very significant for many years. They have been applied to many different areas. They have contributed much to developments of many large-scale corporations, massive organizations, etc. Lots of information and massive data sets (MDSs) have been generated from these big corporations, organizations, etc. These big data sets (BDSs) have been the challenges of many commercial applications, researches, etc. Therefore, there have been many algorithms of the ML, the NN, the EA, the FSs, as well as computer science which have been developed to handle these massive data sets successfully. To support for this process, the authors have displayed all the possible algorithms of the NN for the large-scale data sets (LSDSs) successfully in this chapter. Finally, they have presented a novel model of the NN for the BDS in a sequential environment (SE) and a distributed network environment (DNE).

Download Full-text

Big Data Analytics Using Apache Hive to Analyze Health Data

10.4018/978-1-6684-3662-2.ch046 ◽

2022 ◽

pp. 979-992

Author(s):

Pavani Konagala

Keyword(s):

Big Data ◽

Stock Exchange ◽

Big Data Analytics ◽

Large Data ◽

Massive Data ◽

Data Sets ◽

Related Data ◽

Health Related ◽

Relational Database Management ◽

Apache Hive

A large volume of data is stored electronically. It is very difficult to measure the total volume of that data. This large amount of data is coming from various sources such as stock exchange, which may generate terabytes of data every day, Facebook, which may take about one petabyte of storage, and internet archives, which may store up to two petabytes of data, etc. So, it is very difficult to manage that data using relational database management systems. With the massive data, reading and writing from and into the drive takes more time. So, the storage and analysis of this massive data has become a big problem. Big data gives the solution for these problems. It specifies the methods to store and analyze the large data sets. This chapter specifies a brief study of big data techniques to analyze these types of data. It includes a wide study of Hadoop characteristics, Hadoop architecture, advantages of big data and big data eco system. Further, this chapter includes a comprehensive study of Apache Hive for executing health-related data and deaths data of U.S. government.

Download Full-text

Mining Taxi Pick-Up Hotspots Based on Grid Information Entropy Clustering Algorithm

Journal of Advanced Transportation ◽

10.1155/2021/5814879 ◽

2021 ◽

Vol 2021 ◽

pp. 1-25

Author(s):

Shuoben Bi ◽

Ruizhuang Xu ◽

Aili Liu ◽

Luye Wang ◽

Lei Wan

Keyword(s):

Information Entropy ◽

Input Data ◽

Clustering Algorithm ◽

Scientific Basis ◽

Urban Traffic ◽

Massive Data ◽

Trajectory Data ◽

Research Areas ◽

Density Based Clustering ◽

Traffic Guidance

In view of the fact that the density-based clustering algorithm is sensitive to the input data, which results in the limitation of computing space and poor timeliness, a new method is proposed based on grid information entropy clustering algorithm for mining hotspots of taxi passengers. This paper selects representative geographical areas of Nanjing and Beijing as the research areas and uses information entropy and aggregation degree to analyze the distribution of passenger-carrying points. This algorithm uses a grid instead of original trajectory data to calculate and excavate taxi passenger hotspots. Through the comparison and analysis of the data of taxi loading points in Nanjing and Beijing, it is found that the experimental results are consistent with the actual urban passenger hotspots, which verifies the effectiveness of the algorithm. It overcomes the shortcomings of a density-based clustering algorithm that is limited by computing space and poor timeliness, reduces the size of data needed to be processed, and has greater flexibility to process and analyze massive data. The research results can provide an important scientific basis for urban traffic guidance and urban management.

Download Full-text

Big data analytics opportunities for applications in process engineering

Reviews in Chemical Engineering ◽

10.1515/revce-2020-0054 ◽

2021 ◽

Vol 0 (0) ◽

Author(s):

Mitra Sadat Lavasani ◽

Nahid Raeisi Ardali ◽

Rahmat Sotudeh-Gharebagh ◽

Reza Zarghami ◽

János Abonyi ◽

...

Keyword(s):

Big Data ◽

Data Analytics ◽

Big Data Analytics ◽

Process Engineering ◽

Massive Data ◽

Data Sets ◽

Key Characteristics ◽

Data Content ◽

Hidden Patterns ◽

Insight Into

Abstract Big data is an expression for massive data sets consisting of both structured and unstructured data that are particularly difficult to store, analyze and visualize. Big data analytics has the potential to help companies or organizations improve operations as well as disclose hidden patterns and secret correlations to make faster and intelligent decisions. This article provides useful information on this emerging and promising field for companies, industries, and researchers to gain a richer and deeper insight into advancements. Initially, an overview of big data content, key characteristics, and related topics are presented. The paper also highlights a systematic review of available big data techniques and analytics. The available big data analytics tools and platforms are categorized. Besides, this article discusses recent applications of big data in chemical industries to increase understanding and encourage its implementation in their engineering processes as much as possible. Finally, by emphasizing the adoption of big data analytics in various areas of process engineering, the aim is to provide a practical vision of big data.

Download Full-text

Optimization of the Observing Cadence for the Rubin Observatory Legacy Survey of Space and Time: A Pioneering Process of Community-focused Experimental Design

The Astrophysical Journal Supplement Series ◽

10.3847/1538-4365/ac3e72 ◽

2021 ◽

Vol 258 (1) ◽

pp. 1 ◽

Cited By ~ 4

Author(s):

Federica B. Bianco ◽

Željko Ivezić ◽

R. Lynne Jones ◽

Melissa L. Graham ◽

Phil Marshall ◽

...

Keyword(s):

National Science Foundation ◽

Data Access ◽

Department Of Energy ◽

Massive Data ◽

Joint Project ◽

Access Policy ◽

Space And Time ◽

Survey Strategy ◽

Under Construction ◽

Astronomical Facility

Abstract Vera C. Rubin Observatory is a ground-based astronomical facility under construction, a joint project of the National Science Foundation and the U.S. Department of Energy, designed to conduct a multipurpose 10 yr optical survey of the Southern Hemisphere sky: the Legacy Survey of Space and Time. Significant flexibility in survey strategy remains within the constraints imposed by the core science goals of probing dark energy and dark matter, cataloging the solar system, exploring the transient optical sky, and mapping the Milky Way. The survey’s massive data throughput will be transformational for many other astrophysics domains and Rubin’s data access policy sets the stage for a huge community of potential users. To ensure that the survey science potential is maximized while serving as broad a community as possible, Rubin Observatory has involved the scientific community at large in the process of setting and refining the details of the observing strategy. The motivation, history, and decision-making process of this strategy optimization are detailed in this paper, giving context to the science-driven proposals and recommendations for the survey strategy included in this Focus Issue.

Download Full-text

Hybrid Recommendation Scheme Based on Deep Learning

Mathematical Problems in Engineering ◽

10.1155/2021/6120068 ◽

2021 ◽

Vol 2021 ◽

pp. 1-12

Author(s):

Fangpeng Ming ◽

Liang Tan ◽

Xiaofan Cheng

Keyword(s):

Deep Learning ◽

Massive Data ◽

Nonlinear Interactions ◽

Recommendation Algorithm ◽

Long Tail ◽

Test Dataset ◽

Learning Techniques ◽

Recommendation Algorithms ◽

Cn Algorithm ◽

Hybrid Recommendation

Big data has been developed for nearly a decade, and the information data on the network is exploding. Facing the complex and massive data, it is difficult for people to get the demanded information quickly, and the recommendation algorithm with its characteristics becomes one of the important methods to solve the massive data overload problem at this stage. In particular, the rise of the e-commerce industry has promoted the development of recommendation algorithms. Traditional, single recommendation algorithms often have problems such as cold start, data sparsity, and long-tail items. The hybrid recommendation algorithms at this stage can effectively avoid some of the drawbacks caused by a single algorithm. To address the current problems, this paper makes up for the shortcomings of a single collaborative model by proposing a hybrid recommendation algorithm based on deep learning IA-CN. The algorithm first uses an integrated strategy to fuse user-based and item-based collaborative filtering algorithms to generalize and classify the output results. Then deeper and more abstract nonlinear interactions between users and items are captured by improved deep learning techniques. Finally, we designed experiments to validate the algorithm. The experiments are compared with the benchmark algorithm on (Amazon item rating dataset), and the results show that the IA-CN algorithm proposed in this paper has better performance in rating prediction on the test dataset.

Download Full-text

massive data
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Distributed Sufficient Dimension Reduction for Heterogeneous Massive Data

A Trajectory Evaluator by Sub-tracks for Detecting VOT-based Anomalous Trajectory

Compressed SAR Interferometry in the Big Data Era

Achieving data privacy for decision support systems in times of massive data sharing

Neural Network for Big Data Sets

Big Data Analytics Using Apache Hive to Analyze Health Data

Mining Taxi Pick-Up Hotspots Based on Grid Information Entropy Clustering Algorithm

Big data analytics opportunities for applications in process engineering

Optimization of the Observing Cadence for the Rubin Observatory Legacy Survey of Space and Time: A Pioneering Process of Community-focused Experimental Design

Hybrid Recommendation Scheme Based on Deep Learning

Export Citation Format

massive dataRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Distributed Sufficient Dimension Reduction for Heterogeneous Massive Data

A Trajectory Evaluator by Sub-tracks for Detecting VOT-based Anomalous Trajectory

Compressed SAR Interferometry in the Big Data Era

Achieving data privacy for decision support systems in times of massive data sharing

Neural Network for Big Data Sets

Big Data Analytics Using Apache Hive to Analyze Health Data

Mining Taxi Pick-Up Hotspots Based on Grid Information Entropy Clustering Algorithm

Big data analytics opportunities for applications in process engineering

Optimization of the Observing Cadence for the Rubin Observatory Legacy Survey of Space and Time: A Pioneering Process of Community-focused Experimental Design

Hybrid Recommendation Scheme Based on Deep Learning

massive data
Recently Published Documents