Preliminary assessment of the pragmatic value of information in the classification problem based on deep neural networks

A method is proposed for preliminary assessment of the pragmatic value of information in the problem of classifying the state of an object based on deep recurrent networks of long short-term memory. The purpose of the study is to develop a method for predicting the state of a controlled object while minimizing the number of used prognostic parameters through a preliminary assessment of the pragmatic value of information. This is an especially urgent task under conditions of processing big data, characterized not only by significant volumes of incoming information, but also by information rate and multiformatness. The generation of big data is now happening in almost all areas of activity due to the widespread introduction of the Internet of Things in them. The method is implemented by a two-level scheme for processing input information. At the first level, a Random Forest machine learning algorithm is used, which has significantly fewer adjustable parameters than a recurrent neural network used at the second level for the final and more accurate classification of the state of the controlled object or process. The choice of Random Forest is due to its ability to assess the importance of variables in regression and classification problems. This is used in determining the pragmatic value of the input information at the first level of the data processing scheme. For this purpose, a parameter is selected that reflects the specified value in some sense, and based on the ranking of the input variables by the level of importance, they are selected to form training datasets for the recurrent network. The algorithm of the proposed data processing method with a preliminary assessment of the pragmatic value of information is implemented in a program in the MatLAB language, and it has shown its efficiency in an experiment on model data.

Download Full-text

Measuring the Effectiveness of Adaptive Random Forest for Handling Concept Drift in Big Data Streams

Entropy ◽

10.3390/e23070859 ◽

2021 ◽

Vol 23 (7) ◽

pp. 859

Author(s):

Abdulaziz O. AlQabbany ◽

Aqil M. Azmi

Keyword(s):

Big Data ◽

Random Forest ◽

Real Time ◽

Data Streams ◽

Learning Algorithm ◽

Concept Drift ◽

The United States ◽

Careful Consideration ◽

Data Sets ◽

Stream Data

We are living in the age of big data, a majority of which is stream data. The real-time processing of this data requires careful consideration from different perspectives. Concept drift is a change in the data’s underlying distribution, a significant issue, especially when learning from data streams. It requires learners to be adaptive to dynamic changes. Random forest is an ensemble approach that is widely used in classical non-streaming settings of machine learning applications. At the same time, the Adaptive Random Forest (ARF) is a stream learning algorithm that showed promising results in terms of its accuracy and ability to deal with various types of drift. The incoming instances’ continuity allows for their binomial distribution to be approximated to a Poisson(1) distribution. In this study, we propose a mechanism to increase such streaming algorithms’ efficiency by focusing on resampling. Our measure, resampling effectiveness (ρ), fuses the two most essential aspects in online learning; accuracy and execution time. We use six different synthetic data sets, each having a different type of drift, to empirically select the parameter λ of the Poisson distribution that yields the best value for ρ. By comparing the standard ARF with its tuned variations, we show that ARF performance can be enhanced by tackling this important aspect. Finally, we present three case studies from different contexts to test our proposed enhancement method and demonstrate its effectiveness in processing large data sets: (a) Amazon customer reviews (written in English), (b) hotel reviews (in Arabic), and (c) real-time aspect-based sentiment analysis of COVID-19-related tweets in the United States during April 2020. Results indicate that our proposed method of enhancement exhibited considerable improvement in most of the situations.

Download Full-text

LOCALIZATION OF SOIL MONITORING DATA PROCESSING

EurasianUnionScientists ◽

10.31618/esu.2413-9335.2021.1.86.1351 ◽

2021 ◽

pp. 22-25

Author(s):

V. Fartukov ◽

N. Hanov

Keyword(s):

Machine Learning ◽

Big Data ◽

Data Analysis ◽

Data Processing ◽

The State ◽

Monitoring Data ◽

Soil Monitoring ◽

Monitoring Data Processing ◽

State Of The Field

A tree of data analysis for the formation and preprocessing, storage and protection of data based on Big Data and Blockchain technologies has been developed. The developed algorithm allows for the classification of data on the state of the field, split testing of data, forecasting and machine learning for the implementation of differential irrigation with sprinklers.

Download Full-text

Mapping Agricultural Landuse Patterns from Time Series of Landsat 8 Using Random Forest Based Hierarchial Approach

Remote Sensing ◽

10.3390/rs11050601 ◽

2019 ◽

Vol 11 (5) ◽

pp. 601 ◽

Cited By ~ 6

Author(s):

Sajid Pareeth ◽

Poolad Karimi ◽

Mojtaba Shafiei ◽

Charlotte De Fraiture

Keyword(s):

Land Use ◽

Time Series ◽

Random Forest ◽

Spatial Resolution ◽

Temporal Dynamics ◽

Learning Algorithm ◽

Open Data ◽

Landsat 8 ◽

Processing Scheme ◽

Irrigated Area

Increase in irrigated area, driven by demand for more food production, in the semi-arid regions of Asia and Africa is putting pressure on the already strained available water resources. To cope and manage this situation, monitoring spatial and temporal dynamics of the irrigated area land use at basin level is needed to ensure proper allocation of water. Publicly available satellite data at high spatial resolution and advances in remote sensing techniques offer a viable opportunity. In this study, we developed a new approach using time series of Landsat 8 (L8) data and Random Forest (RF) machine learning algorithm by introducing a hierarchical post-processing scheme to extract key Land Use Land Cover (LULC) types. We implemented this approach for Mashhad basin in Iran to develop a LULC map at 15 m spatial resolution with nine classes for the crop year 2015/2016. In addition, five irrigated land use types were extracted for three crop years—2013/2014, 2014/2015, and 2015/2016—using the RF models. The total irrigated area was estimated at 1796.16 km2, 1581.7 km2 and 1578.26 km2 for the cropping years 2013/2014, 2014/2015 and 2015/2016, respectively. The overall accuracy of the final LULC map was 87.2% with a kappa coefficient of 0.85. The methodology was implemented using open data and open source libraries. The ability of the RF models to extract key LULC types at basin level shows the usability of such approaches for operational near real time monitoring.

Download Full-text

A Novel Efficient Big Data Processing Scheme for Feature Extraction in Electrical Discharge Machining

IEEE Robotics and Automation Letters ◽

10.1109/lra.2019.2891498 ◽

2019 ◽

Vol 4 (2) ◽

pp. 910-917

Author(s):

Chao-Chun Chen ◽

Min-Hsiung Hung ◽

Benny Suryajaya ◽

Yu-Chuan Lin ◽

Haw-Ching Yang ◽

...

Keyword(s):

Feature Extraction ◽

Big Data ◽

Data Processing ◽

Electrical Discharge ◽

Electrical Discharge Machining ◽

Big Data Processing ◽

Processing Scheme

Download Full-text

Big Data Processing with Apache Spark in University Institutions: Spark Streaming and Machine Learning Algorithm

International Journal of Continuing Engineering Education and Life-Long Learning ◽

10.1504/ijceell.2018.10017171 ◽

2018 ◽

Vol 28 (4) ◽

pp. 1

Author(s):

Emmanuel Boachie ◽

Chunlin Li

Keyword(s):

Machine Learning ◽

Big Data ◽

Data Processing ◽

Learning Algorithm ◽

Apache Spark ◽

Machine Learning Algorithm ◽

Big Data Processing

Download Full-text

Big data processing with Apache Spark in university institutions: spark streaming and machine learning algorithm

International Journal of Continuing Engineering Education and Life-Long Learning ◽

10.1504/ijceell.2019.099217 ◽

2019 ◽

Vol 29 (1/2) ◽

pp. 5

Author(s):

Emmanuel Boachie ◽

Chunlin Li

Keyword(s):

Machine Learning ◽

Big Data ◽

Data Processing ◽

Learning Algorithm ◽

Apache Spark ◽

Machine Learning Algorithm ◽

Big Data Processing

Download Full-text

BDPS: An Efficient Spark-Based Big Data Processing Scheme for Cloud Fog-IoT Orchestration

Information ◽

10.3390/info12120517 ◽

2021 ◽

Vol 12 (12) ◽

pp. 517

Author(s):

Rakib Hossen ◽

Md Whaiduzzaman ◽

Mohammed Nasir Uddin ◽

Md. Jahidul Islam ◽

Nuruzzaman Faruqui ◽

...

Keyword(s):

Big Data ◽

Data Processing ◽

Shortest Path ◽

Network Architecture ◽

Data Delivery ◽

Dijkstra Algorithm ◽

Big Data Processing ◽

Processing Scheme ◽

Depth First Search ◽

Efficient Data

The Internet of Things (IoT) has seen a surge in mobile devices with the market and technical expansion. IoT networks provide end-to-end connectivity while keeping minimal latency. To reduce delays, efficient data delivery schemes are required for dispersed fog-IoT network orchestrations. We use a Spark-based big data processing scheme (BDPS) to accelerate the distributed database (RDD) delay efficient technique in the fogs for a decentralized heterogeneous network architecture to reinforce suitable data allocations via IoTs. We propose BDPS based on Spark-RDD in fog-IoT overlay architecture to address the performance issues across the network orchestration. We evaluate data processing delays from fog-IoT integrated parts using a depth-first-search-based shortest path node finding configuration, which outperforms the existing shortest path algorithms in terms of algorithmic (i.e., depth-first search) efficiency, including the Bellman–Ford (BF) algorithm, Floyd–Warshall (FW) algorithm, Dijkstra algorithm (DA), and Apache Hadoop (AH) algorithm. The BDPS exhibits low latency in packet deliveries as well as low network overhead uplink activity through a map-reduced resilient data distribution mechanism, better than in BF, DA, FW, and AH. The overall BDPS scheme supports efficient data delivery across the fog-IoT orchestration, outperforming faster node execution while proving effective results, compared to DA, BF, FW and AH, respectively.

Download Full-text