Feature Selection for Machine Learning in Big Data

We are in the information age there by collecting very huge volume of data from diverse sources in structured, unstructured and semi structured form ranging to petabytes to exabytes of data. Data is an asset as valuable knowledge and information is hidden in such massive volumes of data. Data analytics is required to have a deeper insights and identify fine grained patterns so as to make accurate predictions enabling the improvement of decision making. Extracting knowledge from data is done by data analytics, Machine learning forms the core of it. The increase in the dimensionality of data both in terms of number of tuples and also in terms of number of features poses several challenges to the machine learning algorithms . Preprocessing of data is done as a prior step to machine learning, so feature selection is done as a preprocessing step to have the dimensionality reduction of the data and thereby removing the irrelevant features and improving the efficiency and accuracy of a machine learning algorithm. In this paper we are studying various feature selection mechanisms and analyze them whether they can be adopted to sentiment analysis of big data.

Download Full-text

The use of Big Data in Machine Learning Algorithm

10.5121/csit.2021.111911 ◽

2021 ◽

Author(s):

Yew Kee Wong

Keyword(s):

Machine Learning ◽

Big Data ◽

Data Analysis ◽

Data Analytics ◽

Model Building ◽

Learning Algorithm ◽

Big Data Analytics ◽

Machine Learning Algorithms ◽

Human Intervention ◽

Tools And Techniques

In the information era, enormous amounts of data have become available on hand to decision makers. Big data refers to datasets that are not only big, but also high in variety and velocity, which makes them difficult to handle using traditional tools and techniques. Due to the rapid growth of such data, solutions need to be studied and provided in order to handle and extract value and knowledge from these datasets. Machine learning is a method of data analysis that automates analytical model building. It is a branch of artificial intelligence based on the idea that systems can learn from data, identify patterns and make decisions with minimal human intervention. Such minimal human intervention can be provided using big data analytics, which is the application of advanced analytics techniques on big data. This paper aims to analyse some of the different machine learning algorithms and methods which can be applied to big data analysis, as well as the opportunities provided by the application of big data analytics in various decision making domains.

Download Full-text

Recognition Technology of Athlete’s Limb Movement Combined Based on the Integrated Learning Algorithm

Journal of Sensors ◽

10.1155/2021/3057557 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Fei Tan ◽

Xiaoqing Xie

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Learning Algorithm ◽

Human Motion ◽

Machine Learning Algorithms ◽

Support Vector ◽

Recording Device ◽

Table Tennis ◽

Movement Recognition ◽

Random Forest Tree

Human motion recognition based on inertial sensor is a new research direction in the field of pattern recognition. It carries out preprocessing, feature selection, and feature selection by placing inertial sensors on the surface of the human body. Finally, it mainly classifies and recognizes the extracted features of human action. There are many kinds of swing movements in table tennis. Accurately identifying these movement modes is of great significance for swing movement analysis. With the development of artificial intelligence technology, human movement recognition has made many breakthroughs in recent years, from machine learning to deep learning, from wearable sensors to visual sensors. However, there is not much work on movement recognition for table tennis, and the methods are still mainly integrated into the traditional field of machine learning. Therefore, this paper uses an acceleration sensor as a motion recording device for a table tennis disc and explores the three-axis acceleration data of four common swing motions. Traditional machine learning algorithms (decision tree, random forest tree, and support vector) are used to classify the swing motion, and a classification algorithm based on the idea of integration is designed. Experimental results show that the ensemble learning algorithm developed in this paper is better than the traditional machine learning algorithm, and the average recognition accuracy is 91%.

Download Full-text

Machine Learning Algorithms for Big Data Analytics

Computational Methods and Data Engineering - Advances in Intelligent Systems and Computing ◽

10.1007/978-981-15-6876-3_27 ◽

2020 ◽

pp. 359-367

Author(s):

Kumar Rahul ◽

Rohitash Kumar Banyal ◽

Puneet Goswami ◽

Vijay Kumar

Keyword(s):

Machine Learning ◽

Big Data ◽

Data Analytics ◽

Learning Algorithms ◽

Big Data Analytics ◽

Machine Learning Algorithms

Download Full-text

An evaluation of big data analytics in feature selection for long-lead extreme floods forecasting

2016 IEEE 13th International Conference on Networking, Sensing, and Control (ICNSC) ◽

10.1109/icnsc.2016.7479007 ◽

2016 ◽

Cited By ~ 2

Author(s):

Yong Zhuang ◽

Kui Yu ◽

Dawei Wang ◽

Wei Ding

Keyword(s):

Feature Selection ◽

Big Data ◽

Data Analytics ◽

Big Data Analytics ◽

Extreme Floods ◽

Selection For

Download Full-text

Predicting Travel Behaviour of International and Domestic Tourists using Big Data

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.c4324.098319 ◽

2019 ◽

Vol 8 (3) ◽

pp. 1572-1580

Keyword(s):

Machine Learning ◽

Big Data ◽

San Francisco ◽

Data Analytics ◽

Demand Forecasting ◽

Big Data Analytics ◽

Travel Behaviour ◽

Tourism Industry ◽

Machine Learning Algorithms ◽

Data Set

Tourism is one of the most important sectors contributing towards the economic growth of India. Big data analytics in the recent times is being applied in the tourism sector for the activities like tourism demand forecasting, prediction of interests of tourists’, identification of tourist attraction elements and behavioural patterns. The major objective of this study is to demonstrate how big data analytics could be applied in predicting the travel behaviour of International and Domestic tourists. The significance of machine learning algorithms and techniques in processing the big data is also important. Thus, the combination of machine learning and big data is the state-of-art method which has been acclaimed internationally. While big data analytics and its application with respect to the tourism industry has attracted few researchers interest in the present times, there have been not much researches on this area of study particularly with respect to the scenario of India. This study intends to describe how big data analytics could be used in forecasting Indian tourists travel behaviour. To add much value to the research this study intends to categorize on what grounds the tourists chose domestic tourism and on what grounds they chose international tourism. The online datasets on places reviews from cities namely Chicago, Beijing, New York, Dubai, San Francisco, London, New Delhi and Shanghai have been gathered and an associative rule mining based algorithm has been applied on the data set in order to attain the objectives of the study

Download Full-text

The dynamics of traffic congestion: a specific look into malaysian scenario and the plausible solutions to eradicate it using machine learning

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v15.i2.pp1086-1094 ◽

2019 ◽

Vol 15 (2) ◽

pp. 1086

Author(s):

M. Ali ◽

T. K. Sheng ◽

K. M. Yusof ◽

M. R. Suhaili ◽

N. E. Ghazali ◽

...

Keyword(s):

Machine Learning ◽

Decision Making ◽

Big Data ◽

Traffic Congestion ◽

Data Analytics ◽

Big Data Analytics ◽

Active Role ◽

Machine Learning Algorithms ◽

Asia Pacific ◽

The Government

Transportation has been considered as the backbone of the economy for the past many years. Unfortunately, since few years due to the uncontrolled urbanization and inadequate planning, countries are facing problem of congestion. The congestion is hindering the economic growth and also causing environmental issues. This has caused serious concerns among the major economies of the world, especially in Asia-Pacific region. Many countries are playing an active role in eradicating this problem and some have been quite successful so far. Malaysia, being a major ASEAN economy is also tackling with this huge problem. The authorities are committed to solve the issue. In this regard, solving the issue leveraging the use of big data analytics has become crucial. The authorities can form a complete robust framework based on big data analytics and decision making process to solve the issue effectively. The work focuses and observes the traffic data samples and analyzes the accuracy of machine learning algorithms, which helps in decision making. Yet, here is a lot to be done if the government needs to solve the problem effectively. Supposedly, a comprehensive big data transport framework leveraging machine learning, is one way to solve the issue.

Download Full-text

Improvement of Support Vector Machine Algorithm in Big Data Background

Mathematical Problems in Engineering ◽

10.1155/2021/5594899 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Babacar Gaye ◽

Dezheng Zhang ◽

Aziguli Wulamu

Keyword(s):

Machine Learning ◽

Data Mining ◽

Big Data ◽

Time Complexity ◽

Dual Problem ◽

Learning Algorithm ◽

Rapid Development ◽

Machine Learning Algorithms ◽

Support Vector ◽

Original Space

With the rapid development of the Internet and the rapid development of big data analysis technology, data mining has played a positive role in promoting industry and academia. Classification is an important problem in data mining. This paper explores the background and theory of support vector machines (SVM) in data mining classification algorithms and analyzes and summarizes the research status of various improved methods of SVM. According to the scale and characteristics of the data, different solution spaces are selected, and the solution of the dual problem is transformed into the classification surface of the original space to improve the algorithm speed. Research Process. Incorporating fuzzy membership into multicore learning, it is found that the time complexity of the original problem is determined by the dimension, and the time complexity of the dual problem is determined by the quantity, and the dimension and quantity constitute the scale of the data, so it can be based on the scale of the data Features Choose different solution spaces. The algorithm speed can be improved by transforming the solution of the dual problem into the classification surface of the original space. Conclusion. By improving the calculation rate of traditional machine learning algorithms, it is concluded that the accuracy of the fitting prediction between the predicted data and the actual value is as high as 98%, which can make the traditional machine learning algorithm meet the requirements of the big data era. It can be widely used in the context of big data.

Download Full-text

Crime Data Forecasting Using Machine Learning and Big Data Analytics

Webology ◽

10.14704/web/v18si04/web18284 ◽

2021 ◽

Vol 18 (Special Issue 04) ◽

pp. 591-606

Author(s):

R. Brindha ◽

Dr.M. Thillaikarasi

Keyword(s):

Neural Network ◽

Machine Learning ◽

Big Data ◽

Data Analytics ◽

Big Data Analytics ◽

Machine Learning Algorithms ◽

Geographical Information ◽

Recursive Feature Elimination ◽

Support Vector ◽

Crime Data

Big data analytics (BDA) is a system based method with an aim to recognize and examine different designs, patterns and trends under the big dataset. In this paper, BDA is used to visualize and trends the prediction where exploratory data analysis examines the crime data. “A successive facts and patterns have been taken in following cities of California, Washington and Florida by using statistical analysis and visualization”. The predictive result gives the performance using Keras Prophet Model, LSTM and neural network models followed by prophet model which are the existing methods used to find the crime data under BDA technique. But the crime actions increases day by day which is greater task for the people to overcome the challenging crime activities. Some ignored the essential rate of influential aspects. To overcome these challenging problems of big data, many studies have been developed with limited one or two features. “This paper introduces a big data introduces to analyze the influential aspects about the crime incidents, and examine it on New York City. The proposed structure relates the dynamic machine learning algorithms and geographical information system (GIS) to consider the contiguous reasons of crime data. Recursive feature elimination (RFE) is used to select the optimum characteristic data. Exploitation of gradient boost decision tree (GBDT), logistic regression (LR), support vector machine (SVM) and artificial neural network (ANN) are related to develop the optimum data model. Significant impact features were then reviewed by applying GBDT and GIS”. The experimental results illustrates that GBDT along with GIS model combination can identify the crime ranking with high performance and accuracy compared to existing method.”

Download Full-text

Developing Classification Model for Chickpea Types using Machine Learning Algorithms

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.a8057.1110120 ◽

2020 ◽

Vol 10 (1) ◽

pp. 5-11

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Confusion Matrix ◽

Research Work ◽

Machine Learning Algorithms ◽

Classification Model ◽

Research Center ◽

Domain Experts ◽

Selection Mechanisms ◽

Type Classification

Ethiopia is the leading producer of chickpea in Africa and among the top ten most important producers of chickpea in the world. Debre Zeit Agriculture Research Center is a research center in Ethiopia which is mandated for the improvement of chickpea and other crops. Genome enabled prediction technologies trying to transform the classification of chickpea types and upgrading the existing identification paradigm.Current state of the identification of chickpea types in Ethiopia still sticks to a manual. Domain experts tried to recognize every chickpea type, the way and efficiency of identifying each chickpea types mainly depend on the skills and experience of experts in the domain area and this frequently causes error and sometimes inaccurate. Most of the classification and identification of crops researches were done outside Ethiopia; for local and emerging varieties, there is a need to design classification model that assists selection mechanisms of chickpea and even accuracy of an existing algorithm should be verified and optimized. The main aim of this study is to design chickpea type classification model using machine learning algorithm that classify chickpea types. This research work has a total of 8303 records with 8 features and 80% for training and 20% for testing were used. Data preprocessing were done to prepare the dataset for experiments. ANN, SVM and DT were used to build the model. For evaluating the performance of the model confusion matrix with Accuracy, Recall and Precision were used. The experimental results show that the best-performed algorithms were decision tree and achieve 97.5% accuracy. After the evaluation of results found in this research work, agriculture research centers and companies have benefited. The model of chickpea type classification will be applied in Debre Zeit agriculture research center in Ethiopia as a base to support the experts during chickpea type identification process. In addition it enables the expertise to save time, effort and cost with the support of the identification model. Moreover, this research can also be used as a corner stone in the area and will be referred by future researchers in the domain area.

Download Full-text

Big Data on Machine Learning – A Review

Engineering and Scientific International Journal ◽

10.30726/esij/v8.i3.2021.83018 ◽

2021 ◽

Vol 8 (3) ◽

Author(s):

Balasree K ◽

Dharmarajan K

Keyword(s):

Machine Learning ◽

Big Data ◽

Data Storage ◽

Data Analytics ◽

Rapid Development ◽

Learning Algorithms ◽

Big Data Analytics ◽

Machine Learning Algorithms ◽

Data Sets ◽

Big Data Technology

In rapid development of Big Data technology over the recent years, this paper discussing about the Machine Learning (ML) playing role that is based on methods and algorithms to Big Data Processing and Big Data Analytics. In evolutionary fields and computing fields of developments that both are complementing each other. Big Data: The rapid growth of such data solutions needed to be studied and provided to handle then to gain the knowledge from datasets and extracting values due to the data sets are very high in velocity and variety. The Big data analytics are involving and indicating the appropriate data storage and computational outline that enhanced by using Scalable Machine Learning Algorithms and Big Data Analytics then the analytics to reveal the massive amounts of hidden data’s and secret correlations. This type of Analytic information useful for organizations and companies to gain deeper knowledge, development and getting advantages over the competition. When using this Analytics we can predict the accurate implementation over the data. This paper presented about the detailed review of state-of-the-art developments and overview of advantages and challenges in Machine Learning Algorithms over big data analytics.

Download Full-text