Development Direction of Machine Learning in the Era of Big Data

The core objective of Big Data technology is trying to dig out valuable information from massing huge variety of data structures. In order to achieve these goals, Big Data technology must be combined with machine learning. The uniqueness of Big Data has also brought unprecedented challenges to machine learning, in order to cope with these challenges machine learning should focus on the development of semi-supervised learning method, integrated learning with device integration and transfer learning method.

Download Full-text

Tweets Analysis with Big Data Technology and Machine Learning to Evaluate Smart and Sustainable Urban Mobility Actions in Barcelona

Complex, Intelligent and Software Intensive Systems - Advances in Intelligent Systems and Computing ◽

10.1007/978-3-030-50454-0_53 ◽

2020 ◽

pp. 510-519

Author(s):

Beniamino Di Martino ◽

Luigi Colucci Cante ◽

Mariangela Graziano ◽

Regina Enrich Sard

Keyword(s):

Machine Learning ◽

Big Data ◽

Urban Mobility ◽

Sustainable Urban Mobility ◽

Big Data Technology

Download Full-text

Semantics-Based Document Categorization Employing Semi-Supervised Learning

Advances in Linguistics and Communication Studies - Modern Computational Models of Semantic Discovery in Natural Language ◽

10.4018/978-1-4666-8690-8.ch005 ◽

2015 ◽

pp. 112-140 ◽

Cited By ~ 1

Author(s):

Jan Žižka ◽

František Dařena

Keyword(s):

Machine Learning ◽

Unsupervised Learning ◽

Supervised Learning ◽

Real World ◽

Supervised Machine Learning ◽

The Internet ◽

Learning Method ◽

Label Information ◽

Document Categorization

The automated categorization of unstructured textual documents according to their semantic contents plays important role particularly linked with the ever growing volume of such data originating from the Internet. Having a sufficient number of labeled examples, a suitable supervised machine learning-based classifier can be trained. When no labeling is available, an unsupervised learning method can be applied, however, the missing label information often leads to worse classification results. This chapter demonstrates a method based on semi-supervised learning when a smallish set of manually labeled examples improves the categorization process in comparison with clustering, and the results are comparable with the supervised learning output. For the illustration, a real-world dataset coming from the Internet is used as the input of the supervised, unsupervised, and semi-supervised learning. The results are shown for different number of the starting labeled samples used as “seeds” to automatically label the remaining volume of unlabeled items.

Download Full-text

Graph-Based Semi-Supervised Learning With Big Data

Cognitive Analytics ◽

10.4018/978-1-7998-2460-2.ch012 ◽

2020 ◽

pp. 214-244

Author(s):

Prithish Banerjee ◽

Mark Vere Culp ◽

Kenneth Jospeh Ryan ◽

George Michailidis

Keyword(s):

Machine Learning ◽

Big Data ◽

Supervised Learning ◽

Prior Knowledge ◽

Linear Algebra ◽

Real Data ◽

Data Set ◽

Regression Problems ◽

Classification And Regression ◽

Empirical Demonstration

This chapter presents some popular graph-based semi-supervised approaches. These techniques apply to classification and regression problems and can be extended to big data problems using recently developed anchor graph enhancements. The background necessary for understanding this Chapter includes linear algebra and optimization. No prior knowledge in methods of machine learning is necessary. An empirical demonstration of the techniques for these methods is also provided on real data set benchmarks.

Download Full-text

How Should Data Science Education Be?

International Journal of Energy Optimization and Engineering ◽

10.4018/ijeoe.2020040103 ◽

2020 ◽

Vol 9 (2) ◽

pp. 25-36

Author(s):

Necmi Gürsakal ◽

Ecem Ozkan ◽

Fırat Melih Yılmaz ◽

Deniz Oktay

Keyword(s):

Machine Learning ◽

Big Data ◽

Science Education ◽

Data Science ◽

Doctoral Programs ◽

Time Data ◽

High Demand ◽

The Core ◽

The World ◽

The Subject

The interest in data science is increasing in recent years. Data science, including mathematics, statistics, big data, machine learning, and deep learning, can be considered as the intersection of statistics, mathematics and computer science. Although the debate continues about the core area of data science, the subject is a huge hit. Universities have a high demand for data science. They are trying to live up to this demand by opening postgraduate and doctoral programs. Since the subject is a new field, there are significant differences between the programs given by universities in data science. Besides, since the subject is close to statistics, most of the time, data science programs are opened in the statistics departments, and this also causes differences between the programs. In this article, we will summarize the data science education developments in the world and in Turkey specifically and how data science education should be at the graduate level.

Download Full-text

Using Big Data Technology to Analyze the Development Direction of Internal Audit

Journal of Physics Conference Series ◽

10.1088/1742-6596/1648/4/042040 ◽

2020 ◽

Vol 1648 ◽

pp. 042040

Author(s):

Zuhui Wang

Keyword(s):

Big Data ◽

Internal Audit ◽

Development Direction ◽

Big Data Technology

Download Full-text

Typhoon Quantitative Rainfall Prediction from Big Data Analytics by Using the Apache Hadoop Spark Parallel Computing Framework

Atmosphere ◽

10.3390/atmos11080870 ◽

2020 ◽

Vol 11 (8) ◽

pp. 870 ◽

Cited By ~ 1

Author(s):

Chih-Chiang Wei ◽

Tzu-Hao Chou

Keyword(s):

Machine Learning ◽

Big Data ◽

Prediction Models ◽

Processing Unit ◽

Central Processing ◽

Rainfall Prediction ◽

Typhoon Rainfall ◽

Computing Framework ◽

Spark Framework ◽

Big Data Technology

Situated in the main tracks of typhoons in the Northwestern Pacific Ocean, Taiwan frequently encounters disasters from heavy rainfall during typhoons. Accurate and timely typhoon rainfall prediction is an imperative topic that must be addressed. The purpose of this study was to develop a Hadoop Spark distribute framework based on big-data technology, to accelerate the computation of typhoon rainfall prediction models. This study used deep neural networks (DNNs) and multiple linear regressions (MLRs) in machine learning, to establish rainfall prediction models and evaluate rainfall prediction accuracy. The Hadoop Spark distributed cluster-computing framework was the big-data technology used. The Hadoop Spark framework consisted of the Hadoop Distributed File System, MapReduce framework, and Spark, which was used as a new-generation technology to improve the efficiency of the distributed computing. The research area was Northern Taiwan, which contains four surface observation stations as the experimental sites. This study collected 271 typhoon events (from 1961 to 2017). The following results were obtained: (1) in machine-learning computation, prediction errors increased with prediction duration in the DNN and MLR models; and (2) the system of Hadoop Spark framework was faster than the standalone systems (single I7 central processing unit (CPU) and single E3 CPU). When complex computation is required in a model (e.g., DNN model parameter calibration), the big-data-based Hadoop Spark framework can be used to establish highly efficient computation environments. In summary, this study successfully used the big-data Hadoop Spark framework with machine learning, to develop rainfall prediction models with effectively improved computing efficiency. Therefore, the proposed system can solve problems regarding real-time typhoon rainfall prediction with high timeliness and accuracy.

Download Full-text

A note on label propagation for semi-supervised learning

Acta Universitatis Sapientiae Informatica ◽

10.1515/ausi-2015-0010 ◽

2015 ◽

Vol 7 (1) ◽

pp. 18-30

Author(s):

Zalán Bodó ◽

Lehel Csató

Keyword(s):

Machine Learning ◽

Supervised Learning ◽

Unlabeled Data ◽

Label Propagation ◽

Learning Method ◽

The Past ◽

Data Graph

Abstract Semi-supervised learning has become an important and thoroughly studied subdomain of machine learning in the past few years, because gathering large unlabeled data is almost costless, and the costly human labeling process can be minimized by semi-supervision. Label propagation is a transductive semi-supervised learning method that operates on the—most of the time undirected—data graph. It was introduced in [8] and since many variants were proposed. However, the base algorithm has two variants: the first variant presented in [8] and its slightly modified version used afterwards, e.g. in [7]. This paper presents and compares the two algorithms—both theoretically and experimentally—and also tries to make a recommendation which variant to use.

Download Full-text

Noise Removal Process from Label Classification using Machine Learning

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.c3920.098319 ◽

2019 ◽

Vol 8 (3) ◽

pp. 172-175

Keyword(s):

Machine Learning ◽

Big Data ◽

Supervised Learning ◽

Noise Removal ◽

Error Rates ◽

Training Data ◽

Learning Performance ◽

Training Dataset ◽

Noise Filtering ◽

Label Noise

Text classification and clustering approach is essential for big data environments. In supervised learning applications many classification algorithms have been proposed. In the era of big data, a large volume of training data is available in many machine learning works. However, there is a possibility of mislabeled or unlabeled data that are not labeled properly. Some labels may be incorrect resulted in label noise which in turn regress learning performance of a classifier. A general approach to address label noise is to apply noise filtering techniques to identify and remove noise before learning. A range of noise filtering approaches have been developed to improve the classifiers performance. This paper proposes noise filtering approach in text data during the training phase. Many supervised learning algorithms generates high error rates due to noise in training dataset, our work eliminates such noise and provides accurate classification system.

Download Full-text

Bi-LSTM Sentiment Classifier for Climate Change Issues in South Korea

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1056.0782s619 ◽

2019 ◽

Vol 8 (2S6) ◽

pp. 295-299

Keyword(s):

Climate Change ◽

Machine Learning ◽

Big Data ◽

South Korea ◽

Sentiment Analysis ◽

Training Data ◽

Learning Models ◽

Wide Range ◽

Machine Learning Models ◽

Big Data Technology

A sentiment analysis using SNS data can confirm various people’s thoughts. Thus an analysis using SNS can predict social problems and more accurately identify the complex causes of the problem. In addition, big data technology can identify SNS information that is generated in real time, allowing a wide range of people’s opinions to be understood without losing time. It can supplement traditional opinion surveys. The incumbent government mainly uses SNS to promote its policies. However, measures are needed to actively reflect SNS in the process of carrying out the policy. Therefore this paper developed a sentiment classifier that can identify public feelings on SNS about climate change. To that end, based on a dictionary formulated on the theme of climate change, we collected climate change SNS data for learning and tagged seven sentiments. Using training data, the sentiment classifier models were developed using machine learning models. The analysis showed that the Bi-LSTM model had the best performance than shallow models. It showed the highest accuracy (85.10%) in the seven sentiments classified, outperforming traditional machine learning (Naive Bayes and SVM) by approximately 34.53%p, and 7.14%p respectively. These findings substantiate the applicability of the proposed Bi-LSTM-based sentiment classifier to the analysis of sentiments relevant to diverse climate change issues.

Download Full-text

A Supervised Learning Algorithm to Forecast Weather Conditions for Playing Cricket

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.a4528.119119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 1560-1565

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Cloud Computing ◽

Big Data ◽

Internet Of Things ◽

Supervised Learning ◽

Learning Algorithm ◽

Weather Conditions ◽

Redundant Data ◽

Classification Technique

Now days, Machine learning is considered as the key technique in the field of technologies, such as, Internet of things (IOT), Cloud computing, Big data and Artificial Intelligence etc. As technology enhances, lots of incorrect and redundant data are collected from these fields. To make use of these data for a meaningful purpose, we have to apply mining or classification technique in the real world. In this paper, we have proposed two nobel approaches towards data classification by using supervised learning algorithm

Download Full-text