Iktishaf+: A Big Data Tool with Automatic Labeling for Road Traffic Social Sensing and Event Detection Using Distributed Machine Learning

Digital societies could be characterized by their increasing desire to express themselves and interact with others. This is being realized through digital platforms such as social media that have increasingly become convenient and inexpensive sensors compared to physical sensors in many sectors of smart societies. One such major sector is road transportation, which is the backbone of modern economies and costs globally 1.25 million deaths and 50 million human injuries annually. The cutting-edge on big data-enabled social media analytics for transportation-related studies is limited. This paper brings a range of technologies together to detect road traffic-related events using big data and distributed machine learning. The most specific contribution of this research is an automatic labelling method for machine learning-based traffic-related event detection from Twitter data in the Arabic language. The proposed method has been implemented in a software tool called Iktishaf+ (an Arabic word meaning discovery) that is able to detect traffic events automatically from tweets in the Arabic language using distributed machine learning over Apache Spark. The tool is built using nine components and a range of technologies including Apache Spark, Parquet, and MongoDB. Iktishaf+ uses a light stemmer for the Arabic language developed by us. We also use in this work a location extractor developed by us that allows us to extract and visualize spatio-temporal information about the detected events. The specific data used in this work comprises 33.5 million tweets collected from Saudi Arabia using the Twitter API. Using support vector machines, naïve Bayes, and logistic regression-based classifiers, we are able to detect and validate several real events in Saudi Arabia without prior knowledge, including a fire in Jeddah, rains in Makkah, and an accident in Riyadh. The findings show the effectiveness of Twitter media in detecting important events with no prior knowledge about them.

Download Full-text

Road Traffic Event Detection Using Twitter Data, Machine Learning, and Apache Spark

2019 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI) ◽

10.1109/smartworld-uic-atc-scalcom-iop-sci.2019.00332 ◽

2019 ◽

Cited By ~ 1

Author(s):

Ebtesam Alomari ◽

Rashid Mehmood ◽

Iyad Katib

Keyword(s):

Machine Learning ◽

Event Detection ◽

Road Traffic ◽

Apache Spark ◽

Twitter Data

Download Full-text

On Scalability of Distributed Machine Learning with Big Data on Apache Spark

Big Data – BigData 2018 - Lecture Notes in Computer Science ◽

10.1007/978-3-319-94301-5_16 ◽

2018 ◽

pp. 209-219

Author(s):

Ameen Abdel Hai ◽

Babak Forouraghi

Keyword(s):

Machine Learning ◽

Big Data ◽

Apache Spark ◽

Distributed Machine Learning

Download Full-text

Iktishaf: a Big Data Road-Traffic Event Detection Tool Using Twitter and Spark Machine Learning

Mobile Networks and Applications ◽

10.1007/s11036-020-01635-y ◽

2020 ◽

Cited By ~ 7

Author(s):

Ebtesam Alomari ◽

Iyad Katib ◽

Rashid Mehmood

Keyword(s):

Machine Learning ◽

Big Data ◽

Event Detection ◽

Road Traffic ◽

Detection Tool

Download Full-text

Big data Predictive Analytics for Apache Spark using Machine Learning

2020 Global Conference on Wireless and Optical Technologies (GCWOT) ◽

10.1109/gcwot49901.2020.9391620 ◽

2020 ◽

Author(s):

Muhammad Junaid ◽

Shiraz Ali Wagan ◽

Nawab Muhammad Faseeh Qureshi ◽

Choon Sung Nam ◽

Dong Ryeol Shin

Keyword(s):

Machine Learning ◽

Big Data ◽

Predictive Analytics ◽

Apache Spark

Download Full-text

Machine Learning for Business Analytics

Advances in Data Mining and Database Management - Challenges and Applications of Data Analytics in Social Perspectives ◽

10.4018/978-1-7998-2566-1.ch013 ◽

2021 ◽

pp. 232-256

Author(s):

Kağan Okatan

Keyword(s):

Machine Learning ◽

Data Mining ◽

Social Media ◽

Big Data ◽

Machine Learning Algorithms ◽

Decision Makers ◽

Business Analytics ◽

Business Intelligence Systems ◽

Long Time ◽

Rules Of The Game

All these types of analytics have been answering business questions for a long time about the principal methods of investigating data warehouses. Especially data mining and business intelligence systems support decision makers to reach the information they want. Many existing systems are trying to keep up with a phenomenon that has changed the rules of the game in recent years. This is undoubtedly the undeniable attraction of 'big data'. In particular, the issue of evaluating the big data generated especially by social media is among the most up-to-date issues of business analytics, and this issue demonstrates the importance of integrating machine learning into business analytics. This section introduces the prominent machine learning algorithms that are increasingly used for business analytics and emphasizes their application areas.

Download Full-text

Graph-Based Semi-Supervised Learning With Big Data

Cognitive Analytics ◽

10.4018/978-1-7998-2460-2.ch012 ◽

2020 ◽

pp. 214-244

Author(s):

Prithish Banerjee ◽

Mark Vere Culp ◽

Kenneth Jospeh Ryan ◽

George Michailidis

Keyword(s):

Machine Learning ◽

Big Data ◽

Supervised Learning ◽

Prior Knowledge ◽

Linear Algebra ◽

Real Data ◽

Data Set ◽

Regression Problems ◽

Classification And Regression ◽

Empirical Demonstration

This chapter presents some popular graph-based semi-supervised approaches. These techniques apply to classification and regression problems and can be extended to big data problems using recently developed anchor graph enhancements. The background necessary for understanding this Chapter includes linear algebra and optimization. No prior knowledge in methods of machine learning is necessary. An empirical demonstration of the techniques for these methods is also provided on real data set benchmarks.

Download Full-text

Social Media Data Processing Infrastructure by Using Apache Spark Big Data Platform

Proceedings of the 2019 4th International Conference on Cloud Computing and Internet of Things - CCIOT 2019 ◽

10.1145/3361821.3361825 ◽

2019 ◽

Author(s):

Michal Podhoranyi ◽

Lukas Vojacek

Keyword(s):

Social Media ◽

Big Data ◽

Data Processing ◽

Apache Spark ◽

Social Media Data ◽

Data Platform ◽

Media Data

Download Full-text

Human Behavior Analysis Using Intelligent Big Data Analytics

Frontiers in Psychology ◽

10.3389/fpsyg.2021.686610 ◽

2021 ◽

Vol 12 ◽

Author(s):

Muhammad Usman Tariq ◽

Muhammad Babar ◽

Marc Poulin ◽

Akmal Saeed Khattak ◽

Mohammad Dahman Alshehri ◽

...

Keyword(s):

Social Media ◽

Big Data ◽

Human Behavior ◽

Data Analytics ◽

Data Science ◽

Big Data Analytics ◽

Apache Spark ◽

Social Media Data ◽

The Social ◽

Media Data

Intelligent big data analysis is an evolving pattern in the age of big data science and artificial intelligence (AI). Analysis of organized data has been very successful, but analyzing human behavior using social media data becomes challenging. The social media data comprises a vast and unstructured format of data sources that can include likes, comments, tweets, shares, and views. Data analytics of social media data became a challenging task for companies, such as Dailymotion, that have billions of daily users and vast numbers of comments, likes, and views. Social media data is created in a significant amount and at a tremendous pace. There is a very high volume to store, sort, process, and carefully study the data for making possible decisions. This article proposes an architecture using a big data analytics mechanism to efficiently and logically process the huge social media datasets. The proposed architecture is composed of three layers. The main objective of the project is to demonstrate Apache Spark parallel processing and distributed framework technologies with other storage and processing mechanisms. The social media data generated from Dailymotion is used in this article to demonstrate the benefits of this architecture. The project utilized the application programming interface (API) of Dailymotion, allowing it to incorporate functions suitable to fetch and view information. The API key is generated to fetch information of public channel data in the form of text files. Hive storage machinist is utilized with Apache Spark for efficient data processing. The effectiveness of the proposed architecture is also highlighted.

Download Full-text

Sentiment Analysis on Social Media Big Data With Multiple Tweet Words

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.j9684.0881019 ◽

2019 ◽

Vol 8 (10) ◽

pp. 3429-3434 ◽

Cited By ~ 2

Keyword(s):

Machine Learning ◽

Social Media ◽

Big Data ◽

Sentiment Analysis ◽

Language Processing ◽

Sentiment Classification ◽

Support Vector ◽

Decision Tree Classifier ◽

Machine Learning Classification ◽

Tree Classifier

The main objective of this paper is Analyze the reviews of Social Media Big Data of E-Commerce product’s. And provides helpful result to online shopping customers about the product quality and also provides helpful decision making idea to the business about the customer’s mostly liking and buying products. This covers all features or opinion words, like capitalized words, sequence of repeated letters, emoji, slang words, exclamatory words, intensifiers, modifiers, conjunction words and negation words etc available in tweets. The existing work has considered only two or three features to perform Sentiment Analysis with the machine learning technique Natural Language Processing (NLP). In this proposed work familiar Machine Learning classification models namely Multinomial Naïve Bayes, Support Vector Machine, Decision Tree Classifier, and, Random Forest Classifier are used for sentiment classification. The sentiment classification is used as a decision support system for the customers and also for the business.

Download Full-text

Assessment of Knowledge and Practice about Self Expressed Breast Milk among Saudi Mothers in Jazan Region, KSA, 2016

Journal of Advances in Medicine and Medical Research ◽

10.9734/jammr/2019/v29i1030132 ◽

2019 ◽

pp. 1-11

Author(s):

Karimah Mohammad Qutah ◽

Safar A. Alsaleem ◽

Abdullah Ahmed Najmi ◽

Muteb Bawwah Zabbani

Keyword(s):

Social Media ◽

Saudi Arabia ◽

Breast Milk ◽

Breast Feeding ◽

Post Partum ◽

Arabic Language ◽

Knowledge Level ◽

Expressed Breast Milk ◽

Level Of Knowledge ◽

Knowledge And Practice

Aim: To assess mother's knowledge and attitude regarding self-expressed milk in Jazan, Saudi Arabia. Methodology: Study Area: An observational and cross sectional study done in Obstetric Department (Well Baby and immunization Clinics) in King Fahd Central Hospital (KFCH), Jazan, Saudi Arabia and in six PHCCs in Jazan (randomly selected) from December 2016 - March 2017. Pregnant women who delivered babies before and post-partum women in Obstetric departments, Obstetric outpatient clinic, mother’s in well baby, and immunization clinics in mentioned areas were included in the study. Stratified multistage sampling techniques were used. N = 499 Saudi mothers calculated according to survey system with confidence level % 95. The questionnaire was self-administering questionnaire (in Arabic language). All data processed via Statistical Package for the Social Sciences (SPSS) version 19. Shapiro-Wilk test. Kruskal-Wallis test used to see the association between level of knowledge and practice with demographic variables that contains more than 2 variables. Mann-Whitney test and Spearman correlation were used. Results: Total of 499 mothers was participating aged 30±7 years with mean number of kids 2.98 ± 2. Mothers heard about self-expressed breast milks accounts 73.5% and 236 mothers of them were practice it. Both level of knowledge and practice accuracy were inadequate. Around one third of mothers heard about it from social media. More than third of the women practice it because of work related issues. The higher the educational level was the higher knowledge (p<0.001). Age and number of kids, has no statistically significant effect on the knowledge level (P = 0.417, 0.285). Working mothers have higher knowledge level than house wife and students (p<0.001), nurses especially who toke breast feeding teaching have higher knowledge level than physicians then teachers (p<0.001). Mothers who toke their knowledge from breast feeding courses have the highest knowledge level followed by medical stuffs other than physicians followed by social media and internet websites then physicians then mothers and last are friends (p<0.001). Mothers with more accurate practice were more knowledgeable than mothers with less accurate practices (p<0.001). Conclusion: Mothers knowledge and practice regarding self-expressed breast milk needed to be improved in order to give the babies better chance for exclusive breast feeding. Breast feeding courses for mothers give better results in term of accuracy of mother’s knowledge and practice of expressed breast milk.

Download Full-text