MACHINE LEARNING AND LINK ANALYSIS FOR WEB CONTENT MINING

Violence Detection over Online Social Networks: An Arabic Sentiment Analysis Approach

International Journal of Interactive Mobile Technologies (iJIM) ◽

10.3991/ijim.v15i14.23029 ◽

2021 ◽

Vol 15 (14) ◽

pp. 90

Author(s):

Monther Khalafat ◽

Ja'far S. Alqatawna ◽

Rizik M. H. Al-Sayyed ◽

Mohammad Eshtay ◽

Thaeer Kobbaey

Keyword(s):

Machine Learning ◽

Social Media ◽

Sentiment Analysis ◽

Web Content ◽

Web Content Mining ◽

Social Media Networks ◽

Violence Detection ◽

The Social ◽

Machine Learning Approach ◽

Content Mining

<p class="0abstract">Today, the influence of the social media on different aspects of our lives is increasing, many scholars from various disciplines and majors looking at the social media networks as the ongoing revolution. In Social media networks, many bonds and connections can be established whether being direct or indirect ties. In fact, Social networks are used not only by people but also by companies. People usually create their own profiles and join communities to discuss different common issues that they have interest in. On the other hand, companies also can create their virtual presence on the social media networks to benefit from this media to understand the customers and gather richer information about them. With all of the benefits and advantages of social media networks, they should not always be seen as a safe place for communicating, sharing information and ideas, and establishing virtual communities. These information and ideas could carry with them hatred speeches that must be detected to avoid raising violence. Therefore, web content mining can be used to handle this issue. Web content mining is gaining more concern because of its importance for many businesses and institutions. Sentiment Analysis (SA) is an important sub-area of web content mining. The purpose of SA is to determine the overall sentiment attitude of writer towards a specific entity and classify these opinions automatically. There are two main approaches to build systems of sentiment analysis: the machine learning approach and the lexicon-based approach. This research presents the design and implementation for violence detection over social media using machine learning approach. Our system works on Jordanian Arabic dialect instead of Modern Standard Arabic (MSA). The data was collected from two popular social media websites (Facebook, Twitter) and has used native speakers to annotate the data. Moreover, different preprocessing techniques have been used to show their effect on our model accuracy. The Arabic lexicon was used for generating feature vectors and separate them to features set. Here, we have three well known machine learning algorithms: Support Vector Machine (SVM), Naive Bayes (NB) and k-Nearest Neighbors (KNN). Building on this view, Information Science Research Institute’s (ISRI) stemming and stop word file as a result of preprocessing were used to extract the features. Indeed, several features have been extracted; however, using the SVM classifier reveals that unigram and features extracted from lexicon are characterized by the highest accuracy to detect violence.</p>

Download Full-text

Monitoring of Technology Adoption Using Web Content Mining of Location Information and Geographic Information Systems: A Case Study of Digital Breast Tomosynthesis

JCO Clinical Cancer Informatics ◽

10.1200/cci.17.00150 ◽

2018 ◽

pp. 1-10 ◽

Cited By ~ 1

Author(s):

Tracy Onega ◽

Dharmanshu Kamra ◽

Jennifer Alford-Teaster ◽

Saeed Hassanpour

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Language Processing ◽

Digital Breast Tomosynthesis ◽

Location Information ◽

Web Content ◽

Breast Tomosynthesis ◽

Web Content Mining ◽

Content Mining ◽

Program Interface

Purpose To our knowledge, integration of Web content mining of publicly available addresses with a geographic information system (GIS) has not been applied to the timely monitoring of medical technology adoption. Here, we explore the diffusion of a new breast imaging technology, digital breast tomosynthesis (DBT). Methods We used natural language processing and machine learning to extract DBT facility location information using a set of potential sites for the New England region of the United States via a Google search application program interface. We assessed the accuracy of the algorithm using a validated set of publicly available addresses of locations that provide DBT from the DBT technology vendor, Hologic. We quantified precision, recall, and F1 score, aiming for an F1 score of ≥ 95% as the desirable performance. By reverse geocoding on the basis of the results of the Google Maps application program interface, we derived a spatial data set for use in an ArcGIS environment. Within the GIS, a host of spatiotemporal analyses and geovisualization techniques are possible. Results We developed a semiautomated system that integrated DBT location information into a GIS that was feasible and of reasonable quality. Initial accuracy of the algorithm was poor using only a search term list for information retrieval (precision, 35%; recall, 44%; F1 score, 39%), but performance dramatically improved by leveraging natural language processing and simple machine learning techniques to isolate single, valid instances of DBT location information (precision, 92%; recall, 96%; F1 score, 94%). Reverse geocoding yielded reliable geographic coordinates for easy implementation into a GIS for mapping and planned monitoring. Conclusion Our novel approach can be applicable to technologies beyond DBT, which may inform equitable access over time and space.

Download Full-text

Machine Learning Techniques in Web Content Mining: A Comparative Analysis

Journal of Information & Knowledge Management ◽

10.1142/s0219649214500051 ◽

2014 ◽

Vol 13 (01) ◽

pp. 1450005 ◽

Cited By ~ 4

Author(s):

Basavaraj S. Anami ◽

Ramesh S. Wadawadagi ◽

Veerappa B. Pagi

Keyword(s):

Machine Learning ◽

Data Mining ◽

Comparative Analysis ◽

Machine Learning Techniques ◽

Web Content ◽

Web Data ◽

Web Content Mining ◽

Content Mining ◽

Automated Discovery ◽

And Performance

With incessantly growing amount of information published over Web pages, the World Wide Web (WWW) has become prolific in the field of data mining research. The heterogeneous and semi-structured nature of Web data has made the process of automated discovery a challenging issue. Web Content Mining (WCM) essentially uses data mining techniques to effectively discover knowledge from Web page contents. The intent of this study is to provide a comparative analysis of Machine Learning (ML) techniques available in the literature for WCM. For analysis, the article focuses on issues such as representation techniques, learning methods, datasets used and performance of each method as a criterion. The survey observes that some of the traditional ML algorithms have been efficiently used to work on Web data. Finally, the paper concludes citing some promising issues for further research in this domain.

Download Full-text

Web content mining for alias identification: A first step towards suspect tracking

Proceedings of 2011 IEEE International Conference on Intelligence and Security Informatics ◽

10.1109/isi.2011.5984000 ◽

2011 ◽

Cited By ~ 2

Author(s):

Tarique Anwar ◽

Muhammad Abulaish ◽

Khaled Alghathbar

Keyword(s):

Web Content ◽

Web Content Mining ◽

Content Mining

Download Full-text

Web content mining for comparing corporate and third-party online reporting: a case study on solid waste management

Business Strategy and the Environment ◽

10.1002/bse.549 ◽

2009 ◽

Vol 18 (3) ◽

pp. 137-148 ◽

Cited By ~ 18

Author(s):

Irene Pollach ◽

Arno Scharl ◽

Albert Weichselbraun

Keyword(s):

Waste Management ◽

Solid Waste ◽

Solid Waste Management ◽

Third Party ◽

Web Content ◽

Web Content Mining ◽

Content Mining

Download Full-text

Study on Method of Web Content Mining for Non-XML Documents

Communications in Computer and Information Science - Information Computing and Applications ◽

10.1007/978-3-642-16339-5_31 ◽

2010 ◽

pp. 236-243

Author(s):

Jianguo Chen ◽

Hao Chen ◽

Jie Guo

Keyword(s):

Web Content ◽

Web Content Mining ◽

Xml Documents ◽

Content Mining

Download Full-text

Noise Elimination from Web Page Based on Regular Expressions for Web Content Mining

Smart Innovation, Systems and Technologies - Advanced Computing, Networking and Informatics- Volume 1 ◽

10.1007/978-3-319-07353-8_63 ◽

2014 ◽

pp. 545-554 ◽

Cited By ~ 1

Author(s):

Amit Dutta ◽

Sudipta Paria ◽

Tanmoy Golui ◽

Dipak Kumar Kole

Keyword(s):

Regular Expressions ◽

Web Content ◽

Web Page ◽

Noise Elimination ◽

Web Content Mining ◽

Content Mining

Download Full-text

Similarity Based Web Data Extraction and Integration System for Web Content Mining

Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering - Advances in Communication, Network, and Computing ◽

10.1007/978-3-642-35615-5_41 ◽

2012 ◽

pp. 269-274

Author(s):

Srikantaiah K.C. ◽

Suraj M. ◽

Venugopal K.R. ◽

Iyengar S.S. ◽

L. M. Patnaik

Keyword(s):

Data Extraction ◽

Web Content ◽

Web Data ◽

Integration System ◽

Web Content Mining ◽

Web Data Extraction ◽

Content Mining

Download Full-text

Multilingual Web Content Mining

Intelligent Agents for Data Mining and Information Retrieval ◽

10.4018/978-1-59140-194-0.ch006 ◽

2004 ◽

pp. 88-100

Author(s):

Rowena Chau ◽

Chung-Hsing Yeh

Keyword(s):

Information Filtering ◽

User Profile ◽

Linguistic Knowledge ◽

Web Content ◽

Self Organizing Maps ◽

Web Documents ◽

Web Content Mining ◽

Concept Space ◽

Content Mining ◽

Multilingual Text

This chapter presents a novel user-oriented, concept-based approach to multilingual web content mining using self-organizing maps. The multilingual linguistic knowledge required for multilingual web content mining is made available by encoding all multilingual concept-term relationships using a multilingual concept space. With this linguistic knowledge base, a concept-based multilingual text classifier is developed. It reveals the conceptual content of multilingual web documents and forms concept categories of multilingual web documents on a concept-based browsing interface. To personalize multilingual web content mining, a concept-based user profile is generated from a user’s bookmark file to highlight the user’s topics of information interest on the browsing interface. As such, both explorative browsing and user-oriented, concept-focused information filtering in multilingual web are facilitated.

Download Full-text

Style and fit customization: a web content mining approach to evaluate online mass customization experiences

Journal of Fashion Marketing and Management ◽

10.1108/jfmm-12-2019-0288 ◽

2020 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Chunmin Lang ◽

Sibei Xia ◽

Chuanlan Liu

Keyword(s):

Feature Extraction ◽

Data Analysis ◽

Mass Customization ◽

Web Content ◽

Content Type ◽

Web Content Mining ◽

Major Benefit ◽

Content Mining ◽

Emotional Value ◽

Benefits And Costs

PurposeThis study intends to examine consumers' fashion customization experiences through a web content mining (WCM) approach. By applying the theory of customer value, this study explores the benefits and costs of two levels of mass customization (MC) to identify the values derived from style (i.e. shoe customization) and fit customization experiences (i.e. apparel customization) and further to compare the dominating dimensions of value derived across style and fit customization.Design/methodology/approachA WCM approach was applied. Also, two case studies were conducted with one focusing on style customization and the other focusing on fit customization. The brand Vans was selected to examine style customization in study 1. The brand Sumissura was selected to examine fit customization in study 2. Consumers' comments on customization experiences from these two brands were collected through social networks, respectively. After data cleaning, 394 reviews for Vans and 510 reviews for Sumissura were included in the final data analysis. Co-occurrence plots, feature extraction and grouping were used for the data analysis.FindingsThe emotional value was found to be the major benefit for style customization, while the functional value was indicated as the major benefit for fit customization, followed by ease of use and emotional value. In addition, three major themes of costs, including unsatisfied service, disappointing product performance and financial risk, were revealed by excavating and evaluating consumers' feedback of their actual clothing customization experiences with Sumissura.Originality/valueThis study initiates the effort to use web mining, specifically, the WCM approach to thoroughly investigate the benefits and costs of MC through real consumers' feedback of two different types of fashion products. The analysis of this study also reflects the levels of customization: style and fit. It provides an in-depth text analysis of online MC consumers' feedback through the use of feature extraction analysis and word co-occurrence networks.

Download Full-text