Author(s):  
Monther Khalafat ◽  
Ja'far S. Alqatawna ◽  
Rizik M. H. Al-Sayyed ◽  
Mohammad Eshtay ◽  
Thaeer Kobbaey

<p class="0abstract">Today, the influence of the social media on different aspects of our lives is increasing, many scholars from various disciplines and majors looking at the social media networks as the ongoing revolution. In Social media networks, many bonds and connections can be established whether being direct or indirect ties. In fact, Social networks are used not only by people but also by companies. People usually create their own profiles and join communities to discuss different common issues that they have interest in. On the other hand, companies also can create their virtual presence on the social media networks to benefit from this media to understand the customers and gather richer information about them. With all of the benefits and advantages of social media networks, they should not always be seen as a safe place for communicating, sharing information and ideas, and establishing virtual communities. These information and ideas could carry with them hatred speeches that must be detected to avoid raising violence. Therefore, web content mining can be used to handle this issue. Web content mining is gaining more concern because of its importance for many businesses and institutions.  Sentiment Analysis (SA) is an important sub-area of web content mining.  The purpose of SA is to determine the overall sentiment attitude of writer towards a specific entity and classify these opinions automatically. There are two main approaches to build systems of sentiment analysis: the machine learning approach and the lexicon-based approach. This research presents the design and implementation for violence detection over social media using machine learning approach. Our system works on Jordanian Arabic dialect instead of Modern Standard Arabic (MSA). The data was collected from two popular social media websites (Facebook, Twitter) and has used native speakers to annotate the data. Moreover, different preprocessing techniques have been used to show their effect on our model accuracy. The Arabic lexicon was used for generating feature vectors and separate them to features set. Here, we have three well known machine learning algorithms: Support Vector Machine (SVM), Naive Bayes (NB) and k-Nearest Neighbors (KNN). Building on this view, Information Science Research Institute’s (ISRI) stemming and stop word file as a result of preprocessing were used to extract the features. Indeed, several features have been extracted; however, using the SVM classifier reveals that unigram and features extracted from lexicon are characterized by the highest accuracy to detect violence.</p>


2018 ◽  
pp. 1-10 ◽  
Author(s):  
Tracy Onega ◽  
Dharmanshu Kamra ◽  
Jennifer Alford-Teaster ◽  
Saeed Hassanpour

Purpose To our knowledge, integration of Web content mining of publicly available addresses with a geographic information system (GIS) has not been applied to the timely monitoring of medical technology adoption. Here, we explore the diffusion of a new breast imaging technology, digital breast tomosynthesis (DBT). Methods We used natural language processing and machine learning to extract DBT facility location information using a set of potential sites for the New England region of the United States via a Google search application program interface. We assessed the accuracy of the algorithm using a validated set of publicly available addresses of locations that provide DBT from the DBT technology vendor, Hologic. We quantified precision, recall, and F1 score, aiming for an F1 score of ≥ 95% as the desirable performance. By reverse geocoding on the basis of the results of the Google Maps application program interface, we derived a spatial data set for use in an ArcGIS environment. Within the GIS, a host of spatiotemporal analyses and geovisualization techniques are possible. Results We developed a semiautomated system that integrated DBT location information into a GIS that was feasible and of reasonable quality. Initial accuracy of the algorithm was poor using only a search term list for information retrieval (precision, 35%; recall, 44%; F1 score, 39%), but performance dramatically improved by leveraging natural language processing and simple machine learning techniques to isolate single, valid instances of DBT location information (precision, 92%; recall, 96%; F1 score, 94%). Reverse geocoding yielded reliable geographic coordinates for easy implementation into a GIS for mapping and planned monitoring. Conclusion Our novel approach can be applicable to technologies beyond DBT, which may inform equitable access over time and space.


2014 ◽  
Vol 13 (01) ◽  
pp. 1450005 ◽  
Author(s):  
Basavaraj S. Anami ◽  
Ramesh S. Wadawadagi ◽  
Veerappa B. Pagi

With incessantly growing amount of information published over Web pages, the World Wide Web (WWW) has become prolific in the field of data mining research. The heterogeneous and semi-structured nature of Web data has made the process of automated discovery a challenging issue. Web Content Mining (WCM) essentially uses data mining techniques to effectively discover knowledge from Web page contents. The intent of this study is to provide a comparative analysis of Machine Learning (ML) techniques available in the literature for WCM. For analysis, the article focuses on issues such as representation techniques, learning methods, datasets used and performance of each method as a criterion. The survey observes that some of the traditional ML algorithms have been efficiently used to work on Web data. Finally, the paper concludes citing some promising issues for further research in this domain.


Author(s):  
Rowena Chau ◽  
Chung-Hsing Yeh

This chapter presents a novel user-oriented, concept-based approach to multilingual web content mining using self-organizing maps. The multilingual linguistic knowledge required for multilingual web content mining is made available by encoding all multilingual concept-term relationships using a multilingual concept space. With this linguistic knowledge base, a concept-based multilingual text classifier is developed. It reveals the conceptual content of multilingual web documents and forms concept categories of multilingual web documents on a concept-based browsing interface. To personalize multilingual web content mining, a concept-based user profile is generated from a user’s bookmark file to highlight the user’s topics of information interest on the browsing interface. As such, both explorative browsing and user-oriented, concept-focused information filtering in multilingual web are facilitated.


2020 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Chunmin Lang ◽  
Sibei Xia ◽  
Chuanlan Liu

PurposeThis study intends to examine consumers' fashion customization experiences through a web content mining (WCM) approach. By applying the theory of customer value, this study explores the benefits and costs of two levels of mass customization (MC) to identify the values derived from style (i.e. shoe customization) and fit customization experiences (i.e. apparel customization) and further to compare the dominating dimensions of value derived across style and fit customization.Design/methodology/approachA WCM approach was applied. Also, two case studies were conducted with one focusing on style customization and the other focusing on fit customization. The brand Vans was selected to examine style customization in study 1. The brand Sumissura was selected to examine fit customization in study 2. Consumers' comments on customization experiences from these two brands were collected through social networks, respectively. After data cleaning, 394 reviews for Vans and 510 reviews for Sumissura were included in the final data analysis. Co-occurrence plots, feature extraction and grouping were used for the data analysis.FindingsThe emotional value was found to be the major benefit for style customization, while the functional value was indicated as the major benefit for fit customization, followed by ease of use and emotional value. In addition, three major themes of costs, including unsatisfied service, disappointing product performance and financial risk, were revealed by excavating and evaluating consumers' feedback of their actual clothing customization experiences with Sumissura.Originality/valueThis study initiates the effort to use web mining, specifically, the WCM approach to thoroughly investigate the benefits and costs of MC through real consumers' feedback of two different types of fashion products. The analysis of this study also reflects the levels of customization: style and fit. It provides an in-depth text analysis of online MC consumers' feedback through the use of feature extraction analysis and word co-occurrence networks.


Sign in / Sign up

Export Citation Format

Share Document