scholarly journals Text Document Clustering using K-Means and Dbscan by using Machine Learning

With the growth of today’s world, text data is also increasing which are created by different media like social networking sites, web, and other informatics and sources e.t.c . Clustering is an important part of the data mining. Clustering is the procedure of cleave the large &similar type of text into the same group. Clustering is generally used in many applications like medical, biology, signal processing, etc. Algorithm contains traditional clustering like hierarchal clustering, density based clustering and self-organized map clustering. By using kmeans features and dbscan we can able to cluster the document. dbscan a part of clustering shows to a number of standard. The data sets will automatically evaluate the formulation of each and every part data through by the use of dbscan and k-means that will shows the clustering power of the data. document consists of multiple topic. Document clustering demands the context of signifier and form ancestry. Descriptors are the expression used to describe the satisfied inside the cluster.

Author(s):  
Noman Ashraf ◽  
Abid Rafiq ◽  
Sabur Butt ◽  
Hafiz Muhammad Faisal Shehzad ◽  
Grigori Sidorov ◽  
...  

On YouTube, billions of videos are watched online and millions of short messages are posted each day. YouTube along with other social networking sites are used by individuals and extremist groups for spreading hatred among users. In this paper, we consider religion as the most targeted domain for spreading hate speech among people of different religions. We present a methodology for the detection of religion-based hate videos on YouTube. Messages posted on YouTube videos generally express the opinions of users’ related to that video. We provide a novel dataset for religious hate speech detection on Youtube comments. The proposed methodology applies data mining techniques on extracted comments from religious videos in order to filter religion-oriented messages and detect those videos which are used for spreading hate. The supervised learning algorithms: Support Vector Machine (SVM), Logistic Regression (LR), and k-Nearest Neighbor (k-NN) are used for baseline results.


Author(s):  
Mr. Bhavar Shivam S.

Today we do a lot of things online from shopping to data sharing on social networking sites. Social networking (SNS) is good for releasing stress and depression by sharing one’s thoughts. Thus, emotion detection has become a hot trend to day. But there is a problem in analyzing emotions on a SNS like twitter as it generates lakhs of tweets each day and it is hard to keep track of the emotion behind each tweet as it is impossible for a human being to read and decide the emotions behind tweets. So, to help understand behind the texts in a SNS site we thought of designing a project which will keep track of the tweets and predict the right emotion behind the tweets whether they have a positive or a negative sentiment behind them. This thought of project can be achieved by a integration of SNS with NLP and machine learning together. For SNS we will use Twitter as it generates a lot of data which is accessible freely using an API. First, we will enter a keyword and fetch tweets from the twitter. Then stop words will be removed from these tweets using NLTK stop words database. Then the tweets will be passed for POS tagging and only right form of grammatical words will be kept and others will be removed. Then we create a training dataset with two types positive and negative. Then SVM algorithm will be trained using this training dataset. Then each tweet will be passed to the SVM as testing dataset which in turn will return classification of each tweet as a whole in two classes positive and negative. Thus, our application will be helpful in recognizing emotion behind a tweet.


2020 ◽  
Vol 17 (4) ◽  
pp. 1328
Author(s):  
Syed Tanzeel Rabani ◽  
Qamar Rayees Khan ◽  
Akib Mohi UD Din Khanday

Suicidal ideation is one of the most severe mental health issues faced by people all over the world. There are various risk factors involved that can lead to suicide. The most common & critical risk factors among them are depression, anxiety, social isolation and hopelessness. Early detection of these risk factors can help in preventing or reducing the number of suicides. Online social networking platforms like Twitter, Redditt and Facebook are becoming a new way for the people to express themselves freely without worrying about social stigma. This paper presents a methodology and experimentation using social media as a tool to analyse the suicidal ideation in a better way, thus helping in preventing the chances of being the victim of this unfortunate mental disorder. The data is collected from Twitter, one of the popular Social Networking Sites (SNS). The Tweets are then pre-processed and annotated manually. Finally, various machine learning and ensemble methods are used to automatically distinguish Suicidal and Non-Suicidal tweets. This experimental study will help the researchers to know and understand how SNS are used by the people to express their distress related feelings and emotions. The study further confirmed that it is possible to analyse and differentiate these tweets using human coding and then replicate the accuracy by machine classification. However, the power of prediction for detecting genuine suicidality is not confirmed yet, and this study does not directly communicate and intervene the people having suicidal behaviour.


Author(s):  
Miss. Pooja Dilip Dhotre

Social media websites are among the internet's most far-reaching digital sites. Billions of social network users exist Users' frequent interactions with social networking sites, like Twitter, have a widespread and sometimes unfortunate effect on day-to-day life. Social networking sites make it easy for large amounts of unwanted and unrelated information to spread around the world. Twitter is a popular micro blogging service where users connect with others with similar interests. Because of the current popularity of Twitter, it is vulnerable to public shaming. Recently, Twitter has emerged as a rich source of human-generated information, with the added benefit of connecting you with customers and enabling two-way communication. It is generally accepted that when someone posts a comment in an occurrence, it is likely to humiliate the victim. The fact that shaming users' follower counts increase faster than that of the people who don't use shame is interesting. Using machine learning algorithms, users will be able to identify disrespectful words, as well as the overall negativity of those words, which is displayed in a percentage.


2015 ◽  
Vol 30 (2) ◽  
pp. 157-170 ◽  
Author(s):  
Rizwana Irfan ◽  
Christine K. King ◽  
Daniel Grages ◽  
Sam Ewen ◽  
Samee U. Khan ◽  
...  

AbstractIn this survey, we review different text mining techniques to discover various textual patterns from the social networking sites. Social network applications create opportunities to establish interaction among people leading to mutual learning and sharing of valuable knowledge, such as chat, comments, and discussion boards. Data in social networking websites is inherently unstructured and fuzzy in nature. In everyday life conversations, people do not care about the spellings and accurate grammatical construction of a sentence that may lead to different types of ambiguities, such as lexical, syntactic, and semantic. Therefore, analyzing and extracting information patterns from such data sets are more complex. Several surveys have been conducted to analyze different methods for the information extraction. Most of the surveys emphasized on the application of different text mining techniques for unstructured data sets reside in the form of text documents, but do not specifically target the data sets in social networking website. This survey attempts to provide a thorough understanding of different text mining techniques as well as the application of these techniques in the social networking websites. This survey investigates the recent advancement in the field of text analysis and covers two basic approaches of text mining, such as classification and clustering that are widely used for the exploration of the unstructured text available on the Web.


Efficient utilization of social networking sites (SNS) had reduced communication delays, at the same time increased rumour messages. Subsequently, mischievous people started sharing of rumours via social networking sites for gaining personal benefits. This falsified information (i.e., rumour) creates misconception among the people of society influencing socio-economic losses by disrupting the routine businesses of private and government sectors. Communication of rumour information requires rigorous surveillance, before they become viral through social media platforms. Detecting these rumour words in an early stage from messaging applications needs to be predicted using robust Rumour Detection Models (RDM) and succinct tools. RDM are effectively used in detecting the rumours from social media platforms (Twitter, Linkedln, Instagram, WhatsApp, Weibo sena and others) with the help of bag of words and machine learning approaches to a limited extent. RDM fails in detecting the emerging rumours that contains linguistic words of a specific language during the chatting session. This survey compares the various RDM strategies and Tools that were proposed earlier for identifying the rumour words in social media platforms. It is found that many of earlier RDM make use of Deep learning approaches, Machine learning, Artificial Intelligence, Fuzzy logic technique, Graph theory and Data mining techniques. Finally, an improved RDM model is proposed in Figure 2, efficiency of this proposed RDM models is improved by embedding of Pre-defined rumour rules, WordNet Ontology and NLP/machine learning approach giving the precision rate of 83.33% when compared with other state-of-art systems.


2019 ◽  
Vol 8 (3) ◽  
pp. 3257-3263

Around 2.5 quintillion bytes of data have been created online: out of which most of the data has been generated in the last two years. To generate this huge amount of data from different sources, many devices are being utilized such as sensors to get the data about climate information, social networking sites, banking records, e-commerce records, etc. This data is known as Big Data. It mainly consists of three 3v’s volumes, velocity, and variety. Variety of data discusses about different formats of data originating from various data foundations. Hence, the big data variety’s issue is significant in explaining some genuine challenges. The semantic Web is utilized as an Integrator to join information from different sorts of data foundations like web services, social databases, and spreadsheets and so on and in various formats. The semantic Web is an all-encompassing type of the present web that gives simpler methods to look, reuse, join and offer the data. In this manner, it is along these lines seen as a combiner transversely over different things, information applications, and systems. This paper is an effort to uncover the nature of big data and a brief survey on the use of various semantic web-based methods and tools to add value to today’s big data. In addition, it discusses a case study on performing various machine learning functionalities on news articles and proposes a web-based framework for classification and integration of news articles big data using ontologies.


Author(s):  
Nisha P. Shetty ◽  
Balachandra Muniyal ◽  
Arshia Anand ◽  
Sushant Kumar

Sybil accounts are swelling in popular social networking sites such as Twitter, Facebook etc. owing to cheap subscription and easy access to large masses. A malicious person creates multiple fake identities to outreach and outgrow his network. People blindly trust their online connections and fall into trap set up by these fake perpetrators. Sybil nodes exploit OSN’s ready-made connectivity to spread fake news, spamming, influencing polls, recommendations and advertisements, masquerading to get critical information, launching phishing attacks etc. Such accounts are surging in wide scale and so it has become very vital to effectively detect such nodes. In this research a new classifier (combination of Sybil Guard, Twitter engagement rate and Profile statistics analyser) is developed to combat such Sybil nodes. The proposed classifier overcomes the limitations of structure based, machine learning based and behaviour-based classifiers and is proven to be more accurate and robust than the base Sybil guard algorithm.


2021 ◽  
Vol 12 ◽  
Author(s):  
Patricio E. Ramírez-Correa ◽  
F. Javier Rondán-Cataluña ◽  
Jorge Arenas-Gaitán ◽  
Elizabeth E. Grandón ◽  
Jorge L. Alfaro-Pérez ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document