scholarly journals Text Analyzer - An Approach for Hate Speech & Offensive Language Detection

Author(s):  
Dr. Sweeta Bansal

As we know that the social crowd is increasing day by day, so is the hatred among them online. This hatred gives rise to hate speech/comments that are passed to one another online. Recently, the hate speech has increased so much that we need a way to stop them or at least contain it to minimum. Keeping this problem in mind, we have introduced a way in which we can detect the class of comments that are posted online and stop its spread if it belongs to hateful category. We have used Natural Language Processing methods and Logistic Regression algorithm to achieve our goal.

Author(s):  
Fatemah Husain ◽  
Ozlem Uzuner

The use of offensive language in user-generated content is a serious problem that needs to be addressed with the latest technology. The field of Natural Language Processing (NLP) can support the automatic detection of offensive language. In this survey, we review previous NLP studies that cover Arabic offensive language detection. This survey investigates the state-of-the-art in offensive language detection for the Arabic language, providing a structured overview of previous approaches, including core techniques, tools, resources, methods, and main features used. This work also discusses the limitations and gaps of the previous studies. Findings from this survey emphasize the importance of investing further effort in detecting Arabic offensive language, including the development of benchmark resources and the invention of novel preprocessing and feature extraction techniques.


2017 ◽  
Vol 56 (05) ◽  
pp. 377-389 ◽  
Author(s):  
Xingyu Zhang ◽  
Joyce Kim ◽  
Rachel E. Patzer ◽  
Stephen R. Pitts ◽  
Aaron Patzer ◽  
...  

SummaryObjective: To describe and compare logistic regression and neural network modeling strategies to predict hospital admission or transfer following initial presentation to Emergency Department (ED) triage with and without the addition of natural language processing elements.Methods: Using data from the National Hospital Ambulatory Medical Care Survey (NHAMCS), a cross-sectional probability sample of United States EDs from 2012 and 2013 survey years, we developed several predictive models with the outcome being admission to the hospital or transfer vs. discharge home. We included patient characteristics immediately available after the patient has presented to the ED and undergone a triage process. We used this information to construct logistic regression (LR) and multilayer neural network models (MLNN) which included natural language processing (NLP) and principal component analysis from the patient’s reason for visit. Ten-fold cross validation was used to test the predictive capacity of each model and receiver operating curves (AUC) were then calculated for each model.Results: Of the 47,200 ED visits from 642 hospitals, 6,335 (13.42%) resulted in hospital admission (or transfer). A total of 48 principal components were extracted by NLP from the reason for visit fields, which explained 75% of the overall variance for hospitalization. In the model including only structured variables, the AUC was 0.824 (95% CI 0.818-0.830) for logistic regression and 0.823 (95% CI 0.817-0.829) for MLNN. Models including only free-text information generated AUC of 0.742 (95% CI 0.7310.753) for logistic regression and 0.753 (95% CI 0.742-0.764) for MLNN. When both structured variables and free text variables were included, the AUC reached 0.846 (95% CI 0.839-0.853) for logistic regression and 0.844 (95% CI 0.836-0.852) for MLNN.Conclusions: The predictive accuracy of hospital admission or transfer for patients who presented to ED triage overall was good, and was improved with the inclusion of free text data from a patient’s reason for visit regardless of modeling approach. Natural language processing and neural networks that incorporate patient-reported outcome free text may increase predictive accuracy for hospital admission.


Author(s):  
Uma Maheswari Sadasivam ◽  
Nitin Ganesan

Fake news is the word making more talk these days be it election, COVID 19 pandemic, or any social unrest. Many social websites have started to fact check the news or articles posted on their websites. The reason being these fake news creates confusion, chaos, misleading the community and society. In this cyber era, citizen journalism is happening more where citizens do the collection, reporting, dissemination, and analyse news or information. This means anyone can publish news on the social websites and lead to unreliable information from the readers' points of view as well. In order to make every nation or country safe place to live by holding a fair and square election, to stop spreading hatred on race, religion, caste, creed, also to have reliable information about COVID 19, and finally from any social unrest, we need to keep a tab on fake news. This chapter presents a way to detect fake news using deep learning technique and natural language processing.


2020 ◽  
Vol 12 (20) ◽  
pp. 8441
Author(s):  
Robert G. Boutilier ◽  
Kyle Bahr

Dealing with the social and political impacts of large complex projects requires monitoring and responding to concerns from an ever-evolving network of stakeholders. This paper describes the use of text analysis algorithms to identify stakeholders’ concerns across the project life cycle. The social license (SL) concept has been used to monitor the level of social acceptance of a project. That acceptance can be assessed from the texts produced by stakeholders on sources ranging from social media to personal interviews. The same texts also contain information on the substance of stakeholders’ concerns. Until recently, extracting that information necessitated manual coding by humans, which is a method that takes too long to be useful in time-sensitive projects. Using natural language processing algorithms, we designed a program that assesses the SL level and identifies stakeholders’ concerns in a few hours. To validate the program, we compared it to human coding of interview texts from a Bolivian mining project from 2009 to 2018. The program’s estimation of the annual average SL was significantly correlated with rating scale measures. The topics of concern identified by the program matched the most mentioned categories defined by human coders and identified the same temporal trends.


Author(s):  
Mitta Roja

Abstract: Cyberbullying is a major problem encountered on internet that affects teenagers and also adults. It has lead to mishappenings like suicide and depression. Regulation of content on Social media platorms has become a growing need. The following study uses data from two different forms of cyberbullying, hate speech tweets from Twittter and comments based on personal attacks from Wikipedia forums to build a model based on detection of Cyberbullying in text data using Natural Language Processing and Machine learning. Threemethods for Feature extraction and four classifiers are studied to outline the best approach. For Tweet data the model provides accuracies above 90% and for Wikipedia data it givesaccuracies above 80%. Keywords: Cyberbullying, Hate speech, Personal attacks,Machine learning, Feature extraction, Twitter, Wikipedia


2014 ◽  
Vol 21 (1) ◽  
pp. 1-2
Author(s):  
Mitkov Ruslan

The Journal of Natural Language Engineering (JNLE) has enjoyed another very successful year. Two years after being accepted into Thomson Reuters Citation Index and being indexed in many of their products (including both the Science and the Social Science editions of the Journals Citation Rankings (JCR)), the journal further established itself as a leading forum for high-quality articles covering all aspects of Natural Language Processing research, including, but not limited to, the engineering of natural language methods and applications. I am delighted to report an increased number of submissions reaching a total of 92 between January–September 2014.


Author(s):  
Dr. Kamlesh Sharma ◽  
◽  
Nidhi Garg ◽  
Arun Pandey ◽  
Daksh Yadav ◽  
...  

Plagiarism is an act of using another person’s words, idea or information without giving credit to that person and presenting them as your own. With the development of the technologies in recent years, the act of Plagiarism increases significantly. But luckily the plagiarism detection techniques are available and they are improving day by day to detect the attempts of plagiarizing the content in education. The software like Turnitin, iThenticate or Safe Assign is available in the markets that are doing a great job in this context. But the problem is not fully solved yet. These software(s) still doesn’t detect the rephrasing of statements of another writer in other words. This paper primarily focuses to detect the plagiarism in the suspicious document based on the meaning and linguistic variation of the content. The techniques used for this context is based on Natural language processing. In this Paper, we present how the semantic analysis and syntactic driven Parsing can be used to detect the plagiarism.


Sign in / Sign up

Export Citation Format

Share Document