language detection
Recently Published Documents


TOTAL DOCUMENTS

180
(FIVE YEARS 127)

H-INDEX

9
(FIVE YEARS 4)

2022 ◽  
Vol 16 (4) ◽  
pp. 1-30
Author(s):  
Muhammad Abulaish ◽  
Mohd Fazil ◽  
Mohammed J. Zaki

Domain-specific keyword extraction is a vital task in the field of text mining. There are various research tasks, such as spam e-mail classification, abusive language detection, sentiment analysis, and emotion mining, where a set of domain-specific keywords (aka lexicon) is highly effective. Existing works for keyword extraction list all keywords rather than domain-specific keywords from a document corpus. Moreover, most of the existing approaches perform well on formal document corpuses but fail on noisy and informal user-generated content in online social media. In this article, we present a hybrid approach by jointly modeling the local and global contextual semantics of words, utilizing the strength of distributional word representation and contrasting-domain corpus for domain-specific keyword extraction. Starting with a seed set of a few domain-specific keywords, we model the text corpus as a weighted word-graph. In this graph, the initial weight of a node (word) represents its semantic association with the target domain calculated as a linear combination of three semantic association metrics, and the weight of an edge connecting a pair of nodes represents the co-occurrence count of the respective words. Thereafter, a modified PageRank method is applied to the word-graph to identify the most relevant words for expanding the initial set of domain-specific keywords. We evaluate our method over both formal and informal text corpuses (comprising six datasets), and show that it performs significantly better in comparison to state-of-the-art methods. Furthermore, we generalize our approach to handle the language-agnostic case, and show that it outperforms existing language-agnostic approaches.


2021 ◽  
Author(s):  
Fachrul Kurniawan ◽  
Badruddin ◽  
Aji Prasetya Wibawa

Abstract By identifying a text's polarity, sentiment analysis is a technique for extracting information from a person's attitude about an issue or occurrence. The grouping is made to discuss whether the reader is positive or negative. The drop duplication procedure creates 4339 from the preceding 10997, and the result language detection is 31 languages, thanks to the pre-processing stage. Although the data comes from the world's largest Muslim country, the problem is not limited to it, as evidenced by the use of text mining tools to identify languages.


Author(s):  
Sarthak Sharma

Abstract: Sign language is one of the oldest and most natural form of language for communication, but since most people do not know sign language and interpreters are very difficult to come by we have come up with a real time method using neural networks for fingerspelling based American sign language. In our method, the hand is first passed through a filter and after the filter is applied the hand is passed through a classifier which predicts the class of the hand gestures.


Author(s):  
Vildan Mercan ◽  
Akhtar Jamil ◽  
Alaa Ali Hameed ◽  
Irfan Ahmed Magsi ◽  
Sibghatullah Bazai ◽  
...  

2021 ◽  
Vol 7 ◽  
pp. e742
Author(s):  
Noman Ashraf ◽  
Arkaitz Zubiaga ◽  
Alexander Gelbukh

Nowadays, social media experience an increase in hostility, which leads to many people suffering from online abusive behavior and harassment. We introduce a new publicly available annotated dataset for abusive language detection in short texts. The dataset includes comments from YouTube, along with contextual information: replies, video, video title, and the original description. The comments in the dataset are labeled as abusive or not and are classified by topic: politics, religion, and other. In particular, we discuss our refined annotation guidelines for such classification. We report a number of strong baselines on this dataset for the tasks of abusive language detection and topic classification, using a number of classifiers and text representations. We show that taking into account the conversational context, namely, replies, greatly improves the classification results as compared with using only linguistic features of the comments. We also study how the classification accuracy depends on the topic of the comment.


2021 ◽  
Author(s):  
Matan Halevy ◽  
Camille Harris ◽  
Amy Bruckman ◽  
Diyi Yang ◽  
Ayanna Howard

Electronics ◽  
2021 ◽  
Vol 10 (19) ◽  
pp. 2367
Author(s):  
Noyon Dey ◽  
Md. Sazzadur Rahman ◽  
Motahara Sabah Mredula ◽  
A. S. M. Sanwar Hosen ◽  
In-Ho Ra

In modern times, ensuring social security has become the prime concern for security administrators. The widespread and recurrent use of social media sites is creating a huge risk for the lives of the general people, as these sites are frequently becoming potential sources of the organization of various types of immoral events. For protecting society from these dangers, a prior detection system which can effectively detect events by analyzing these social media data is essential. However, automating the process of event detection has been difficult, as existing processes must account for diverse writing styles, languages, dialects, post lengths, and et cetera. To overcome these difficulties, we developed an effective model for detecting events, which, for our purposes, were classified as either protesting, celebrating, religious, or neutral, using Bengali and Banglish Facebook posts. At first, the collected posts’ text were processed for language detection, and then, detected posts were pre-processed using stopwords removal and tokenization. Features were then extracted from these pre-processed texts using three sub-processes: filtering, phrase matching of specific events, and sentiment analysis. The collected features were ultimately used to train our Bernoulli Naive Bayes classification model, which was capable of detecting events with 90.41% accuracy (for Bengali-language posts) and 70% (for the Banglish-form posts). For evaluating the effectiveness of our proposed model more precisely, we compared it with two other classifiers: Support Vector Machine and Decision Tree.


Sign in / Sign up

Export Citation Format

Share Document