Dataset Creation from Multilingual Data of Social Media: Challenges and Consequences

The real-time availability of the Internet has engaged millions of users around the world. The usage of regional languages is being preferred for effective and ease of communication that is causing multilingual data on social networks and news channels. People share ideas, opinions, and events that are happening globally i.e., sports, inflation, protest, explosion, and sexual assault, etc. in regional (local) languages on social media. Extraction and classification of events from multilingual data have become bottlenecks because of resource lacking. In this research paper, we presented the event classification task for the Urdu language text existing on social media and the news channels by using machine learning classifiers. The dataset contains more than 0.1 million (102,962) labeled instances of twelve (12) different types of events. The title, its length, and the last four words of a sentence are used as features to classify the events. The Term Frequency-Inverse Document Frequency (tf-idf) showed the best results as a feature vector to evaluate the performance of the six popular machine learning classifiers. Random Forest (RF) and K-Nearest Neighbor (KNN) are among the classifiers that out-performed among other classifiers by achieving 98.00% and 99.00% accuracy, respectively. The novelty lies in the fact that the features aforementioned are not applied, up to the best of our knowledge, in the event extraction of the text written in the Urdu language.

Download Full-text

An Efficient Cross-Lingual BERT Model for Text Classification and Named Entity Extraction in Multilingual Dataset

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit217353 ◽

2021 ◽

pp. 280-286

Author(s):

Asoke Nath ◽

Debapriya Kandar ◽

Rahul Gupta

Keyword(s):

Social Media ◽

Text Classification ◽

Model Performance ◽

The Internet ◽

Entity Extraction ◽

Named Entity ◽

Named Entity Extraction ◽

Multilingual Data ◽

The Cross ◽

Cross Lingual

In recent times, with the rise of the internet, everyone is being bombarded with tons of information and data from various sources like websites, blogs and articles, social media posts and comments, e-news portals etc. Now all these data are mostly unstructured. In this paper, the authors have tried to explore the efficiency of the cross-lingual BERT model i.e. M-BERT for text classification and named entity extraction on multilingual data. The authors have used datasets of three different languages namely: French, German and Portuguese to evaluate the model performance.

Download Full-text

Where Social Media Meets AAC

ASHA Leader ◽

10.1044/leader.gs.20072015.np ◽

2015 ◽

Vol 20 (7) ◽

Author(s):

Vicki Clarke

Keyword(s):

Social Media

Download Full-text

10.1017/cbo9781316182796 ◽

2015 ◽

Author(s):

Alan Davidson

Keyword(s):

Social Media ◽

Electronic Commerce ◽

Commerce Law

Download Full-text

Auswirkungen der Corona-Pandemie auf die Maßnahmen zur Suchtprävention der Bundeszentrale für gesundheitliche Aufklärung (BZgA)

SUCHT - Zeitschrift für Wissenschaft und Praxis / Journal of Addiction Research and Practice ◽

10.1024/0939-5911/a000677 ◽

2020 ◽

Vol 66 (5) ◽

pp. 259-264

Author(s):

Michaela Goecke

Keyword(s):

Public Health ◽

Social Media ◽

Junge Erwachsene ◽

Bundeszentrale Für Gesundheitliche Aufklärung

Zusammenfassung. Abstract: Hintergrund: Die Bundeszentrale für gesundheitliche Aufklärung (BZgA) ist als Fachbehörde unter anderem für die Umsetzung nationaler Programme zur Suchtprävention zuständig. Die jährlichen Arbeitsprogramme werden mit dem Bundesministerium für Gesundheit abgestimmt und sehen aktuell vor dem Hintergrund der Public-Health-Relevanz Schwerpunkte in der Prävention der legalen Substanzen Tabak und Alkohol vor. Vorrangige Zielgruppen sind Jugendliche und junge Erwachsene, da sich bei ihnen riskante Konsummuster entwickeln und festigen können. Die Präventionsprogramme der BZgA umfassen schulische Angebote, Webportale, Social Media und Printmedien wie Informationsbroschüren. Aktuelle Situation: Die Corona-Pandemie hat Einfluss genommen auf die Suchtprävention der BZgA. Zu nennen ist die thematische Verzahnung im Kontext von Corona und ein veränderter inhaltlicher Beratungsbedarf – telefonisch und online. Auch die durch die Corona-Pandemie bedingten Kontaktbeschränkungen während des „Lockdowns“ sowie die neuen Rahmenbedingungen für ein persönliches Miteinander haben die Suchtprävention verändert. Interaktive Präventionsangebote in Schulen wurden ebenso wie die Unterstützung von Mitmachaktionen in Sportvereinen oder die Durchführung von Peer-Programmen ausgesetzt. Dafür rückte die Nutzung digitaler Möglichkeiten sowohl bei der Umsetzung von suchtpräventiven Angeboten als auch in der Kooperation und Vernetzung mit den Ländern in einen neuen Fokus. Die Corona-Krise kann perspektivisch auch eine Chance für mehr Digitalisierung in der Suchtprävention werden.

Download Full-text