Social-Child-Case Document Clustering based on Topic Modeling using Latent Dirichlet Allocation

Nur Annisa Tresnasari; Teguh Bharata Adji; Adhistya Erna Permanasari

doi:10.22146/ijccs.54507

Social-Child-Case Document Clustering based on Topic Modeling using Latent Dirichlet Allocation

IJCCS (Indonesian Journal of Computing and Cybernetics Systems) ◽

10.22146/ijccs.54507 ◽

2020 ◽

Vol 14 (2) ◽

pp. 179

Author(s):

Nur Annisa Tresnasari ◽

Teguh Bharata Adji ◽

Adhistya Erna Permanasari

Keyword(s):

Social Workers ◽

Topic Modeling ◽

Latent Dirichlet Allocation ◽

The Past ◽

Coherence Score ◽

The Social ◽

Case Type ◽

The Right ◽

All Treatment ◽

Dirichlet Allocation

Children are the future of the nation. All treatment and learning they get would affect their future. Nowadays, there are various kinds of social problems related to children. To ensure the right solution to their problem, social workers usually refer to the social-child-case (SCC) documents to find similar cases in the past and adapting the solution of the cases. Nevertheless, to read a bunch of documents to find similar cases is a tedious task and needs much time. Hence, this work aims to categorize those documents into several groups according to the case type. We use topic modeling with Latent Dirichlet Allocation (LDA) approach to extract topics from the documents and classify them based on their similarities. The Coherence Score and Perplexity graph are used in determining the best model. The result obtains a model with 5 topics that match the targeted case types. The result supports the process of reusing knowledge about SCC handling that ease the finding of documents with similar cases

Download Full-text

Innovation in an Emerging Market: A Bibliometric and Latent Dirichlet Allocation Based Topic Modeling Study

2020 International Conference on Decision Aid Sciences and Application (DASA) ◽

10.1109/dasa51403.2020.9317278 ◽

2020 ◽

Author(s):

Mohd Faiz Hilmi ◽

Yanti Mustapha ◽

Mohammad Tasyriq Che Omar

Keyword(s):

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Emerging Market ◽

Modeling Study ◽

Dirichlet Allocation

Download Full-text

The Perils of Vanguardism

Socio-Economic Review ◽

10.1093/ser/mwx046 ◽

2017 ◽

Vol 17 (4) ◽

pp. 947-968 ◽

Cited By ~ 1

Author(s):

Joshua C Gordon

Keyword(s):

Causal Explanation ◽

Unemployment Benefit ◽

Right Wing ◽

Social Democrats ◽

Asylum Policy ◽

The Past ◽

The Social ◽

The Rich ◽

The Right ◽

Unexpected Outcome

AbstractOver the past 25 years, Sweden has gone from having one of the most generous unemployment benefit systems among the rich democracies to one of the least. This article advances a multi-causal explanation for this unexpected outcome. It shows how the benefit system became a target of successive right-wing governments due to its role in fostering social democratic hegemony. Employer groups, radicalized by the turbulent 1970s more profoundly than elsewhere, sought to undermine the system, and their abandonment of corporatism in the early 1990s limited unions’ capacity to restrain right-wing governments in retrenchment initiatives. Two further developments help to explain the surprising political resilience of the cuts: the emergence of a private (supplementary) insurance regime and a realignment of working-class voters from the Social Democrats to parties of the right, especially the nativist Sweden Democrats, in the context of a liberal refugee/asylum policy.

Download Full-text

Designing the social estate of odnodvortsy of the western provinces in 1831

Samara Journal of Science ◽

10.17816/snv201762209 ◽

2017 ◽

Vol 6 (2) ◽

pp. 135-140

Author(s):

Constantin Vadimovich Troianowski

Keyword(s):

19Th Century ◽

Legal Status ◽

Social Hierarchy ◽

Social Category ◽

The Past ◽

Imperial Russia ◽

The Social ◽

The Subject ◽

The Right ◽

The 19Th Century

This article investigates the process of designing of the new social estate in imperial Russia - odnodvortsy of the western provinces. This social category was designed specifically for those petty szlachta who did not possess documents to prove their noble ancestry and status. The author analyses deliberations on the subject that took place in the Committee for the Western Provinces. The author focuses on the argument between senior imperial officials and the Grodno governor Mikhail Muraviev on the issue of registering petty szlachta in fiscal rolls. Muraviev argued against setting up a special fiscal-administrative category for petty szlachta suggesting that its members should join the already existing unprivileged categories of peasants and burgers. Because this proposal ran against the established fiscal practices, the Committee opted for creating a distinct social estate for petty szlachta. The existing social estate paradigm in Russia pre-assigned the location of the new soslovie in the imperial social hierarchy. Western odnodvortsy were to be included into a broad legal status category of the free inhabitants. Despite similarity of the name, the new estate was not modeled on the odnodvortsy of the Russian provinces because they retained from the past certain privileges (e.g. the right to possess serfs) that did not correspond to the 19th century attributes of unprivileged social estates.

Download Full-text

Spam Diffusion in Social Networking Media using Latent Dirichlet Allocation

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.i7898.1081219 ◽

2019 ◽

Vol 8 (12) ◽

pp. 881-885

Keyword(s):

Online Social Networks ◽

Topic Modeling ◽

Information Diffusion ◽

Latent Dirichlet Allocation ◽

Good Accuracy ◽

Ground Truth ◽

Online Social Media ◽

Diffusion Dynamics ◽

Dirichlet Allocation

Like web spam has been a major threat to almost every aspect of the current World Wide Web, similarly social spam especially in information diffusion has led a serious threat to the utilities of online social media. To combat this challenge the significance and impact of such entities and content should be analyzed critically. In order to address this issue, this work usedTwitter as a case study and modeled the contents of information through topic modeling and coupled it with the user oriented feature to deal it with a good accuracy. Latent Dirichlet Allocation (LDA) a widely used topic modeling technique is applied to capture the latent topics from the tweets’ documents. The major contribution of this work is twofold: constructing the dataset which serves as the ground-truth for analyzing the diffusion dynamics of spam/non-spam information and analyzing the effects of topics over the diffusibility. Exhaustive experiments clearly reveal the variation in topics shared by the spam and nonspam tweets. The rise in popularity of online social networks, not only attracts legitimate users but also the spammers. Legitimate users use the services of OSNs for a good purpose i.e., maintaining the relations with friends/colleagues, sharing the information of interest, increasing the reach of their business through advertisings

Download Full-text

Around ethical dilemmas in the work of a family assistant

Problemy Opiekuńczo-Wychowawcze ◽

10.5604/01.3001.0014.7825 ◽

2021 ◽

Vol 597 (2) ◽

pp. 8-17

Author(s):

Małgorzata Ciczkowska-Giedziun

Keyword(s):

Social Workers ◽

Ethical Dilemmas ◽

Social Assistance ◽

Self Determination ◽

Assistance Programs ◽

Helping Relationship ◽

The Social ◽

The Family ◽

The Right ◽

Voluntary Cooperation

The purpose of the article is to describe selected ethical dilemmas in the work of a family assistant, based on the typology of ethical dilemmas of Frederic Reamer. In accordance with the typology adopted in the article, in the area of cooperation with families, ethical dilemmas regarding direct work with families, implementation of social assistance programs and relationship between representatives of the profession arise. The information presented in the text is based on publications, studies and reports on family assistantship. The first group of ethical dilemmas is revealed when constructing supportive and helping relationship between assistants and families. It refers to such areas as: voluntary cooperation, limits of cooperation, the right to self-determination or limits of responsibility. The second group of ethical dilemmas is related to the planning and implementation of various solutions in the field of social policy and also support and assistance programs offered to the family. The last group of ethical dilemmas results from a different understanding of family assistantship in the structures of the social assistance system. They are also revealed in the construction of relationships with social workers. The text also includes solutions how to cope with these dilemmas.

Download Full-text

Urban Crisis Detection Technique: A Spatial and Data Driven Approach Based on Latent Dirichlet Allocation (LDA) Topic Modeling

Construction Research Congress 2018 ◽

10.1061/9780784481271.025 ◽

2018 ◽

Cited By ~ 6

Author(s):

Yan Wang ◽

John E. Taylor

Keyword(s):

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Detection Technique ◽

Data Driven ◽

Urban Crisis ◽

Data Driven Approach ◽

Dirichlet Allocation

Download Full-text

Analysis of the Trends in Biochemical Research Using Latent Dirichlet Allocation (LDA)

Processes ◽

10.3390/pr7060379 ◽

2019 ◽

Vol 7 (6) ◽

pp. 379 ◽

Cited By ~ 5

Author(s):

Kang ◽

Kim ◽

Kang

Keyword(s):

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Research Topics ◽

Helpful Tool ◽

Journal Editors ◽

The Past ◽

Mining Technique ◽

Modeling Analysis ◽

Funding Agencies ◽

Biochemical Research

Biochemistry has been broadly defined as “chemistry of molecules included or related to living systems”, but is becoming increasingly hard to be distinguished from other related fields. Targets of its studies evolve rapidly; some newly emerge, disappear, combine, or resurface themselves with a fresh viewpoint. Methodologies for biochemistry have been extremely diversified, thanks particularly to those adopted from molecular biology, synthetic chemistry, and biophysics. Therefore, this paper adopts topic modeling, a text mining technique, to identify the research topics in the field of biochemistry over the past twenty years and quantitatively analyze the changes in its trends. The results of the topic modeling analysis obtained through this study will provide a helpful tool for researchers, journal editors, publishers, and funding agencies to understand the connections among the diverse sub-fields in biochemical research and even see how the research topics branch out and integrate with other fields.

Download Full-text

Topic modeling for expert finding using latent Dirichlet allocation

Wiley Interdisciplinary Reviews Data Mining and Knowledge Discovery ◽

10.1002/widm.1102 ◽

2013 ◽

Vol 3 (5) ◽

pp. 346-353 ◽

Cited By ~ 11

Author(s):

Saeedeh Momtazi ◽

Felix Naumann

Keyword(s):

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Expert Finding ◽

Dirichlet Allocation

Download Full-text

Promoting Young People’s Participation: Exploring Social Work’s Contribution to the Literature

Social Work ◽

10.1093/sw/sww018 ◽

2016 ◽

Vol 61 (3) ◽

pp. 217-226 ◽

Cited By ~ 6

Author(s):

Suzanne Pritzker ◽

Katie Richards-Schuster

Keyword(s):

Social Work ◽

Civic Engagement ◽

Social Workers ◽

Code Of Ethics ◽

Exploratory Research ◽

Youth Civic Engagement ◽

The Past ◽

Current State ◽

Meaningful Involvement ◽

The Social

Abstract In the National Association of Social Workers’ Code of Ethics, social workers are called on to promote meaningful involvement in decision making among vulnerable populations. The ethical imperatives and social justice implications associated with unequal participation suggest that the field of social work is uniquely situated to lead research and practice in the area of youth civic engagement. This article examines the current state of the social work literature regarding how young people participate civically. Authors identified 113 articles on this topic published over the past decade in journals with a large presence in social work or by social work authors. They present the findings of their exploratory research, with a focus on describing where this research is being published, the range of research foci, and the terms used to describe this work. Increased attention to promoting youth civic engagement is needed in the profession’s core journals. Based on the analysis of this literature, they recommend moving toward a cohesive body of social work scholarship that includes increased collaboration among scholars, more unified terms and language, increased range of research foci and methodologies, and more rigorous and comparative testing of strategies by which youths participate civically.

Download Full-text

Authorship verification

10.12681/eadd/45382 ◽

2019 ◽

Author(s):

Νεκταρία Πόθα

Keyword(s):

Cyber Security ◽

Topic Modeling ◽

Latent Dirichlet Allocation ◽

State Of The Art ◽

Latent Semantic Indexing ◽

Semantic Indexing ◽

Authorship Verification ◽

Latent Topics ◽

Authorship Analysis ◽

Dirichlet Allocation

Η περιοχή της ανάλυσης συγγραφέα (Authorship Analysis) αποσκοπεί στην άντληση πληροφοριών σχετικά με τους συγγραφείς ψηφιακών κειμένων. Συνδέεται άμεσα με πολλές εφαρμογές καθώς είναι εφικτό να χρησιμοποιηθεί για την ανάλυση οποιουδήποτε είδους(genre) κειμένων: λογοτεχνικών έργων, άρθρων εφημερίδων, αναρτήσεις σε κοινωνικά δίκτυα κλπ. Οι περιοχές εφαρμογών της τεχνολογίας αυτής διακρίνονται σε φιλολογικές (humanities),(π.χ. ποιος είναι ο συγγραφέας ενός λογοτεχνικού έργου που εκδόθηκε ανώνυμα, ποιος είναι ο συγγραφέας έργων που έχουν εκδοθεί με ψευδώνυμο, επαλήθευση της πατρότητας λογοτεχνικών έργων γνωστών συγγραφέων κτλ.), εγκληματολογικές (forensics) (π.χ. εύρεση υφολογικών ομοιοτήτων μεταξύ προκηρύξεων τρομοκρατικών ομάδων, διερεύνηση αυθεντικότητας σημειώματος αυτοκτονίας, αποκάλυψη πολλαπλών λογαριασμών χρήστη σε κοινωνικά δίκτυα που αντιστοιχούν στο ίδιο άτομο κτλ.) και στον τομέα της ασφάλειας του κυβερνοχώρου (cyber-security) (π.χ. εύρεση υφολογικών ομοιοτήτων μεταξύ χρηστών πολλαπλών ψευδωνύμων).Θεμελιώδες ερευνητικό πεδίο της ανάλυσης συγγραφέα αποτελεί η επαλήθευση συγγραφέα (author verification), όπου δεδομένου ενός συνόλου κειμένων (σε ηλεκτρονική μορφή) από τον ίδιο συγγραφέα (υποψήφιος συγγραφέας) καλούμαστε να αποφασίσουμε αν ένα άλλο κείμενο (άγνωστης ή αμφισβητούμενης συγγραφικής προέλευσης) έχει γραφτεί από τον συγγραφέα αυτόν ή όχι. Η επαλήθευση συγγραφέα έχει αποκτήσει ιδιαίτερο ενδιαφέρον τα τελευταία χρόνια κυρίως λόγω των πειραματικών αξιολογήσεων PAN@CLEF. Συγκεκριμένα, από το 2013 εως το 2015 οι διαγωνισμοί PAN είχαν εστιάσει στο πεδίο της επαλήθευσης συγγραφέα παρέχοντας ένα καλά οργανωμένο σύνολο δεδομένων (PAN corpora) και συγκεντρώνοντας πλήθος μεθόδων για τον σκοπό αυτό. Ωστόσο, το περιθώριο λάθους είναι αρκετά μεγάλο εφόσον η επίδοση των μεθόδων εξαρτάται από πολλαπλούς παράγοντες όπως το μήκος των κειμένων, η θεματική συνάφεια μεταξύ των κειμένων και η υφολογική συνάφεια μεταξύ των κειμένων. Η πιο απαιτητική περίπτωση προκύπτει όταν τα κείμενα γνωστού συγγραφέα ανήκουν σε ένα είδος (π.χ. blogs ή μηνύματα email) ενώ το προς διερεύνηση κείμενο ανήκει σε άλλο είδος (π.χ., tweet ή άρθρο εφημερίδας). Επιπλέον, αν τα κείμενα του γνωστού συγγραφέα με το προς διερεύνηση κείμενο δεν συμφωνούν ως προς τη θεματική περιοχή (topic) (π.χ. τα γνωστά κείμενα σχετίζονται με εξωτερική πολιτική και το άγνωστο με πολιτιστικά θέματα) η επίδοση των τρεχόντων μεθόδων επαλήθευσης συγγραφέα είναι ιδιαίτερα χαμηλή. Στόχος της παρούσας διδακτορικής διατριβής είναι η ανάπτυξη αποδοτικών και εύρωστων μεθόδων επαλήθευσης συγγραφέα που είναι ικανές να χειριστούν ακόμα και τέτοιες περίπλοκες περιπτώσεις. Προς την κατεύθυνση αυτή, παρουσιάζουμε βελτιωμένες μεθόδους επαλήθευσης συγγραφέα και συστηματικά εξετάζουμε την αποδοτικότητα τους σε διάφορα σύνολα δεδομένων αναφοράς (PAN datasets και Enron Data). Αρχικά, προτείνουμε δύο βελτιωμένους αλγόριθμους, ο ένας ακολουθεί το παράδειγμα όπου όλα τα διαθέσιμα δείγματα γραφής του υποψηφίου συγγραφέα αντιμετωπίζονται μεμονωμένα, ως ξεχωριστές αναπαραστάσεις (instance-based paradigm) και ο άλλος είναι βασισμένος στο παράδειγμα όπου όλα τα δείγματα γραφής του υποψηφίου συγγραφέα συννενώνονται και εξάγεται ένα ενιαίο κείμενο, μία μοναδική αναπαράσταση (profile-based paradigm), οι οποίες επιτυγχανουν υψηλότερη απόδοση σε σύνολα δεδομένων που καλύπτουν ποικιλία γλωσσώνν (Αγγλικά, Ελληνικά, Ισπανικά, Ολλανδικά) και κειμενικών ειδών (άρθρα, κριτικές, νουβέλες, κ.ά.) σε σύγκριση με την τεχνολογία αιχμής (state-of-the-art) στον τομέα της επαλήθευσης. Είναι σημαντικό να τονίσουμε ότι οι προτεινόμενες μέθοδοι επωφελούνται σημαντικά από τη διαθεσιμότητα πολλαπλών δειγμάτων κειμένων του υποψηφίου συγγραφέα και παραμένουν ιδιαίτερα ανθεκτικές/ανταγωνιστικές όταν το μήκος των κειμένων είναι περιορισμένο. Επιπλέον, διερευνούμε τη χρησιμότητα της εφαρμογής μοντελοποίησης θέματος (topic modeling) στην επαλήθευση συγγραφέα. Συγκεκριμένα, διεξάγουμε μια συστηματική μελέτη για να εξετάσουμε εάν οι τεχνικές μοντελοποίησης θέματος επιτυγχάνουν την βελτίωση της απόδοσης των πιο βασικών κατηγοριών μεθόδων επαλήθευσης καθώς και ποια συγκεκριμένη τεχνική μοντελοποίησης θέματος είναι η πλέον κατάλληλη για κάθε ένα από τα παραδείγματα μεθόδων επαλήθευσης. Για το σκοπό αυτό, συνδυάζουμε γνωστές μεθόδους μοντελοποίσης, Latent Semantic Indexing (LSI) και Latent Dirichlet Allocation, (LDA), με διάφορες μεθόδους επαλήθευσης συγγραφέα, οι οποίες καλύπτουν τις βασικές κατηγορίες στην περιοχή αυτή, δηλαδή την ενδογενή(intrinsic), που αντιμετωπίζει το πρόβλημα επαλήθευσης ως πρόβλημα μίας κλάσης, και την εξωγενή (extrinsic), που μετατρέπει το πρόβλημα επαλήθευσης σε πρόβλημα δύο κλάσεων, σε συνδυασμό με τις profile-based και instance-based προσεγγίσεις.Χρησιμοποιώντας πολλαπλά σύνολα δεδομένων αξιολόγησης επιδεικνύουμε ότι η LDA τεχνική συνδυάζεται καλύτερα με τις εξωγενείς μεθόδους ενώ η τεχνική LSI αποδίδει καλύτερα με την πιο αποδοτικής ενδογενή μέθοδο. Επιπλέον, οι τεχνικές μοντελοποίησης θέματος φαίνεται να είναι πιο αποτελεσματικές όταν εφαρμόζονται σε μεθόδους που ακολουθούν το profile-based παράδειγμα και η αποδοτικότητα τους ενισχύεται όταν η πληροφορία των latent topics εξάγεται από ένα ενισχυμένο σύνολο κειμένων (εμπλουτισμένο με επιπλέον κείμενα τα οποία έχουν συλλεχθεί από εξωτερικές πηγές (π.χ web) και παρουσιάζουν σημαντική θεματική συνάφεια με το αρχικό υπό εξέταση σύνολο δεδομένων. Η σύγκριση των αποτελεσμάτων μας με την τεχνολογία αιχμής του τομέα της επαλήθευσης, επιδεικνύει την δυναμική των προτεινόμενων μεθόδων. Επίσης, οι προτεινόμενες εξωγενείς μέθοδοι είναι ιδιαίτερα ανταγωνιστικές στην περίπτωση που χρησιμοποιηθούν αγνώστου είδους εξωγενή κείμενα. Σε ορισμένες από τις σχετικές μελέτες, υπάρχουν ενδείξεις ότι ετερογενή σύνολα(heterogeneous ensembles) μεθόδων επαλήθευσης μπορούν να παρέχουν πολύ αξιόπιστες λύσεις, καλύτερες από κάθε ατομικό μοντέλο επαλήθευσης ξεχωριστά. Ωστόσο, έχουν εξεταστεί μόνο πολύ απλά μοντέλα συνόλων έως τώρα που συνδυάζουν σχετικά λίγες βασικές μεθόδους. Προσπαθώντας να καλύψουμε το κενό αυτό, θεωρούμε ένα μεγάλο σύνολο βασικών μοντέλων επαλήθευσης (συνολικά 47 μοντέλα) που καλύπτουν τα κύρια παραδείγματα /κατηγορίες μεθόδων στην περιοχή αυτή και μελετούμε τον τρόπο με τον οποίο μπορούν να συνδυαστούν ώστε να δημιουργηθεί ένα αποτελεσματικό σύνολο. Με αυτό τον τρόπο, προτείνουμε ένα απλό σύνολο ομαδοποίησης στοίβας (stacking ensemble) καθώς και μια προσέγγιση που βασίζεται στην δυναμική επιλογή μοντέλων για καθεμία υπό εξέταση περίπτωση επαλήθευσης συγγραφέα ξεχωριστά. Τα πειραματικά αποτελέσματα σε πολλαπλά σύνολα δεδομένων επιβεβαιώνουν την καταλληλότητα των προτεινόμενων μεθόδων επιδεικνύοντας την αποτελεσματικότητα τους. Η βελτίωση της επίδοσης που επιτυγχάνουν τα καλύτερα από τα αναφερόμενα μοντέλα σε σχέση με την τρέχουσα τεχνολογία αιχμής είναι περισσότερο από 10%.

Download Full-text