scholarly journals Eliminasi Non-Topic Menggunakan Pemodelan Topik untuk Peringkasan Otomatis Data Tweet dengan Konteks Covid-19

2021 ◽  
Vol 8 (1) ◽  
pp. 199
Author(s):  
Putri Damayanti ◽  
Diana Purwitasari ◽  
Nanik Suciati

<p>Akun <em>twitter</em>, seperti Suara Surabaya, dapat membantu menyebarkan informasi tentang COVID-19 meskipun ada bahasan lainnya seperti kecelakaan, kemacetan atau topik lain. Peringkasan teks dapat diimplementasikan pada kasus pembacaan data <em>twitter</em> karena banyaknya jumlah <em>tweet</em> yang tersedia, sehingga akan mempermudah dalam memperoleh informasi penting terkini terkait COVID-19. Jumlah variasi bahasan pada teks <em>tweet</em> mengakibatkan hasil ringkasan yang kurang baik. Oleh karena itu dibutuhkan adanya eliminasi <em>tweet</em> yang tidak berkaitan dengan konteks sebelum dilakukan peringkasan. Kontribusi penelitian ini adalah adanya metode pemodelan topik sebagai bagian tahapan dalam serangkaian proses eliminasi data. Metode pemodelan topik sebagai salah satu teknik eliminasi data dapat digunakan dalam berbagai kasus namun pada penelitian ini difokuskan pada COVID-19. Tujuannya adalah untuk mempermudah masyarakat memperoleh informasi terkini secara ringkas. Tahapan yang dilakukan adalah pra-pemrosesan, eliminasi data menggunakan pemodelan topik dan peringkasan otomatis. Penelitian ini menggunakan kombinasi beberapa metode word embedding, pemodelan topik dan peringkasan otomatis sebagai pembanding. Ringkasan diuji menggunakan metode ROUGE dari setiap kombinasi untuk ditemukan kombinasi terbaik dari penelitian ini. Hasil pengujian menunjukkan kombinasi metode Word2Vec, LSI dan TextRank memiliki nilai ROUGE terbaik yaitu 0.67. Sedangkan kombinasi metode TFIDF, LDA dan Okapi BM25 memiliki nilai ROUGE terendah yaitu 0.35.</p><p> </p><p><em><strong>Abstract</strong></em></p><p><em>Twitter accounts, such as Suara Surabaya, can help spread information about COVID-19 even though there are other topics such as accidents, traffic jams or other topics. Text summarization can be implemented in the case of reading Twitter data because of the large number of tweets available, making it easier to obtain the latest important information related to COVID-19. The number of discussion variations in the tweet text results in poor summary results. Therefore, it is necessary to eliminate tweets that are not related to the context before summarization is carried out. The contribution to this research is the topic modeling method as part of a series of data elimination processes. The topic modeling method as a data elimination technique can be used in various cases, but this research focuses on COVID-19. The aim is to make it easier for the public to obtain current information in a concise manner. The steps taken in this study were pre-processing, data elimination using topic modeling and automatic summarization. This study uses a combination of several word embedding methods, topic modeling and automatic summarization as a comparison. The summary is tested using the ROUGE method of each combination to find the best combination of this study. The test results show that the combination of Word2Vec, LSI and TextRank methods has the best ROUGE value, 0.67. While the combination of TFIDF, LDA and Okapi BM25 methods has the lowest ROUGE value, 0.35.</em></p><p><em><strong><br /></strong></em></p>

2020 ◽  
Author(s):  
Caitlin Doogan ◽  
Wray Buntine ◽  
Henry Linger ◽  
Samantha Brunt

BACKGROUND Nonpharmaceutical interventions (NPIs) (such as wearing masks and social distancing) have been implemented by governments around the world to slow the spread of COVID-19. To promote public adherence to these regimes, governments need to understand the public perceptions and attitudes toward NPI regimes and the factors that influence them. Twitter data offer a means to capture these insights. OBJECTIVE The objective of this study is to identify tweets about COVID-19 NPIs in six countries and compare the trends in public perceptions and attitudes toward NPIs across these countries. The aim is to identify factors that influenced public perceptions and attitudes about NPI regimes during the early phases of the COVID-19 pandemic. METHODS We analyzed 777,869 English language tweets about COVID-19 NPIs in six countries (Australia, Canada, New Zealand, Ireland, the United Kingdom, and the United States). The relationship between tweet frequencies and case numbers was assessed using a Pearson correlation analysis. Topic modeling was used to isolate tweets about NPIs. A comparative analysis of NPIs between countries was conducted. RESULTS The proportion of NPI-related topics, relative to all topics, varied between countries. The New Zealand data set displayed the greatest attention to NPIs, and the US data set showed the lowest. The relationship between tweet frequencies and case numbers was statistically significant only for Australia (<i>r</i>=0.837, <i>P</i>&lt;.001) and New Zealand (<i>r</i>=0.747, <i>P</i>&lt;.001). Topic modeling produced 131 topics related to one of 22 NPIs, grouped into seven NPI categories: Personal Protection (n=15), Social Distancing (n=9), Testing and Tracing (n=10), Gathering Restrictions (n=18), Lockdown (n=42), Travel Restrictions (n=14), and Workplace Closures (n=23). While less restrictive NPIs gained widespread support, more restrictive NPIs were perceived differently across countries. Four characteristics of these regimes were seen to influence public adherence to NPIs: timeliness of implementation, NPI campaign strategies, inconsistent information, and enforcement strategies. CONCLUSIONS Twitter offers a means to obtain timely feedback about the public response to COVID-19 NPI regimes. Insights gained from this analysis can support government decision making, implementation, and communication strategies about NPI regimes, as well as encourage further discussion about the management of NPI programs for global health events, such as the COVID-19 pandemic.


2020 ◽  
Author(s):  
Canruo Zou ◽  
Xueting Wang ◽  
Zidian Xie ◽  
Dongmei Li

Background: The coronavirus disease 2019 (COVID-19) has spread globally since December 2019. Twitter is a popular social media platform with active discussions about the COVID-19 pandemic. The public reactions on Twitter about the COVID-19 pandemic in different countries have not been studied. This study aims to compare the public reactions towards the COVID-19 pandemic between the United Kingdom and the United States from March 6, 2020 to April 2, 2020. Data: The numbers of confirmed COVID-19 cases in the United Kingdom and the United States were obtained from the 1Point3Acres website. Twitter data were collected using COVID-19 related keywords from March 6, 2020 to April 2, 2020. Methods: Temporal analyses were performed on COVID-19 related Twitter posts (tweets) during the study period to show daily trends and hourly trends. The sentiment scores of the tweets on COVID-19 were analyzed and associated with the policy announcements and the number of confirmed COVID-19 cases. Topic modeling was conducted to identify related topics discussed with COVID-19 in the United Kingdom and the United States. Results: The number of daily new confirmed COVID-19 cases in the United Kingdom was significantly lower than that in the United States during our study period. There were 3,556,442 COVID-19 tweets in the United Kingdom and 16,280,065 tweets in the United States during the study period. The number of COVID-19 tweets per 10,000 Twitter users in the United Kingdom was lower than that in the United States. The sentiment scores of COVID-19 tweets in the United Kingdom were less negative than those in the United States. The topics discussed in COVID-19 tweets in the United Kingdom were mostly about the gratitude to government and health workers, while the topics in the United States were mostly about the global COVID-19 pandemic situation. Conclusion: Our study showed correlations between the public reactions towards the COVID-19 pandemic on Twitter and the confirmed COVID-19 cases as well as the policies related to the COVID-19 pandemic in the United Kingdom and the United States.


Author(s):  
Júlia Koltai ◽  
Zoltán Kmetty ◽  
Károly Bozsonyi

AbstractThe phenomenon of suicide has been a focal point since Durkheim among social scientists. Internet and social media sites provide new ways for people to express their positive feelings, but they are also platforms to express suicide ideation or depressed thoughts. Most of these posts are not about real suicide, and some of them are a cry for help. Nevertheless, suicide- and depression-related content varies among platforms, and it is not evident how a researcher can find these materials in mass data of social media. Our paper uses the corpus of more than four million Instagram posts, related to mental health problems. After defining the initial corpus, we present two different strategies to find the relevant sociological content in the noisy environment of social media. The first approach starts with a topic modeling (Latent Dirichlet Allocation), the output of which serves as the basis of a supervised classification method based on advanced machine-learning techniques. The other strategy is built on an artificial neural network-based word embedding language model. Based on our results, the combination of topic modeling and neural network word embedding methods seems to be a promising way to find the research related content in a large digital corpus.Our research can provide added value in the detection of possible self-harm events. With the utilization of complex techniques (such as topic modeling and word embedding methods), it is possible to identify the most problematic posts and most vulnerable users.


Peace Studies ◽  
2020 ◽  
Vol 28 (1) ◽  
pp. 287-332
Author(s):  
Kayoung Kim ◽  
Kyeongpil Kang ◽  
Minji Son ◽  
Cheongah Lee ◽  
Shinbeom Hong ◽  
...  

Author(s):  
Hari Wahyudi

This study aimed to investigate the influence of accounting information systems and technology to service performance information on the public sector. Samples in this study were RS. M. Djamil in Padang, PLN, PDAM in Padang and taken at random (purposive sampling). Of the 122 questionnaires had been distributed only 85 questionnaires could be processed. Test Equipment used to test the validity of this study is the test, Test Reliability, Test for multicollinearity, coefficient Determination Test, and the t test, results of this study are: (a) The first hypothesis tests can be concluded that the accounting information systems has significant influence on performance in service sector public. (2) Information technology does not significantly influence the performance of services in the public sector.


2009 ◽  
Vol 20 (3) ◽  
pp. 73-77 ◽  
Author(s):  
Mark J Kearns ◽  
Sabrina S Plitt ◽  
Bonita E Lee ◽  
Joan L Robinson

BACKGROUND: There are limited recent data on rubella immunity in women of childbearing age in Canada. In the present paper, the proportion of rubella seroreactivity and redundant testing (testing of women previously seropositive when tested by the same physician) in the Alberta prenatal rubella screening program were studied.METHODS: In the present retrospective observational study, data on all specimens submitted for prenatal screening in Alberta between August 2002 and December 2005 were extracted from the Provincial Laboratory for Public Health database. The proportion of rubella screening and immunoglobulin G (IgG) seroreactivity were determined. Demographic variables were compared between rubella seroreactors and nonseroreactors. The proportion of redundant testing was determined.RESULTS: Of 159,046 prenatal specimens, 88.3% (n=140,473) were screened for rubella immunity. In total, 8.8% of specimens tested negative for rubella IgG. Younger women (23.2% of women younger than 20 years of age versus 4.7% of women between 35 and 39 years of age; P<0.001) and women from northern Alberta (11.9% versus 8.1% [overall]; P<0.001) were significantly more likely to have seronegative specimens. Of the 20,044 women who had multiple rubella immunity screenings, 88.1% (n=17,651) had multiple positive test results. In total, 20.7% of the 42,274 specimens submitted from women with multiple screenings were deemed redundant.DISCUSSION: Younger women were most likely to be seronegative for rubella. The public health significance of women entering their childbearing years with low or undetectable rubella IgG levels remains to be determined. A large number of women with documented rubella immunity were unnecessarily retested.


2021 ◽  
Vol 172 ◽  
pp. 114652
Author(s):  
Nabil Alami ◽  
Mohammed Meknassi ◽  
Noureddine En-nahnahi ◽  
Yassine El Adlouni ◽  
Ouafae Ammor

2018 ◽  
Vol 15 (4) ◽  
pp. 29-44 ◽  
Author(s):  
Yi Zhao ◽  
Chong Wang ◽  
Jian Wang ◽  
Keqing He

With the rapid growth of web services on the internet, web service discovery has become a hot topic in services computing. Faced with the heterogeneous and unstructured service descriptions, many service clustering approaches have been proposed to promote web service discovery, and many other approaches leveraged auxiliary features to enhance the classical LDA model to achieve better clustering performance. However, these extended LDA approaches still have limitations in processing data sparsity and noise words. This article proposes a novel web service clustering approach by incorporating LDA with word embedding, which leverages relevant words obtained based on word embedding to improve the performance of web service clustering. Especially, the semantically relevant words of service keywords by Word2vec were used to train the word embeddings and then incorporated into the LDA training process. Finally, experiments conducted on a real-world dataset published on ProgrammableWeb show that the authors' proposed approach can achieve better clustering performance than several classical approaches.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Milad Mirbabaie ◽  
Stefan Stieglitz ◽  
Felix Brünker

PurposeThe purpose of this study is to investigate communication on Twitter during two unpredicted crises (the Manchester bombings and the Munich shooting) and one natural disaster (Hurricane Harvey). The study contributes to understanding the dynamics of convergence behaviour archetypes during crises.Design/methodology/approachThe authors collected Twitter data and analysed approximately 7.5 million relevant cases. The communication was examined using social network analysis techniques and manual content analysis to identify convergence behaviour archetypes (CBAs). The dynamics and development of CBAs over time in crisis communication were also investigated.FindingsThe results revealed the dynamics of influential CBAs emerging in specific stages of a crisis situation. The authors derived a conceptual visualisation of convergence behaviour in social media crisis communication and introduced the terms hidden and visible network-layer to further understanding of the complexity of crisis communication.Research limitations/implicationsThe results emphasise the importance of well-prepared emergency management agencies and support the following recommendations: (1) continuous and (2) transparent communication during the crisis event as well as (3) informing the public about central information distributors from the start of the crisis are vital.Originality/valueThe study uncovered the dynamics of crisis-affected behaviour on social media during three cases. It provides a novel perspective that broadens our understanding of complex crisis communication on social media and contributes to existing knowledge of the complexity of crisis communication as well as convergence behaviour.


2021 ◽  
Author(s):  
Faizah Faizah ◽  
Bor-Shen Lin

BACKGROUND The World Health Organization (WHO) declared COVID-19 as a global pandemic on January 30, 2020. However, the pandemic has not been over yet. Furthermore, in the first quartal of 2021, some countries face the third wave of the pandemic. During the difficult time, the development of the vaccines for COVID-19 accelerates rapidly. Understanding the public perception of the COVID-19 Vaccine according to the data collected from social media can widen the perspective on the state of the global pandemic OBJECTIVE This study explores and analyzes the latent topic on COVID-19 Vaccine Tweet posted by individuals from various countries by using two-stage topic modeling. METHODS A two-stage analysis in topic modeling was proposed to investigating people’s reactions in five countries. The first stage is Latent Dirichlet Allocation that produces the latent topics with the corresponding term distributions that facilitate the investigators to understand the main issues or opinions. The second stage then performs agglomerative clustering on the latent topics based on Hellinger distance, which merges close topics hierarchically into topic clusters to visualize those topics in either tree or graph views. RESULTS In general, the topic discussion regarding the COVID-19 Vaccine in five countries is similar. Topic themes such as "first vaccine" and & "vaccine effect" dominate the public discussion. The remarkable point is that people in some countries have some topic themes, such as "politician opinion" and " stay home" in Canada, "emergency" in India, and & "blood clots" in the United Kingdom. The analysis also shows the most popular COVID-19 Vaccine, which is gaining more public interest. CONCLUSIONS With LDA and Hierarchical clustering, two-stage topic modeling is powerful for visualizing the latent topics and understanding the public perception regarding the COVID-19 Vaccine.


Sign in / Sign up

Export Citation Format

Share Document