scholarly journals Sequential Topic Modelling: A Case Study on Indonesian LGBT Conversation on Twitter

Author(s):  
Arsy Arslina ◽  
Muhaza Liebenlito

AbstractAs a country with the largest Muslim population in the world, the Lesbian, Gay, Bisexual, and Transgender (LGBT) issue in Indonesia has always been a hot topic to investigate. Social media such as Twitter is normally the main media where people normally discuss this LGBT topic. In this paper, we collect 18,552 tweets dated from 2015 up to 2018 to analyze the dynamics of the LGBT conversation among Indonesian peoples. In this research, we will explore the main topic of the LGBT conversation using Linear Discriminant Analysis (LDA). LDA is one of the most popular methods of soft clustering. This technique is effective to identify latent topic information (hidden) in a collection of big data using a bag of words approaches that treat every document as a vector of total words and is represented as a probability distribution on several topics. The result shows that there are seven main categories that people normally talked about regarding LGBT i.e. politics, religion, government, ethics, nationality, culture, and technology. Looking at the topic probability distributions on each semester we found that it is generally homogenous. An exception occurs during the government election period where politic tends to have a significantly higher probability. In other words, we have found that there is a tendency that LGBT issues are used in Indonesian politics.Keywords: LGBT; politics; topic modeling; twitter. AbstrakSebagai negara dengan penduduk muslim terbesar di dunia, isu mengenai Lesbian, Gay, Bisexual, dan Transgender (LGBT) di Indonesia adalah isu sensitif yang senantiasa menarik untuk diteliti. Media sosial seperti twitter adalah salah satu media yang biasa digunakan masyarakat untuk mendiskusikan tentang topik LGBT ini. Penelitian ini menggunakan 18.552 tweet tahun 2015 – 2018 dikumpulkan untuk melihat perbedaan pola perbincangan dari waktu ke waktu. Dalam penelitian ini, eksplorasi topik utama perbincangan LGBT dianalisis menggunakan metode Linear Discriminant Analysis (LDA). LDA adalah metode yang paling populer dalam soft clustering. Teknik ini efektif untuk mengidentifikasi informasi topik laten (tersembunyi) dalam koleksi dokumen besar menggunakan pendekatan bag of words yang memperlakukan setiap dokumen sebagai vektor jumlah kata dan direpresentasikan sebagai distribusi probabilitas atas beberapa topik, sementara setiap topik direpresentasikan sebagai distribusi probabilitas atas sejumlah kata. Hasil menunjukkan bahwa terdapat tujuh topik dominan yang sering muncul pada perbincangan tentang LGBT, yaitu politik, agama, pemerintahan, keasusilaan, kewarganegaraan, budaya dan teknologi. Pada kategori ini kemudian distribusi probabilitas topik dihitung dan dianalisa pada setiap semesternya. Hasilnya menunjukkan bahwa ada kecenderungan distribusi topik seragam, kecuali pada masa-masa pergantian pemerintahan dimana kategori politik cenderung meningkat secara signifikan. Dengan kata lain, ada kecenderungan bahwa isu LGBT dikaitkan dengan kehidupan perpolitikan di Indonesia.Kata kunci: LGBT, politik, topic modelling, twitter.

2020 ◽  
Vol 16 (8) ◽  
pp. 1079-1087
Author(s):  
Jorgelina Z. Heredia ◽  
Carlos A. Moldes ◽  
Raúl A. Gil ◽  
José M. Camiña

Background: The elemental composition of maize grains depends on the soil, land and environment characteristics where the crop grows. These effects are important to evaluate the availability of nutrients with complex dynamics, such as the concentration of macro and micronutrients in soils, which can vary according to different topographies. There is available scarce information about the influence of topographic characteristics (upland and lowland) where culture is developed with the mineral composition of crop products, in the present case, maize seeds. On the other hand, the study of the topographic effect on crops using multivariate analysis tools has not been reported. Objective: This paper assesses the effect of topographic conditions on plants, analyzing the mineral profiles in maize seeds obtained in two land conditions: uplands and lowlands. Materials and Methods: The mineral profile was studied by microwave plasma atomic emission spectrometry. Samples were collected from lowlands and uplands of cultivable lands of the north-east of La Pampa province, Argentina. Results: Differentiation of maize seeds collected from both topographical areas was achieved by principal components analysis (PCA), cluster analysis (CA) and linear discriminant analysis (LDA). PCA model based on mineral profile allowed to differentiate seeds from upland and lowlands by the influence of Cr and Mg variables. A significant accumulation of Cr and Mg in seeds from lowlands was observed. Cluster analysis confirmed such grouping but also, linear discriminant analysis achieved a correct classification of both the crops, showing the effect of topography on elemental profile. Conclusions: Multi-elemental analysis combined with chemometric tools proved useful to assess the effect of topographic characteristics on crops.


2020 ◽  
Vol 15 ◽  
Author(s):  
Mohanad Mohammed ◽  
Henry Mwambi ◽  
Bernard Omolo

Background: Colorectal cancer (CRC) is the third most common cancer among women and men in the USA, and recent studies have shown an increasing incidence in less developed regions, including Sub-Saharan Africa (SSA). We developed a hybrid (DNA mutation and RNA expression) signature and assessed its predictive properties for the mutation status and survival of CRC patients. Methods: Publicly-available microarray and RNASeq data from 54 matched formalin-fixed paraffin-embedded (FFPE) samples from the Affymetrix GeneChip and RNASeq platforms, were used to obtain differentially expressed genes between mutant and wild-type samples. We applied the support-vector machines, artificial neural networks, random forests, k-nearest neighbor, naïve Bayes, negative binomial linear discriminant analysis, and the Poisson linear discriminant analysis algorithms for classification. Cox proportional hazards model was used for survival analysis. Results: Compared to the genelist from each of the individual platforms, the hybrid genelist had the highest accuracy, sensitivity, specificity, and AUC for mutation status, across all the classifiers and is prognostic for survival in patients with CRC. NBLDA method was the best performer on the RNASeq data while the SVM method was the most suitable classifier for CRC across the two data types. Nine genes were found to be predictive of survival. Conclusion: This signature could be useful in clinical practice, especially for colorectal cancer diagnosis and therapy. Future studies should determine the effectiveness of integration in cancer survival analysis and the application on unbalanced data, where the classes are of different sizes, as well as on data with multiple classes.


Sign in / Sign up

Export Citation Format

Share Document