scholarly journals Similarity Approximation of Twitter Profiles

Author(s):  
Niloufar Shoeibi ◽  
Nastaran Shoeibi ◽  
Pablo Chamoso ◽  
Zakie AlizadehSani ◽  
Juan M. Corchado

Social media platforms have been entirely an undeniable part of the lifestyle for the past decade. Analyzing the information being shared is a crucial step to understanding human behavior. Social media analysis aims to guarantee a better experience for the user and risen user satisfaction. However, first, it is necessary to know how and from which aspects to compare users. In this paper, an intelligent system has been proposed to measure the similarity of Twitter profiles. For this, firstly, the timeline of each profile has been extracted using the official TwitterAPI. Then, all information is given to the proposed system. Next, in parallel, three aspects of a profile are derived. Behavioral ratios are time-series-related information showing the consistency and habits of the user. Dynamic time warping has been utilized for the comparison of the behavioral ratios of two profiles. Next, the audience network is extracted for each user, and for estimating the similarity of two sets, Jaccard similarity is used. Finally, for the Content similarity measurement, the tweets are preprocessed respecting the feature extraction method; TF-IDF and DistilBERT for feature extraction are employed and then compared using the cosine similarity method. Results have shown that TF-IDF has slightly better performance; therefore, the more straightforward solution is selected for the model. Similarity level of different profiles. As in the case study, a Random Forest classification model was trained on almost 20000 users revealed a 97.24% accuracy. This comparison enables us to find duplicate profiles with nearly the same behavior and content.

Author(s):  
Niloufar Shoeibi ◽  
Nastaran Shoeibi ◽  
Pablo Chamoso ◽  
Zakie Alizadehsani ◽  
Juan M. Corchado

Social media platforms are entirely an undeniable part of the lifestyle from the past decade. Analyzing the information being shared is a crucial step to understand humans behavior. Social media analysis is aiming to guarantee a better experience for the user and risen user satisfaction. But first, it is necessary to know how and from which aspects to compare users with each other. In this paper, an intelligent system has been proposed to measure the similarity of Twitter profiles. For this, firstly, the timeline of each profile has been extracted using the official Twitter API. Then, all information is given to the proposed system. Next, in parallel, three aspects of a profile are derived. Behavioral ratios are time-series-related information showing the consistency and habits of the user. Dynamic time warping has been utilized for comparison of the behavioral ratios of two profiles. Next, Graph Network Analysis is used for monitoring the interactions of the user and its audience; for estimating the similarity of graphs, Jaccard similarity is used. Finally, for the Content similarity measurement, natural language processing techniques for preprocessing and TF-IDF for feature extraction are employed and then compared using the cosine similarity method. Results have presented the similarity level of different profiles. As the case study, people with the same interest show higher similarity. This way of comparison is helpful in many other areas. Also, it enables to find duplicate profiles; those are profiles with almost the same behavior and content.


Today the world is gripped with fear of the most infectious disease which was caused by a newly discovered virus namely corona and thus termed as COVID-19. This is a large group of viruses which severely affects humans. The world bears testimony to its contagious nature and rapidity of spreading the illness. 50l people got infected and 30l people died due to this pandemic all around the world. This made a wide impact for people to fear the epidemic around them. The death rate of male is more compared to female. This Pandemic news has caught the attention of the world and gained its momentum in almost all the media platforms. There was an array of creating and spreading of true as well as fake news about COVID-19 in the social media, which has become popular and a major concern to the general public who access it. Spreading such hot news in social media has become a new trend in acquiring familiarity and fan base. At the time it is undeniable that spreading of such fake news in and around creates lots of confusion and fear to the public. To stop all such rumors detection of fake news has become utmost important. To effectively detect the fake news in social media the emerging machine learning classification algorithms can be an appropriate method to frame the model. In the context of the COVID-19 pandemic, we investigated and implemented by collecting the training data and trained a machine learning model by using various machine learning algorithms to automatically detect the fake news about the Corona Virus. The machine learning algorithm used in this investigation is Naïve Bayes classifier and Random forest classification algorithm for the best results. A separate model for each classifier is created after the data preparation and feature extraction Techniques. The results obtained are compared and examined accurately to evaluate the accurate model. Our experiments on a benchmark dataset with random forest classification model showed a promising results with an overall accuracy of 94.06%. This experimental evaluation will prevent the general public to keep themselves out of their fear and to know and understand the impact of fast-spreading as well as misleading fake news.


2021 ◽  
Vol 14 ◽  
pp. 1-11
Author(s):  
Suraya Alias

In the edge where conversation merely involves online chatting and texting one another, an automated conversational agent is needed to support certain repetitive tasks such as providing FAQs, customer service and product recommendations. One of the key challenges is to identify and discover user’s intention in a social conversation where the focus of our work in the academic domain. Our unsupervised text feature extraction method for Intent Pattern Discovery is developed by applying text features constraints to the FP-Growth technique. The academic corpus was developed using a chat messages dataset where the conversation between students and academicians regarding undergraduate and postgraduate queries were extracted as text features for our model. We experimented with our new Constrained Frequent Intent Pattern (cFIP) model in contrast with the N-gram model in terms of feature-vector size reduction, descriptive intent discovery, and analysis of cFIP Rules. Our findings show significant and descriptive intent patterns was discovered with confidence rules value of 0.9 for cFIP of 3-sequence. We report an average feature-vector size reduction of 76% compared to the Bigram model using both undergraduate and postgraduate conversation datasets. The usability testing results depicted overall user satisfaction average mean score is 4.30 out of 5 in using the Academic chatbot which supported our intent discovery cFIP approach.


2020 ◽  
Author(s):  
Carolyn Lou ◽  
Pascal Sati ◽  
Martina Absinta ◽  
Kelly Clark ◽  
Jordan D. Dworkin ◽  
...  

AbstractBackground and PurposeThe presence of a paramagnetic rim around a white matter lesion has recently been shown to be a hallmark of a particular pathological type of multiple sclerosis (MS) lesion. Increased prevalence of these paramagnetic rim lesions (PRLs) is associated with a more severe disease course in MS. The identification of these lesions is time-consuming to perform manually. We present a method to automatically detect PRLs on 3T T2*-phase images.MethodsT1-weighted, T2-FLAIR, and T2*-phase MRI of the brain were collected at 3T for 19 subjects with MS. The images were then processed with lesion segmentation, lesion center detection, lesion labelling, and lesion-level radiomic feature extraction. A total of 877 lesions were identified, 118 (13%) of which contained a paramagnetic rim. We divided our data into a training set (15 patients, 673 lesions) and a testing set (4 patients, 204 lesions). We fit a random forest classification model on the training set and assessed our ability to classify lesions as PRL on the test set.ResultsThe number of PRLs per subject identified via our automated lesion labelling method was highly correlated with the gold standard count of PRLs per subject, r = 0.91 (95% CI [0.79, 0.97]). The classification algorithm using radiomic features can classify a lesion as PRL or not with an area under the curve of 0.80 (95% CI [0.67, 0.86]).ConclusionThis study develops a fully automated technique for the detection of paramagnetic rim lesions using standard T1 and FLAIR sequences and a T2*phase sequence obtained on 3T MR images.HighlightsA fully automated method for both the identification and classification of paramagnetic rim lesions is proposed.Radiomic features in conjunction with machine learning algorithms can accurately classify paramagnetic rim lesions.Challenges for classification are largely driven by heterogeneity between lesions, including equivocal rim signatures and lesion location.


Author(s):  
M. C. Girish Baabu ◽  
Padma M. C.

<span>Hyperspectral imaging (HSI) is composed of several hundred of narrow bands (NB) with high spectral correlation and is widely used in crop classification; thus induces time and space complexity, resulting in high computational overhead and Hughes phenomenon in processing these images. Dimensional reduction technique such as band selection and feature extraction plays an important part in enhancing performance of hyperspectral image classification. However, existing method are not efficient when put forth in noisy and mixed pixel environment with dynamic illumination and climatic condition. Here the proposed Sematic Feature Representation based HSI (SFR-HSI) crop classification method first employ Image Fusion (IF) method for finding meaningful features from raw HSI spectrally. Second, to extract inherent features that keeps spatially meaningful representation of different crops by eliminating shading elements. Then, the meaningful feature set are used for training using Support vector machine (SVM). Experiment outcome shows proposed HSI crop classification model achieves much better accuracies and Kappa coefficient performance. </span>


Epigenomes ◽  
2021 ◽  
Vol 5 (3) ◽  
pp. 18
Author(s):  
Kelsey Dawes ◽  
Luke Sampson ◽  
Rachel Reimer ◽  
Shelly Miller ◽  
Robert Philibert ◽  
...  

Alcohol and tobacco use are highly comorbid and exacerbate the associated morbidity and mortality of either substance alone. However, the relationship of alcohol consumption to the various forms of nicotine-containing products is not well understood. To improve this understanding, we examined the relationship of alcohol consumption to nicotine product use using self-report, cotinine, and two epigenetic biomarkers specific for smoking (cg05575921) and drinking (Alcohol T Scores (ATS)) in n = 424 subjects. Cigarette users had significantly higher ATS values than the other groups (p < 2.2 × 10−16). Using the objective biomarkers, the intensity of nicotine and alcohol consumption was correlated in both the cigarette and smokeless users (R = −0.66, p = 3.1 × 10−14; R2 = 0.61, p = 1.97 × 10−4). Building upon this idea, we used the objective nicotine biomarkers and age to build and test a Balanced Random Forest classification model for heavy alcohol consumption (ATS > 2.35). The model performed well with an AUC of 0.962, 89.3% sensitivity, and 85% specificity. We conclude that those who use non-combustible nicotine products drink significantly less than smokers, and cigarette and smokeless users drink more with heavier nicotine use. These findings further highlight the lack of informativeness of self-reported alcohol consumption and suggest given the public and private health burden of alcoholism, further research into whether using non-combustible nicotine products as a mode of treatment for dual users should be considered.


2020 ◽  
Vol 77 (9) ◽  
pp. 1564-1573
Author(s):  
J. Benjamin Stout ◽  
Mary Conner ◽  
Phaedra Budy ◽  
Peter Mackinnon ◽  
Mark McKinstry

The ability of passive integrated transponder (PIT) tag data to improve demographic parameter estimates has led to the rapid advancement of PIT tag systems. However, ghost tags create uncertainty about detected tag status (i.e., live fish or ghost tag) when using mobile interrogation systems. We developed a method to differentiate between live fish and ghost tags using a random forest classification model with a novel data input structure based on known fate PIT tag detections in the San Juan River (New Mexico, Colorado, and Utah, USA). We used our model to classify detected tags with an overall error rate of 6.8% (1.6% ghost tags error rate and 21.8% live fish error rate). The important variables for classification were related to distance moved and response to monsoonal flood flows; however, habitat variables did not appear to influence model accuracy. Our results and approach allow the use of mobile detection data with confidence and allow for greater accuracy in movement, distribution, and habitat use studies, potentially helping identify influential management actions that would improve our ability to conserve and recover endangered fish.


2020 ◽  
Vol 591 ◽  
pp. 125324 ◽  
Author(s):  
Jieyu Li ◽  
Ping-an Zhong ◽  
Minzhi Yang ◽  
Feilin Zhu ◽  
Juan Chen ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document