unsupervised approach Latest Research Papers

STEMUR: An Automated Word Conflation Algorithm for the Urdu Language

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3476226 ◽

2022 ◽

Vol 21 (2) ◽

pp. 1-20

Author(s):

Tayyaba Fatima ◽

Raees Ul Islam ◽

Muhammad Waqas Anwar ◽

M. Hasan Jamal ◽

M. Tayyab Chaudhry ◽

...

Keyword(s):

Morphological Analysis ◽

Linguistic Knowledge ◽

Common Word ◽

Training Process ◽

Loan Words ◽

Single Term ◽

Study Results ◽

Unsupervised Approach

Stemming is a common word conflation method that perceives stems embedded in the words and decreases them to their stem (root) by conflating all the morphologically related terms into a single term, without doing a complete morphological analysis. This article presents STEMUR, an enhanced stemming algorithm for automatic word conflation for Urdu language. In addition to handling words with prefixes and suffixes, STEMUR also handles words with infixes. Rather than using a totally unsupervised approach, we utilized the linguistic knowledge to develop a collection of patterns for Urdu infixes to enhance the accuracy of the stems and affixes acquired during the training process. Additionally, STEMUR also handles English loan words and can handle words with more than one affix. STEMUR is compared with four existing Urdu stemmers including Assas-Band and the template-based stemmer that are also implemented in this study. Results are processed on two corpora containing 89,437 and 30,907 words separately. Results show clear improvements regarding strength and accuracy of STEMUR. The use of maximum possible infix rules boosted our stemmer's accuracy up to 93.1% and helped us achieve a precision of 98.9%.

An unsupervised method for social network spammer detection based on user information interests

Journal Of Big Data ◽

10.1186/s40537-021-00552-5 ◽

2022 ◽

Vol 9 (1) ◽

Author(s):

Darshika Koggalahewa ◽

Yue Xu ◽

Ernest Foo

Keyword(s):

Social Networks ◽

Social Network ◽

Online Social Networks ◽

Supervised Classification ◽

Peer Acceptance ◽

Spam Detection ◽

Highly Active ◽

Detection Approach ◽

Data Fabrication ◽

Unsupervised Approach

AbstractOnline Social Networks (OSNs) are a popular platform for communication and collaboration. Spammers are highly active in OSNs. Uncovering spammers has become one of the most challenging problems in OSNs. Classification-based supervised approaches are the most commonly used method for detecting spammers. Classification-based systems suffer from limitations of “data labelling”, “spam drift”, “imbalanced datasets” and “data fabrication”. These limitations effect the accuracy of a classifier’s detection. An unsupervised approach does not require labelled datasets. We aim to address the limitation of data labelling and spam drifting through an unsupervised approach.We present a pure unsupervised approach for spammer detection based on the peer acceptance of a user in a social network to distinguish spammers from genuine users. The peer acceptance of a user to another user is calculated based on common shared interests over multiple shared topics between the two users. The main contribution of this paper is the introduction of a pure unsupervised spammer detection approach based on users’ peer acceptance. Our approach does not require labelled training datasets. While it does not better the accuracy of supervised classification-based approaches, our approach has become a successful alternative for traditional classifiers for spam detection by achieving an accuracy of 96.9%.

Using Semantic Role Knowledge for Relevance Ranking of Key Phrases in Documents: An Unsupervised Approach

10.1145/3493700.3493702 ◽

2022 ◽

Author(s):

Neelamadhav Gantayat ◽

Prateeti Mohapatra

Keyword(s):

Semantic Role ◽

Relevance Ranking ◽

Unsupervised Approach ◽

Key Phrases

An Unsupervised Approach to Structuring and Analyzing Repetitive Semantic Structures in Free Text of Electronic Medical Records

Journal of Personalized Medicine ◽

10.3390/jpm12010025 ◽

2022 ◽

Vol 12 (1) ◽

pp. 25

Author(s):

Varvara Koshman ◽

Anastasia Funkner ◽

Sergey Kovalchuk

Keyword(s):

Electronic Medical Records ◽

Medical Records ◽

Medical Data ◽

Validation Dataset ◽

Free Text ◽

Automatic Annotation ◽

Text Data ◽

Data Annotation ◽

Unsupervised Approach ◽

Labeling Method

Electronic medical records (EMRs) include many valuable data about patients, which is, however, unstructured. Therefore, there is a lack of both labeled medical text data in Russian and tools for automatic annotation. As a result, today, it is hardly feasible for researchers to utilize text data of EMRs in training machine learning models in the biomedical domain. We present an unsupervised approach to medical data annotation. Syntactic trees are produced from initial sentences using morphological and syntactical analyses. In retrieved trees, similar subtrees are grouped using Node2Vec and Word2Vec and labeled using domain vocabularies and Wikidata categories. The usage of Wikidata categories increased the fraction of labeled sentences 5.5 times compared to labeling with domain vocabularies only. We show on a validation dataset that the proposed labeling method generates meaningful labels correctly for 92.7% of groups. Annotation with domain vocabularies and Wikidata categories covered more than 82% of sentences of the corpus, extended with timestamp and event labels 97% of sentences got covered. The obtained method can be used to label EMRs in Russian automatically. Additionally, the proposed methodology can be applied to other languages, which lack resources for automatic labeling and domain vocabulary.

A Novel Unsupervised Spectral Clustering for Pure-Tone Audiograms towards Hearing Aid Filter Bank Design and Initial Configurations

Applied Sciences ◽

10.3390/app12010298 ◽

2021 ◽

Vol 12 (1) ◽

pp. 298

Author(s):

Abeer Elkhouly ◽

Allan Melvin Andrew ◽

Hasliza A. Rahim ◽

Nidhal Abdulaziz ◽

Mohamedfareq Abdulmalek ◽

...

Keyword(s):

Hearing Aids ◽

Spectral Clustering ◽

Pure Tone ◽

Filter Bank ◽

Current Practice ◽

Evaluation Criteria ◽

Silhouette Coefficient ◽

Impaired People ◽

Unsupervised Approach ◽

Standard Set

The current practice of adjusting hearing aids (HA) is tiring and time-consuming for both patients and audiologists. Of hearing-impaired people, 40–50% are not satisfied with their HAs. In addition, good designs of HAs are often avoided since the process of fitting them is exhausting. To improve the fitting process, a machine learning (ML) unsupervised approach is proposed to cluster the pure-tone audiograms (PTA). This work applies the spectral clustering (SP) approach to group audiograms according to their similarity in shape. Different SP approaches are tested for best results and these approaches were evaluated by Silhouette, Calinski-Harabasz, and Davies-Bouldin criteria values. Kutools for Excel add-in is used to generate audiograms’ population, annotated using the results from SP, and different criteria values are used to evaluate population clusters. Finally, these clusters are mapped to a standard set of audiograms used in HA characterization. The results indicated that grouping the data in 8 groups or 10 results in ones with high evaluation criteria. The evaluation for population audiograms clusters shows good performance, as it resulted in a Silhouette coefficient >0.5. This work introduces a new concept to classify audiograms using an ML algorithm according to the audiograms’ similarity in shape.

Protein Organization with Manifold Exploration and Spectral Clustering

10.1101/2021.12.08.471858 ◽

2021 ◽

Author(s):

Geoffroy Dubourg-Felonneau ◽

Shahab Shams ◽

Eyal Akiva ◽

Lawrence Lee

Keyword(s):

Spectral Clustering ◽

Protein Sequences ◽

Language Models ◽

Functional Categories ◽

Protein Families ◽

Enzyme Protein ◽

Amount Of Information ◽

Vast Amount ◽

Unsupervised Approach

We present a method to provide a biologically meaningful representation of the space of protein sequences. While billions of protein sequences are available, organizing this vast amount of information into functional categories is daunting, time-consuming and incomplete. We present our unsupervised approach that combines Transformer protein language models, UMAP graphs, and spectral clustering to create meaningful clusters in the protein spaces. To demonstrate the meaningfulness of the clusters, we show that they preserve most of the signal present in a dataset of manually curated enzyme protein families.

Detect Extreme Sentiments on Social Networks using BERT

10.21203/rs.3.rs-1120307/v1 ◽

2021 ◽

Author(s):

Muhammad Luqman Jamil ◽

Sebastião Pais ◽

João Cordeiro ◽

Gaël Dias

Keyword(s):

Social Networks ◽

Social Media ◽

Public Opinion ◽

Social Network ◽

Social Networking ◽

Online Social Networking ◽

Social Media Data ◽

Preceding Work ◽

Unsupervised Approach ◽

Media Data

Abstract Online social networking platforms allow people to freely express their ideas, opinions, and emotions negatively or positively. Previous studies have examined user’s sentiments on these platforms to study their behaviour in different contexts and purposes. The mechanism of collecting public opinion information has attracted researchers to automatically classify the polarity of public opinions based on the use of concise language in messages, such as tweets, by analyzing social media data. In this paper, we extend the preceding work [1], by proposing an unsupervised approach to automatically detect extreme opinions/posts in social networks. We have evaluated our performance on five different social network and media datasets. In this work, we use the semi-supervised approach BERT to check the accuracy of our classified dataset. The latter task shows that, in these datasets, posts that were previously classified as negative or positive are, in fact, extremely negative or positive in many cases.

An Unsupervised Approach of Colonic Polyp Segmentation using Adaptive Markov Random Fields

Pattern Recognition Letters ◽

10.1016/j.patrec.2021.12.014 ◽

2021 ◽

Author(s):

Pradipta Sasmal ◽

M.K. Bhuyan ◽

Soumayan Dutta ◽

Yuji Iwahori

Keyword(s):

Random Fields ◽

Markov Random Fields ◽

Colonic Polyp ◽

Markov Random ◽

Unsupervised Approach

A novel unsupervised approach based on the hidden features of Deep Denoising Autoencoders for COVID-19 disease detection

Expert Systems with Applications ◽

10.1016/j.eswa.2021.116366 ◽

2021 ◽

pp. 116366

Author(s):

Michele Scarpiniti ◽

Sima Sarv Ahrabi ◽

Enzo Baccarelli ◽

Lorenzo Piazzo ◽

Alireza Momenzadeh

Keyword(s):

Disease Detection ◽

Unsupervised Approach ◽

Deep Denoising Autoencoders

Assessing safety critical driving patterns of heavy passenger vehicle drivers using instrumented vehicle data – An unsupervised approach

Accident Analysis & Prevention ◽

10.1016/j.aap.2021.106464 ◽

2021 ◽

Vol 163 ◽

pp. 106464

Author(s):

Jahnavi Yarlagadda ◽

Pranjal Jain ◽

Digvijay S. Pawar

Keyword(s):

Passenger Vehicle ◽

Safety Critical ◽

Driving Patterns ◽

Vehicle Data ◽

Unsupervised Approach ◽

Instrumented Vehicle

unsupervised approach
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

STEMUR: An Automated Word Conflation Algorithm for the Urdu Language

An unsupervised method for social network spammer detection based on user information interests

Using Semantic Role Knowledge for Relevance Ranking of Key Phrases in Documents: An Unsupervised Approach

An Unsupervised Approach to Structuring and Analyzing Repetitive Semantic Structures in Free Text of Electronic Medical Records

A Novel Unsupervised Spectral Clustering for Pure-Tone Audiograms towards Hearing Aid Filter Bank Design and Initial Configurations

Protein Organization with Manifold Exploration and Spectral Clustering

Detect Extreme Sentiments on Social Networks using BERT

An Unsupervised Approach of Colonic Polyp Segmentation using Adaptive Markov Random Fields

A novel unsupervised approach based on the hidden features of Deep Denoising Autoencoders for COVID-19 disease detection

Assessing safety critical driving patterns of heavy passenger vehicle drivers using instrumented vehicle data – An unsupervised approach

Export Citation Format

unsupervised approachRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

STEMUR: An Automated Word Conflation Algorithm for the Urdu Language

An unsupervised method for social network spammer detection based on user information interests

Using Semantic Role Knowledge for Relevance Ranking of Key Phrases in Documents: An Unsupervised Approach

An Unsupervised Approach to Structuring and Analyzing Repetitive Semantic Structures in Free Text of Electronic Medical Records

A Novel Unsupervised Spectral Clustering for Pure-Tone Audiograms towards Hearing Aid Filter Bank Design and Initial Configurations

Protein Organization with Manifold Exploration and Spectral Clustering

Detect Extreme Sentiments on Social Networks using BERT

An Unsupervised Approach of Colonic Polyp Segmentation using Adaptive Markov Random Fields

A novel unsupervised approach based on the hidden features of Deep Denoising Autoencoders for COVID-19 disease detection

Assessing safety critical driving patterns of heavy passenger vehicle drivers using instrumented vehicle data – An unsupervised approach

unsupervised approach
Recently Published Documents