clustering methods
Recently Published Documents





Kamilia Hosny ◽  
Abeer El-korany

<p>Adaptive learning is one of the most widely used data driven approach to teaching and it received an increasing attention over the last decade. It aims to meet the student’s characteristics by tailoring learning courses materials and assessment methods. In order to determine the student’s characteristics, we need to detect their learning styles according to visual, auditory or kinaesthetic (VAK) learning style. In this research, an integrated model that utilizes both semantic and machine learning clustering methods is developed in order to cluster students to detect their learning styles and recommend suitable assessment method(s) accordingly. In order to measure the effectiveness of the proposed model, a set of experiments were conducted on real dataset (Open University Learning Analytics Dataset). Experiments showed that the proposed model is able to cluster students according to their different learning activities with an accuracy that exceeds 95% and predict their relative assessment method(s) with an average accuracy equals to 93%.</p>

Mathematics ◽  
2022 ◽  
Vol 10 (2) ◽  
pp. 274
Álvaro Gómez-Rubio ◽  
Ricardo Soto ◽  
Broderick Crawford ◽  
Adrián Jaramillo ◽  
David Mancilla ◽  

In the world of optimization, especially concerning metaheuristics, solving complex problems represented by applying big data and constraint instances can be difficult. This is mainly due to the difficulty of implementing efficient solutions that can solve complex optimization problems in adequate time, which do exist in different industries. Big data has demonstrated its efficiency in solving different concerns in information management. In this paper, an approach based on multiprocessing is proposed wherein clusterization and parallelism are used together to improve the search process of metaheuristics when solving large instances of complex optimization problems, incorporating collaborative elements that enhance the quality of the solution. The proposal deals with machine learning algorithms to improve the segmentation of the search space. Particularly, two different clustering methods belonging to automatic learning techniques, are implemented on bio-inspired algorithms to smartly initialize their solution population, and then organize the resolution from the beginning of the search. The results show that this approach is competitive with other techniques in solving a large set of cases of a well-known NP-hard problem without incorporating too much additional complexity into the metaheuristic algorithms.

2022 ◽  
Vol 2022 ◽  
pp. 1-13
Jinbo Chao ◽  
Chunhui Zhao ◽  
Fuzhi Zhang

Information security is one of the key issues in e-commerce Internet of Things (IoT) platform research. The collusive spamming groups on e-commerce platforms can write a large number of fake reviews over a period of time for the evaluated products, which seriously affect the purchase decision behaviors of consumers and destroy the fair competition environment among merchants. To address this problem, we propose a network embedding based approach to detect collusive spamming groups. First, we use the idea of a meta-graph to construct a heterogeneous information network based on the user review dataset. Second, we exploit the modified DeepWalk algorithm to learn the low-dimensional vector representations of user nodes in the heterogeneous information network and employ the clustering methods to obtain candidate spamming groups. Finally, we leverage an indicator weighting strategy to calculate the spamming score of each candidate group, and the top-k groups with high spamming scores are considered to be the collusive spamming groups. The experimental results on two real-world review datasets show that the overall detection performance of the proposed approach is much better than that of baseline methods.

2022 ◽  
Vol 12 ◽  
Xin Duan ◽  
Wei Wang ◽  
Minghui Tang ◽  
Feng Gao ◽  
Xudong Lin

Identifying the phenotypes and interactions of various cells is the primary objective in cellular heterogeneity dissection. A key step of this methodology is to perform unsupervised clustering, which, however, often suffers challenges of the high level of noise, as well as redundant information. To overcome the limitations, we proposed self-diffusion on local scaling affinity (LSSD) to enhance cell similarities’ metric learning for dissecting cellular heterogeneity. Local scaling infers the self-tuning of cell-to-cell distances that are used to construct cell affinity. Our approach implements the self-diffusion process by propagating the affinity matrices to further improve the cell similarities for the downstream clustering analysis. To demonstrate the effectiveness and usefulness, we applied LSSD on two simulated and four real scRNA-seq datasets. Comparing with other single-cell clustering methods, our approach demonstrates much better clustering performance, and cell types identified on colorectal tumors reveal strongly biological interpretability.

Meng Yuan ◽  
Justin Zobel ◽  
Pauline Lin

AbstractClustering of the contents of a document corpus is used to create sub-corpora with the intention that they are expected to consist of documents that are related to each other. However, while clustering is used in a variety of ways in document applications such as information retrieval, and a range of methods have been applied to the task, there has been relatively little exploration of how well it works in practice. Indeed, given the high dimensionality of the data it is possible that clustering may not always produce meaningful outcomes. In this paper we use a well-known clustering method to explore a variety of techniques, existing and novel, to measure clustering effectiveness. Results with our new, extrinsic techniques based on relevance judgements or retrieved documents demonstrate that retrieval-based information can be used to assess the quality of clustering, and also show that clustering can succeed to some extent at gathering together similar material. Further, they show that intrinsic clustering techniques that have been shown to be informative in other domains do not work for information retrieval. Whether clustering is sufficiently effective to have a significant impact on practical retrieval is unclear, but as the results show our measurement techniques can effectively distinguish between clustering methods.

2022 ◽  
Hannah Paris Cowley ◽  
Michael S. Robinette ◽  
Jordan K. Matelsky ◽  
Daniel Xenes ◽  
Aparajita Kashyap ◽  

Abstract As clinicians are faced with a deluge of new information, data science can play a key role in highlighting key features towards developing new clinical hypotheses. Indeed, insights derived from machine learning can serve as a clinical support tool by connecting care providers with results from big data analysis to identify latent patterns that may not be easily detected by even skilled human observers. In this work, we show an example of collaboration between clinicians and data scientists during the COVID-19 pandemic, identifying subgroups of COVID-19 patients with unanticipated outcomes or who are high-risk for severe disease or death. We apply a random forest classifier model to predict adverse patient outcomes early in the disease course, and we connect our classification results to unsupervised clustering of patient features that may underpin patient risk. The paradigm for using data science for hypothesis generation and clinical decision support, as well as our triage classification approach and unsupervised clustering methods to determine patient cohorts, are applicable to driving rapid hypothesis generation and iteration in a variety of clinical challenges, including future public health crises.

Imke Rhoden ◽  
Daniel Weller ◽  
Ann-Katrin Voit

We apply a functional data approach for mixture model-based multivariate innovation clustering to identify different regional innovation portfolios in Europe, considering patterns of specialization among innovation types. We combine patent registration data and other innovation and economic data across 225 regions, 13 years, and eight patent classes. The approach allows us to form several regional clusters according to their specific innovation types and captures spatio-temporal dynamics too subtle for most other clustering methods. Consistent with the literature on innovation systems, our analysis supports the value of regionalized clusters that can benefit from flexible policy support to strengthen regions as well as innovation in a systematic context, adding technology specificity as a new criterion to consider. The regional innovation cluster solutions for IPC classes for ‘fixed constructions’ and ‘mechanical engineering’ are highly comparable but relatively less comparable for ‘chemistry and metallurgy’. The clusters for innovations in ‘physics’ and ‘chemistry and metallurgy’ are similar; innovations in ‘electricity’ and ‘physics’ show similar temporal dynamics. For all other innovation types, the regional clustering is different. By taking regional profiles, strengths, and developments into account, options for improved efficiency of location-based regional innovation policy to promote tailored and efficient innovation-promoting programs can be derived.

2022 ◽  
Jiyuan Fang ◽  
Cliburn Chan ◽  
Kouros Owzar ◽  
Liuyang Wang ◽  
Diyuan Qin ◽  

Single-cell RNA-sequencing (scRNA-seq) technology allows us to explore cellular heterogeneity in the transcriptome. Because most scRNA-seq data analyses begin with cell clustering, its accuracy considerably impacts the validity of downstream analyses. Although many clustering methods have been developed, few tools are available to evaluate the clustering "goodness-of-fit" to the scRNA-seq data. In this paper, we propose a new Clustering Deviation Index (CDI) that measures the deviation of any clustering label set from the observed single-cell data. We conduct in silico and experimental scRNA-seq studies to show that CDI can select the optimal clustering label set. Particularly, CDI also informs the optimal tuning parameters for any given clustering method and the correct number of cluster components.

Sign in / Sign up

Export Citation Format

Share Document