Data Clustering using Genomic Analysis in Graph model

Abstract In the event that the data is addressed as a diagram, wherein the hubs are devices and the hyperlinks establish associations among devices then a bunch might be defined as an associated perspective; i.e., a gathering of devices that are identified with each other, yet that don’t have any association with objects outside the gathering. Bunching is an essential test in the quality examination. This ponders monster impact genetic field. Thusly in the current system, the various genomic assessments are scattered in various dispersed structures. In our proposed work, we endeavour to develop a normal data base for genomic and proteomic assessment using diagram grouping.

Download Full-text

Data Clustering and Evolving Fuzzy Decision Tree for Data Base Classification Problems

Communications in Computer and Information Science - Advanced Intelligent Computing Theories and Applications. With Aspects of Contemporary Intelligent Computing Techniques ◽

10.1007/978-3-540-85930-7_59 ◽

2008 ◽

pp. 463-470 ◽

Cited By ~ 1

Author(s):

Pei-Chann Chang ◽

Chin-Yuan Fan ◽

Yen-Wen Wang

Keyword(s):

Decision Tree ◽

Data Base ◽

Data Clustering ◽

Fuzzy Decision ◽

Fuzzy Decision Tree ◽

Classification Problems

Download Full-text

Identification of Conductivity Distribution Using Eddy Current Tomography System and Artificial Neural Networks

Materials Science Forum ◽

10.4028/www.scientific.net/msf.670.336 ◽

2010 ◽

Vol 670 ◽

pp. 336-344

Author(s):

Tomasz Chady ◽

Ireneusz Spychalski ◽

Takashi Todaka

Keyword(s):

Neural Networks ◽

Artificial Neural Networks ◽

Data Base ◽

Eddy Current ◽

Current System ◽

Identification Problem ◽

Conductivity Distribution ◽

Tomography System ◽

Dimensional Distribution ◽

Artificial Neural

In certain applications (security, biomedical, food and wood testing etc.) it is necessary to detect and identify position of small metal particles with high precision. This paper presents an eddy current system designated for evaluation of conductivity distribution. The system was modeled using the finite element method as well as it was constructed and the measurements were carried out. Using these results a data base of the signals achieved for various configurations of the test objects were created. The data base was utilized to solve the identification problem. Artificial neural networks were utilized as the inverse models in order to reconstruct two-dimensional distribution of conductivity. Selected results achieved for simulated signals were presented.

Download Full-text

Parallel Semi-Supervised Big Data Clustering Based on Mapreduce Technology

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.c5206.118419 ◽

2019 ◽

Vol 8 (4) ◽

pp. 1657-1664

Keyword(s):

Big Data ◽

Data Clustering ◽

Clustering Algorithm ◽

Graph Model ◽

Heterogeneous Data ◽

Initial Population ◽

Consensus Clustering ◽

Hidden Knowledge ◽

Intermediate Results ◽

Data Objects

In the area of information technology, a speedy sensational technology is big data. Big data brings tremendous challenges to extract valuable hidden knowledge. Data mining techniques can be used over big data to extract valuable knowledge for decision making. Big data results in high heterogeneity because it consists of various inter-related kinds of objects such as audios, texts, and images. In addition to this, the inter-related kinds of objects carry different information. So, in this paper clustering techniques are introduced to separate objects into several clusters. It also reduces the computational complexity of classifiers. A Possibilistic c-Means (PCM) algorithm was introduced to group the objects in big data. PCM replicated the characteristic of each object to different clusters effectively and it had capability to avoid the corruption of noise in the clustering process. However, PCM is not more efficient for big data and it cannot confine the complex correlation over multiple modalities of the heterogeneous data objects. So, a Parallel Semi-supervised Multi-Ant Colonies Clustering (PSMACC) is introduced for big data clustering. Initially, the PSMACC splits the data into number of partitions and each partition is processed in mappers. Each mapper generates a diverse collection of three clustering components using the semisupervised ant colony clustering algorithm with various moving speeds. Then, a hyper graph model was used to combine three clustering components. Finally, two constraints such as MustLink (ML) and Cannot-Link (CL) are included to form a consensus clustering. Finally, the intermediate results of each mapper are combined in the reducer. However, the overhead of iteration in PSMACC is overwhelming which affects the performance of PSMACC. So, a Parallel Semi-supervised MultiImperialist Competitive Algorithm (PSMICA) is proposed to cluster the big data. In PSMICA, each mapper processes the ICA where initial population is called countries. Some of the best countries in the population chosen as the imperialists and the remaining countries form the colonies of these imperialists. The colonies move towards the imperialists based on the distance between them. The intermediate results of each mapper are combined in reducer to get the final clustering result.

Download Full-text

Shot boundary detection in videos using Graph Cut Sets

International Journal of Image Processing and Vision Science ◽

10.47893/ijipvs.2013.1075 ◽

2013 ◽

pp. 96-104

Author(s):

Shanmukhappa Angadi ◽

Vilas Naik

Keyword(s):

Data Clustering ◽

Video Segmentation ◽

Graph Model ◽

Weighted Graph ◽

Boundary Detection ◽

Shot Boundary Detection ◽

Graph Partition ◽

Data Set ◽

Pixel Intensity ◽

Shot Boundary

The Shot Boundary Detection (SBD) is an early step for most of the video applications involving understanding, indexing, characterization, or categorization of video. The SBD is temporal video segmentation and it has been an active topic of research in the area of content based video analysis. The research efforts have resulted in a variety of algorithms. The major methods that have been used for shot boundary detection include pixel intensity based, histogram-based, edge-based, and motion vectors based, technique. Recently researchers have attempted use of graph theory based methods for shot boundary detection. The proposed algorithm is one such graph based model and employs graph partition mechanism for detection of shot boundaries. Graph partition model is one of the graph theoretic segmentation algorithms, which offers data clustering by using a graph model. Pair-wise similarities between all data objects are used to construct a weighted graph represented as an adjacency matrix (weighted similarity matrix) that contains all necessary information for clustering. Representing the data set in the form of an edge-weighted graph converts the data clustering problem into a graph partitioning problem. The algorithm is experimented on sports and movie videos and the results indicate the promising performance.

Download Full-text

Genomic analysis of C-type lectins

Biochemical Society Symposium ◽

10.1042/bss0690059 ◽

2002 ◽

Vol 69 ◽

pp. 59-72 ◽

Cited By ~ 85

Author(s):

Kurt Drickamer ◽

Andrew J. Fadden

Keyword(s):

Biological Effects ◽

Cell Signalling ◽

Genomic Analysis ◽

Binding Activity ◽

Immunoglobulin Superfamily ◽

Carbohydrate Binding ◽

Carbohydrate Recognition ◽

Calcium Dependent ◽

Binding Domains ◽

Carbohydrate Recognition Domains

Many biological effects of complex carbohydrates are mediated by lectins that contain discrete carbohydrate-recognition domains. At least seven structurally distinct families of carbohydrate-recognition domains are found in lectins that are involved in intracellular trafficking, cell adhesion, cell–cell signalling, glycoprotein turnover and innate immunity. Genome-wide analysis of potential carbohydrate-binding domains is now possible. Two classes of intracellular lectins involved in glycoprotein trafficking are present in yeast, model invertebrates and vertebrates, and two other classes are present in vertebrates only. At the cell surface, calcium-dependent (C-type) lectins and galectins are found in model invertebrates and vertebrates, but not in yeast; immunoglobulin superfamily (I-type) lectins are only found in vertebrates. The evolutionary appearance of different classes of sugar-binding protein modules parallels a development towards more complex oligosaccharides that provide increased opportunities for specific recognition phenomena. An overall picture of the lectins present in humans can now be proposed. Based on our knowledge of the structures of several of the C-type carbohydrate-recognition domains, it is possible to suggest ligand-binding activity that may be associated with novel C-type lectin-like domains identified in a systematic screen of the human genome. Further analysis of the sequences of proteins containing these domains can be used as a basis for proposing potential biological functions.

Download Full-text