scholarly journals Rapid and Accurate Analysis of an X-Ray Fluorescence Microscopy Data Set through Gaussian Mixture-Based Soft Clustering Methods

2013 ◽  
Vol 19 (5) ◽  
pp. 1281-1289 ◽  
Author(s):  
Jesse Ward ◽  
Rebecca Marvin ◽  
Thomas O'Halloran ◽  
Chris Jacobsen ◽  
Stefan Vogt

AbstractX-ray fluorescence (XRF) microscopy is an important tool for studying trace metals in biology, enabling simultaneous detection of multiple elements of interest and allowing quantification of metals in organelles without the need for subcellular fractionation. Currently, analysis of XRF images is often done using manually defined regions of interest (ROIs). However, since advances in synchrotron instrumentation have enabled the collection of very large data sets encompassing hundreds of cells, manual approaches are becoming increasingly impractical. We describe here the use of soft clustering to identify cell ROIs based on elemental contents, using data collected over a sample of the malaria parasite Plasmodium falciparum as a test case. Soft clustering was able to successfully classify regions in infected erythrocytes as “parasite,” “food vacuole,” “host,” or “background.” In contrast, hard clustering using the k-means algorithm was found to have difficulty in distinguishing cells from background. While initial tests showed convergence on two or three distinct solutions in 60% of the cells studied, subsequent modifications to the clustering routine improved results to yield 100% consistency in image segmentation. Data extracted using soft cluster ROIs were found to be as accurate as data extracted using manually defined ROIs, and analysis time was considerably improved.

2006 ◽  
Vol 39 (2) ◽  
pp. 262-266 ◽  
Author(s):  
R. J. Davies

Synchrotron sources offer high-brilliance X-ray beams which are ideal for spatially and time-resolved studies. Large amounts of wide- and small-angle X-ray scattering data can now be generated rapidly, for example, during routine scanning experiments. Consequently, the analysis of the large data sets produced has become a complex and pressing issue. Even relatively simple analyses become difficult when a single data set can contain many thousands of individual diffraction patterns. This article reports on a new software application for the automated analysis of scattering intensity profiles. It is capable of batch-processing thousands of individual data files without user intervention. Diffraction data can be fitted using a combination of background functions and non-linear peak functions. To compliment the batch-wise operation mode, the software includes several specialist algorithms to ensure that the results obtained are reliable. These include peak-tracking, artefact removal, function elimination and spread-estimate fitting. Furthermore, as well as non-linear fitting, the software can calculate integrated intensities and selected orientation parameters.


2021 ◽  
Vol 40 (1) ◽  
pp. 477-490
Author(s):  
Yanping Xu ◽  
Tingcong Ye ◽  
Xin Wang ◽  
Yuping Lai ◽  
Jian Qiu ◽  
...  

In the field of security, the data labels are unknown or the labels are too expensive to label, so that clustering methods are used to detect the threat behavior contained in the big data. The most widely used probabilistic clustering model is Gaussian Mixture Models(GMM), which is flexible and powerful to apply prior knowledge for modelling the uncertainty of the data. Therefore, in this paper, we use GMM to build the threat behavior detection model. Commonly, Expectation Maximization (EM) and Variational Inference (VI) are used to estimate the optimal parameters of GMM. However, both EM and VI are quite sensitive to the initial values of the parameters. Therefore, we propose to use Singular Value Decomposition (SVD) to initialize the parameters. Firstly, SVD is used to factorize the data set matrix to get the singular value matrix and singular matrices. Then we calculate the number of the components of GMM by the first two singular values in the singular value matrix and the dimension of the data. Next, other parameters of GMM, such as the mixing coefficients, the mean and the covariance, are calculated based on the number of the components. After that, the initialization values of the parameters are input into EM and VI to estimate the optimal parameters of GMM. The experiment results indicate that our proposed method performs well on the parameters initialization of GMM clustering using EM and VI for estimating parameters.


2015 ◽  
Vol 12 (2) ◽  
pp. 204 ◽  
Author(s):  
Lynda C. Radke ◽  
Jin Li ◽  
Grant Douglas ◽  
Rachel Przeslawski ◽  
Scott Nichol ◽  
...  

Environmental context Australia's tropical marine estate is a biodiversity hotspot that is threatened by human activities. Analysis and interpretation of large physical and geochemistry data sets provides important information on processes occurring at the seafloor in this poorly known area. These processes help us to understand how the seafloor functions to support biodiversity in the region. Abstract Baseline information on habitats is required to manage Australia's northern tropical marine estate. This study aims to develop an improved understanding of seafloor environments of the Timor Sea. Clustering methods were applied to a large data set comprising physical and geochemical variables that describe organic matter (OM) reactivity, quantity and source, and geochemical processes. Arthropoda (infauna) were used to assess different groupings. Clusters based on physical and geochemical data discriminated arthropods better than geomorphic features. Major variations among clusters included grain size and a cross-shelf transition from authigenic-Mn–As enrichments (inner shelf) to authigenic-P enrichment (outer shelf). Groups comprising raised features had the highest reactive OM concentrations (e.g. low chlorin indices and C:N ratios, and high reaction rate coefficients) and benthic algal δ13C signatures. Surface area-normalised OM concentrations higher than continental shelf norms were observed in association with: (i) low δ15N, inferring Trichodesmium input; and (ii) pockmarks, which impart bottom–up controls on seabed chemistry and cause inconsistencies between bulk and pigment OM pools. Low Shannon–Wiener diversity occurred in association with low redox and porewater pH and published evidence for high energy. Highest β-diversity was observed at euphotic depths. Geochemical data and clustering methods used here provide insight into ecosystem processes that likely influence biodiversity patterns in the region.


Author(s):  
Bing Li ◽  
Liyun Xu ◽  
Qi Guo ◽  
Jianhui Chen ◽  
Yanan Zhang ◽  
...  

Mycobacterium tuberculosis (MTB) and non-tuberculous mycobacteria (NTM) infections often exhibit similar clinical symptoms. Timely and effective treatment relies on the rapid and accurate identification of species and resistance genotypes. In this study, a new platform (GenSeizer), which combines bioinformatics analysis of a large data set and multiplex PCR-based targeted gene sequencing, was developed to identify 10 major Mycobacterium species that cause pulmonary, as well as extrapulmonary, human diseases. Simultaneous detection of certain resistance erm(41) and rrl genotypes in M. abscessus was also feasible. This platform was specific and sensitive, exhibited no cross-reactivity among reference strains and a detection limit of 5 DNA copies or 50 CFU Mycobacterium/ml. In a blinded comparison, GenSeizer and multigene sequencing showed 100% agreement in the ability to identify 88 clinical, Mycobacterium isolates. The resistance genotypes, confirmed by whole genome sequencing of 30 M. abscessus strains, were also correctly identified by GenSeizer 100% of the time. These results indicate that GenSeizer is an efficient, reliable platform for diagnosing major pathogenic Mycobacterium species.


2016 ◽  
Author(s):  
Jeremy G. Todd ◽  
Jamey S. Kain ◽  
Benjamin L. de Bivort

AbstractTo fully understand the mechanisms giving rise to behavior, we need to be able to precisely measure it. When coupled with large behavioral data sets, unsupervised clustering methods offer the potential of unbiased mapping of behavioral spaces. However, unsupervised techniques to map behavioral spaces are in their infancy, and there have been few systematic considerations of all the methodological options. We compared the performance of seven distinct mapping methods in clustering a data set consisting of the x-and y-positions of the six legs of individual flies. Legs were automatically tracked by small pieces of fluorescent dye, while the fly was tethered and walking on an air-suspended ball. We find that there is considerable variation in the performance of these mapping methods, and that better performance is attained when clustering is done in higher dimensional spaces (which are otherwise less preferable because they are hard to visualize). High dimensionality means that some algorithms, including the non-parametric watershed cluster assignment algorithm, cannot be used. We developed an alternative watershed algorithm which can be used in high-dimensional spaces when the probability density estimate can be computed directly. With these tools in hand, we examined the behavioral space of fly leg postural dynamics and locomotion. We find a striking division of behavior into modes involving the fore legs and modes involving the hind legs, with few direct transitions between them. By computing behavioral clusters using the data from all flies simultaneously, we show that this division appears to be common to all flies. We also identify individual-to-individual differences in behavior and behavioral transitions. Lastly, we suggest a computational pipeline that can achieve satisfactory levels of performance without the taxing computational demands of a systematic combinatorial approach.AbbreviationsGMM: Gaussian mixture model; PCA: principal components analysis; SW: sparse watershed; t-SNE: t-distributed stochastic neighbor embedding


2019 ◽  
Author(s):  
Martin Papenberg ◽  
Gunnar W. Klau

Numerous applications in psychological research require that a pool of elements is partitioned into multiple parts. While many applications seek groups that are well-separated, i.e., dissimilar from each other, others require the different groups to be as similar as possible. Examples include the assignment of students to parallel courses, assembling stimulus sets in experimental psychology, splitting achievement tests into parts of equal difficulty, and dividing a data set for cross validation. We present anticlust, an easy-to-use and free software package for solving these problems fast and in an automated manner. The package anticlust is an open source extension to the R programming language and implements the methodology of anticlustering. Anticlustering divides elements into similar parts, ensuring similarity between groups by enforcing heterogeneity within groups. Thus, anticlustering is the direct reversal of cluster analysis that aims to maximize homogeneity within groups and dissimilarity between groups. Our package anticlust implements two anticlustering criteria, reversing the clustering methods k-means and cluster editing, respectively. In a simulation study, we show that anticlustering returns excellent results and outperforms alternative approaches like random assignment and matching. In three example applications, we illustrate how to apply anticlust on real data sets. We demonstrate how to assign experimental stimuli to equivalent sets based on norming data, how to divide a large data set for cross validation, and how to split a test into parts of equal item difficulty and discrimination.


2020 ◽  
Vol 7 (4) ◽  
pp. 182
Author(s):  
Christine Böhmer ◽  
Estella Böhmer

Acquired dental problems are among the most frequently encountered diseases in pet rabbits. However, early symptoms are often overlooked because the affected animals first appear completely asymptomatic. Alterations from anatomical reference lines according to Böhmer and Crossley applied to standard skull X-ray images, have been shown to be indicative of tooth health problems in pet rabbits. Despite its proven usefulness, there are exceptions in which the anatomical reference lines appear not to be suitable for application. We addressed this issue by quantifying the cranial morphology of a large data set of pet rabbit patients (N = 80). The results of the morphometric analyses revealed considerable diversity in skull shape among the typical pet rabbits, but variance in only a few parameters influences the applicability of the anatomical reference lines. The most substantial parameter is the palatal angle. Specimens in which the anatomical reference lines could not be applied, have a rather large angle between the skull base and the palatal bone. We recommend to measure the palatal angle before applying the anatomical reference lines for objective interpretation of dental disease. Pet rabbits with a palatal angle larger than 18.8° are not strictly suitable for the successful application of the anatomical reference lines.


2014 ◽  
Vol 47 (3) ◽  
pp. 1118-1131 ◽  
Author(s):  
Anton Barty ◽  
Richard A. Kirian ◽  
Filipe R. N. C. Maia ◽  
Max Hantke ◽  
Chun Hong Yoon ◽  
...  

The emerging technique of serial X-ray diffraction, in which diffraction data are collected from samples flowing across a pulsed X-ray source at repetition rates of 100 Hz or higher, has necessitated the development of new software in order to handle the large data volumes produced. Sorting of data according to different criteria and rapid filtering of events to retain only diffraction patterns of interest results in significant reductions in data volume, thereby simplifying subsequent data analysis and management tasks. Meanwhile the generation of reduced data in the form of virtual powder patterns, radial stacks, histograms and other meta data creates data set summaries for analysis and overall experiment evaluation. Rapid data reduction early in the analysis pipeline is proving to be an essential first step in serial imaging experiments, prompting the authors to make the tool described in this article available to the general community. Originally developed for experiments at X-ray free-electron lasers, the software is based on a modular facility-independent library to promote portability between different experiments and is available under version 3 or later of the GNU General Public License.


2019 ◽  
Vol 8 (2S11) ◽  
pp. 3687-3693

Clustering is a type of mining process where the data set is categorized into various sub classes. Clustering process is very much essential in classification, grouping, and exploratory pattern of analysis, image segmentation and decision making. And we can explain about the big data as very large data sets which are examined computationally to show techniques and associations and also which is associated to the human behavior and their interactions. Big data is very essential for several organisations but in few cases very complex to store and it is also time saving. Hence one of the ways of overcoming these issues is to develop the many clustering methods, moreover it suffers from the large complexity. Data mining is a type of technique where the useful information is extracted, but the data mining models cannot utilized for the big data because of inherent complexity. The main scope here is to introducing a overview of data clustering divisions for the big data And also explains here few of the related work for it. This survey concentrates on the research of several clustering algorithms which are working basically on the elements of big data. And also the short overview of clustering algorithms which are grouped under partitioning, hierarchical, grid based and model based are seenClustering is major data mining and it is used for analyzing the big data.the problems for applying clustering patterns to big data and also we phase new issues come up with big data


Sign in / Sign up

Export Citation Format

Share Document