scholarly journals Data reduction for serial crystallography using a robust peak finder

2021 ◽  
Vol 54 (5) ◽  
Author(s):  
Marjan Hadian-Jazi ◽  
Alireza Sadri ◽  
Anton Barty ◽  
Oleksandr Yefanov ◽  
Marina Galchenkova ◽  
...  

A peak-finding algorithm for serial crystallography (SX) data analysis based on the principle of `robust statistics' has been developed. Methods which are statistically robust are generally more insensitive to any departures from model assumptions and are particularly effective when analysing mixtures of probability distributions. For example, these methods enable the discretization of data into a group comprising inliers (i.e. the background noise) and another group comprising outliers (i.e. Bragg peaks). Our robust statistics algorithm has two key advantages, which are demonstrated through testing using multiple SX data sets. First, it is relatively insensitive to the exact value of the input parameters and hence requires minimal optimization. This is critical for the algorithm to be able to run unsupervised, allowing for automated selection or `vetoing' of SX diffraction data. Secondly, the processing of individual diffraction patterns can be easily parallelized. This means that it can analyse data from multiple detector modules simultaneously, making it ideally suited to real-time data processing. These characteristics mean that the robust peak finder (RPF) algorithm will be particularly beneficial for the new class of MHz X-ray free-electron laser sources, which generate large amounts of data in a short period of time.

2017 ◽  
Vol 50 (6) ◽  
pp. 1705-1715 ◽  
Author(s):  
Marjan Hadian-Jazi ◽  
Marc Messerschmidt ◽  
Connie Darmanin ◽  
Klaus Giewekemeyer ◽  
Adrian P. Mancuso ◽  
...  

The recent development of serial crystallography at synchrotron and X-ray free-electron laser (XFEL) sources is producing crystallographic datasets of ever increasing volume. The size of these datasets is such that fast and efficient analysis presents a range of challenges that have to be overcome to enable real-time data analysis, which is essential for the effective management of XFEL experiments. Among the blocks which constitute the analysis pipeline, one major bottleneck is `peak finding', whose goal is to identify the Bragg peaks within (often) noisy diffraction patterns. Development of faster and more reliable peak-finding algorithms will allow for efficient processing and storage of the incoming data, as well as the optimal use of diffraction data for structure determination. This paper addresses the problem of peak finding and, by extension, `hit finding' in crystallographic XFEL datasets, by exploiting recent developments in robust statistical analysis. The approach described here involves two basic steps: (1) the identification of pixels which contain potential peaks and (2) modeling of the local background in the vicinity of these potential peaks. The presented framework can be generalized to include both complex background models and alternative models for the Bragg peaks.


2019 ◽  
Vol 34 (S1) ◽  
pp. S59-S70 ◽  
Author(s):  
Ekaterina Fomina ◽  
Evgeniy Kozlov ◽  
Svetlana Ivashevskaja

This paper presents an example of comparing geochemical and mineralogical data by means of the statistical analysis of the X-ray diffraction patterns and the chemical compositions of bulk samples. The proposed methodology was tested on samples of metasomatic rocks from two geologically different objects. Its application allows us to mathematically identify all the main, secondary and some accessory minerals, to qualitatively estimate the contents of these minerals, as well as to assess their effect on the distribution of all petrogenic and investigated trace elements in a short period of time at the earliest stages of the research. We found that the interpretation of the results is significantly influenced by the number of samples studied and the quality of diffractograms.


2014 ◽  
Vol 21 (6) ◽  
pp. 1231-1239 ◽  
Author(s):  
Alexei S. Soares ◽  
Jeffrey D. Mullen ◽  
Ruchi M. Parekh ◽  
Grace S. McCarthy ◽  
Christian G. Roessler ◽  
...  

X-ray diffraction data were obtained at the National Synchrotron Light Source from insulin and lysozyme crystals that were densely deposited on three types of surfaces suitable for serial micro-crystallography: MiTeGen MicroMeshes™, Greiner Bio-One Ltdin situmicro-plates, and a moving kapton crystal conveyor belt that is used to deliver crystals directly into the X-ray beam. 6° wedges of data were taken from ∼100 crystals mounted on each material, and these individual data sets were merged to form nine complete data sets (six from insulin crystals and three from lysozyme crystals). Insulin crystals have a parallelepiped habit with an extended flat face that preferentially aligned with the mounting surfaces, impacting the data collection strategy and the design of the serial crystallography apparatus. Lysozyme crystals had a cuboidal habit and showed no preferential orientation. Preferential orientation occluded regions of reciprocal space when the X-ray beam was incident normal to the data-collection medium surface, requiring a second pass of data collection with the apparatus inclined away from the orthogonal. In addition, crystals measuring less than 20 µm were observed to clump together into clusters of crystals. Clustering required that the X-ray beam be adjusted to match the crystal size to prevent overlapping diffraction patterns. No additional problems were encountered with the serial crystallography strategy of combining small randomly oriented wedges of data from a large number of specimens. High-quality data able to support a realistic molecular replacement solution were readily obtained from both crystal types using all three serial crystallography strategies.


Author(s):  
Tannistha Pal

Images captured in severe atmospheric catastrophe especially in fog critically degrade the quality of an image and thereby reduces the visibility of an image which in turn affects several computer vision applications like visual surveillance detection, intelligent vehicles, remote sensing, etc. Thus acquiring clear vision is the prime requirement of any image. In the last few years, many approaches have been made towards solving this problem. In this article, a comparative analysis has been made on different existing image defogging algorithms and then a technique has been proposed for image defogging based on dark channel prior strategy. Experimental results show that the proposed method shows efficient results by significantly improving the visual effects of images in foggy weather. Also computational time of the existing techniques are much higher which has been overcame in this paper by using the proposed method. Qualitative assessment evaluation is performed on both benchmark and real time data sets for determining theefficacy of the technique used. Finally, the whole work is concluded with its relative advantages and shortcomings.


Sensors ◽  
2021 ◽  
Vol 21 (15) ◽  
pp. 5204
Author(s):  
Anastasija Nikiforova

Nowadays, governments launch open government data (OGD) portals that provide data that can be accessed and used by everyone for their own needs. Although the potential economic value of open (government) data is assessed in millions and billions, not all open data are reused. Moreover, the open (government) data initiative as well as users’ intent for open (government) data are changing continuously and today, in line with IoT and smart city trends, real-time data and sensor-generated data have higher interest for users. These “smarter” open (government) data are also considered to be one of the crucial drivers for the sustainable economy, and might have an impact on information and communication technology (ICT) innovation and become a creativity bridge in developing a new ecosystem in Industry 4.0 and Society 5.0. The paper inspects OGD portals of 60 countries in order to understand the correspondence of their content to the Society 5.0 expectations. The paper provides a report on how much countries provide these data, focusing on some open (government) data success facilitating factors for both the portal in general and data sets of interest in particular. The presence of “smarter” data, their level of accessibility, availability, currency and timeliness, as well as support for users, are analyzed. The list of most competitive countries by data category are provided. This makes it possible to understand which OGD portals react to users’ needs, Industry 4.0 and Society 5.0 request the opening and updating of data for their further potential reuse, which is essential in the digital data-driven world.


Genetics ◽  
2003 ◽  
Vol 163 (3) ◽  
pp. 1177-1191 ◽  
Author(s):  
Gregory A Wilson ◽  
Bruce Rannala

Abstract A new Bayesian method that uses individual multilocus genotypes to estimate rates of recent immigration (over the last several generations) among populations is presented. The method also estimates the posterior probability distributions of individual immigrant ancestries, population allele frequencies, population inbreeding coefficients, and other parameters of potential interest. The method is implemented in a computer program that relies on Markov chain Monte Carlo techniques to carry out the estimation of posterior probabilities. The program can be used with allozyme, microsatellite, RFLP, SNP, and other kinds of genotype data. We relax several assumptions of early methods for detecting recent immigrants, using genotype data; most significantly, we allow genotype frequencies to deviate from Hardy-Weinberg equilibrium proportions within populations. The program is demonstrated by applying it to two recently published microsatellite data sets for populations of the plant species Centaurea corymbosa and the gray wolf species Canis lupus. A computer simulation study suggests that the program can provide highly accurate estimates of migration rates and individual migrant ancestries, given sufficient genetic differentiation among populations and sufficient numbers of marker loci.


Stats ◽  
2021 ◽  
Vol 4 (1) ◽  
pp. 184-204
Author(s):  
Carlos Barrera-Causil ◽  
Juan Carlos Correa ◽  
Andrew Zamecnik ◽  
Francisco Torres-Avilés ◽  
Fernando Marmolejo-Ramos

Expert knowledge elicitation (EKE) aims at obtaining individual representations of experts’ beliefs and render them in the form of probability distributions or functions. In many cases the elicited distributions differ and the challenge in Bayesian inference is then to find ways to reconcile discrepant elicited prior distributions. This paper proposes the parallel analysis of clusters of prior distributions through a hierarchical method for clustering distributions and that can be readily extended to functional data. The proposed method consists of (i) transforming the infinite-dimensional problem into a finite-dimensional one, (ii) using the Hellinger distance to compute the distances between curves and thus (iii) obtaining a hierarchical clustering structure. In a simulation study the proposed method was compared to k-means and agglomerative nesting algorithms and the results showed that the proposed method outperformed those algorithms. Finally, the proposed method is illustrated through an EKE experiment and other functional data sets.


2019 ◽  
Vol 28 (06) ◽  
pp. 1950106
Author(s):  
Qian Dong ◽  
Bing Li

The hardware-based dictionary compression is widely adopted for high speed requirement of real-time data processing. Hash function helps to manage large dictionary to improve compression ratio but is prone to collisions, so some phrases in match search result are not true matches. This paper presents a novel match search approach called dual chaining hash refining, which can improve the efficiency of match search. From the experimental results, our method showed obvious advantage in compression speed compared with other approach that utilizes single hash function described in the previous publications.


Sign in / Sign up

Export Citation Format

Share Document