ON COUNTING THE FREQUENCY DISTRIBUTION OF STRING MOTIFS IN MOLECULAR SEQUENCES

2012 ◽  
Vol 05 (06) ◽  
pp. 1250055
Author(s):  
MATTIA C. F. PROSPERI ◽  
LUCIANO PROSPERI ◽  
REBECCA R. GRAY ◽  
MARCO SALEMI

This work investigates frequency distributions of strings within a text. The mathematical derivation accounts for variable alphabet size, character probabilities, and string/text lengths, under both the Bernoullian and the Markovian model for string generation. The analysis is limited to the set of non-clumpable strings, that cannot overlap with themselves. Two formulae (exact and approximated) are derived, calculating the frequency distribution of a string of length m found inside a text of length n (with m < n). The approximated formula has a constant complexity (in contrast to an exponential complexity of the exact) and makes it applicable to very long texts. The proposed formulae were applied to analyze string frequencies in a portion of the human genome, and to recalculate frequencies of known repeated motif within genes, associated to genetic diseases. A comparison with state-of-the-art methods was provided. The formulae presented here can be of use in the statistical evaluation of specific motif frequencies within very long texts (e.g. genes or genomes) and help in characterizing motifs in pathologic conditions.

2007 ◽  
Vol 7 (4) ◽  
pp. 347-359 ◽  
Author(s):  
Gaurav Ameta ◽  
Joseph K. Davidson ◽  
Jami J. Shah

A new mathematical model for representing the geometric variations of lines is extended to include probabilistic representations of one-dimensional (1D) clearance, which arise from positional variations of the axis of a hole, the size of the hole, and a pin-hole assembly. The model is compatible with the ASME/ ANSI/ISO Standards for geometric tolerances. Central to the new model is a Tolerance-Map (T-Map) (Patent No. 69638242), a hypothetical volume of points that models the 3D variations in location and orientation for a segment of a line (the axis), which can arise from tolerances on size, position, orientation, and form. Here, it is extended to model the increases in yield that occur when maximum material condition (MMC) is specified and when tolerances are assigned statistically rather than on a worst-case basis; the statistical method includes the specification of both size and position tolerances on a feature. The frequency distribution of 1D clearance is decomposed into manufacturing bias, i.e., toward certain regions of a Tolerance-Map, and into a geometric bias that can be computed from the geometry of multidimensional T-Maps. Although the probabilistic representation in this paper is built from geometric bias, and it is presumed that manufacturing bias is uniform, the method is robust enough to include manufacturing bias in the future. Geometric bias alone shows a greater likelihood of small clearances than large clearances between an assembled pin and hole. A comparison is made between the effects of choosing the optional material condition MMC and not choosing it with the tolerances that determine the allowable variations in position.


1978 ◽  
Vol 35 (2) ◽  
pp. 184-189 ◽  
Author(s):  
S. J. Westrheim ◽  
W. E. Ricker

Consider two representative samples of fish taken in different years from the same fish population, this being a population in which year-class strength varies. For the "parental" sample the length and age of the fish are determined and are used to construct an "age–length key," the fractions of the fish in each (short) length interval that are of each age. For the "filial" sample only the length is measured, and the parental age–length key is used to compute the corresponding age distribution. Trials show that the age–length key will reproduce the age-frequency distribution of the filial sample without systematic bias only if there is no overlap in length between successive ages. Where there is much overlap, the age–length key will compute from the filial length-frequency distribution approximately the parental age distribution. Additional bias arises if the rate of growth if a year-class is affected by its abundance, or if the survival rate in the population changes. The length of the fish present in any given part of a population's range can vary with environmental factors such as depth of the water; nevertheless, a sample taken in any part of that range can be used to compute age from the length distribution of a sample taken at the same time in any other part of the range, without systematic bias. But this of course is not likely to be true of samples taken from different populations of the species. Key words: age–length key, bias, Pacific ocean perch, Sebastes alutus


Parasitology ◽  
1990 ◽  
Vol 101 (3) ◽  
pp. 429-434 ◽  
Author(s):  
P. K. Das ◽  
A. Manoharan ◽  
A. Srividya ◽  
B. T. Grenfell ◽  
D. A. P. Bundy ◽  
...  

SUMMARYThis paper examines the effects of host age and sex on the frequency distribution of Wuchereria bancrofti infections in the human host. Microfilarial counts from a large data base on the epidemiology of bancroftian filariasis in Pondicherry, South India are analysed. Frequency distributions of microfilarial counts divided by age are successfully described by zero-truncated negative binomial distributions, fitted by maximum likelihood. Parameter estimates from the fits indicate a significant trend of decreasing overdispersion with age in the distributions above age 10; this pattern provides indirect evidence for the operation of density-dependent constraints on microfilarial intensity. The analysis also provides estimates of the proportion of mf-positive individuals who are identified as negative due to sampling errors (around 5% of the total negatives). This allows the construction of corrected mf age–prevalence curves, which indicate that the observed prevalence may underestimate the true figures by between 25% and 100%. The age distribution of mf-negative individuals in the population is discussed in terms of current hypotheses about the interaction between disease and infection.


Author(s):  
Rajneesh K. Gaur

The space-group frequency distributions for two types of proteins and their complexes are explored. Based on the incremental availability of data in the Protein Data Bank, an analytical assessment shows a preferential distribution of three space groups, i.e. P212121 > P1211 > C121, in soluble and membrane proteins as well as in their complexes. In membrane proteins, the order of the three space groups is P212121 > C121 > P1211. The distribution of these space groups also shows the same pattern whether a protein crystallizes with a monomer or an oligomer in the asymmetric unit. The results also indicate that the sizes of the two entities in the structures of soluble proteins crystallized as complexes do not influence the frequency distribution of space groups. In general, it can be concluded that the space-group frequency distribution is homogenous across different types of proteins and their complexes.


2014 ◽  
Vol 18 (11) ◽  
pp. 4381-4389 ◽  
Author(s):  
J. L. Salinas ◽  
A. Castellarin ◽  
A. Viglione ◽  
S. Kohnová ◽  
T. R. Kjeldsen

Abstract. This study addresses the question of the existence of a parent flood frequency distribution on a European scale. A new database of L-moment ratios of flood annual maximum series (AMS) from 4105 catchments was compiled by joining 13 national data sets. Simple exploration of the database presents the generalized extreme value (GEV) distribution as a potential pan-European flood frequency distribution, being the three-parameter statistical model that with the closest resemblance to the estimated average of the sample L-moment ratios. Additional Monte Carlo simulations show that the variability in terms of sample skewness and kurtosis present in the data is larger than in a hypothetical scenario where all the samples were drawn from a GEV model. Overall, the generalized extreme value distribution fails to represent the kurtosis dispersion, especially for the longer sample lengths and medium to high skewness values, and therefore may be rejected in a statistical hypothesis testing framework as a single pan-European parent distribution for annual flood maxima. The results presented in this paper suggest that one single statistical model may not be able to fit the entire variety of flood processes present at a European scale, and presents an opportunity to further investigate the catchment and climatic factors controlling European flood regimes and their effects on the underlying flood frequency distributions.


1976 ◽  
Vol 31 ◽  
pp. 227-231
Author(s):  
D. A. Morrison ◽  
E. Zinner

AbstractCrater size frequency distributions vary to a degree which probably cannot be explained by variations in lunar surface orientation of the crater detectors or changes in micrometeoroid flux. Questions of sample representativity suggest that high ratios of small to large craters of micrometeoroids (e.g., a million 1.0 micron craters for each 500 micron crater) should be the most reliable. We obtain a flux for particles producing 0.1 micron diameter craters of approximately 300 per cm2 per steradian per year. We observe no anisotropy in the submicron particle flux between the plane of the ecliptic and the normal in the direction of lunar north. No change in flux over a 106 year period is indicated by our data.


1983 ◽  
Vol 76 (11) ◽  
pp. 928-932 ◽  
Author(s):  
M J Gleeson ◽  
A J Fourcin

A study was undertaken to analyse the effect of short-term intubation on the voice. Children were examined laryngographically both pre- and postoperatively. Changes in larynx frequency distribution following intubation were documented using the technique of electrolaryngography; the resolution of these changes was similarly recorded. The results, in comparison with the frequency distributions associated with other disease states, give insight into the nature of the damage and its effect on vocal fold vibratory patterns. The technique therefore enables objective evidence of minor degrees of laryngeal trauma to be demonstrated and differentiated.


1997 ◽  
Vol 36 (8-9) ◽  
pp. 51-56
Author(s):  
F. Calomino ◽  
P. Veltri ◽  
P. Piro ◽  
J. Niemczynowicz

In Urban Hydrology, a basic question is whether or not the common methods involving the use of design storms bring to the the some results obtained by those methods that make use of real storms. In general, one can say that different design storms give good results when used with the appropriate model, or, conversely, that good results can be achieved through careful model calibration. On the basis of 51 rainfall-runoff recordings obtained from the experimental catchment of Luzzi (Cosenza, Italy), the frequency distribution of the observed peak discharges was initially computed. Then the runoff events were simulated using Wallrus, a well known simulation model, taking as input the observed precipitations. The frequency distribution of the simulated peak discharges was compared to that of the observed ones, with the aim of calibrating the model on a statistical basis. After that, the rainfall events were analysed, obtaining the frequency distributions of the observed intensities over several durations and developing IDF curves of given frequencies and, then, the Chicago design storms. The plotting positions of the peak discharges simulated by this way show a good agreement with the distribution of both the observed peak discharges and the peak discharges simulated through the real storms.


1985 ◽  
Vol 248 (2) ◽  
pp. H217-H224 ◽  
Author(s):  
L. C. Maxwell ◽  
A. P. Shepherd ◽  
C. A. McMahan

Significant quantities of 9-micron microspheres (20-30%) are not trapped in the intestine following intracardiac or intra-arterial injection, but reach venous blood. Some investigators propose that the passage of 9-micron spheres measures blood flow through noncapillary connections. Because frequency distributions of intestinal capillary diameters and 9-micron spheres overlap, microspheres could simply pass through capillaries. Therefore, we developed simple probabilistic models to predict both the size distribution and the percentage of injected spheres [9 +/- 1 (SD) micron] that should appear in venous blood. Chief assumptions in models are that microsphere delivery and sphere diameter are independent and that microspheres pass through capillaries of equal or larger size. The passage predicted by the models was consistent with values in canine intestinal circulation, demonstrating that passage through capillaries [7.38 +/- 1.4 (SD) micron] adequately accounts for spheres in venous blood. Because the diameters of nominal 9-micron spheres were distributed too narrowly to show a marked sieving effect on passage through the intestinal circulation, we also injected microspheres varying from 5 to 20 micron in diameter. This mixture demonstrated a marked sieving effect. The predicted frequency distribution for microsphere diameters in venous blood agreed with the observed distribution. Our models demonstrate that the passage of 9-micron spheres through capillaries, rather than through “shunts,” adequately accounts for the appearance of spheres in venous blood and suggests that the frequency distribution of venous microspheres can provide an in vivo method for estimating the frequency distribution of intestinal capillary diameters.


2019 ◽  
Vol 4 (1) ◽  
Author(s):  
Kyubum Lee ◽  
Mindy Clyne ◽  
Wei Yu ◽  
Zhiyong Lu ◽  
Muin J. Khoury

Abstract Understanding the drivers of research on human genes is a critical component to success of translation efforts of genomics into medicine and public health. Using publicly available curated online databases we sought to identify specific genes that are featured in translational genetic research in comparison to all genomics research publications. Articles in the CDC’s Public Health Genomics and Precision Health Knowledge Base were stratified into studies that have moved beyond basic research to population and clinical epidemiologic studies (T1: clinical and population human genome epidemiology research), and studies that evaluate, implement, and assess impact of genes in clinical and public health areas (T2+: beyond bench to bedside). We examined gene counts and numbers of publications within these phases of translation in comparison to all genes from Medline. We are able to highlight those genes that are moving from basic research to clinical and public health translational research, namely in cancer and a few genetic diseases with high penetrance and clinical actionability. Identifying human genes of translational value is an important step towards determining an evidence-based trajectory of the human genome in clinical and public health practice over time.


Sign in / Sign up

Export Citation Format

Share Document