scholarly journals A Novel Approach to Mine for Genetic Markers via Comparing Class Frequency Distributions of Maximal Repeats Extracted from Tagged Whole Genomic Sequences

Author(s):  
Jing-Doo Wang
2017 ◽  
Author(s):  
Sarvesh Nikumbh ◽  
Peter Ebert ◽  
Nico Pfeifer

AbstractMost string kernels for comparison of genomic sequences are generally tied to using (absolute) positional information of the features in the individual sequences. This poses limitations when comparing variable-length sequences using such string kernels. For example, profiling chromatin interactions by 3C-based experiments results in variable-length genomic sequences (restriction fragments). Here, exact position-wise occurrence of signals in sequences may not be as important as in the scenario of analysis of the promoter sequences, that typically have a transcription start site as reference. Existing position-aware string kernels have been shown to be useful for the latter scenario.In this work, we propose a novel approach for sequence comparison that enables larger positional freedom than most of the existing approaches, can identify a possibly dispersed set of features in comparing variable-length sequences, and can handle both the aforementioned scenarios. Our approach, CoMIK, identifies not just the features useful towards classification but also their locations in the variable-length sequences, as evidenced by the results of three binary classification experiments, aided by recently introduced visualization techniques. Furthermore, we show that we are able to efficiently retrieve and interpret the weight vector for the complex setting of multiple multi-instance kernels.


2018 ◽  
Vol 201 ◽  
pp. 05002 ◽  
Author(s):  
Jing-Doo Wang

Quality control is an essential issue for manufacture, especially when the manufacture is towards intelligent manufacturing that is associated with “Internet of thing”(IOT) and “Artificial Intelligence”(AI) to speed up the rate of product line automatically nowadays. To monitor product quality automatically, it is necessary to collect and monitor the data generated by sensors, or to record parameters by machine operators, or to save the types (brands) of materials used when producing products. In this study, it is assumed that the sequences of the traceability of unqualified products are different from that of qualified ones, and these different values (or points) within the sequences result in these products qualified or unqualified. This approach extracts maximal repeats from the tagged sequences of product traceability, and meanwhile computes the class frequency distribution of these repeats, where the classes, e.g. “qualified” or “unqualified”, are derived from the tags. Instead of inspecting all of the sequences of product traceability aimlessly, quality control engineers can filter out those maximal repeats whose frequency distributions are unique to specific classes and then just check the corresponding processes of these repeats. However, from the practical point of view, it should be estimated as a big-data problem to extract these maximal repeats and meanwhile compute their corresponding class frequency distribution from a huge amount of tagged sequential data. To have this work practical, this study uses one previous work that is based on Hadoop MapReduce programming model. and has been applied for an U.S.A patent (US Patent App. 15/208,994). Therefore, it is expected to be able to handle a huge amount of sequences of product traceability. With this approach that can narrow down the range for identifying false points (processes) within product line, it is expected to improve quality control by comparing tagged sequences of product traceability in the future.


1991 ◽  
Vol 21 (12) ◽  
pp. 1703-1710 ◽  
Author(s):  
Richard P. Duncan ◽  
Glenn H. Stewart

The temporal and spatial patterns of tree establishment and stand disturbance history are often based on the interpretation of age-class frequency distributions. In particular, the presence of even-aged groups of trees is often used as compelling evidence of past disturbance. However, even-aged groups of trees may be indistinguishable in an age distribution if several different-aged patches occur, especially if their ages overlap. For two different types of forest we used spatial autocorrelation analysis to statistically test for the presence of even-aged patches in tree age data. Ordination and cluster analysis were subsequently applied to a matrix of association measures that reflected both spatial proximity and age similarity to identify even-aged groups of trees. Although the method worked well for our forests, which contained light-demanding tree species, it is likely to be less applicable to forests dominated by shade-tolerant species, because trees may be of many different ages if they were present as suppressed individuals prior to disturbance. However, in these instances the method could be usefully applied in other types of analysis, such as the distribution of growth release dates, tree-fall or fire-scar dates, and growth rates.


2021 ◽  
Vol 16 (1) ◽  
pp. 23-48
Author(s):  
Filip Nenadić ◽  
Petar Milin ◽  
Benjamin V. Tucker

Abstract A multitude of studies show the relevance of both inflectional paradigms (word form frequency distributions, i.e., inflectional entropy) and inflectional classes (whole class frequency distributions) for visual lexical processing. Their interplay has also been proven significant, measured as the difference between paradigm and class frequency distributions (relative entropy). Relative entropy effects have now been recorded in nouns, verbs, adjectives, and prepositional phrases. However, all of these studies used visual stimuli – either written words or picture-naming tasks. The goal of our study is to test whether the effects of relative entropy can also be captured in the auditory modality. Forty young native speakers of Romanian (60% female) living in Serbia as part of the Romanian ethnic minority participated in an auditory lexical decision task. Stimuli were 168 Romanian verbs from two inflectional classes. Verbs were presented in four forms: present and imperfect 1st person singular, present 3rd person plural, and imperfect 2nd person plural. The results show that relative entropy influences both response accuracy and response latency. We discuss alternative operationalizations of relative entropy and how they can help us test hypotheses about the structure of the mental lexicon.


2004 ◽  
Vol 26 (2) ◽  
pp. 237 ◽  
Author(s):  
A. C. Grice ◽  
S. D. Campbell ◽  
J. R. McKenzie ◽  
L. V. Whiteman ◽  
M. Pattison ◽  
...  

Age-class frequency distributions are valuable means of describing plant populations because they can be used to infer population history. Variables other than age are also often used to describe plant populations, either because they more accurately reflect an attribute of interest, or because it is difficult to determine age. However, interpretation of frequency distributions based on variables other than age can be problematic. We discuss these problems and illustrate them using data from six populations of the invasive rangeland shrub Parkinsonia aculeata L. We used three different measures of plant size: height, canopy diameter and stem cross-sectional area. Structures based on these measures were compared with structures based on three different estimates of above-ground biomass derived from them. For each variable, structures differed greatly between populations, and for each population, they were strongly dependent on the variable used to describe it. Population structures based on three-dimensional variables (above-ground biomass) tend to be more strongly positively skewed than those based on two-dimensional (area) measures of plant size. These in turn are more strongly positively skewed than those based on one-dimensional (height, diameter) measures. The statistical basis of this general phenomenon is discussed. The results highlight the difficulties of deriving histories and projecting futures of populations from size-class frequency distributions without accompanying knowledge of the temporal patterns of change in size variables as plants grow.


2020 ◽  
Vol 220 ◽  
pp. 106415
Author(s):  
Rotem Vainberger ◽  
Zvi Roth ◽  
Alisa Komsky-Elbaz ◽  
Dorit Kalo ◽  
Moran Gershoni

1988 ◽  
Vol 45 (5) ◽  
pp. 767-773 ◽  
Author(s):  
J. D. Pringle ◽  
R. E. Semple

Irish moss (Chondrus crispus Stackhouse) is cropped annually by dragrakes in certain southern Gulf of St. Lawrence Marine Plant Harvesting Districts (MPHD). Mean annual yield declined between 1972 and 1979, from peak years 1966–71, through a standing crop decrease. The present study compares frond size-class structure between three dragraked districts (light, moderate, and intense harvesting pressure) and two non-dragraked districts. Fronds, sampled by hand, were classified by pattern of dichotomy (branch) number; both mean frond weight and length were determined for each district. Frond size-class frequency distributions were bimodal from the moderate and intensely dragraked districts. Densities peaked at Classes 0 (no branching) and 5 or 6 (five or six branches), mean frond dry weights ranged from 0.09 g (intensely dragraked) to 0.18 g, mean frond length was between ~7 and ~10 cm, and mean frond dichotomy number was between ~3.6 and ~5.2. Frond size-class frequency distributions were unimodal from the non-dragraked and lightly dragraked districts, peaks were at Classes 6–8, mean frond dry weight per district was 0.40 g, mean frond length was between ~8 and ~12 cm, and mean frond dichotomy number was 7.5–8.2. The hypothesis that frond size-class frequency, mean frond dry weights, and mean frond dichotomy number were similar between intensely dragraked and non-dragraked beds was rejected. It was concluded that annual yield in the former beds would be substantially higher, were the mean frond size-class of the harvest increased by even one dichotomy. Frond length per class was not significantly different between intensely dragraked and non-dragraked beds. However, fronds higher than Class 6 from the former beds were heavier than those from the latter beds. These results are discussed in relation to dragrake selection pressure.


Sign in / Sign up

Export Citation Format

Share Document