scholarly journals Constructing Large-Scale Genetic Maps Using an Evolutionary Strategy Algorithm

Genetics ◽  
2003 ◽  
Vol 165 (4) ◽  
pp. 2269-2282
Author(s):  
D Mester ◽  
Y Ronin ◽  
D Minkov ◽  
E Nevo ◽  
A Korol

Abstract This article is devoted to the problem of ordering in linkage groups with many dozens or even hundreds of markers. The ordering problem belongs to the field of discrete optimization on a set of all possible orders, amounting to n!/2 for n loci; hence it is considered an NP-hard problem. Several authors attempted to employ the methods developed in the well-known traveling salesman problem (TSP) for multilocus ordering, using the assumption that for a set of linked loci the true order will be the one that minimizes the total length of the linkage group. A novel, fast, and reliable algorithm developed for the TSP and based on evolution-strategy discrete optimization was applied in this study for multilocus ordering on the basis of pairwise recombination frequencies. The quality of derived maps under various complications (dominant vs. codominant markers, marker misclassification, negative and positive interference, and missing data) was analyzed using simulated data with ∼50-400 markers. High performance of the employed algorithm allows systematic treatment of the problem of verification of the obtained multilocus orders on the basis of computing-intensive bootstrap and/or jackknife approaches for detecting and removing questionable marker scores, thereby stabilizing the resulting maps. Parallel calculation technology can easily be adopted for further acceleration of the proposed algorithm. Real data analysis (on maize chromosome 1 with 230 markers) is provided to illustrate the proposed methodology.

2020 ◽  
Vol 21 (S9) ◽  
Author(s):  
Qingyang Zhang ◽  
Thy Dao

Abstract Background Compositional data refer to the data that lie on a simplex, which are common in many scientific domains such as genomics, geology and economics. As the components in a composition must sum to one, traditional tests based on unconstrained data become inappropriate, and new statistical methods are needed to analyze this special type of data. Results In this paper, we consider a general problem of testing for the compositional difference between K populations. Motivated by microbiome and metagenomics studies, where the data are often over-dispersed and high-dimensional, we formulate a well-posed hypothesis from a Bayesian point of view and suggest a nonparametric test based on inter-point distance to evaluate statistical significance. Unlike most existing tests for compositional data, our method does not rely on any data transformation, sparsity assumption or regularity conditions on the covariance matrix, but directly analyzes the compositions. Simulated data and two real data sets on the human microbiome are used to illustrate the promise of our method. Conclusions Our simulation studies and real data applications demonstrate that the proposed test is more sensitive to the compositional difference than the mean-based method, especially when the data are over-dispersed or zero-inflated. The proposed test is easy to implement and computationally efficient, facilitating its application to large-scale datasets.


2003 ◽  
Vol 01 (01) ◽  
pp. 41-69 ◽  
Author(s):  
JING LI ◽  
TAO JIANG

We study haplotype reconstruction under the Mendelian law of inheritance and the minimum recombination principle on pedigree data. We prove that the problem of finding a minimum-recombinant haplotype configuration (MRHC) is in general NP-hard. This is the first complexity result concerning the problem to our knowledge. An iterative algorithm based on blocks of consecutive resolved marker loci (called block-extension) is proposed. It is very efficient and can be used for large pedigrees with a large number of markers, especially for those data sets requiring few recombinants (or recombination events). A polynomial-time exact algorithm for haplotype reconstruction without recombinants is also presented. This algorithm first identifies all the necessary constraints based on the Mendelian law and the zero recombinant assumption, and represents them using a system of linear equations over the cyclic group Z2. By using a simple method based on Gaussian elimination, we could obtain all possible feasible haplotype configurations. A C++ implementation of the block-extension algorithm, called PedPhase, has been tested on both simulated data and real data. The results show that the program performs very well on both types of data and will be useful for large scale haplotype inference projects.


Information ◽  
2021 ◽  
Vol 12 (5) ◽  
pp. 195
Author(s):  
Davide Andrea Guastella ◽  
Guilhem Marcillaud ◽  
Cesare Valenti

Smart cities leverage large amounts of data acquired in the urban environment in the context of decision support tools. These tools enable monitoring the environment to improve the quality of services offered to citizens. The increasing diffusion of personal Internet of things devices capable of sensing the physical environment allows for low-cost solutions to acquire a large amount of information within the urban environment. On the one hand, the use of mobile and intermittent sensors implies new scenarios of large-scale data analysis; on the other hand, it involves different challenges such as intermittent sensors and integrity of acquired data. To this effect, edge computing emerges as a methodology to distribute computation among different IoT devices to analyze data locally. We present here a new methodology for imputing environmental information during the acquisition step, due to missing or otherwise out of order sensors, by distributing the computation among a variety of fixed and mobile devices. Numerous experiments have been carried out on real data to confirm the validity of the proposed method.


2013 ◽  
Vol 712-715 ◽  
pp. 2569-2575
Author(s):  
Wen Wu Xie ◽  
Tao Ning

The problem of placing a number of specific shapes on a raw material in order to maximize material utilization is commonly encountered in the production of steel bars and plates, papers, glasses, etc. In this paper, we presented a genetic algorithm for steel grating nesting design. For application in large-scale discrete optimization problems, we also implemented this algorithm with CUDA based on parallel computation. Experimental results show that under genetic algorithm invoking with CUDA scheme, we can obtain satisfied solutions to steel grating nesting problem with high performance.


Author(s):  
WOOSEOK RYU ◽  
JOONHO KWON ◽  
BONGHEE HONG

High-performance radio-frequency identification (RFID) is a challenging issue for large-scale enterprises. As a key component of an RFID system, RFID middleware is an important factor to measure the performance of the system. To evaluate the feasibility of an RFID middleware, the performance of the RFID middleware should be carefully evaluated in various RFID-enabled business environments. However, the construction of an RFID testbed requires a lot of time, money, and human resources because it involves numerous tagged items and a large number of deployed readers. We must provide a meaningful input tag stream representing various business activities, rather than random data. This paper presents a novel simulation model for the virtual construction of RFID testbeds. To ensure the semantic validity of the input tag stream, the proposed RFID simulation network (RSN) extends Petri nets by including sets of functions that represent unique characteristics of RFID environments such as the uncertainty of communications and tag movement patterns. By configuring appropriate functions, the RSN automatically generates an input tag stream that matches the distribution of real data. We demonstrate that the RSN model correctly reflects data from real-world environments by comparing input tag streams from real RFID equipment and from the RSN model.


2016 ◽  
Vol 28 (4) ◽  
pp. 576-635 ◽  
Author(s):  
JUNCHENG WEI ◽  
MATTHIAS WINTER

We consider the Gierer–Meinhardt system with precursor inhomogeneity and two small diffusivities in an interval $$\begin{equation*} \left\{ \begin{array}{ll} A_t=\epsilon^2 A''- \mu(x) A+\frac{A^2}{H}, &x\in(-1, 1),\,t>0,\\[3mm] \tau H_t=D H'' -H+ A^2, & x\in (-1, 1),\,t>0,\\[3mm] A' (-1)= A' (1)= H' (-1) = H' (1) =0, \end{array} \right. \end{equation*}$$$$\begin{equation*}\mbox{where } \quad 0<\epsilon \ll\sqrt{D}\ll 1, \quad \end{equation*}$$$$\begin{equation*} \tau\geq 0 \mbox{ and $\tau$ is independent of $\epsilon$. } \end{equation*}$$ A spike cluster is the combination of several spikes which all approach the same point in the singular limit. We rigorously prove the existence of a steady-state spike cluster consisting of N spikes near a non-degenerate local minimum point t0 of the smooth positive inhomogeneity μ(x), i.e. we assume that μ′(t0) = 0, μ″(t0) > 0 and we have μ(t0) > 0. Here, N is an arbitrary positive integer. Further, we show that this solution is linearly stable. We explicitly compute all eigenvalues, both large (of order O(1)) and small (of order o(1)). The main features of studying the Gierer–Meinhardt system in this setting are as follows: (i) it is biologically relevant since it models a hierarchical process (pattern formation of small-scale structures induced by a pre-existing large-scale inhomogeneity); (ii) it contains three different spatial scales two of which are small: the O(1) scale of the precursor inhomogeneity μ(x), the $O(\sqrt{D})$ scale of the inhibitor diffusivity and the O(ε) scale of the activator diffusivity; (iii) the expressions can be made explicit and often have a particularly simple form.


PeerJ ◽  
2018 ◽  
Vol 6 ◽  
pp. e4234 ◽  
Author(s):  
T. Jeffrey Cole ◽  
Michael S. Brewer

Background The recent proliferation of large amounts of biodiversity transcriptomic data has resulted in an ever-expanding need for scalable and user-friendly tools capable of answering large scale molecular evolution questions. FUSTr identifies gene families involved in the process of adaptation. This is a tool that finds genes in transcriptomic datasets under strong positive selection that automatically detects isoform designation patterns in transcriptome assemblies to maximize phylogenetic independence in downstream analysis. Results When applied to previously studied spider transcriptomic data as well as simulated data, FUSTr successfully grouped coding sequences into proper gene families as well as correctly identified those under strong positive selection in relatively little time. Conclusions FUSTr provides a useful tool for novice bioinformaticians to characterize the molecular evolution of organisms throughout the tree of life using large transcriptomic biodiversity datasets and can utilize multi-processor high-performance computational facilities.


2021 ◽  
Author(s):  
Kieran Elmes ◽  
Astra Heywood ◽  
Zhiyi Huang ◽  
Alex Gavryushkin

Large-scale genotype-phenotype screens provide a wealth of data for identifying molecular alterations associated with a phenotype. Epistatic effects play an important role in such association studies. For example, siRNA perturbation screens can be used to identify combinatorial gene-silencing effects. In bacteria, epistasis has practical consequences in determining antimicrobial resistance as the genetic background of a strain plays an important role in determining resistance. Recently developed tools scale to human exome-wide screens for pairwise interactions, but none to date have included the possibility of three-way interactions. Expanding upon recent state-of-the art methods, we make a number of improvements to the performance on large-scale data, making consideration of three-way interactions possible. We demonstrate our proposed method, Pint, on both simulated and real data sets, including antibiotic resistance testing and siRNA perturbation screens. Pint outperforms known methods in simulated data, and identifies a number of biologically plausible gene effects in both the antibiotic and siRNA models. For example, we have identified a combination of known tumor suppressor genes that is predicted (using Pint) to cause a significant increase in cell proliferation.


Author(s):  
C.K. Wu ◽  
P. Chang ◽  
N. Godinho

Recently, the use of refractory metal silicides as low resistivity, high temperature and high oxidation resistance gate materials in large scale integrated circuits (LSI) has become an important approach in advanced MOS process development (1). This research is a systematic study on the structure and properties of molybdenum silicide thin film and its applicability to high performance LSI fabrication.


Author(s):  
Olga V. Khavanova ◽  

The second half of the eighteenth century in the lands under the sceptre of the House of Austria was a period of development of a language policy addressing the ethno-linguistic diversity of the monarchy’s subjects. On the one hand, the sphere of use of the German language was becoming wider, embracing more and more segments of administration, education, and culture. On the other hand, the authorities were perfectly aware of the fact that communication in the languages and vernaculars of the nationalities living in the Austrian Monarchy was one of the principal instruments of spreading decrees and announcements from the central and local authorities to the less-educated strata of the population. Consequently, a large-scale reform of primary education was launched, aimed at making the whole population literate, regardless of social status, nationality (mother tongue), or confession. In parallel with the centrally coordinated state policy of education and language-use, subjects-both language experts and amateur polyglots-joined the process of writing grammar books, which were intended to ease communication between the different nationalities of the Habsburg lands. This article considers some examples of such editions with primary attention given to the correlation between private initiative and governmental policies, mechanisms of verifying the textbooks to be published, their content, and their potential readers. This paper demonstrates that for grammar-book authors, it was very important to be integrated into the patronage networks at the court and in administrative bodies and stresses that the Vienna court controlled the process of selection and financing of grammar books to be published depending on their quality and ability to satisfy the aims and goals of state policy.


Sign in / Sign up

Export Citation Format

Share Document