Constructing Large-Scale Genetic Maps Using an Evolutionary Strategy Algorithm

Abstract This article is devoted to the problem of ordering in linkage groups with many dozens or even hundreds of markers. The ordering problem belongs to the field of discrete optimization on a set of all possible orders, amounting to n!/2 for n loci; hence it is considered an NP-hard problem. Several authors attempted to employ the methods developed in the well-known traveling salesman problem (TSP) for multilocus ordering, using the assumption that for a set of linked loci the true order will be the one that minimizes the total length of the linkage group. A novel, fast, and reliable algorithm developed for the TSP and based on evolution-strategy discrete optimization was applied in this study for multilocus ordering on the basis of pairwise recombination frequencies. The quality of derived maps under various complications (dominant vs. codominant markers, marker misclassification, negative and positive interference, and missing data) was analyzed using simulated data with ∼50-400 markers. High performance of the employed algorithm allows systematic treatment of the problem of verification of the obtained multilocus orders on the basis of computing-intensive bootstrap and/or jackknife approaches for detecting and removing questionable marker scores, thereby stabilizing the resulting maps. Parallel calculation technology can easily be adopted for further acceleration of the proposed algorithm. Real data analysis (on maize chromosome 1 with 230 markers) is provided to illustrate the proposed methodology.

Download Full-text

A distance based multisample test for high-dimensional compositional data with applications to the human microbiome

BMC Bioinformatics ◽

10.1186/s12859-020-3530-x ◽

2020 ◽

Vol 21 (S9) ◽

Author(s):

Qingyang Zhang ◽

Thy Dao

Keyword(s):

Large Scale ◽

Compositional Data ◽

Statistical Significance ◽

Human Microbiome ◽

Simulated Data ◽

Real Data ◽

Nonparametric Test ◽

High Dimensional ◽

Regularity Conditions ◽

Compositional Difference

Abstract Background Compositional data refer to the data that lie on a simplex, which are common in many scientific domains such as genomics, geology and economics. As the components in a composition must sum to one, traditional tests based on unconstrained data become inappropriate, and new statistical methods are needed to analyze this special type of data. Results In this paper, we consider a general problem of testing for the compositional difference between K populations. Motivated by microbiome and metagenomics studies, where the data are often over-dispersed and high-dimensional, we formulate a well-posed hypothesis from a Bayesian point of view and suggest a nonparametric test based on inter-point distance to evaluate statistical significance. Unlike most existing tests for compositional data, our method does not rely on any data transformation, sparsity assumption or regularity conditions on the covariance matrix, but directly analyzes the compositions. Simulated data and two real data sets on the human microbiome are used to illustrate the promise of our method. Conclusions Our simulation studies and real data applications demonstrate that the proposed test is more sensitive to the compositional difference than the mean-based method, especially when the data are over-dispersed or zero-inflated. The proposed test is easy to implement and computationally efficient, facilitating its application to large-scale datasets.

Download Full-text

EFFICIENT INFERENCE OF HAPLOTYPES FROM GENOTYPES ON A PEDIGREE

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720003000204 ◽

2003 ◽

Vol 01 (01) ◽

pp. 41-69 ◽

Cited By ~ 54

Author(s):

JING LI ◽

TAO JIANG

Keyword(s):

Large Scale ◽

Gaussian Elimination ◽

Linear Equations ◽

Simulated Data ◽

Exact Algorithm ◽

Real Data ◽

Haplotype Reconstruction ◽

Pedigree Data ◽

Simple Method ◽

Complexity Result

We study haplotype reconstruction under the Mendelian law of inheritance and the minimum recombination principle on pedigree data. We prove that the problem of finding a minimum-recombinant haplotype configuration (MRHC) is in general NP-hard. This is the first complexity result concerning the problem to our knowledge. An iterative algorithm based on blocks of consecutive resolved marker loci (called block-extension) is proposed. It is very efficient and can be used for large pedigrees with a large number of markers, especially for those data sets requiring few recombinants (or recombination events). A polynomial-time exact algorithm for haplotype reconstruction without recombinants is also presented. This algorithm first identifies all the necessary constraints based on the Mendelian law and the zero recombinant assumption, and represents them using a system of linear equations over the cyclic group Z2. By using a simple method based on Gaussian elimination, we could obtain all possible feasible haplotype configurations. A C++ implementation of the block-extension algorithm, called PedPhase, has been tested on both simulated data and real data. The results show that the program performs very well on both types of data and will be useful for large scale haplotype inference projects.

Download Full-text

Edge-Based Missing Data Imputation in Large-Scale Environments

Information ◽

10.3390/info12050195 ◽

2021 ◽

Vol 12 (5) ◽

pp. 195

Author(s):

Davide Andrea Guastella ◽

Guilhem Marcillaud ◽

Cesare Valenti

Keyword(s):

Urban Environment ◽

Large Scale ◽

Smart Cities ◽

Low Cost ◽

Real Data ◽

Missing Data Imputation ◽

Large Scale Data ◽

Iot Devices ◽

Edge Based ◽

The One

Smart cities leverage large amounts of data acquired in the urban environment in the context of decision support tools. These tools enable monitoring the environment to improve the quality of services offered to citizens. The increasing diffusion of personal Internet of things devices capable of sensing the physical environment allows for low-cost solutions to acquire a large amount of information within the urban environment. On the one hand, the use of mobile and intermittent sensors implies new scenarios of large-scale data analysis; on the other hand, it involves different challenges such as intermittent sensors and integrity of acquired data. To this effect, edge computing emerges as a methodology to distribute computation among different IoT devices to analyze data locally. We present here a new methodology for imputing environmental information during the acquisition step, due to missing or otherwise out of order sensors, by distributing the computation among a variety of fixed and mobile devices. Numerous experiments have been carried out on real data to confirm the validity of the proposed method.

Download Full-text

Genetic Algorithm for Panel Cutting Stock on CUDA Platform

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.712-715.2569 ◽

2013 ◽

Vol 712-715 ◽

pp. 2569-2575

Author(s):

Wen Wu Xie ◽

Tao Ning

Keyword(s):

Genetic Algorithm ◽

Discrete Optimization ◽

High Performance ◽

Large Scale ◽

Optimization Problems ◽

Raw Material ◽

Cutting Stock ◽

Steel Bars ◽

Material Utilization ◽

Discrete Optimization Problems

The problem of placing a number of specific shapes on a raw material in order to maximize material utilization is commonly encountered in the production of steel bars and plates, papers, glasses, etc. In this paper, we presented a genetic algorithm for steel grating nesting design. For application in large-scale discrete optimization problems, we also implemented this algorithm with CUDA based on parallel computation. Experimental results show that under genetic algorithm invoking with CUDA scheme, we can obtain satisfied solutions to steel grating nesting problem with high performance.

Download Full-text

A SIMULATION NETWORK MODEL TO EVALUATE RFID MIDDLEWARES

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194011005517 ◽

2011 ◽

Vol 21 (06) ◽

pp. 779-801 ◽

Cited By ~ 8

Author(s):

WOOSEOK RYU ◽

JOONHO KWON ◽

BONGHEE HONG

Keyword(s):

Human Resources ◽

Radio Frequency Identification ◽

High Performance ◽

Large Scale ◽

Real Data ◽

Virtual Construction ◽

Business Environments ◽

Frequency Identification ◽

Rfid Middleware ◽

Semantic Validity

High-performance radio-frequency identification (RFID) is a challenging issue for large-scale enterprises. As a key component of an RFID system, RFID middleware is an important factor to measure the performance of the system. To evaluate the feasibility of an RFID middleware, the performance of the RFID middleware should be carefully evaluated in various RFID-enabled business environments. However, the construction of an RFID testbed requires a lot of time, money, and human resources because it involves numerous tagged items and a large number of deployed readers. We must provide a meaningful input tag stream representing various business activities, rather than random data. This paper presents a novel simulation model for the virtual construction of RFID testbeds. To ensure the semantic validity of the input tag stream, the proposed RFID simulation network (RSN) extends Petri nets by including sets of functions that represent unique characteristics of RFID environments such as the uncertainty of communications and tag movement patterns. By configuring appropriate functions, the RSN automatically generates an input tag stream that matches the distribution of real data. We demonstrate that the RSN model correctly reflects data from real-world environments by comparing input tag streams from real RFID equipment and from the RSN model.

Download Full-text

Stable spike clusters for the one-dimensional Gierer–Meinhardt system

European Journal of Applied Mathematics ◽

10.1017/s0956792516000450 ◽

2016 ◽

Vol 28 (4) ◽

pp. 576-635 ◽

Cited By ~ 10

Author(s):

JUNCHENG WEI ◽

MATTHIAS WINTER

Keyword(s):

Large Scale ◽

Spatial Scales ◽

Small Scale ◽

Biologically Relevant ◽

Formula Group ◽

Group A ◽

Scale Inhomogeneity ◽

The One ◽

Image Position ◽

Small Scale Structures

We consider the Gierer–Meinhardt system with precursor inhomogeneity and two small diffusivities in an interval $$\begin{equation*} \left\{ \begin{array}{ll} A_t=\epsilon^2 A''- \mu(x) A+\frac{A^2}{H}, &x\in(-1, 1),\,t>0,\\[3mm] \tau H_t=D H'' -H+ A^2, & x\in (-1, 1),\,t>0,\\[3mm] A' (-1)= A' (1)= H' (-1) = H' (1) =0, \end{array} \right. \end{equation*}$$$$\begin{equation*}\mbox{where } \quad 0<\epsilon \ll\sqrt{D}\ll 1, \quad \end{equation*}$$$$\begin{equation*} \tau\geq 0 \mbox{ and $\tau$ is independent of $\epsilon$. } \end{equation*}$$ A spike cluster is the combination of several spikes which all approach the same point in the singular limit. We rigorously prove the existence of a steady-state spike cluster consisting of N spikes near a non-degenerate local minimum point t0 of the smooth positive inhomogeneity μ(x), i.e. we assume that μ′(t0) = 0, μ″(t0) > 0 and we have μ(t0) > 0. Here, N is an arbitrary positive integer. Further, we show that this solution is linearly stable. We explicitly compute all eigenvalues, both large (of order O(1)) and small (of order o(1)). The main features of studying the Gierer–Meinhardt system in this setting are as follows: (i) it is biologically relevant since it models a hierarchical process (pattern formation of small-scale structures induced by a pre-existing large-scale inhomogeneity); (ii) it contains three different spatial scales two of which are small: the O(1) scale of the precursor inhomogeneity μ(x), the $O(\sqrt{D})$ scale of the inhibitor diffusivity and the O(ε) scale of the activator diffusivity; (iii) the expressions can be made explicit and often have a particularly simple form.

Download Full-text

FUSTr: a tool to find gene families under selection in transcriptomes

PeerJ ◽

10.7717/peerj.4234 ◽

2018 ◽

Vol 6 ◽

pp. e4234 ◽

Cited By ~ 6

Author(s):

T. Jeffrey Cole ◽

Michael S. Brewer

Keyword(s):

Molecular Evolution ◽

Positive Selection ◽

High Performance ◽

Large Scale ◽

Simulated Data ◽

Gene Families ◽

Strong Positive Selection ◽

Transcriptomic Data ◽

Downstream Analysis ◽

User Friendly

Background The recent proliferation of large amounts of biodiversity transcriptomic data has resulted in an ever-expanding need for scalable and user-friendly tools capable of answering large scale molecular evolution questions. FUSTr identifies gene families involved in the process of adaptation. This is a tool that finds genes in transcriptomic datasets under strong positive selection that automatically detects isoform designation patterns in transcriptome assemblies to maximize phylogenetic independence in downstream analysis. Results When applied to previously studied spider transcriptomic data as well as simulated data, FUSTr successfully grouped coding sequences into proper gene families as well as correctly identified those under strong positive selection in relatively little time. Conclusions FUSTr provides a useful tool for novice bioinformaticians to characterize the molecular evolution of organisms throughout the tree of life using large transcriptomic biodiversity datasets and can utilize multi-processor high-performance computational facilities.

Download Full-text

A Fast Lasso-Based Method for Inferring Higher-Order Interactions

10.1101/2021.12.13.471844 ◽

2021 ◽

Author(s):

Kieran Elmes ◽

Astra Heywood ◽

Zhiyi Huang ◽

Alex Gavryushkin

Keyword(s):

Large Scale ◽

Association Studies ◽

Simulated Data ◽

Real Data ◽

Resistance Testing ◽

Data Sets ◽

Gene Effects ◽

Molecular Alterations ◽

Large Scale Data ◽

Pairwise Interactions

Large-scale genotype-phenotype screens provide a wealth of data for identifying molecular alterations associated with a phenotype. Epistatic effects play an important role in such association studies. For example, siRNA perturbation screens can be used to identify combinatorial gene-silencing effects. In bacteria, epistasis has practical consequences in determining antimicrobial resistance as the genetic background of a strain plays an important role in determining resistance. Recently developed tools scale to human exome-wide screens for pairwise interactions, but none to date have included the possibility of three-way interactions. Expanding upon recent state-of-the art methods, we make a number of improvements to the performance on large-scale data, making consideration of three-way interactions possible. We demonstrate our proposed method, Pint, on both simulated and real data sets, including antibiotic resistance testing and siRNA perturbation screens. Pint outperforms known methods in simulated data, and identifies a number of biologically plausible gene effects in both the antibiotic and siRNA models. For example, we have identified a combination of known tumor suppressor genes that is predicted (using Pint) to cause a significant increase in cell proliferation.

Download Full-text

The Structure and Properties of MoSi2 Thin Film in Mos Process

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s1431927600001379 ◽

1980 ◽

Vol 38 ◽

pp. 326-327

Author(s):

C.K. Wu ◽

P. Chang ◽

N. Godinho

Keyword(s):

Thin Film ◽

Integrated Circuits ◽

High Performance ◽

Large Scale ◽

Process Development ◽

Structure And Properties ◽

Metal Silicides ◽

High Oxidation ◽

Important Approach ◽

High Oxidation Resistance

Recently, the use of refractory metal silicides as low resistivity, high temperature and high oxidation resistance gate materials in large scale integrated circuits (LSI) has become an important approach in advanced MOS process development (1). This research is a systematic study on the structure and properties of molybdenum silicide thin film and its applicability to high performance LSI fabrication.

Download Full-text

First grammar books in the Habsburg Monarchy: individual initiative and regulatory interference by the state (1760s–1770s)

A day in the calendar. Celebrations and memorial days as an instrument of national consolidation in Central, Eastern and South-Eastern Europe from the nineteenth to the twenty-first century - Central-European Studies ◽

10.31168/2619-0877.2019.2.6 ◽

2020 ◽

Vol 2019 (2 (11)) ◽

pp. 137-157

Author(s):

Olga V. Khavanova ◽

Keyword(s):

Eighteenth Century ◽

State Policy ◽

Large Scale ◽

Language Use ◽

Linguistic Diversity ◽

Mother Tongue ◽

German Language ◽

Habsburg Monarchy ◽

Private Initiative ◽

The One

The second half of the eighteenth century in the lands under the sceptre of the House of Austria was a period of development of a language policy addressing the ethno-linguistic diversity of the monarchy’s subjects. On the one hand, the sphere of use of the German language was becoming wider, embracing more and more segments of administration, education, and culture. On the other hand, the authorities were perfectly aware of the fact that communication in the languages and vernaculars of the nationalities living in the Austrian Monarchy was one of the principal instruments of spreading decrees and announcements from the central and local authorities to the less-educated strata of the population. Consequently, a large-scale reform of primary education was launched, aimed at making the whole population literate, regardless of social status, nationality (mother tongue), or confession. In parallel with the centrally coordinated state policy of education and language-use, subjects-both language experts and amateur polyglots-joined the process of writing grammar books, which were intended to ease communication between the different nationalities of the Habsburg lands. This article considers some examples of such editions with primary attention given to the correlation between private initiative and governmental policies, mechanisms of verifying the textbooks to be published, their content, and their potential readers. This paper demonstrates that for grammar-book authors, it was very important to be integrated into the patronage networks at the court and in administrative bodies and stresses that the Vienna court controlled the process of selection and financing of grammar books to be published depending on their quality and ability to satisfy the aims and goals of state policy.

Download Full-text