input alignment
Recently Published Documents


TOTAL DOCUMENTS

5
(FIVE YEARS 1)

H-INDEX

2
(FIVE YEARS 1)

2019 ◽  
Vol 36 (11) ◽  
pp. 2604-2619 ◽  
Author(s):  
Elodie Laine ◽  
Yasaman Karami ◽  
Alessandra Carbone

Abstract The systematic and accurate description of protein mutational landscapes is a question of utmost importance in biology, bioengineering, and medicine. Recent progress has been achieved by leveraging on the increasing wealth of genomic data and by modeling intersite dependencies within biological sequences. However, state-of-the-art methods remain time consuming. Here, we present Global Epistatic Model for predicting Mutational Effects (GEMME) (www.lcqb.upmc.fr/GEMME), an original and fast method that predicts mutational outcomes by explicitly modeling the evolutionary history of natural sequences. This allows accounting for all positions in a sequence when estimating the effect of a given mutation. GEMME uses only a few biologically meaningful and interpretable parameters. Assessed against 50 high- and low-throughput mutational experiments, it overall performs similarly or better than existing methods. It accurately predicts the mutational landscapes of a wide range of protein families, including viral ones and, more generally, of much conserved families. Given an input alignment, it generates the full mutational landscape of a protein in a matter of minutes. It is freely available as a package and a webserver at www.lcqb.upmc.fr/GEMME/.


2018 ◽  
Author(s):  
John M. Gaspar

The computational analyses of genome-enrichment assays, such as ChIP-seq and ATAC-seq, are typically concluded with a peak-calling program that identifies genomic regions that are significantly enriched. The most popular peak-caller, MACS2, assumes that the input alignment files are for single-end sequence reads by default, yet those with paired-end Illumina sequence data frequently use this default setting. This leads to erroneous coverage values and suboptimal peak identification. However, using the correct paired-end mode can introduce another set of artifacts. After thoroughly reviewing the MACS2 source code, we have modified it to limit these and other problems. Our updated version is freely available (https://github.com/jsh58/MACS).


2016 ◽  
Vol 38 (4) ◽  
pp. 479-495
Author(s):  
Ilona Cserháti ◽  
Tibor Keresztély ◽  
Tibor Takács

Effective decision making uses various databases including both micro and macro level datasets. In many cases it is a big challenge to ensure the consistency of the two levels. Different types of problems can occur and several methods can be used to solve them. The paper concentrates on the input alignment of the households’ income for microsimulation, which means refers to improving the elements of a micro data survey (EU-SILC) by using macro data from administrative sources. We use a combined micro-macro model called ECONS-TAX for this improvement. We also produced model projections until 2015 which is important because the official EU-SILC micro database will only be available in Hungary in the summer of 2017. The paper presents our estimations about the dynamics of income elements and the changes in income inequalities. Results show that the aligned data provides a different level of income inequality, but does not affect the direction of change from year to year. However, when we analyzed policy change, the use of aligned data caused larger differences both in income levels and in their dynamics.


2016 ◽  
Author(s):  
Jens Kleinjung ◽  
Ton C.C. Coolen

ABSTRACTSummaryThe Mutual Information of pairs of data vectors, for example sequence alignment positions or gene expression profiles, is a quantitative measure of the interdependence between the data. However, data vectors based on a finite number of samples retain non-zero Mutual Information values even for completely random data, which is referred to as background or residual Mutual Information. Estimates of the residual Mutual Information have so far been obtained through heuristic or numerical approximations. Here we introduce a simple analytical formula for the computation of the residual Mutual Information that yields precise values and does not require the joint probabilities between the vector elements as input.Availability and ImplementationA C program arMI is available at http://mathbio.crick.ac.uk/wiki/Software#arMI. Using an input alignment in FASTA format or alternatively an internally created random alignment of specified length and depth, the program computes three types of Mutual information: (i) Shannon’s Mutual Information between all pairs of alignment columns; (ii) the numerical residual Mutual Information by using the same formula on the randomised (shuffled) data; (iii) the analytical residual Mutual Information introduced here. The package depends on the GNU Scientific Library, which is used for vector and matrix operations, factorial expressions and random number generation (Galassi et al., 2009). Reference alignments and result data are included in the program package in the folder ‘tests’. The R environment was used for statistics and plotting (R Core Team, 2014)[email protected] MaterialA detailed derivation of the analytical formula is given in the Supplementary Material.


2014 ◽  
Vol 12 (02) ◽  
pp. 1441004
Author(s):  
Mikhail Krivozubov ◽  
Florian Goebels ◽  
Sergei Spirin

Reconstruction of phylogeny of a protein family from a sequence alignment can produce results of different quality. Our goal is to predict the quality of phylogeny reconstruction basing on features that can be extracted from the input alignment. We used Fitch–Margoliash (FM) method of phylogeny reconstruction and random forest as a predictor. For training and testing the predictor, alignments of orthologous series (OS) were used, for which the result of phylogeny reconstruction can be evaluated by comparison with trees of corresponding organisms. Our results show that the quality of phylogeny reconstruction can be predicted with more than 80% precision. Also, we tried to predict which phylogeny reconstruction method, FM or UPGMA, is better for a particular alignment. With the used set of features, among alignments for which the obtained predictor predicts a better performance of UPGMA, 56% really give a better result with UPGMA. Taking into account that in our testing set only for 34% alignments UPGMA performs better, this result shows a principal possibility to predict the better phylogeny reconstruction method basing on features of a sequence alignment.


Sign in / Sign up

Export Citation Format

Share Document