SADI: Sequence Analysis Tools for Stata

The SADI package provides tools for sequence analysis, which focuses on the similarity and dissimilarity between categorical time series such as life-course trajectories. SADI‘s main components are tools to calculate intersequence distances using several different algorithms, including the optimal matching algorithm, but it also includes utilities to graph, summarize, and manage sequence data. It provides similar functionality to the R package TraMineR and the Stata package SQ but is substantially faster than the latter.

Download Full-text

Sequence Analysis in Demographic Research

Canadian Studies in Population ◽

10.25336/p6g30c ◽

2001 ◽

Vol 28 (2) ◽

pp. 439 ◽

Cited By ~ 46

Author(s):

Francesco C. Billari

Keyword(s):

Sequence Analysis ◽

Data Collection ◽

Life Course ◽

Synthetic Data ◽

Optimal Matching ◽

New Approach ◽

Demographic Research ◽

Holistic Perspective ◽

Salient Features ◽

Life Course Analysis

This paper examines the salient features of sequence analysis in demographic research. The new approach allows a holistic perspective on life course analysis, and is based on a representation of lives as sequences of states. Some of the methods for analysing such data are sketched, from complex description to optimal matching to monothetic divisive algorithms. After a short illustration of a demographically relevant example, the needs in terms of data collection and the opportunities of applying the same approach to synthetic data are discussed.

Download Full-text

Complex Process of Post-Modernity of Life Course and Dual Process of Individualization and Familization: Focused on the Sequence Analysis of Transition to Adulthood of 1955-1974 Birth Cohort

Korean Journal of Sociology ◽

10.21562/kjs.2014.04.48.2.67 ◽

2014 ◽

Vol 48 (2) ◽

pp. 67 ◽

Cited By ~ 1

Author(s):

Soon-Mi Lee

Keyword(s):

Sequence Analysis ◽

Life Course ◽

Birth Cohort ◽

Transition To Adulthood ◽

Dual Process ◽

Complex Process

Download Full-text

A New Analysis Tool for Continuous Glucose Monitor Data

Journal of Diabetes Science and Technology ◽

10.1177/19322968211028909 ◽

2021 ◽

pp. 193229682110289

Author(s):

Evan Olawsky ◽

Yuan Zhang ◽

Lynn E Eberly ◽

Erika S Helgeson ◽

Lisa S Chow

Keyword(s):

Glucose Monitoring ◽

Glycemic Variability ◽

R Package ◽

Analysis Tool ◽

Continuous Glucose Monitor ◽

Analysis Tools ◽

R Shiny ◽

Primary Driver ◽

Rich Information ◽

Web App

Background: With the development of continuous glucose monitoring systems (CGMS), detailed glycemic data are now available for analysis. Yet analysis of this data-rich information can be formidable. The power of CGMS-derived data lies in its characterization of glycemic variability. In contrast, many standard glycemic measures like hemoglobin A1c (HbA1c) and self-monitored blood glucose inadequately describe glycemic variability and run the risk of bias toward overreporting hyperglycemia. Methods that adjust for this bias are often overlooked in clinical research due to difficulty of computation and lack of accessible analysis tools. Methods: In response, we have developed a new R package rGV, which calculates a suite of 16 glycemic variability metrics when provided a single individual’s CGM data. rGV is versatile and robust; it is capable of handling data of many formats from many sensor types. We also created a companion R Shiny web app that provides these glycemic variability analysis tools without prior knowledge of R coding. We analyzed the statistical reliability of all the glycemic variability metrics included in rGV and illustrate the clinical utility of rGV by analyzing CGM data from three studies. Results: In subjects without diabetes, greater glycemic variability was associated with higher HbA1c values. In patients with type 2 diabetes mellitus (T2DM), we found that high glucose is the primary driver of glycemic variability. In patients with type 1 diabetes (T1DM), we found that naltrexone use may potentially reduce glycemic variability. Conclusions: We present a new R package and accompanying web app to facilitate quick and easy computation of a suite of glycemic variability metrics.

Download Full-text

Large, Sparse Optimal Matching with R package rebalance

Observational Studies ◽

10.1353/obs.2016.0006 ◽

2016 ◽

Vol 2 (1) ◽

pp. 4-23

Author(s):

Samuel D. Pimentel

Keyword(s):

R Package ◽

Optimal Matching

Download Full-text

perfectphyloR: An R package for reconstructing perfect phylogenies

BMC Bioinformatics ◽

10.1186/s12859-019-3313-4 ◽

2019 ◽

Vol 20 (1) ◽

Author(s):

Charith B. Karunarathna ◽

Jinko Graham

Keyword(s):

Binary Tree ◽

Sequence Data ◽

R Package ◽

Binary Sequences ◽

Ancestral Haplotype ◽

Perfect Phylogeny ◽

Nested Partitions ◽

Genetic Sequence ◽

Insight Into ◽

Rooted Binary Tree

Abstract Background A perfect phylogeny is a rooted binary tree that recursively partitions sequences. The nested partitions of a perfect phylogeny provide insight into the pattern of ancestry of genetic sequence data. For example, sequences may cluster together in a partition indicating that they arise from a common ancestral haplotype. Results We present an R package to reconstruct the local perfect phylogenies underlying a sample of binary sequences. The package enables users to associate the reconstructed partitions with a user-defined partition. We describe and demonstrate the major functionality of the package. Conclusion The package should be of use to researchers seeking insight into the ancestral structure of their sequence data. The reconstructed partitions have many applications, including the mapping of trait-influencing variants.

Download Full-text

RSAT 2011: regulatory sequence analysis tools

Nucleic Acids Research ◽

10.1093/nar/gkr377 ◽

2011 ◽

Vol 39 (suppl) ◽

pp. W86-W91 ◽

Cited By ~ 165

Author(s):

M. Thomas-Chollier ◽

M. Defrance ◽

A. Medina-Rivera ◽

O. Sand ◽

C. Herrmann ◽

...

Keyword(s):

Sequence Analysis ◽

Regulatory Sequence ◽

Analysis Tools

Download Full-text

Genetic Diversity and Pathogenic Variability Among Isolates of Colletotrichum Species from Strawberry

Phytopathology ◽

10.1094/phyto.2003.93.2.219 ◽

2003 ◽

Vol 93 (2) ◽

pp. 219-228 ◽

Cited By ~ 51

Author(s):

Béatrice Denoyes-Rothan ◽

Guy Guérin ◽

Christophe Délye ◽

Barbara Smith ◽

Dror Minz ◽

...

Keyword(s):

Sequence Analysis ◽

Sequence Data ◽

Random Amplified Polymorphic Dna ◽

Molecular Data ◽

Its2 Sequence ◽

Host Specialization ◽

Pathogenicity Tests ◽

Colletotrichum Spp ◽

Rapd Polymorphism ◽

Pathogenic Variability

Ninety-five isolates of Colletotrichum including 81 isolates of C. acutatum (62 from strawberry) and 14 isolates of C. gloeosporioides (13 from strawberry) were characterized by various molecular methods and pathogenicity tests. Results based on random amplified polymorphic DNA (RAPD) polymorphism and internal transcribed spacer (ITS) 2 sequence data provided clear genetic evidence of two subgroups in C. acutatum. The first subgroup, characterized as CA-clonal, included only isolates from strawberry and exhibited identical RAPD patterns and nearly identical ITS2 sequence analysis. A larger genetic group, CA-variable, included isolates from various hosts and exhibited variable RAPD patterns and divergent ITS2 sequence analysis. Within the C. acutatum population isolated from strawberry, the CA-clonal group is prevalent in Europe (54 isolates of 62). A subset of European C. acutatum isolates isolated from strawberry and representing the CA-clonal and CA-variable groups was assigned to two pathogenicity groups. No correlation could be drawn between genetic and pathogenicity groups. On the basis of molecular data, it is proposed that the CA-clonal subgroup contains closely related, highly virulent C. acutatum isolates that may have developed host specialization to strawberry. C. gloeosporioides isolates from Europe, which were rarely observed were either slightly or nonpathogenic on strawberry. The absence of correlation between genetic polymorphism and geographical origin in Colletotrichum spp. suggests a worldwide dissemination of isolates, probably through international plant exchanges.

Download Full-text

Analyzing categorical time series in the presence of missing observations

Statistics in Medicine ◽

10.1002/sim.9089 ◽

2021 ◽

Author(s):

Christian H. Weiß

Keyword(s):

Time Series ◽

Missing Observations ◽

Categorical Time Series

Download Full-text

Nanopore sequencing technology and tools for genome assembly: computational analysis of the current state, bottlenecks and future directions

Briefings in Bioinformatics ◽

10.1093/bib/bby017 ◽

2018 ◽

Vol 20 (4) ◽

pp. 1542-1559 ◽

Cited By ~ 44

Author(s):

Damla Senol Cali ◽

Jeremie S Kim ◽

Saugata Ghose ◽

Can Alkan ◽

Onur Mutlu

Keyword(s):

Sequence Analysis ◽

Genome Assembly ◽

Sequence Data ◽

Error Rates ◽

Nanopore Sequencing ◽

Memory Usage ◽

Sequencing Technology ◽

Assembly Pipeline ◽

And Performance ◽

Polishing Tool

Abstract Nanopore sequencing technology has the potential to render other sequencing technologies obsolete with its ability to generate long reads and provide portability. However, high error rates of the technology pose a challenge while generating accurate genome assemblies. The tools used for nanopore sequence analysis are of critical importance, as they should overcome the high error rates of the technology. Our goal in this work is to comprehensively analyze current publicly available tools for nanopore sequence analysis to understand their advantages, disadvantages and performance bottlenecks. It is important to understand where the current tools do not perform well to develop better tools. To this end, we (1) analyze the multiple steps and the associated tools in the genome assembly pipeline using nanopore sequence data, and (2) provide guidelines for determining the appropriate tools for each step. Based on our analyses, we make four key observations: (1) the choice of the tool for basecalling plays a critical role in overcoming the high error rates of nanopore sequencing technology. (2) Read-to-read overlap finding tools, GraphMap and Minimap, perform similarly in terms of accuracy. However, Minimap has a lower memory usage, and it is faster than GraphMap. (3) There is a trade-off between accuracy and performance when deciding on the appropriate tool for the assembly step. The fast but less accurate assembler Miniasm can be used for quick initial assembly, and further polishing can be applied on top of it to increase the accuracy, which leads to faster overall assembly. (4) The state-of-the-art polishing tool, Racon, generates high-quality consensus sequences while providing a significant speedup over another polishing tool, Nanopolish. We analyze various combinations of different tools and expose the trade-offs between accuracy, performance, memory usage and scalability. We conclude that our observations can guide researchers and practitioners in making conscious and effective choices for each step of the genome assembly pipeline using nanopore sequence data. Also, with the help of bottlenecks we have found, developers can improve the current tools or build new ones that are both accurate and fast, to overcome the high error rates of the nanopore sequencing technology.

Download Full-text

Sequence analysis of heparan sulphate and heparin oligosaccharides

Biochemical Journal ◽

10.1042/bj3390767 ◽

1999 ◽

Vol 339 (3) ◽

pp. 767-773 ◽

Cited By ~ 35

Author(s):

Romain R. VIVÈS ◽

David A. PYE ◽

Markku SALMIVIRTA ◽

John J. HOPWOOD ◽

Ulf LINDAHL ◽

...

Keyword(s):

Sequence Analysis ◽

Protein Interactions ◽

Sequence Data ◽

Specific Binding ◽

Heparan Sulphate ◽

Biologically Active ◽

Simple Method ◽

Gag Protein ◽

Specific Binding Sites ◽

Strong Anion Exchange

The biological activity of heparan sulphate (HS) and heparin largely depends on internal oligosaccharide sequences that provide specific binding sites for an extensive range of proteins. Identification of such structures is crucial for the complete understanding of glycosaminoglycan (GAG)-protein interactions. We describe here a simple method of sequence analysis relying on the specific tagging of the sugar reducing end by 3H radiolabelling, the combination of chemical scission and specific enzymic digestion to generate intermediate fragments, and the analysis of the generated products by strong-anion-exchange HPLC. We present full sequence data on microgram quantities of four unknown oligosaccharides (three HS-derived hexasaccharides and one heparin-derived octasaccharide) which illustrate the utility and relative simplicity of the technique. The results clearly show that it is also possible to read sequences of inhomogeneous preparations. Application of this technique to biologically active oligosaccharides should accelerate progress in the understanding of HS and heparin structure-function relationships and provide new insights into the primary structure of these polysaccharides.

Download Full-text