scholarly journals SADI: Sequence Analysis Tools for Stata

Author(s):  
Brendan Halpin

The SADI package provides tools for sequence analysis, which focuses on the similarity and dissimilarity between categorical time series such as life-course trajectories. SADI‘s main components are tools to calculate intersequence distances using several different algorithms, including the optimal matching algorithm, but it also includes utilities to graph, summarize, and manage sequence data. It provides similar functionality to the R package TraMineR and the Stata package SQ but is substantially faster than the latter.

2001 ◽  
Vol 28 (2) ◽  
pp. 439 ◽  
Author(s):  
Francesco C. Billari

This paper examines the salient features of sequence analysis in demographic research. The new approach allows a holistic perspective on life course analysis, and is based on a representation of lives as sequences of states. Some of the methods for analysing such data are sketched, from complex description to optimal matching to monothetic divisive algorithms. After a short illustration of a demographically relevant example, the needs in terms of data collection and the opportunities of applying the same approach to synthetic data are discussed.


2021 ◽  
pp. 193229682110289
Author(s):  
Evan Olawsky ◽  
Yuan Zhang ◽  
Lynn E Eberly ◽  
Erika S Helgeson ◽  
Lisa S Chow

Background: With the development of continuous glucose monitoring systems (CGMS), detailed glycemic data are now available for analysis. Yet analysis of this data-rich information can be formidable. The power of CGMS-derived data lies in its characterization of glycemic variability. In contrast, many standard glycemic measures like hemoglobin A1c (HbA1c) and self-monitored blood glucose inadequately describe glycemic variability and run the risk of bias toward overreporting hyperglycemia. Methods that adjust for this bias are often overlooked in clinical research due to difficulty of computation and lack of accessible analysis tools. Methods: In response, we have developed a new R package rGV, which calculates a suite of 16 glycemic variability metrics when provided a single individual’s CGM data. rGV is versatile and robust; it is capable of handling data of many formats from many sensor types. We also created a companion R Shiny web app that provides these glycemic variability analysis tools without prior knowledge of R coding. We analyzed the statistical reliability of all the glycemic variability metrics included in rGV and illustrate the clinical utility of rGV by analyzing CGM data from three studies. Results: In subjects without diabetes, greater glycemic variability was associated with higher HbA1c values. In patients with type 2 diabetes mellitus (T2DM), we found that high glucose is the primary driver of glycemic variability. In patients with type 1 diabetes (T1DM), we found that naltrexone use may potentially reduce glycemic variability. Conclusions: We present a new R package and accompanying web app to facilitate quick and easy computation of a suite of glycemic variability metrics.


2016 ◽  
Vol 2 (1) ◽  
pp. 4-23
Author(s):  
Samuel D. Pimentel
Keyword(s):  

2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Charith B. Karunarathna ◽  
Jinko Graham

Abstract Background A perfect phylogeny is a rooted binary tree that recursively partitions sequences. The nested partitions of a perfect phylogeny provide insight into the pattern of ancestry of genetic sequence data. For example, sequences may cluster together in a partition indicating that they arise from a common ancestral haplotype. Results We present an R package to reconstruct the local perfect phylogenies underlying a sample of binary sequences. The package enables users to associate the reconstructed partitions with a user-defined partition. We describe and demonstrate the major functionality of the package. Conclusion The package should be of use to researchers seeking insight into the ancestral structure of their sequence data. The reconstructed partitions have many applications, including the mapping of trait-influencing variants.


2011 ◽  
Vol 39 (suppl) ◽  
pp. W86-W91 ◽  
Author(s):  
M. Thomas-Chollier ◽  
M. Defrance ◽  
A. Medina-Rivera ◽  
O. Sand ◽  
C. Herrmann ◽  
...  

2003 ◽  
Vol 93 (2) ◽  
pp. 219-228 ◽  
Author(s):  
Béatrice Denoyes-Rothan ◽  
Guy Guérin ◽  
Christophe Délye ◽  
Barbara Smith ◽  
Dror Minz ◽  
...  

Ninety-five isolates of Colletotrichum including 81 isolates of C. acutatum (62 from strawberry) and 14 isolates of C. gloeosporioides (13 from strawberry) were characterized by various molecular methods and pathogenicity tests. Results based on random amplified polymorphic DNA (RAPD) polymorphism and internal transcribed spacer (ITS) 2 sequence data provided clear genetic evidence of two subgroups in C. acutatum. The first subgroup, characterized as CA-clonal, included only isolates from strawberry and exhibited identical RAPD patterns and nearly identical ITS2 sequence analysis. A larger genetic group, CA-variable, included isolates from various hosts and exhibited variable RAPD patterns and divergent ITS2 sequence analysis. Within the C. acutatum population isolated from strawberry, the CA-clonal group is prevalent in Europe (54 isolates of 62). A subset of European C. acutatum isolates isolated from strawberry and representing the CA-clonal and CA-variable groups was assigned to two pathogenicity groups. No correlation could be drawn between genetic and pathogenicity groups. On the basis of molecular data, it is proposed that the CA-clonal subgroup contains closely related, highly virulent C. acutatum isolates that may have developed host specialization to strawberry. C. gloeosporioides isolates from Europe, which were rarely observed were either slightly or nonpathogenic on strawberry. The absence of correlation between genetic polymorphism and geographical origin in Colletotrichum spp. suggests a worldwide dissemination of isolates, probably through international plant exchanges.


2018 ◽  
Vol 20 (4) ◽  
pp. 1542-1559 ◽  
Author(s):  
Damla Senol Cali ◽  
Jeremie S Kim ◽  
Saugata Ghose ◽  
Can Alkan ◽  
Onur Mutlu

Abstract Nanopore sequencing technology has the potential to render other sequencing technologies obsolete with its ability to generate long reads and provide portability. However, high error rates of the technology pose a challenge while generating accurate genome assemblies. The tools used for nanopore sequence analysis are of critical importance, as they should overcome the high error rates of the technology. Our goal in this work is to comprehensively analyze current publicly available tools for nanopore sequence analysis to understand their advantages, disadvantages and performance bottlenecks. It is important to understand where the current tools do not perform well to develop better tools. To this end, we (1) analyze the multiple steps and the associated tools in the genome assembly pipeline using nanopore sequence data, and (2) provide guidelines for determining the appropriate tools for each step. Based on our analyses, we make four key observations: (1) the choice of the tool for basecalling plays a critical role in overcoming the high error rates of nanopore sequencing technology. (2) Read-to-read overlap finding tools, GraphMap and Minimap, perform similarly in terms of accuracy. However, Minimap has a lower memory usage, and it is faster than GraphMap. (3) There is a trade-off between accuracy and performance when deciding on the appropriate tool for the assembly step. The fast but less accurate assembler Miniasm can be used for quick initial assembly, and further polishing can be applied on top of it to increase the accuracy, which leads to faster overall assembly. (4) The state-of-the-art polishing tool, Racon, generates high-quality consensus sequences while providing a significant speedup over another polishing tool, Nanopolish. We analyze various combinations of different tools and expose the trade-offs between accuracy, performance, memory usage and scalability. We conclude that our observations can guide researchers and practitioners in making conscious and effective choices for each step of the genome assembly pipeline using nanopore sequence data. Also, with the help of bottlenecks we have found, developers can improve the current tools or build new ones that are both accurate and fast, to overcome the high error rates of the nanopore sequencing technology.


1999 ◽  
Vol 339 (3) ◽  
pp. 767-773 ◽  
Author(s):  
Romain R. VIVÈS ◽  
David A. PYE ◽  
Markku SALMIVIRTA ◽  
John J. HOPWOOD ◽  
Ulf LINDAHL ◽  
...  

The biological activity of heparan sulphate (HS) and heparin largely depends on internal oligosaccharide sequences that provide specific binding sites for an extensive range of proteins. Identification of such structures is crucial for the complete understanding of glycosaminoglycan (GAG)-protein interactions. We describe here a simple method of sequence analysis relying on the specific tagging of the sugar reducing end by 3H radiolabelling, the combination of chemical scission and specific enzymic digestion to generate intermediate fragments, and the analysis of the generated products by strong-anion-exchange HPLC. We present full sequence data on microgram quantities of four unknown oligosaccharides (three HS-derived hexasaccharides and one heparin-derived octasaccharide) which illustrate the utility and relative simplicity of the technique. The results clearly show that it is also possible to read sequences of inhomogeneous preparations. Application of this technique to biologically active oligosaccharides should accelerate progress in the understanding of HS and heparin structure-function relationships and provide new insights into the primary structure of these polysaccharides.


Sign in / Sign up

Export Citation Format

Share Document