sequence patterns
Recently Published Documents


TOTAL DOCUMENTS

304
(FIVE YEARS 56)

H-INDEX

36
(FIVE YEARS 3)

Author(s):  
Jacopo Vanoli ◽  
Consuelo Rubina Nava ◽  
Chiara Airoldi ◽  
Andrealuna Ucciero ◽  
Virginio Salvi ◽  
...  

While state sequence analysis (SSA) has been long used in social sciences, its use in pharmacoepidemiology is still in its infancy. Indeed, this technique is relatively easy to use, and its intrinsic visual nature may help investigators to untangle the latent information within prescription data, facilitating the individuation of specific patterns and possible inappropriate use of medications. In this paper, we provide an educational primer of the most important learning concepts and methods of SSA, including measurement of dissimilarities between sequences, the application of clustering methods to identify sequence patterns, the use of complexity measures for sequence patterns, the graphical visualization of sequences, and the use of SSA in predictive models. As a worked example, we present an application of SSA to opioid prescription patterns in patients with non-cancer pain, using real-world data from Italy. We show how SSA allows the identification of patterns in prescriptions in these data that might not be evident using standard statistical approaches and how these patterns are associated with future discontinuation of opioid therapy.


Author(s):  
Jozef Kapusta ◽  
Ľubomír Benko ◽  
Dasa Munkova ◽  
Michal Munk

AbstractPost-editing has become an important part not only of translation research but also in the global translation industry. While computer-aided translation tools, such as translation memories, are considered to be part of a translator's work, lately, machine translation (MT) systems have also been accepted by human translators. However, many human translators are still adopting the changes brought by translation technologies to the translation industry. This paper introduces a novel approach for seeking suitable pairs of n-grams when recommending n-grams (corresponding n-grams between MT and post-edited MT) based on the type of text (manual or administrative) and MT system used for machine translation. A tool that recommends and speeds up the correction of MT was developed to help the post-editors with their work. It is based on the analysis of words with the same lemmas and analysis of n-gram recommendations. These recommendations are extracted from sequence patterns of the mismatched words (MisMatch) between MT output and post-edited MT output. The paper aims to show the usage of morphological analysis for recommending the post-edit operations. It describes the usage of mismatched words in the n-gram recommendations for the post-edited MT output. The contribution consists of the methodology for seeking suitable pairs of words, n-grams and additionally the importance of taking into account metadata (the type of the text and/or style and MT system) when recommending post-edited operations.


Blood ◽  
2021 ◽  
Vol 138 (Supplement 1) ◽  
pp. 105-105
Author(s):  
Hui Yang ◽  
Guillermo Garcia-Manero ◽  
Guillermo Montalban-Bravo ◽  
Kelly S. Chien ◽  
Awdesh Kalia ◽  
...  

Abstract Introduction Introduction of next-generation sequencing has defined the somatic mutational landscape in MDS. Comprehensive high-throughput structural variant profiling (SVP) is as important as mutation profiling in characterizing MDS clonal architecture since these large genomic aberrations have already shown to be critical for diagnosis and risk-stratification of MDS. A subset (MECOM, KMT2A rearrangements) are therapeutic targets in clinical trials. At this time, technical advances in SVP for copy number alterations (CNAs) and fusions have not been congruent with mutation profiling due to the inability of short-read (150bp) NGS to detect SVs. Currently available long-read (10-20Kbp) and whole genome sequencing cannot detect all SVs due to the presence of repeat sequences. Hence, conventional karyotyping (CK) remains the gold standard. Optical genome mapping (OGM) is a novel single-platform technique that measures ultra-long-range sequence patterns (>300Kbp), thereby unaffected by repeat sequences, enabling unbiased evaluation of all types of SVs at a high resolution. Here, we performed comprehensive SVP and mutation profiling in a large well-characterized cohort of MDS. Methods We selected samples with available fresh/frozen BM cells from consecutive treatment-naïve MDS pts who also underwent standard-of-care tests (CK, FISH, targeted 81-gene NGS for mutations). For OGM, ultra-high-molecular-weight-DNA was extracted, followed by labeling, linearization and imaging of DNA (Saphyr, Bionano) [median coverage:>300X]. The results were analyzed using de novo (>500 bp), rare variant (>5000 bp) and copy number (>500,000 bp) pipelines. The data was compared against 200 healthy controls to exclude common germline SVs. Clinical significance of the SVs was determined based on the location/overlap with the coding region of myeloid malignancy associated genes. The detection sensitivity was 10%. Results There were 76 treatment naïve MDS patients. Baseline characteristics, comprehensive cytogenetic scoring system (CCSS) and R-IPPS risk categories and somatic mutations are in Fig 1. OGM identified all clonal abnormalities detected by CK [CNAs, inversions, inter/intra-chromosomal translocations, dicentric, complex derivative chromosomes]. Precise mapping of SVs by OGM at gene-level allowed determining the status of clinically informative biomarkers such as TET2, MECOM, TP53 and KMT2A, without the need for confirmatory assays. Detailed gene-level characterization of different SVs included KMT2A-ELL [t(11;19)] in MDS with WT1 mut, t(9;11) with SYTL2 fusion (and not KMT2A), der(1;7) leading to del(7q) in MDS with GATA2 mut/IDH2 mutand t(1;3)(p36;q21) rearrangements with potential PRDM16 disruption in SF3B1 mut/RUNX1 mutMDS, among others. Using OGM, we mapped the sequence patterns in both samples with IM with high level of confidence. Additionally, OGM identified 23 cryptic, clinically significant SVs in 14 (18%) of 76 pts. These included deletions of TET2, KMT2A, and del(5q), KMT2A amplification in MDS with FLT3-ITD/DNMT3A mut/RAS mut, NUP98-PRRX2, MECOM rearrangement in TET mut mutated NK-MDS. In addition, there were SVs of uncertain significance: duplications of chr1 (PDE41P), deletions of chr21 (involving RUNX1), chr2 (DNMT3A, ASXL2), chr12 (ETV6) and chr22 (EP300) and der(16)t(12;16)(q21.1;q12.1). These cryptic SVs were noted across all R-IPSS risk categories (highest yield in very-low and low R-IPSS) and across all cytogenetic risk-groups (very-good to very-poor). In complex karyotype setting, OGM could resolve the markers and additional genetic material, and in most cases, showed a much higher the degree of complexity within the genome than was apparent by CK. Four pts showed SV patterns typical of chromothripsis/chromoplexy. The median number of mutations per pt was 1 (0-6). When compared to mutation subsets, cryptic SVs were only identified in pts with ≤3 mutations. Majority represented either MDS with TP53 mut (6, 29%) or SF3B1 mut/TET mut (deletions of TET2, KMT2A, NOTCH1 and EP300 genes). Conclusions Unbiased, high-throughput whole genome SVP revealed cryptic, clinically significant SVs in ~18% of MDS pts. OGM is a single-platform cytogenomic tool that can facilitate SVP at a gene-level resolution. This study provides strong support for further validation in expanded cohorts to guide clinical implementation and integration of SVP for routine work-up. Figure 1 Figure 1. Disclosures Wei: Daiichi Sanko: Research Funding. Kantarjian: Ipsen Pharmaceuticals: Honoraria; Amgen: Honoraria, Research Funding; Astellas Health: Honoraria; Astra Zeneca: Honoraria; AbbVie: Honoraria, Research Funding; KAHR Medical Ltd: Honoraria; NOVA Research: Honoraria; Ascentage: Research Funding; Aptitude Health: Honoraria; Novartis: Honoraria, Research Funding; Pfizer: Honoraria, Research Funding; Jazz: Research Funding; Immunogen: Research Funding; Daiichi-Sankyo: Research Funding; BMS: Research Funding; Precision Biosciences: Honoraria; Taiho Pharmaceutical Canada: Honoraria.


2021 ◽  
Vol 3 (4) ◽  
Author(s):  
Philipp Benner ◽  
Martin Vingron

Abstract Recent efforts to measure epigenetic marks across a wide variety of different cell types and tissues provide insights into the cell type-specific regulatory landscape. We use these data to study whether there exists a correlate of epigenetic signals in the DNA sequence of enhancers and explore with computational methods to what degree such sequence patterns can be used to predict cell type-specific regulatory activity. By constructing classifiers that predict in which tissues enhancers are active, we are able to identify sequence features that might be recognized by the cell in order to regulate gene expression. While classification performances vary greatly between tissues, we show examples where our classifiers correctly predict tissue-specific regulation from sequence alone. We also show that many of the informative patterns indeed harbor transcription factor footprints.


Genes ◽  
2021 ◽  
Vol 12 (10) ◽  
pp. 1571
Author(s):  
Aaron Sievers ◽  
Liane Sauer ◽  
Michael Hausmann ◽  
Georg Hildenbrand

Several strongly conserved DNA sequence patterns in and between introns and intergenic regions (IIRs) consisting of short tandem repeats (STRs) with repeat lengths <3 bp have already been described in the kingdom of Animalia. In this work, we expanded the search and analysis of conserved DNA sequence patterns to a wider range of eukaryotic genomes. Our aims were to confirm the conservation of these patterns, to support the hypothesis on their functional constraints and/or the identification of unknown patterns. We pairwise compared genomic DNA sequences of genes, exons, CDS, introns and intergenic regions of 34 Embryophyta (land plants), 30 Protista and 29 Fungi using established k-mer-based (alignment-free) comparison methods. Additionally, the results were compared with values derived for Animalia in former studies. We confirmed strong correlations between the sequence structures of IIRs spanning over the entire domain of Eukaryotes. We found that the high correlations within introns, intergenic regions and between the two are a result of conserved abundancies of STRs with repeat units ≤2 bp (e.g., (AT)n). For some sequence patterns and their inverse complementary sequences, we found a violation of equal distribution on complementary DNA strands in a subset of genomes. Looking at mismatches within the identified STR patterns, we found specific preferences for certain nucleotides stable over all four phylogenetic kingdoms. We conclude that all of these conserved patterns between IIRs indicate a shared function of these sequence structures related to STRs.


2021 ◽  
Vol 9 (10) ◽  
pp. 2067
Author(s):  
Weijian Wang ◽  
Muchun Wan ◽  
Fang Yang ◽  
Na Li ◽  
Lihua Xiao ◽  
...  

Cryptosporidium bovis is a common enteric pathogen in bovine animals. The research on transmission characteristics of the pathogen is hampered by the lack of subtyping tools. In this study, we retrieve the nucleotide sequence of the 60 kDa glycoprotein (GP60) from the whole genome sequences of C. bovis we obtained previously and analyze its sequence characteristics. Despite a typical structure of the GP60 protein, the GP60 of C. bovis had only 19.3–45.3% sequence identity to those of other Cryptosporidium species. On the basis of the gene sequence, a subtype typing tool was developed for C. bovis and used in the analysis of 486 C. bovis samples from dairy cattle, yaks, beef cattle, and water buffalos from China. Sixty-eight sequence types were identified from 260 subtyped samples, forming six subtype families, namely XXVIa to XXVIf. The mosaic sequence patterns among subtype families and the 121 potential recombination events identified among the sequences both suggest the occurrence of genetic recombination at the locus. No obvious host adaptation and geographic differences in the distribution of subtype families were observed. Most farms with more extensive sampling had more than one subtype family, and the dominant subtype families on a farm appeared to differ between pre- and post-weaned calves, indicating the likely occurrence of multiple episodes of C. bovis infections. There was an association between XXVId infection and occurrence of moderate diarrhea in dairy cattle. The subtyping tool developed and the data generated in the study might improve our knowledge of the genetic diversity and transmission of C. bovis.


Author(s):  
Gabriel Luíz Costa ◽  
Maria Eduarda Pereira Mascarenhas ◽  
Thamires Oliveira Gasquez Martin ◽  
Laura Guimarães Fortini ◽  
Jaime Louzada ◽  
...  

Early diagnosis and treatment are fundamental to the control and elimination of malaria. In many endemic areas, routine diagnosis is primarily performed microscopically, although rapid diagnostic tests (RDTs) provide a useful point-of-care tool. Most of the commercially available RDTs detect histidine-rich protein 2 (HRP2) of Plasmodium falciparum in the blood of infected individuals. Nonetheless, parasite isolates lacking the pfhrp2 gene are relatively frequent in some endemic regions, thereby hampering the diagnosis of malaria using HRP2-based RDTs. To track the efficacy of RDTs in areas of the Brazilian Amazon, we assessed pfhrp2 deletions in 132 P. falciparum samples collected from four malaria-endemic states in Brazil. Our findings show low to moderate levels of pfhrp2 deletion in different regions of the Brazilian Amazon. Overall, during the period covered by this study (2002-2020), we found that 10% of the P. falciparum isolates were characterized by a pfhrp2 deletion. Notably, however, the presence of pfhrp2-negative isolates has not been translated into a reduction in RDT efficacy, which in part may be explained by the presence of polyclonal infections. A further important finding was the discrepancy in the proportion of pfhrp2 deletions detected using two assessed protocols (conventional PCR versus nested PCR), which reinforces the need to perform a carefully planned laboratory workflow to assess gene deletion. This is the first study to perform a comprehensive analysis of PfHRP2 sequence diversity in Brazilian isolates of P. falciparum. We identified 10 PfHRP2 sequence patterns, which were found to be exclusive of each of the assessed regions. Despite the small number of PfHRP2 sequences available from South America, we found that the PfHRP2 sequences identified in Brazil and neighboring French Guiana show similar sequence patterns. Our findings highlight the importance of continuously monitoring the occurrence and spread of parasites with pfrhp2 deletions, while also taking into account the limitations of PCR-based testing methods associated with accuracy and the complexity of infections.


2021 ◽  
Vol 3 ◽  
Author(s):  
Donna Moxley Scarborough ◽  
Pablo E. Colón ◽  
Shannon E. Linderman ◽  
Eric M. Berkson

Performance of a sequential proximal-to-distal transfer of segmental angular velocity (or Kinematic Sequence) is reported to reduce stress on musculoskeletal structures and thus the probability of injury while also maximizing ball velocity. However, there is limited investigation regarding the Kinematic Sequence of the five body segments (Pelvis, Trunk, Arm, Forearm, and Hand) among baseball pitchers. Some biomechanical and epidemiology studies have reported an association of the curveball with increased risk for elbow injury among youth pitchers. Kinematic Sequences with altered distal upper extremity (forearm and hand) sequences have been associated with greater elbow valgus and shoulder external rotation torques compared to other Kinematic Sequences. Identifying Kinematic Sequence patterns during curveball pitches may lead to improved understanding of injury susceptibility. This study investigated the Kinematic Sequence patterns (and their variability) during curveball pitching and compared them to the sequences identified during fastball pitches. Using 3D motion analyses, 14 baseball pitchers (four high school, eight college, and two professional) performed 5–6 curveball pitches and 12 pitchers also threw fastball pitches in a simulated bullpen session. Eleven different curveball Kinematic Sequences were identified and 8 fastball Kinematic Sequences. There was no significant variability in the number of Kinematic Sequences performed between the two pitch types, (Z = −0.431, p = 0.67). The median number of KSs performed by each group was 2.5. The most frequently used Kinematic Sequences for both pitch types were due to alteration in the sequence of the distal segments. The total percentage of Kinematic Sequences with altered distal segment sequencing for the curveball pitches was 49% and 43% for fastball pitches. Identifying the frequency of Kinematic Sequences with altered timing of hand and forearm peak velocities across pitch types may lead to a better understanding of the stresses that individual pitchers incur.


PeerJ ◽  
2021 ◽  
Vol 9 ◽  
pp. e11983
Author(s):  
Philip J. Shaw ◽  
Jittima Piriyapongsa ◽  
Pavita Kaewprommal ◽  
Chayaphat Wongsombat ◽  
Chadapohn Chaosrikul ◽  
...  

Background The genome of the human malaria parasite Plasmodium falciparum is poorly annotated, in particular, the 5′ capped ends of its mRNA transcripts. New approaches are needed to fully catalog P. falciparum transcripts for understanding gene function and regulation in this organism. Methods We developed a transcriptomic method based on next-generation sequencing of complementary DNA (cDNA) enriched for full-length fragments using eIF4E, a 5′ cap-binding protein, and an unenriched control. DNA sequencing adapter was added after enrichment of full-length cDNA using two different ligation protocols. From the mapped sequence reads, enrichment scores were calculated for all transcribed nucleotides and used to calculate P-values of 5′ capped nucleotide enrichment. Sensitivity and accuracy were increased by combining P-values from replicate experiments. Data were obtained for P. falciparum ring, trophozoite and schizont stages of intra-erythrocytic development. Results 5′ capped nucleotide signals were mapped to 17,961 non-overlapping P. falciparum genomic intervals. Analysis of the dominant 5′ capped nucleotide in these genomic intervals revealed the presence of two groups with distinctive epigenetic features and sequence patterns. A total of 4,512 transcripts were annotated as 5′ capped based on the correspondence of 5′ end with 5′ capped nucleotide annotated from full-length cDNA data. Discussion The presence of two groups of 5′ capped nucleotides suggests that alternative mechanisms may exist for producing 5′ capped transcript ends in P. falciparum. The 5′ capped transcripts that are antisense, outside of, or partially overlapping coding regions may be important regulators of gene function in P. falciparum.


Sign in / Sign up

Export Citation Format

Share Document