scholarly journals Proteomics Standards Initiative Extended FASTA Format

2019 ◽  
Vol 18 (6) ◽  
pp. 2686-2692 ◽  
Author(s):  
Pierre-Alain Binz ◽  
Jim Shofstahl ◽  
Juan Antonio Vizcaíno ◽  
Harald Barsnes ◽  
Robert J. Chalkley ◽  
...  
Keyword(s):  
2015 ◽  
Author(s):  
Frédéric Mahé ◽  
Torbjørn Rognes ◽  
Christopher Quince ◽  
Colomban de Vargas ◽  
Micah S Dunthorn

Previously we presented Swarm v1, a novel and open source amplicon clustering program that produced fine-scale molecular operational taxonomic units (OTUs), free of arbitrary global clustering thresholds and input-order dependency. Swarm v1 worked with an initial phase that used iterative single-linkage with a local clustering threshold (d), followed by a phase that used the internal abundance structures of clusters to break chained OTUs. Here we present Swarm v2 that has two important novel features: 1) a new algorithm for d = 1 that allows the computation time of the program to scale linearly with increasing amounts of data; and 2) the new fastidious option that reduces under-grouping by grafting low abundant OTUs (e.g., singletons and doubletons) onto larger ones. Swarm v2 also directly integrates the clustering and breaking phases, dereplicates sequencing reads with d = 0, outputs OTU representatives in fasta format, and plots individual OTUs as two-dimensional networks.


Author(s):  
Xiangfu Zhong ◽  
Albert Pla ◽  
Simon Rayner

Abstract Motivation The existence of complex subpopulations of miRNA isoforms, or isomiRs, is well established. While many tools exist for investigating isomiR populations, they differ in how they characterize an isomiR, making it difficult to compare results across different tools. Thus, there is a need for a more comprehensive and systematic standard for defining isomiRs. Such a standard would allow investigation of isomiR population structure in progressively more refined sub-populations, permitting the identification of more subtle changes between conditions and leading to an improved understanding of the processes that generate these differences. Results We developed Jasmine, a software tool that incorporates a hierarchal framework for characterizing isomiR populations. Jasmine is a Java application that can process raw read data in fastq/fasta format, or mapped reads in SAM format to produce a detailed characterization of isomiR populations. Thus, Jasmine can reveal structure not apparent in a standard miRNA-Seq analysis pipeline. Availability and implementation Jasmine is implemented in Java and R and freely available at bitbucket https://bitbucket.org/bipous/jasmine/src/master/. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 47 (W1) ◽  
pp. W289-W294 ◽  
Author(s):  
Fatemeh Sharifi ◽  
Yuzhen Ye

Abstract MyDGR is a web server providing integrated prediction and visualization of Diversity-Generating Retroelements (DGR) systems in query nucleotide sequences. It is built upon an enhanced version of DGRscan, a tool we previously developed for identification of DGR systems. DGR systems are remarkable genetic elements that use error-prone reverse transcriptases to generate vast sequence variants in specific target genes, which have been shown to benefit their hosts (bacteria, archaea or phages). As the first web server for annotation of DGR systems, myDGR is freely available on the web at http://omics.informatics.indiana.edu/myDGR with all major browsers supported. MyDGR accepts query nucleotide sequences in FASTA format, and outputs all the important features of a predicted DGR system, including a reverse transcriptase, a template repeat and one (or more) variable repeats and their alignment featuring A-to-N (N can be C, T or G) substitutions, and VR-containing target gene(s). In addition to providing the results as text files for download, myDGR generates a visual summary of the results for users to explore the predicted DGR systems. Users can also directly access pre-calculated, putative DGR systems identified in currently available reference bacterial genomes and a few other collections of sequences (including human microbiomes).


Author(s):  
Frédéric Mahé ◽  
Torbjørn Rognes ◽  
Christopher Quince ◽  
Colomban de Vargas ◽  
Micah Dunthorn

Previously we presented Swarm v1, a novel and open source amplicon clustering program that produced fine-scale molecular operational taxonomic units (OTUs), free of arbitrary global clustering thresholds and input-order dependency. Swarm v1 worked with an initial phase that used iterative single-linkage with a local clustering threshold (d), followed by a phase that used the internal abundance structures of clusters to break chained OTUs. Here we present Swarm v2 that has two important novel features: 1) a new algorithm for d = 1 that allows the computation time of the program to scale linearly with increasing amounts of data; and 2) the new fastidious option that reduces under-grouping by grafting low abundant OTUs (e.g., singletons and doubletons) onto larger ones. Swarm v2 also directly integrates the clustering and breaking phases, dereplicates sequencing reads with d = 0, outputs OTU representatives in fasta format, and plots individual OTUs as two-dimensional networks.


2019 ◽  
Author(s):  
Pierre-Alain Binz ◽  
Jim Shofstahl ◽  
Juan Antonio Vizcaíno ◽  
Harald Barsnes ◽  
Robert J. Chalkley ◽  
...  

AbstractMass spectrometry-based proteomics enables the high-throughput identification and quantification of proteins, including sequence variants and post-translational modifications (PTMs), in biological samples. However, most workflows require that such variations be included in the search space used to analyze the data, and doing so remains challenging with most analysis tools. In order to facilitate the search for known sequence variants and PTMs, the Proteomics Standards Initiative (PSI) has designed and implemented the PSI Extended FASTA Format (PEFF). PEFF is based on the very popular FASTA format but adds a uniform mechanism for encoding substantially more metadata about the sequence collection as well as individual entries, including support for encoding known sequence variants, PTMs, and proteoforms. The format is very nearly backwards compatible, and as such, existing FASTA parsers will require little or no changes to be able to read PEFF files as FASTA files, although without supporting any of the extra capabilities of PEFF. PEFF is defined by a full specification document, controlled vocabulary terms, a set of example files, software libraries, and a file validator. Popular software and resources are starting to support PEFF, including the sequence search engine Comet and the knowledge bases neXtProt and UniProtKB. Widespread implementation of PEFF is expected to further enable proteogenomics and top-down proteomics applications by providing a standardized mechanism for encoding protein sequences and their known variations. All the related documentation, including the detailed file format specification and example files, are available athttp://www.psidev.info/peff.


2015 ◽  
Author(s):  
Frédéric Mahé ◽  
Torbjørn Rognes ◽  
Christopher Quince ◽  
Colomban de Vargas ◽  
Micah Dunthorn

Previously we presented Swarm v1, a novel and open source amplicon clustering program that produced fine-scale molecular operational taxonomic units (OTUs), free of arbitrary global clustering thresholds and input-order dependency. Swarm v1 worked with an initial phase that used iterative single-linkage with a local clustering threshold (d), followed by a phase that used the internal abundance structures of clusters to break chained OTUs. Here we present Swarm v2 that has two important novel features: 1) a new algorithm for d = 1 that allows the computation time of the program to scale linearly with increasing amounts of data; and 2) the new fastidious option that reduces under-grouping by grafting low abundant OTUs (e.g., singletons and doubletons) onto larger ones. Swarm v2 also directly integrates the clustering and breaking phases, dereplicates sequencing reads with d = 0, outputs OTU representatives in fasta format, and plots individual OTUs as two-dimensional networks.


2020 ◽  
Author(s):  
Keyword(s):  

Author(s):  
Praveen Reddy P.

Background: The L-Asparaginase is a medically important drug. The L-Asparaginase enzyme, an anticancer agent produced by microorganisms is used for the treatment of patients suffering from lymphoma and leukemia. The L-Asparaginase is economical and its administration is easy when compared to other commercial drugs available in market. Many microbes have been reported to produce the L-Asparaginase.Methods: In the present work the sequence of L-Asparaginase enzyme protein was obtained from the Universal Protein Resource (UNIPROT) server. The sequence of L-Asparaginase was used to generate 3-D model of L-Asparaginase in SWISS MODEL server. The constructed L-Asparaginase model was verified using Ramachandran Plot in PROCHECK server.Results: The FASTA format of L-Asparaginase enzyme of Bacillus subtilis strain 168 was retrieved from UNIPROT server. The FASTA format of L-Asparaginase was submitted to SWISS MODEL and its three-dimensional structural model was developed based on relevant template model. The model structure of L-Asparaginase was validated in PROCHECK server using Ramachandran Plot. The Ramachandran Plot of L-Asparaginase model inferred the reliability of L-Asparaginase structure model developed in SWISS MODEL server.  Conclusions: In the present study computational tools were exploited to develop and validate a potent anticancer drug, L-Asparaginase. Further the modeled L-Asparaginase enzyme protein can be improved using advanced bioinformatics tools and the same improved enzyme can be produced by improving the L-Asparaginase producing microbial strains by site-directed mutagenesis in the corresponding gene.


2021 ◽  
Author(s):  
Neha Mittal ◽  
Juhi Bhardwaj ◽  
Shruti Verma ◽  
Rajesh Kumar Singh ◽  
Renu Yadav ◽  
...  

Abstract Background- The present investigation was conducted to assess the nutritional diverseness and identify novel genetic resources to be utilized in chickpea breeding for macro and micro nutrients. Methods-The plants were grown in randomized block design. Nutritional and phytochemical properties of nine chickpea genotypes were estimated. The EST sequences from NCBI database were downloaded in FASTA format, clustered into contigs using CAP3, mined for novel SSRs using TROLL analysis and primer pairs were designed using Primer 3 software. Jaccard’s similarity coefficients were used to compare the nutritional and molecular indexes followed by dendrograms construction employing UPGMA approach. Results- The genotypes PUSA-1103, K-850, PUSA-1108, PUSA-1053 and the EST-SSR markers ICCeM012, ICCeM0049, ICCeM0070, ICCeM0078, SVP55, SVP95, SVP96, SVP146, SVP213 & SVP217 were found as potential donor / marker resources for the macro-micro nutrients. The genotypes differed (p<0.05) for nutritional properties. Amongst newly designed primers, 6 were found polymorphic with median PIC (0.46). The alleles per primer ranged 1 to 8. Cluster analysis based on nutritional and molecular diversities partially matched to each other in principle. Conclusion-The identified novel genetic resources may be used to widen the germplasm base, prepare maintainable catalogue and identify systematic blueprints for future chickpea breeding strategies targeting macro-micro nutrients.


Sign in / Sign up

Export Citation Format

Share Document