scholarly journals Bovine Genome Analysis to Unravel the Location and Feature of Target Sites of RNA-Guided Hyperactivated Recombinase Gin with Spacer Length Six

Author(s):  
Shalu Kumari Pathak ◽  
Arvind Sonwane ◽  
Subodh Kumar

Background: Programmable nucleases are very promising tools of genome editing (GE), but they suffer from limitations including potential risk of genotoxicity which led to the exploration of safer approach of GE based on RNA-guided recombinase (RGR) platform. RNA-guided recombinase (RGR) platform operates on a typical recognition or target site comprised of the minimal pseudo-core recombinase site, a 5 to 6-base pair spacer flanking it and whole this central region is flanked by two guide RNA-specified DNA sequences or Cas9 binding sites followed by protospacer adjacent motifs (PAMs). Methods: The current study focuses on analysis of entire cattle genome to prepare a detailed map of target sites for RNA-guided hyperactivated recombinase Gin with spacer length six. For this, chromosome wise whole genomic sequence data was retrieved from Ensembl. After that search pattern for recombinase Gin with spacer length six was designed. By using this search pattern, RGR target sites were located by using dreg program of Emboss package. Result: Total number of RGR target sites identified in bovine genome for recombinase Gin was 677 with spacer length six. It was also investigated that whether these RGR target sites are present with in any gene or not and it was found that RGR target sites lies in both genic and intergenic region. Besides this, description of genes in context with these target sites was identified.

Author(s):  
Shalu Kumari Pathak ◽  
Arvind Sonwane ◽  
Subodh Kumar

The present study was conducted to determine the effect of different cooling systems; Fan Fogger (FF) and Fan Pad (FP) on micro environment of poultry house, thermal comfort, welfare, egg production and egg quality parameters of laying hens. This experiment was conducted on 210, White Leghorn laying pullets (32 weeks old) during hot-dry summer months (May - July) under deep litter system of housing. The FP and FF cooling systems significantly dropped the mean shed temperature and increased the relative humidity. Thus, better THI resulted in increase in egg production by 4.66 % and 3.32 % under FP and FF systems over the control group. However, specific gravity, H.U, egg shell thickness, yolk index and yolk color were not significantly influenced by cooling treatments. Significantly lower levels of antioxidant enzymes viz. LPO, Catalase, G6PD, GPx and SOD was registered in cooling groups. Both the cooling devices contributed towards bird welfare by altering the behavioral expression from agonistic to non-agonistic activities.RNA-guided recombinases (RGR) are potentially valuable tools for basic research and genetic modifications. The platform has been demonstrated to do genome editing efficiently. The platform operates on a typical recognition site comprised of the degenerate recombinase site, a 5 to 6-base pair spacer flanking it and this whole central region is flanked by two guide RNA-specified DNA sequences or Cas9 binding sites which is followed by protospacer adjacent motifs. In present investigation, a detailed map of target sites for RNA-guided recombinase platforms based on hyperactivated recombinase Beta throughout the bovine genome was prepared. For this, Chromosome wise whole genomic sequence data was retrieved from Ensembl followed by designing search pattern for recombinase Beta with spacer length five. By using this search pattern, RGR target sites were located by using dreg program of Emboss package. In total,436 RGR target sites were identified in bovine genome for recombinase Beta with spacer length five. These RGR target site provide potential of being utilized for specific genomic integration, deletion or inversion.


2005 ◽  
Vol 44 (05) ◽  
pp. 687-692 ◽  
Author(s):  
B. A. Malin

Summary Objectives: Current genomic privacy technologies assume the identity of genomic sequence data is protected if personal information, such as demographics, are obscured, removed, or encrypted. While demographic features can directly compromise an individual’s identity, recent research demonstrates such protections are insufficient because sequence data itself is susceptible to re-identification. To counteract this problem, we introduce an algorithm for anonymizing a collection of person-specific DNA sequences. Methods: The technique is termed DNA lattice an-onymization (DNALA), and is based upon the formal privacy protection schema of k-anonymity. Under this model, it is impossible to observe or learn features that distinguish one genetic sequence from k-1 other entries in a collection. To maximize information retained in protected sequences, we incorporate a concept generalization lattice to learn the distance between two residues in a single nucleotide region. The lattice provides the most similar generalized concept for two residues (e.g. adenine and guanine are both purines). Results: The method is tested and evaluated with several publicly available human population datasets ranging in size from 30 to 400 sequences. Our findings imply the anonymization schema is feasible for the protection of sequences privacy. Conclusions: The DNALA method is the first computational disclosure control technique for general DNA sequences. Given the computational nature of the method, guarantees of anonymity can be formally proven. There is room for improvement and validation, though this research provides the groundwork from which future researchers can construct genomics anonymization schemas tailored to specific data-sharing scenarios.


2019 ◽  
Author(s):  
Sandeep Chakraborty

Concerns about the unintended effects of gene-editing is not new. Kim, et al, 2014 warned about ”unwanted integration of DNA segments derived from plasmids encoding Cas9 and guide RNA at both on- target and off-target sites”. Thus, there was nothing unexpected when in 2019 a FDA-authored pre-print reported template plasmid integration in the 2016 TALEN-edited hornless cows (Accid:PRJNA316122). Soon, yet another paper confirmed this on-target integration (Accid:PRJNA494431), and also reported that 4 out of 6 calves sired by one of the cow have inherited this plasmid in the targeted site on chromosome 1. However, this paper failed to report another definite off-target integration (in chromosome 14) of the beta-lactamase from the plasmid. Sequencing fragments, and then, samples the genome in any single pass, with the hope with enough passes the full genome would have been covered. For a roughly 3GB genome (as the bovine genome), 10-15 passes is not enough. In fact a read that is completely plasmid (and there are many) could be an integration anywhere. Thus, there might remain other integrations of the plasmid that are unobserved.


2019 ◽  
Vol 88 (1) ◽  
pp. 191-220 ◽  
Author(s):  
Daesik Kim ◽  
Kevin Luk ◽  
Scot A. Wolfe ◽  
Jin-Soo Kim

Programmable nucleases and deaminases, which include zinc-finger nucleases, transcription activator-like effector nucleases, CRISPR RNA-guided nucleases, and RNA-guided base editors, are now widely employed for the targeted modification of genomes in cells and organisms. These gene-editing tools hold tremendous promise for therapeutic applications. Importantly, these nucleases and deaminases may display off-target activity through the recognition of near-cognate DNA sequences to their target sites, resulting in collateral damage to the genome in the form of local mutagenesis or genomic rearrangements. For therapeutic genome-editing applications with these classes of programmable enzymes, it is essential to measure and limit genome-wide off-target activity. Herein, we discuss the key determinants of off-target activity for these systems. We describe various cell-based and cell-free methods for identifying genome-wide off-target sites and diverse strategies that have been developed for reducing the off-target activity of programmable gene-editing enzymes.


Proteomes ◽  
2019 ◽  
Vol 7 (2) ◽  
pp. 19
Author(s):  
Yoji Igarashi ◽  
Daisuke Mori ◽  
Susumu Mitsuyama ◽  
Kazutoshi Yoshitake ◽  
Hiroaki Ono ◽  
...  

Metagenomic data have mainly been addressed by showing the composition of organisms based on a small part of a well-examined genomic sequence, such as ribosomal RNA genes and mitochondrial DNAs. On the contrary, whole metagenomic data obtained by the shotgun sequence method have not often been fully analyzed through a homology search because the genomic data in databases for living organisms on earth are insufficient. In order to complement the results obtained through homology-search-based methods with shotgun metagenomes data, we focused on the composition of protein domains deduced from the sequences of genomes and metagenomes, and we utilized them in characterizing genomes and metagenomes, respectively. First, we compared the relationships based on similarities in the protein domain composition with the relationships based on sequence similarities. We searched for protein domains of 325 bacterial species produced using the Pfam database. Next, the correlation coefficients of protein domain compositions between every pair of bacteria were examined. Every pairwise genetic distance was also calculated from 16S rRNA or DNA gyrase subunit B. We compared the results of these methods and found a moderate correlation between them. Essentially, the same results were obtained when we used partial random 100 bp DNA sequences of the bacterial genomes, which simulated raw sequence data obtained from short-read next-generation sequences. Then, we applied the method for analyzing the actual environmental data obtained by shotgun sequencing. We found that the transition of the microbial phase occurred because the seasonal change in water temperature was shown by the method. These results showed the usability of the method in characterizing metagenomic data based on protein domain compositions.


2019 ◽  
Author(s):  
Lizhen Shi ◽  
Bo Chen

ABSTRACTDrawing from the analogy between natural language and "genomic sequence language", we explored the applicability of word embeddings in natural language processing (NLP) to represent DNA reads in Metagenomics studies. Here, k-mer is the equivalent concept of word in NLP and it has been widely used in analyzing sequence data. However, directly replacing word embedding with k-mer embedding is problematic due to two reasons: First, the number of k-mers is many times of the number of words in NLP, making the model too big to be useful. Second, sequencing errors create lots of rare k-mers (noise), making the model hard to be trained. In this work, we leverage Locality Sensitive Hashing (LSH) to overcoming these challenges. We then adopted the skip-gram with negative sampling model to learn k-mer embeddings. Experiments on metagenomic datasets with labels demonstrated that LSH can not only accelerate training time and reduce the memory requirements to store the model, but also achieve higher accuracy than alternative methods. Finally, we demonstrate the trained low-dimensional k-mer embeddings can be potentially used for accurate metagenomic read clustering and predict their taxonomy, and this method is robust on reads with high sequencing error rates (12-22%).


2019 ◽  
Author(s):  
Sandeep Chakraborty

‘Prime-editing’ proposes to replace traditional programmable nucleases (CRISPR-Cas9) using a catalytically impaired Cas9 (dCas9) connected to a engineered reverse transcriptase, and a guide RNA encoding both the target site and the desired change. With just a ‘nick’ on one strand, it is hypothe- sized, the negative, uncontrollable effects arising from double-strand DNA breaks (DSBs) - translocations, complex proteins, integrations and p53 activation - will be eliminated. However, sequencing data pro- vided (Accid:PRJNA565979) reveal plasmid integration, indicating that DSBs occur. Also, looking at only 16 off-targets is inadequate to assert that Prime-editing is more precise. Integration of plasmid occurs in all three versions (PE1/2/3). Interestingly, dCas9 which is known to be toxic in E. coli and yeast, is shown to have residual endonuclease activity. This also affects studies that use dCas9, like base- editors and de/methylations systems. Previous work using hRad51–Cas9 nickases also show significant integration in on-targets, as well as off-target integration [1]. Thus, we show that cellular response to nicking involves DSBs, and subsequent plasmid/Cas9 integration. This is an unacceptable outcome for any in vivo application in human therapy.


2020 ◽  
Vol 15 ◽  
Author(s):  
Affan Alim ◽  
Abdul Rafay ◽  
Imran Naseem

Background: Proteins contribute significantly in every task of cellular life. Their functions encompass the building and repairing of tissues in human bodies and other organisms. Hence they are the building blocks of bones, muscles, cartilage, skin, and blood. Similarly, antifreeze proteins are of prime significance for organisms that live in very cold areas. With the help of these proteins, the cold water organisms can survive below zero temperature and resist the water crystallization process which may cause the rupture in the internal cells and tissues. AFP’s have attracted attention and interest in food industries and cryopreservation. Objective: With the increase in the availability of genomic sequence data of protein, an automated and sophisticated tool for AFP recognition and identification is in dire need. The sequence and structures of AFP are highly distinct, therefore, most of the proposed methods fail to show promising results on different structures. A consolidated method is proposed to produce the competitive performance on highly distinct AFP structure. Methods: In this study, we propose to use machine learning-based algorithms Principal Component Analysis (PCA) followed by Gradient Boosting (GB) for antifreeze protein identification. To analyze the performance and validation of the proposed model, various combinations of two segments composition of amino acid and dipeptide are used. PCA, in particular, is proposed to dimension reduction and high variance retaining of data which is followed by an ensemble method named gradient boosting for modelling and classification. Results: The proposed method obtained the superfluous performance on PDB, Pfam and Uniprot dataset as compared with the RAFP-Pred method. In experiment-3, by utilizing only 150 PCA components a high accuracy of 89.63 was achieved which is superior to the 87.41 utilizing 300 significant features reported for the RAFP-Pred method. Experiment-2 is conducted using two different dataset such that non-AFP from the PISCES server and AFPs from Protein data bank. In this experiment-2, our proposed method attained high sensitivity of 79.16 which is 12.50 better than state-of-the-art the RAFP-pred method. Conclusion: AFPs have a common function with distinct structure. Therefore, the development of a single model for different sequences often fails to AFPs. A robust results have been shown by our proposed model on the diversity of training and testing dataset. The results of the proposed model outperformed compared to the previous AFPs prediction method such as RAFP-Pred. Our model consists of PCA for dimension reduction followed by gradient boosting for classification. Due to simplicity, scalability properties and high performance result our model can be easily extended for analyzing the proteomic and genomic dataset.


Author(s):  
Kuldeepsingh A. Kalariya ◽  
Ram Prasnna Meena ◽  
Lipi Poojara ◽  
Deepa Shahi ◽  
Sandip Patel

Abstract Background Squalene synthase (SQS) is a rate-limiting enzyme necessary to produce pentacyclic triterpenes in plants. It is an important enzyme producing squalene molecules required to run steroidal and triterpenoid biosynthesis pathways working in competitive inhibition mode. Reports are available on information pertaining to SQS gene in several plants, but detailed information on SQS gene in Gymnema sylvestre R. Br. is not available. G. sylvestre is a priceless rare vine of central eco-region known for its medicinally important triterpenoids. Our work aims to characterize the GS-SQS gene in this high-value medicinal plant. Results Coding DNA sequences (CDS) with 1245 bp length representing GS-SQS gene predicted from transcriptome data in G. sylvestre was used for further characterization. The SWISS protein structure modeled for the GS-SQS amino acid sequence data had MolProbity Score of 1.44 and the Clash Score 3.86. The quality estimates and statistical score of Ramachandran plots analysis indicated that the homology model was reliable. For full-length amplification of the gene, primers designed from flanking regions of CDS encoding GS-SQS were used to get amplification against genomic DNA as template which resulted in approximately 6.2-kb sized single-band product. The sequencing of this product through NGS was carried out generating 2.32 Gb data and 3347 number of scaffolds with N50 value of 457 bp. These scaffolds were compared to identify similarity with other SQS genes as well as the GS-SQSs of the transcriptome. Scaffold_3347 representing the GS-SQS gene harbored two introns of 101 and 164 bp size. Both these intronic regions were validated by primers designed from adjoining outside regions of the introns on the scaffold representing GS-SQS gene. The amplification took place when the template was genomic DNA and failed when the template was cDNA confirmed the presence of two introns in GS-SQS gene in Gymnema sylvestre R. Br. Conclusion This study shows GS-SQS gene was very closely related to Coffea arabica and Gardenia jasminoides and this gene harbored two introns of 101 and 164 bp size.


Sign in / Sign up

Export Citation Format

Share Document