Data Mining for Motifs in DNA Sequences

Author(s):  
D. A. Bell ◽  
J. W. Guan
Keyword(s):  
2013 ◽  
Vol 4 (1) ◽  
pp. 174-178
Author(s):  
Vijay Arputharaj J ◽  
Dr.R. Manicka Chezian

The proposed method is a mixture of several security methods namely digital authentication tag along with the data mining in the DNA database. Data mining in the area of human genetics, an important goal is to understand the mapping relationship between the individual variation in human DNA sequences and variability in various algorithms for database security issues, for mutation susceptibility and parental identification differences. This paper primarily deals with the advancement of genetic algorithm with proper security features in DNA Databases and it enhances the special features in DNA database security. Several security methods include encryption algorithms, higher, not as much of multifaceted with trouble-free to apply in DNA Databases, used for protected database. The Reverse Encryption algorithm to protect data,Advance Cryptography algorithm to resist data, also Advanced Encryption Standard (AES) is most preferable for security in DNA databases.


Biotechnology ◽  
2019 ◽  
pp. 305-321
Author(s):  
Fatima Kabli

The mass of data available on the Internet is rapidly increasing; the complexity of this data is discussed at the level of the multiplicity of information sources, formats, modals, and versions. Facing the complexity of biological data, such as the DNA sequences, protein sequences, and protein structures, the biologist cannot simply use the traditional techniques to analyze this type of data. The knowledge extraction process with data mining methods for the analysis and processing of biological complex data is considered a real scientific challenge in the search for systematically potential relationships without prior knowledge of the nature of these relationships. In this chapter, the authors discuss the Knowledge Discovery in Databases process (KDD) from the Biological Data. They specifically present a state of the art of the best known and most effective methods of data mining for analysis of the biological data and problems of bioinformatics related to data mining.


2011 ◽  
Vol 8 (2) ◽  
pp. 428-440 ◽  
Author(s):  
Kwong-Sak Leung ◽  
Kin Hong Lee ◽  
Jin-Feng Wang ◽  
Eddie Y T Ng ◽  
Henry L Y Chan ◽  
...  

2003 ◽  
Vol 01 (01) ◽  
pp. 139-167 ◽  
Author(s):  
HUIQING LIU ◽  
LIMSOON WONG

We describe a methodology, as well as some related data mining tools, for analyzing sequence data. The methodology comprises three steps: (a) generating candidate features from the sequences, (b) selecting relevant features from the candidates, and (c) integrating the selected features to build a system to recognize specific properties in sequence data. We also give relevant techniques for each of these three steps. For generating candidate features, we present various types of features based on the idea of k-grams. For selecting relevant features, we discuss signal-to-noise, t-statistics, and entropy measures, as well as a correlation-based feature selection method. For integrating selected features, we use machine learning methods, including C4.5, SVM, and Naive Bayes. We illustrate this methodology on the problem of recognizing translation initiation sites. We discuss how to generate and select features that are useful for understanding the distinction between ATG sites that are translation initiation sites and those that are not. We also discuss how to use such features to build reliable systems for recognizing translation initiation sites in DNA sequences.


Author(s):  
Fatima Kabli

The mass of data available on the Internet is rapidly increasing; the complexity of this data is discussed at the level of the multiplicity of information sources, formats, modals, and versions. Facing the complexity of biological data, such as the DNA sequences, protein sequences, and protein structures, the biologist cannot simply use the traditional techniques to analyze this type of data. The knowledge extraction process with data mining methods for the analysis and processing of biological complex data is considered a real scientific challenge in the search for systematically potential relationships without prior knowledge of the nature of these relationships. In this chapter, the authors discuss the Knowledge Discovery in Databases process (KDD) from the Biological Data. They specifically present a state of the art of the best known and most effective methods of data mining for analysis of the biological data and problems of bioinformatics related to data mining.


Author(s):  
David P. Bazett-Jones ◽  
Mark L. Brown

A multisubunit RNA polymerase enzyme is ultimately responsible for transcription initiation and elongation of RNA, but recognition of the proper start site by the enzyme is regulated by general, temporal and gene-specific trans-factors interacting at promoter and enhancer DNA sequences. To understand the molecular mechanisms which precisely regulate the transcription initiation event, it is crucial to elucidate the structure of the transcription factor/DNA complexes involved. Electron spectroscopic imaging (ESI) provides the opportunity to visualize individual DNA molecules. Enhancement of DNA contrast with ESI is accomplished by imaging with electrons that have interacted with inner shell electrons of phosphorus in the DNA backbone. Phosphorus detection at this intermediately high level of resolution (≈lnm) permits selective imaging of the DNA, to determine whether the protein factors compact, bend or wrap the DNA. Simultaneously, mass analysis and phosphorus content can be measured quantitatively, using adjacent DNA or tobacco mosaic virus (TMV) as mass and phosphorus standards. These two parameters provide stoichiometric information relating the ratios of protein:DNA content.


Author(s):  
Barbara Trask ◽  
Susan Allen ◽  
Anne Bergmann ◽  
Mari Christensen ◽  
Anne Fertitta ◽  
...  

Using fluorescence in situ hybridization (FISH), the positions of DNA sequences can be discretely marked with a fluorescent spot. The efficiency of marking DNA sequences of the size cloned in cosmids is 90-95%, and the fluorescent spots produced after FISH are ≈0.3 μm in diameter. Sites of two sequences can be distinguished using two-color FISH. Different reporter molecules, such as biotin or digoxigenin, are incorporated into DNA sequence probes by nick translation. These reporter molecules are labeled after hybridization with different fluorochromes, e.g., FITC and Texas Red. The development of dual band pass filters (Chromatechnology) allows these fluorochromes to be photographed simultaneously without registration shift.


Sign in / Sign up

Export Citation Format

Share Document