scholarly journals DNA Classification using Machine Learning for Detecting Genetic Disorders

Deoxyribonucleic acid is a double- helical molecule composed of two chains that contains genetic instructions. Genetic diseases are caused by changes in pre-existing genes. A genetic abnormality results from the alteration in chromosomes. DNA classification helps to identify genetic disorders in organisms. DNA pattern recognition is a major issue in bioinformatics. DNA is classified into several categories on the basis of Structure, Location, Number of base pairs etc. Traditionally the DNA Molecule is studied by extracting it from the blood sample and is then manually analysed to find out the abnormalities. To increase the accuracy, a machine learning based DNA classification is done which helps in studying the extracted DNA image using various techniques. This consumes minimal amount of time and is more efficient. The image is preprocessed using median filter and canny edge detection. DNA sequences can be recognized correctly and effectively without any uncertainties with the help of Neural Network.The network successfully classifies an image given as input when it is trained with patterns. Thus, we can analyse if a person has a genetic disorder.

PeerJ ◽  
2021 ◽  
Vol 9 ◽  
pp. e12564
Author(s):  
Taifu Wang ◽  
Jinghua Sun ◽  
Xiuqing Zhang ◽  
Wen-Jing Wang ◽  
Qing Zhou

Background Copy-number variants (CNVs) have been recognized as one of the major causes of genetic disorders. Reliable detection of CNVs from genome sequencing data has been a strong demand for disease research. However, current software for detecting CNVs has high false-positive rates, which needs further improvement. Methods Here, we proposed a novel and post-processing approach for CNVs prediction (CNV-P), a machine-learning framework that could efficiently remove false-positive fragments from results of CNVs detecting tools. A series of CNVs signals such as read depth (RD), split reads (SR) and read pair (RP) around the putative CNV fragments were defined as features to train a classifier. Results The prediction results on several real biological datasets showed that our models could accurately classify the CNVs at over 90% precision rate and 85% recall rate, which greatly improves the performance of state-of-the-art algorithms. Furthermore, our results indicate that CNV-P is robust to different sizes of CNVs and the platforms of sequencing. Conclusions Our framework for classifying high-confident CNVs could improve both basic research and clinical diagnosis of genetic diseases.


1986 ◽  
Vol 6 (11) ◽  
pp. 3826-3830 ◽  
Author(s):  
G P Bates ◽  
B J Wainwright ◽  
R Williamson ◽  
S D Brown

A bank of cloned DNA sequences from the distal half of the short arm of human chromosome 2 was generated by using microdissection and microcloning techniques. DNA was purified from 106 chromosomal fragments, manually dissected from peripheral lymphocytes in metaphase, and cloned into the EcoRI site of lambda gt10. A total of 257 putative recombinants were recovered, of which 41% were found to contain human inserts. The mean insert size was 380 base pairs (median size, 83 base pairs), and fewer than 10% of the clones contained highly repetitive sequences. All single-copy sequences examined were shown to map to the short arm of chromosome 2 by using hybrid panels. This technique provides a rapid method of isolating probes specific to a human subchromosomal region to generate linked markers to genetic diseases for which the chromosomal location is known.


2000 ◽  
Vol 23 (2) ◽  
pp. 269-271 ◽  
Author(s):  
Janice Carneiro Coelho ◽  
Roberto Giugliani

Skin biopsies are frequently indicated for investigation and/or confirmation of genetic disorders. Although relatively simple and noninvasive, these procedures require care in order to increase probability of success and to avoid patient discomfort and unnecessary repeated analyses and associated laboratory fees. The present report highlights the importance of skin biopsies in genetic disorder diagnosis and presents general rules for collecting, storing, transporting and processing samples. We recommend its reading to professionals intending to use this important and sometimes fundamental diagnostic tool.


Author(s):  
Riko Nishimura ◽  
Kenji Hata ◽  
Yoshifumi Takahata ◽  
Tomohiko Murakami ◽  
Eriko Nakamura ◽  
...  

Osteoarthritis and rheumatoid arthritis are common cartilage and joint diseases that globally affect more than 200 and 20 million people, respectively. Several transcription factors have been implicated in the onset and progression of osteoarthritis, including Runx2, C/EBPβ, HIF2α, Sox4, and Sox11. IL-1β also leads to osteoarthritis through NF-ĸB, IκBζ, and Zn2+-ZIP8-MTF1 axis. IL-1, IL-6, and TNFα play a major pathological role in rheumatoid arthritis through NF-ĸB and JAK/STAT pathways. Indeed, inhibitory reagents for IL-1, IL-6, and TNFα provide clinical benefits for rheumatoid arthritis patients. Several growth factors, such as BMP, FGF, PTHrP, and Indian hedgehog, play roles regulating chondrocyte proliferation and differentiation. Disruption and excess of these signaling cause genetic disorders in cartilage and skeletal tissues. FOP, an autosomal genetic disorder characterized by ectopic ossification, is induced by mutant ACVR1. mTOR inhibitors were found to prevent ectopic ossification by ACVR1 mutations. ACH and related diseases are autosomal genetic diseases, which manifest severe dwarfism. CNP is currently the most promising therapy for ACH. In these ways, investigation of cartilage and chondrocyte diseases at molecular and cellular levels sheds light on the development of effective therapies. Thus, identification of signaling pathways and transcription factors implicated in these diseases is important.


1986 ◽  
Vol 6 (11) ◽  
pp. 3826-3830
Author(s):  
G P Bates ◽  
B J Wainwright ◽  
R Williamson ◽  
S D Brown

A bank of cloned DNA sequences from the distal half of the short arm of human chromosome 2 was generated by using microdissection and microcloning techniques. DNA was purified from 106 chromosomal fragments, manually dissected from peripheral lymphocytes in metaphase, and cloned into the EcoRI site of lambda gt10. A total of 257 putative recombinants were recovered, of which 41% were found to contain human inserts. The mean insert size was 380 base pairs (median size, 83 base pairs), and fewer than 10% of the clones contained highly repetitive sequences. All single-copy sequences examined were shown to map to the short arm of chromosome 2 by using hybrid panels. This technique provides a rapid method of isolating probes specific to a human subchromosomal region to generate linked markers to genetic diseases for which the chromosomal location is known.


Genetics ◽  
2004 ◽  
Vol 166 (2) ◽  
pp. 661-668
Author(s):  
Mandy Kim ◽  
Erika Wolff ◽  
Tiffany Huang ◽  
Lilit Garibyan ◽  
Ashlee M Earl ◽  
...  

Abstract We have applied a genetic system for analyzing mutations in Escherichia coli to Deinococcus radiodurans, an extremeophile with an astonishingly high resistance to UV- and ionizing-radiation-induced mutagenesis. Taking advantage of the conservation of the β-subunit of RNA polymerase among most prokaryotes, we derived again in D. radiodurans the rpoB/Rif r system that we developed in E. coli to monitor base substitutions, defining 33 base change substitutions at 22 different base pairs. We sequenced >250 mutations leading to Rif r in D. radiodurans derived spontaneously in wild-type and uvrD (mismatch-repair-deficient) backgrounds and after treatment with N-methyl-N′-nitro-N-nitrosoguanidine (NTG) and 5-azacytidine (5AZ). The specificities of NTG and 5AZ in D. radiodurans are the same as those found for E. coli and other organisms. There are prominent base substitution hotspots in rpoB in both D. radiodurans and E. coli. In several cases these are at different points in each organism, even though the DNA sequences surrounding the hotspots and their corresponding sites are very similar in both D. radiodurans and E. coli. In one case the hotspots occur at the same site in both organisms.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Margot Gunning ◽  
Paul Pavlidis

AbstractDiscovering genes involved in complex human genetic disorders is a major challenge. Many have suggested that machine learning (ML) algorithms using gene networks can be used to supplement traditional genetic association-based approaches to predict or prioritize disease genes. However, questions have been raised about the utility of ML methods for this type of task due to biases within the data, and poor real-world performance. Using autism spectrum disorder (ASD) as a test case, we sought to investigate the question: can machine learning aid in the discovery of disease genes? We collected 13 published ASD gene prioritization studies and evaluated their performance using known and novel high-confidence ASD genes. We also investigated their biases towards generic gene annotations, like number of association publications. We found that ML methods which do not incorporate genetics information have limited utility for prioritization of ASD risk genes. These studies perform at a comparable level to generic measures of likelihood for the involvement of genes in any condition, and do not out-perform genetic association studies. Future efforts to discover disease genes should be focused on developing and validating statistical models for genetic association, specifically for association between rare variants and disease, rather than developing complex machine learning methods using complex heterogeneous biological data with unknown reliability.


Genetics ◽  
1974 ◽  
Vol 77 (1) ◽  
pp. 95-104
Author(s):  
J E Sulston ◽  
S Brenner

ABSTRACT Chemical analysis and a study of renaturation kinetics show that the nematode, Caenorhabditis elegans, has a haploid DNA content of 8 x IO7 base pairs (20 times the genome of E. coli). Eighty-three percent of the DNA sequences are unique. The mean base composition is 36% GC; a small component, containing the rRNA cistrons, has a base composition of 51% GC. The haploid genome contains about 300 genes for 4s RNA, 110 for 5s RNA, and 55 for (18 + 28)S RNA.


2021 ◽  
Vol 22 (S3) ◽  
Author(s):  
Junyi Li ◽  
Huinian Li ◽  
Xiao Ye ◽  
Li Zhang ◽  
Qingzhe Xu ◽  
...  

Abstract Background The prediction of long non-coding RNA (lncRNA) has attracted great attention from researchers, as more and more evidence indicate that various complex human diseases are closely related to lncRNAs. In the era of bio-med big data, in addition to the prediction of lncRNAs by biological experimental methods, many computational methods based on machine learning have been proposed to make better use of the sequence resources of lncRNAs. Results We developed the lncRNA prediction method by integrating information-entropy-based features and machine learning algorithms. We calculate generalized topological entropy and generate 6 novel features for lncRNA sequences. By employing these 6 features and other features such as open reading frame, we apply supporting vector machine, XGBoost and random forest algorithms to distinguish human lncRNAs. We compare our method with the one which has more K-mer features and results show that our method has higher area under the curve up to 99.7905%. Conclusions We develop an accurate and efficient method which has novel information entropy features to analyze and classify lncRNAs. Our method is also extendable for research on the other functional elements in DNA sequences.


2021 ◽  
Vol 4 (2) ◽  
pp. 133-141
Author(s):  
Suma Elcy Varghese ◽  
Rana Hassan Mohammad El Otol ◽  
Fatma Sultan Al Olama ◽  
Salah Ahmad Mohamed Elbadawi

<b><i>Background:</i></b> Early detection of diseases in newborn may help in early intervention and treatment, which may either cure the disease or improve the outcome of the patient. Dubai’s Health Authority has a newborn screening program which includes screening for metabolic and genetic conditions, for hearing and vision, and for congenital heart disease. <b><i>Objectives:</i></b> The objectives of this study are to assess the outcome of the newborn genetic screening program, to correlate the association between the outcome of the program and demographic variables and to find out the percentage of the number of infants who were confirmed to have the genetic disease (by confirmatory tests) out of the total infants who had positive screening test results. <b><i>Methods:</i></b> During the period of the study from January 2018 to December 2018, a total of 7,027 newborns were tested in Dubai Health Authority facilities by the newborn genetic screening program (known as the “Step One Screening”). Blood samples were collected by heel prick on a collection paper. All samples were transported to PerkinElmer Genomics in the USA where the tests were done. The genetic disorders identified were correlated with different variables like gender and nationality. The data were entered in an excel sheet and analyzed by using SPSS software. All infants aged 0–3 months who have done newborn genetic screening at Dubai Health Authority facilities between January and December 2018 were included. <b><i>Results:</i></b> The incidence of screened disorders was 1:7,027 for congenital adrenal hyperplasia, 1:1,757 for congenital hypothyroidism, 1:1,757 for inborn errors of metabolism, 1:2,342 for biotinidase deficiency, 1:1,171 for hemoglobinopathies, 1:12 for hemoglobinopathy traits, and 1:10 for different genetic mutations of G6PD deficiency. <b><i>Conclusions:</i></b> There is a high incidence of different genetic diseases detected by newborn screening. These results justify unifying the program in the UAE and preventive programs like premarital screening and genetic counseling.


Sign in / Sign up

Export Citation Format

Share Document