scholarly journals DHSpred: support-vector-machine-based human DNase I hypersensitive sites prediction using the optimal features selected by random forest

2017 ◽  
Author(s):  
Balachandran Manavalan ◽  
Tae Hwan Shin ◽  
Gwang Lee

AbstractDNase I hypersensitive sites (DHSs) are genomic regions that provide important information regarding the presence of transcriptional regulatory elements and the state of chromatin. Therefore, identifying DHSs in uncharacterized DNA sequences is crucial for understanding their biological functions and mechanisms. Although many experimental methods have been proposed to identify DHSs, they have proven to be expensive for genome-wide application. Therefore, it is necessary to develop computational methods for DHS prediction. In this study, we proposed a support vector machine (SVM)-based method for predicting DHSs, called DHSpred (DNase I Hypersensitive Site predictor in human DNA sequences), which was trained with 174 optimal features. The optimal combination of features was identified from a large set that included nucleotide composition and di- and trinucleotide physicochemical properties, using a random forest algorithm. DHSpred achieved a Matthews correlation coefficient and accuracy of 0.660 and 0.871, respectively, which were 3% higher than those of control SVM predictors trained with non-optimized features, indicating the efficiency of the feature selection method. Furthermore, the performance of DHSpred was superior to that of state-of-the-art predictors. An online prediction server has been developed to assist the scientific community, and is freely available at:http://www.thegleelab.org/DHSpred.html.

Blood ◽  
1996 ◽  
Vol 87 (7) ◽  
pp. 2750-2761 ◽  
Author(s):  
A Sinclair ◽  
B Daly ◽  
E Dzierzak

The Ly-6E.1/A.2 gene product recognized by the Sca-1 antibody has been found on murine hematopoietic stem cells and some hematopoietic precursors, T lymphocytes, and nonhematopoietic cell lineages, suggesting a complex array of gene regulatory elements. The ability to use the Ly6E.1/A.2 transcriptional regulatory elements to direct expression of heterologous genes will allow for the manipulation of these cells during development and in hematopoietic cell transplantations. To identify the elements necessary for high-level expression, we have made deletion constructs of Ly-6E.1 gene flanking regions containing DNase I hypersensitive sites, tested them for expression in hematopoietic cells, and have performed kinetic analyses to correlate the appearance of hypersensitive sites with gene transcription and protein expression. We show that a 3′ region containing two DNase I hypersensitive sites at +8.7 and +8.9 kb is required for high-level, gamma-interferon (gamma-IFN)-induced expression of the Ly-6E.1 gene and that a consensus sequence for a gamma-IFN-responsive element localizes to the +8.7 site. We also provide a description of allele- and cell-specific DNase I hypersensitive site patterns of the Ly-6E.1 and Ly-6A.2 genes. Taken together, these data indicate that while both 5′ and 3′ hypersensitive sites are rapidly induced with gamma-IFN, the 3′ most distal hypersensitive sites are involved in directing high levels of expression of Sca-1 in hematopoietic cells.


2014 ◽  
Vol 2014 ◽  
pp. 1-4 ◽  
Author(s):  
Pengmian Feng ◽  
Ning Jiang ◽  
Nan Liu

DNase I hypersensitive sites (DHS) associated with a wide variety of regulatory DNA elements. Knowledge about the locations of DHS is helpful for deciphering the function of noncoding genomic regions. With the acceleration of genome sequences in the postgenomic age, it is highly desired to develop cost-effective computational methods to identify DHS. In the present work, a support vector machine based model was proposed to identify DHS by using the pseudo dinucleotide composition. In the jackknife test, the proposed model obtained an accuracy of 83%, which is competitive with that of the existing method. This result suggests that the proposed model may become a useful tool for DHS identifications.


2002 ◽  
Vol 269 (2) ◽  
pp. 553-559 ◽  
Author(s):  
Marios Phylactides ◽  
Rebecca Rowntree ◽  
Hugh Nuthall ◽  
David Ussery ◽  
Ann Wheeler ◽  
...  

2017 ◽  
Vol 8 (1) ◽  
Author(s):  
Matteo D′Antonio ◽  
Donate Weghorn ◽  
Agnieszka D′Antonio-Chronowska ◽  
Florence Coulet ◽  
Katrina M. Olson ◽  
...  

2017 ◽  
Vol 10 (3) ◽  
pp. 683-690 ◽  
Author(s):  
Kamalpreet Kaur ◽  
O.P. Guptata

Maturity checking has become mandatory for the food industries as well as for the farmers so as to ensure that the fruits and vegetables are not diseased and are ripe. However, manual inspection leads to human error, unripe fruits and vegetables may decrease the production [3]. Thus, this study proposes a Tomato Classification system for determining maturity stages of tomato through Machine Learning which involves training of different algorithms like Decision Tree, Logistic Regression, Gradient Boosting, Random Forest, Support Vector Machine, K-NN and XG Boost. This system consists of image collection, feature extraction and training the classifiers on 80% of the total data. Rest 20% of the total data is used for the testing purpose. It is concluded from the results that the performance of the classifier depends on the size and kind of features extracted from the data set. The results are obtained in the form of Learning Curve, Confusion Matrix and Accuracy Score. It is observed that out of seven classifiers, Random Forest is successful with 92.49% accuracy due to its high capability of handling large set of data. Support Vector Machine has shown the least accuracy due to its inability to train large data set.


Blood ◽  
2004 ◽  
Vol 104 (11) ◽  
pp. 1219-1219
Author(s):  
Ping Xiang ◽  
Hemei Han ◽  
Xiangdong Fang ◽  
George Stamatoyannopoulos ◽  
Qiliang Li

Abstract Formation of DNase I hypersensitive sites is an indication of local disruption of chromatin conformation. It has been documented that HS sites are frequently associated with functional DNA sequences, such as, promoters, enhancers, and insulators. While Southern blot hybridization is the standard method to detect HS sites, this procedure is time-consuming and labor intensive. To improve the efficiency of HS detection through Southern blot hybridization, we designed a contigs strategy of Southern blot hybridization and test it in the 200 kb region 5′ to the LCR in the b-globin locus. Based on the human genome sequence we made physical maps of seven 6-bp-cut restriction enzymes in the 200 kb region. From the map we selected continuous contigs of 10 to 15 kb fragment; and designed hybridization probes for the 5′ and 3′ ends of each fragment (some probes can be used in two neighboring fragments). The screening was performed on erythroid (K562) and non-erythroid (Jurkat) cell lines. We found about 40 HS sites within the region. The major sites were either erythroid specific (for instance, HSs at −66 kb, −142 kb, and −236 kb, the cap site of the e-globin gene is +1), or non-erythroid specific (for instances, HSs at −111 kb, −164 kb, and −205 kb). These HS sites will be investigated for enhancer, promoter, and insulator function using transient and stable transfection studies. Due to the limited number of enzyme required and the fact that each blot could be used several times, this strategy can greatly expedite the screening process for presence of DNase I hypersensitive sites. Estimated efficiency of this screening approach is about 0.5 to1 Mb per person per year.


Sign in / Sign up

Export Citation Format

Share Document