scholarly journals Mathematical Algorithm for Identification of Eukaryotic Promoter Sequences

Symmetry ◽  
2021 ◽  
Vol 13 (6) ◽  
pp. 917
Author(s):  
Eugene V. Korotkov ◽  
Yulia. M. Suvorova ◽  
Anna V. Nezhdanova ◽  
Sofia E. Gaidukova ◽  
Irina V. Yakovleva ◽  
...  

Identification of promoter sequences in the eukaryotic genome, by computer methods, is an important task of bioinformatics. However, this problem has not been solved since the best algorithms have a false positive probability of 10−3–10−4 per nucleotide. As a result of full genome analysis, there may be more false positives than annotated gene promoters. The probability of a false positive should be reduced to 10−6–10−8 to reduce the number of false positives and increase the reliability of the prediction. The method for multi alignment of the promoter sequences was developed. Then, mathematical methods were developed for calculation of the statistically important classes of the promoter sequences. Five promoter classes, from the rice genome, were created. We developed promoter classes to search for potential promoter sequences in the rice genome with a false positive number less than 10−8 per nucleotide. Five classes of promoter sequences contain 1740, 222, 199, 167 and 130 promoters, respectively. A total of 145,277 potential promoter sequences (PPSs) were identified. Of these, 18,563 are promoters of known genes, 87,233 PPSs intersect with transposable elements, and 37,390 PPSs were found in previously unannotated sequences. The number of false positives for a randomly mixed rice genome is less than 10−8 per nucleotide. The method developed for detecting PPSs was compared with some previously used approaches. The developed mathematical method can be used to search for genes, transposable elements, and transcript start sites in eukaryotic genomes.

Geomatics ◽  
2021 ◽  
Vol 1 (1) ◽  
pp. 34-49
Author(s):  
Mael Moreni ◽  
Jerome Theau ◽  
Samuel Foucher

The combination of unmanned aerial vehicles (UAV) with deep learning models has the capacity to replace manned aircrafts for wildlife surveys. However, the scarcity of animals in the wild often leads to highly unbalanced, large datasets for which even a good detection method can return a large amount of false detections. Our objectives in this paper were to design a training method that would reduce training time, decrease the number of false positives and alleviate the fine-tuning effort of an image classifier in a context of animal surveys. We acquired two highly unbalanced datasets of deer images with a UAV and trained a Resnet-18 classifier using hard-negative mining and a series of recent techniques. Our method achieved sub-decimal false positive rates on two test sets (1 false positive per 19,162 and 213,312 negatives respectively), while training on small but relevant fractions of the data. The resulting training times were therefore significantly shorter than they would have been using the whole datasets. This high level of efficiency was achieved with little tuning effort and using simple techniques. We believe this parsimonious approach to dealing with highly unbalanced, large datasets could be particularly useful to projects with either limited resources or extremely large datasets.


2019 ◽  
Vol 152 (Supplement_1) ◽  
pp. S35-S36
Author(s):  
Hadrian Mendoza ◽  
Christopher Tormey ◽  
Alexa Siddon

Abstract In the evaluation of bone marrow (BM) and peripheral blood (PB) for hematologic malignancy, positive immunoglobulin heavy chain (IG) or T-cell receptor (TCR) gene rearrangement results may be detected despite unrevealing results from morphologic, flow cytometric, immunohistochemical (IHC), and/or cytogenetic studies. The significance of positive rearrangement studies in the context of otherwise normal ancillary findings is unknown, and as such, we hypothesized that gene rearrangement studies may be predictive of an emerging B- or T-cell clone in the absence of other abnormal laboratory tests. Data from all patients who underwent IG or TCR gene rearrangement testing at the authors’ affiliated VA hospital between January 1, 2013, and July 6, 2018, were extracted from the electronic medical record. Date of testing; specimen source; and morphologic, flow cytometric, IHC, and cytogenetic characterization of the tissue source were recorded from pathology reports. Gene rearrangement results were categorized as true positive, false positive, false negative, or true negative. Lastly, patient records were reviewed for subsequent diagnosis of hematologic malignancy in patients with positive gene rearrangement results with negative ancillary testing. A total of 136 patients, who had 203 gene rearrangement studies (50 PB and 153 BM), were analyzed. In TCR studies, there were 2 false positives and 1 false negative in 47 PB assays, as well as 7 false positives and 1 false negative in 54 BM assays. Regarding IG studies, 3 false positives and 12 false negatives in 99 BM studies were identified. Sensitivity and specificity, respectively, were calculated for PB TCR studies (94% and 93%), BM IG studies (71% and 95%), and BM TCR studies (92% and 83%). Analysis of PB IG gene rearrangement studies was not performed due to the small number of tests (3; all true negative). None of the 12 patients with false-positive IG/TCR gene rearrangement studies later developed a lymphoproliferative disorder, although 2 patients were later diagnosed with acute myeloid leukemia. Of the 14 false negatives, 10 (71%) were related to a diagnosis of plasma cell neoplasms. Results from the present study suggest that positive IG/TCR gene rearrangement studies are not predictive of lymphoproliferative disorders in the context of otherwise negative BM or PB findings. As such, when faced with equivocal pathology reports, clinicians can be practically advised that isolated positive IG/TCR gene rearrangement results may not indicate the need for closer surveillance.


1987 ◽  
Vol 7 (1) ◽  
pp. 398-402
Author(s):  
T Rutherford ◽  
A W Nienhuis

The contribution of the human globin gene promoters to tissue-specific transcription was studied by using globin promoters to transcribe the neo (G418 resistance) gene. After transfection into different cell types, neo gene expression was assayed by scoring colony formation in the presence of G418. In K562 human erythroleukemia cells, which express fetal and embryonic globin genes but not the adult beta-globin gene, the neo gene was expressed strongly from a fetal gamma- or embryonic zeta-globin gene promoter but only weakly from the beta promoter. In murine erythroleukemia cells which express the endogenous mouse beta genes, the neo gene was strongly expressed from both beta and gamma promoters. In two nonerythroid cell lines, human HeLa cells and mouse 3T3 fibroblasts, the globin gene promoters did not allow neo gene expression. Globin-neo genes were integrated in the erythroleukemia cell genomes mostly as a single copy per cell and were transcribed from the appropriate globin gene cap site. We conclude that globin gene promoter sequences extending from -373 to +48 base pairs (bp) (relative to the cap site) for the beta gene, -385 to +34 bp for the gamma gene, and -555 to +38 bp for the zeta gene are sufficient for tissue-specific and perhaps developmentally specific transcription.


2018 ◽  
Vol 156 (5) ◽  
pp. 234 ◽  
Author(s):  
Karen A. Collins ◽  
Kevin I. Collins ◽  
Joshua Pepper ◽  
Jonathan Labadie-Bartz ◽  
Keivan G. Stassun ◽  
...  

2014 ◽  
Vol 644-650 ◽  
pp. 3338-3341 ◽  
Author(s):  
Guang Feng Guo

During the 30-year development of the Intrusion Detection System, the problems such as the high false-positive rate have always plagued the users. Therefore, the ontology and context verification based intrusion detection model (OCVIDM) was put forward to connect the description of attack’s signatures and context effectively. The OCVIDM established the knowledge base of the intrusion detection ontology that was regarded as the center of efficient filtering platform of the false alerts to realize the automatic validation of the alarm and self-acting judgment of the real attacks, so as to achieve the goal of filtering the non-relevant positives alerts and reduce false positives.


2021 ◽  
Vol 162 (6) ◽  
pp. 258
Author(s):  
Mu-Tian Wang ◽  
Hui-Gen Liu ◽  
Jiapeng Zhu ◽  
Ji-Lin Zhou

Abstract The Kepler mission’s single-band photometry suffers from astrophysical false positives, most commonly of background eclipsing binaries (BEBs) and companion transiting planets (CTPs). Multicolor photometry can reveal the color-dependent depth feature of false positives and thus exclude them. In this work, we aim to estimate the fraction of false positives that cannot be classified by Kepler alone but can be identified from their color-dependent depth feature if a reference band (z, K s , and Transiting Exoplanet Survey Satellite (TESS)) is adopted in follow-up observation. We construct physics-based blend models to simulate multiband signals of false positives. Nearly 65%–95% of the BEBs and more than 80% of the CTPs that host a Jupiter-sized planet will show detectable depth variations if the reference band can achieve a Kepler-like precision. The K s band is most effective in eliminating BEBs exhibiting features of any depth, while the z and TESS bands are better for identifying giant candidates, and their identification rates are more sensitive to photometric precision. Given the radius distribution of planets transiting the secondary star in binary systems, we derive a formalism to calculate the overall identification rate for CTPs. By comparing the likelihood distribution of the double-band depth ratio for BEB and planet models, we calculate the false-positive probability (FPP) for typical Kepler candidates. Additionally, we show that the FPP calculation helps distinguish the planet candidate’s host star in an unresolved binary system. The framework of the analysis in this paper can be easily adapted to predict the multicolor photometric yield for other transit surveys, especially TESS.


Author(s):  
A. Kamionskaya ◽  
E. Korotkov

We represent here a new method for the detection of new copies of SINE elements. The method is based on the correlation of pairs of symbols. The correlation is used for the construction of a position-specific matrix as well as for the search of new repeat copies using the matrix. This allows us to enlarge the alphabet and to increase the sensitivity of the method. The method was used to study the rice genome. As a result, new copies of SINE repeats that were not included in the standard annotation were found. The number of false positives was evaluated.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Pierre Ambrosini ◽  
Eva Hollemans ◽  
Charlotte F. Kweldam ◽  
Geert J. L. H. van Leenders ◽  
Sjoerd Stallinga ◽  
...  

Abstract Cribriform growth patterns in prostate carcinoma are associated with poor prognosis. We aimed to introduce a deep learning method to detect such patterns automatically. To do so, convolutional neural network was trained to detect cribriform growth patterns on 128 prostate needle biopsies. Ensemble learning taking into account other tumor growth patterns during training was used to cope with heterogeneous and limited tumor tissue occurrences. ROC and FROC analyses were applied to assess network performance regarding detection of biopsies harboring cribriform growth pattern. The ROC analysis yielded a mean area under the curve up to 0.81. FROC analysis demonstrated a sensitivity of 0.9 for regions larger than $${0.0150}\,\hbox {mm}^{2}$$ 0.0150 mm 2 with on average 7.5 false positives. To benchmark method performance for intra-observer annotation variability, false positive and negative detections were re-evaluated by the pathologists. Pathologists considered 9% of the false positive regions as cribriform, and 11% as possibly cribriform; 44% of the false negative regions were not annotated as cribriform. As a final experiment, the network was also applied on a dataset of 60 biopsy regions annotated by 23 pathologists. With the cut-off reaching highest sensitivity, all images annotated as cribriform by at least 7/23 of the pathologists, were all detected as cribriform by the network and 9/60 of the images were detected as cribriform whereas no pathologist labelled them as such. In conclusion, the proposed deep learning method has high sensitivity for detecting cribriform growth patterns at the expense of a limited number of false positives. It can detect cribriform regions that are labelled as such by at least a minority of pathologists. Therefore, it could assist clinical decision making by suggesting suspicious regions.


2020 ◽  
Vol 30 (12) ◽  
pp. 1851-1855
Author(s):  
Sruti Rao ◽  
M. B. Goens ◽  
Orrin B. Myers ◽  
Emilie A. Sebesta

AbstractAim:To determine the false-positive rate of pulse oximetry screening at moderate altitude, presumed to be elevated compared with sea level values and assess change in false-positive rate with time.Methods:We retrospectively analysed 3548 infants in the newborn nursery in Albuquerque, New Mexico, (elevation 5400 ft) from July 2012 to October 2013. Universal pulse oximetry screening guidelines were employed after 24 hours of life but before discharge. Newborn babies between 36 and 36 6/7 weeks of gestation, weighing >2 kg and babies >37 weeks weighing >1.7 kg were included in the study. Log-binomial regression was used to assess change in the probability of false positives over time.Results:Of the 3548 patients analysed, there was one true positive with a posteriorly-malaligned ventricular septal defect and an interrupted aortic arch. Of the 93 false positives, the mean pre- and post-ductal saturations were lower, 92 and 90%, respectively. The false-positive rate before April 2013 was 3.5% and after April 2013, decreased to 1.5%. There was a significant decrease in false-positive rate (p = 0.003, slope coefficient = −0.082, standard error of coefficient = 0.023) with the relative risk of a false positive decreasing at 0.92 (95% CI 0.88–0.97) per month.Conclusion:This is the first study in Albuquerque, New Mexico, reporting a high false-positive rate of 1.5% at moderate altitude at the end of the study in comparison to the false-positive rate of 0.035% at sea level. Implementation of the nationally recommended universal pulse oximetry screening was associated with a high false-positive rate in the initial period, thought to be from the combination of both learning curve and altitude. After the initial decline, it remained steadily elevated above sea level, indicating the dominant effect of moderate altitude.


Sign in / Sign up

Export Citation Format

Share Document