scholarly journals Annotating the Insect Regulatory Genome

Insects ◽  
2021 ◽  
Vol 12 (7) ◽  
pp. 591
Author(s):  
Hasiba Asma ◽  
Marc S. Halfon

An ever-growing number of insect genomes is being sequenced across the evolutionary spectrum. Comprehensive annotation of not only genes but also regulatory regions is critical for reaping the full benefits of this sequencing. Driven by developments in sequencing technologies and in both empirical and computational discovery strategies, the past few decades have witnessed dramatic progress in our ability to identify cis-regulatory modules (CRMs), sequences such as enhancers that play a major role in regulating transcription. Nevertheless, providing a timely and comprehensive regulatory annotation of newly sequenced insect genomes is an ongoing challenge. We review here the methods being used to identify CRMs in both model and non-model insect species, and focus on two tools that we have developed, REDfly and SCRMshaw. These resources can be paired together in a powerful combination to facilitate insect regulatory annotation over a broad range of species, with an accuracy equal to or better than that of other state-of-the-art methods.

2021 ◽  
Author(s):  
Qingxi Meng ◽  
Shubham Chandak ◽  
Yifan Zhu ◽  
Tsachy Weissman

Motivation: The amount of data produced by genome sequencing experiments has been growing rapidly over the past several years, making compression important for efficient storage, transfer and analysis of the data. In recent years, nanopore sequencing technologies have seen increasing adoption since they are portable, real-time and provide long reads. However, there has been limited progress on compression of nanopore sequencing reads obtained in FASTQ files. Previous work ENANO focuses mostly on quality score compression and does not achieve significant gains for the compression of read sequences over general-purpose compressors. RENANO achieves significantly better compression for read sequences but is limited to aligned data with a reference available. Results: We present NanoSpring, a reference-free compressor for nanopore sequencing reads, relying on an approximate assembly approach. We evaluate NanoSpring on a variety of datasets including bacterial, metagenomic, plant, animal, and human whole genome data. For recently basecalled high quality nanopore datasets, NanoSpring achieves close to 3x improvement in compression over state-of-the-art reference-free compressors. The computational requirements of NanoSpring are practical, although it uses more time and memory during compression than previous tools to achieve the compression gains. Availability: NanoSpring is available on GitHub at https://github.com/qm2/NanoSpring.


2020 ◽  
Author(s):  
Pengyu Ni ◽  
Zhengchang Su

AbstractAnnotating all cis-regulatory modules (CRMs) and transcription factor (TF) binding sites(TFBSs) in genomes remains challenging. We tackled the task by integrating putative TFBSs motifs found in available 6,092 datasets covering 77.47% of the human genome. This approach enabled us to partition the covered genome regions into a CRM candidate (CRMC) set (56.84%) and a non-CRMC set (43.16%). Intriguingly, like known enhancers, the predicted 1,404,973 CRMCs are under strong evolutionary constraints, suggesting that they might be cis-regulator. In contrast, the non-CRMCs are largely selectively neutral, suggesting that they might not be cis-regulatory. Our method substantially outperforms three state-of-the-art methods (GeneHancers, EnhancerAtlas and ENCODE phase 3) for recalling VISTA enhancers and ClinVar variants, as well as by measurements of evolutionary constraints. We estimated that the human genome might encode about 1.46 million CRMs and 67 million TFBSs, comprising about 55% and 22% of the genome, respectively; for both of which, we predicted 80%. Therefore, the cis-regulatory genome appears to be more prevalent than originally thought.


2021 ◽  
Vol 3 (2) ◽  
Author(s):  
Pengyu Ni ◽  
Zhengchang Su

Abstract cis-regulatory modules(CRMs) formed by clusters of transcription factor (TF) binding sites (TFBSs) are as important as coding sequences in specifying phenotypes of humans. It is essential to categorize all CRMs and constituent TFBSs in the genome. In contrast to most existing methods that predict CRMs in specific cell types using epigenetic marks, we predict a largely cell type agonistic but more comprehensive map of CRMs and constituent TFBSs in the gnome by integrating all available TF ChIP-seq datasets. Our method is able to partition 77.47% of genome regions covered by available 6092 datasets into a CRM candidate (CRMC) set (56.84%) and a non-CRMC set (43.16%). Intriguingly, the predicted CRMCs are under strong evolutionary constraints, while the non-CRMCs are largely selectively neutral, strongly suggesting that the CRMCs are likely cis-regulatory, while the non-CRMCs are not. Our predicted CRMs are under stronger evolutionary constraints than three state-of-the-art predictions (GeneHancer, EnhancerAtlas and ENCODE phase 3) and substantially outperform them for recalling VISTA enhancers and non-coding ClinVar variants. We estimated that the human genome might encode about 1.47M CRMs and 68M TFBSs, comprising about 55% and 22% of the genome, respectively; for both of which, we predicted 80%. Therefore, the cis-regulatory genome appears to be more prevalent than originally thought.


2021 ◽  
Author(s):  
Sravya Sravya ◽  
Andriy Miranskyy ◽  
Ayse Bener

Software Bug Localization involves a significant amount of time and effort on the part of the software developer. Many state-of-the-art bug localization models have been proposed in the past, to help developers localize bugs easily. However, none of these models meet the adoption thresholds of the software practitioner. Recently some deep learning-based models have been proposed, that have been shown to perform better than the state-of-the-art models. With this motivation, we experiment on Convolution Neural Networks (CNNs) to examine their effectiveness in localizing bugs. We also train a SimpleLogistic model as a baseline model for our experiments. We train both our models on five open source Java projects and compare their performance across the projects. Our experiments show that the CNN models perform better than the SimpleLogistic models in most of the cases, but do not meet the adoption criteria set by the practitioners.


2021 ◽  
Author(s):  
Sravya Sravya ◽  
Andriy Miranskyy ◽  
Ayse Bener

Software Bug Localization involves a significant amount of time and effort on the part of the software developer. Many state-of-the-art bug localization models have been proposed in the past, to help developers localize bugs easily. However, none of these models meet the adoption thresholds of the software practitioner. Recently some deep learning-based models have been proposed, that have been shown to perform better than the state-of-the-art models. With this motivation, we experiment on Convolution Neural Networks (CNNs) to examine their effectiveness in localizing bugs. We also train a SimpleLogistic model as a baseline model for our experiments. We train both our models on five open source Java projects and compare their performance across the projects. Our experiments show that the CNN models perform better than the SimpleLogistic models in most of the cases, but do not meet the adoption criteria set by the practitioners.


2021 ◽  
Vol 13 (7) ◽  
pp. 1311
Author(s):  
Danqing Xu ◽  
Yiquan Wu

In the past few decades, target detection from remote sensing images gained from aircraft or satellites has become one of the hottest topics. However, the existing algorithms are still limited by the detection of small remote sensing targets. Benefiting from the great development of computing power, deep learning has also made great breakthroughs. Due to a large number of small targets and complexity of background, the task of remote sensing target detection is still a challenge. In this work, we establish a series of feature enhancement modules for the network based on YOLO (You Only Look Once -V3 to improve the performance of feature extraction. Therefore, we term our proposed network as FE-YOLO. In addition, to realize fast detection, the original Darknet-53 was simplified. Experimental results on remote sensing datasets show that our proposed FE-YOLO performs better than other state-of-the-art target detection models.


Author(s):  
Carl E. Henderson

Over the past few years it has become apparent in our multi-user facility that the computer system and software supplied in 1985 with our CAMECA CAMEBAX-MICRO electron microprobe analyzer has the greatest potential for improvement and updating of any component of the instrument. While the standard CAMECA software running on a DEC PDP-11/23+ computer under the RSX-11M operating system can perform almost any task required of the instrument, the commands are not always intuitive and can be difficult to remember for the casual user (of which our laboratory has many). Given the widespread and growing use of other microcomputers (such as PC’s and Macintoshes) by users of the microprobe, the PDP has become the “oddball” and has also fallen behind the state-of-the-art in terms of processing speed and disk storage capabilities. Upgrade paths within products available from DEC are considered to be too expensive for the benefits received. After using a Macintosh for other tasks in the laboratory, such as instrument use and billing records, word processing, and graphics display, its unique and “friendly” user interface suggested an easier-to-use system for computer control of the electron microprobe automation. Specifically a Macintosh IIx was chosen for its capacity for third-party add-on cards used in instrument control.


2017 ◽  
Vol 8 (2) ◽  
Author(s):  
Andreas Budiman ◽  
Dennis Gunawan ◽  
Seng Hansun

Plagiarism is a behavior that causes violence of copyrights. Survey shows 55% of college presidents say that plagiarism in students’ papers has increased over the past 10 years. Therefore, an application for detecting plagiarism is needed, especially for teachers. This plagiarism checker application is made by using Visual C# 2010. The plagiarism checker uses hamming distance algorithm for matching line code of the source code. This algorithm works by matching the same length string of the code programs. Thus, it needs brute will be matched with hamming distance. Another important thing for detecting plagiarism is the preprocessing, which is used to help the algorithm for detecting plagiarized source code. This paper shows that the application works good in detecting plagiarism, the hamming distance algorithm and brute force algorithm works better than levenstein distance algorithm for detecting structural type of plagiarism and this thesis also shows that the preprocessing could help the application to increase its percentage and its accuracy. Index Terms—Brute Force, Hamming Distance, Plagiarisme, Preprocessing.


Author(s):  
Kris McDaniel

This chapter develops a version of ontological pluralism that respects two common intuitions about time: that the present moment is metaphysically distinguished but not in such a way that the past is unreal. The version of ontological pluralism developed—presentist existential pluralism (PEP)—embraces two modes of being, the mode of being that present objects enjoy and the mode of being that past objects enjoy. The author argues that this view fares at least as well, and probably better, than other views in which the present is metaphysically distinguished. The chapter also introduces another form of ontological superiority called “levels of being.”


Sign in / Sign up

Export Citation Format

Share Document