Annotating the Insect Regulatory Genome

Motivation: The amount of data produced by genome sequencing experiments has been growing rapidly over the past several years, making compression important for efficient storage, transfer and analysis of the data. In recent years, nanopore sequencing technologies have seen increasing adoption since they are portable, real-time and provide long reads. However, there has been limited progress on compression of nanopore sequencing reads obtained in FASTQ files. Previous work ENANO focuses mostly on quality score compression and does not achieve significant gains for the compression of read sequences over general-purpose compressors. RENANO achieves significantly better compression for read sequences but is limited to aligned data with a reference available. Results: We present NanoSpring, a reference-free compressor for nanopore sequencing reads, relying on an approximate assembly approach. We evaluate NanoSpring on a variety of datasets including bacterial, metagenomic, plant, animal, and human whole genome data. For recently basecalled high quality nanopore datasets, NanoSpring achieves close to 3x improvement in compression over state-of-the-art reference-free compressors. The computational requirements of NanoSpring are practical, although it uses more time and memory during compression than previous tools to achieve the compression gains. Availability: NanoSpring is available on GitHub at https://github.com/qm2/NanoSpring.

Download Full-text

Accurate prediction of cis-regulatory modules reveals a prevalent regulatory genome of humans

10.1101/2020.05.15.098988 ◽

2020 ◽

Author(s):

Pengyu Ni ◽

Zhengchang Su

Keyword(s):

Transcription Factor ◽

Human Genome ◽

Binding Sites ◽

State Of The Art ◽

Accurate Prediction ◽

Evolutionary Constraints ◽

Phase 3 ◽

Regulatory Modules ◽

Regulatory Genome ◽

Encode Phase

AbstractAnnotating all cis-regulatory modules (CRMs) and transcription factor (TF) binding sites(TFBSs) in genomes remains challenging. We tackled the task by integrating putative TFBSs motifs found in available 6,092 datasets covering 77.47% of the human genome. This approach enabled us to partition the covered genome regions into a CRM candidate (CRMC) set (56.84%) and a non-CRMC set (43.16%). Intriguingly, like known enhancers, the predicted 1,404,973 CRMCs are under strong evolutionary constraints, suggesting that they might be cis-regulator. In contrast, the non-CRMCs are largely selectively neutral, suggesting that they might not be cis-regulatory. Our method substantially outperforms three state-of-the-art methods (GeneHancers, EnhancerAtlas and ENCODE phase 3) for recalling VISTA enhancers and ClinVar variants, as well as by measurements of evolutionary constraints. We estimated that the human genome might encode about 1.46 million CRMs and 67 million TFBSs, comprising about 55% and 22% of the genome, respectively; for both of which, we predicted 80%. Therefore, the cis-regulatory genome appears to be more prevalent than originally thought.

Download Full-text

Accurate prediction of cis-regulatory modules reveals a prevalent regulatory genome of humans

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqab052 ◽

2021 ◽

Vol 3 (2) ◽

Author(s):

Pengyu Ni ◽

Zhengchang Su

Keyword(s):

Binding Sites ◽

State Of The Art ◽

Cell Types ◽

Evolutionary Constraints ◽

Specific Cell ◽

Cell Type ◽

Phase 3 ◽

Regulatory Modules ◽

Regulatory Genome ◽

Encode Phase

Abstract cis-regulatory modules(CRMs) formed by clusters of transcription factor (TF) binding sites (TFBSs) are as important as coding sequences in specifying phenotypes of humans. It is essential to categorize all CRMs and constituent TFBSs in the genome. In contrast to most existing methods that predict CRMs in specific cell types using epigenetic marks, we predict a largely cell type agonistic but more comprehensive map of CRMs and constituent TFBSs in the gnome by integrating all available TF ChIP-seq datasets. Our method is able to partition 77.47% of genome regions covered by available 6092 datasets into a CRM candidate (CRMC) set (56.84%) and a non-CRMC set (43.16%). Intriguingly, the predicted CRMCs are under strong evolutionary constraints, while the non-CRMCs are largely selectively neutral, strongly suggesting that the CRMCs are likely cis-regulatory, while the non-CRMCs are not. Our predicted CRMs are under stronger evolutionary constraints than three state-of-the-art predictions (GeneHancer, EnhancerAtlas and ENCODE phase 3) and substantially outperform them for recalling VISTA enhancers and non-coding ClinVar variants. We estimated that the human genome might encode about 1.47M CRMs and 68M TFBSs, comprising about 55% and 22% of the genome, respectively; for both of which, we predicted 80%. Therefore, the cis-regulatory genome appears to be more prevalent than originally thought.

Download Full-text

On Empirically Examining The Effectiveness Of Deep Learning-Based Bug Localization Models

10.32920/ryerson.14648622.v1 ◽

2021 ◽

Author(s):

Sravya Sravya ◽

Andriy Miranskyy ◽

Ayse Bener

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Open Source ◽

State Of The Art ◽

Bug Localization ◽

Baseline Model ◽

Software Developer ◽

The Past ◽

Software Bug ◽

Better Than

Software Bug Localization involves a significant amount of time and effort on the part of the software developer. Many state-of-the-art bug localization models have been proposed in the past, to help developers localize bugs easily. However, none of these models meet the adoption thresholds of the software practitioner. Recently some deep learning-based models have been proposed, that have been shown to perform better than the state-of-the-art models. With this motivation, we experiment on Convolution Neural Networks (CNNs) to examine their effectiveness in localizing bugs. We also train a SimpleLogistic model as a baseline model for our experiments. We train both our models on five open source Java projects and compare their performance across the projects. Our experiments show that the CNN models perform better than the SimpleLogistic models in most of the cases, but do not meet the adoption criteria set by the practitioners.

Download Full-text

On Empirically Examining The Effectiveness Of Deep Learning-Based Bug Localization Models

10.32920/ryerson.14648622 ◽

2021 ◽

Author(s):

Sravya Sravya ◽

Andriy Miranskyy ◽

Ayse Bener

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Open Source ◽

State Of The Art ◽

Bug Localization ◽

Baseline Model ◽

Software Developer ◽

The Past ◽

Software Bug ◽

Better Than

Software Bug Localization involves a significant amount of time and effort on the part of the software developer. Many state-of-the-art bug localization models have been proposed in the past, to help developers localize bugs easily. However, none of these models meet the adoption thresholds of the software practitioner. Recently some deep learning-based models have been proposed, that have been shown to perform better than the state-of-the-art models. With this motivation, we experiment on Convolution Neural Networks (CNNs) to examine their effectiveness in localizing bugs. We also train a SimpleLogistic model as a baseline model for our experiments. We train both our models on five open source Java projects and compare their performance across the projects. Our experiments show that the CNN models perform better than the SimpleLogistic models in most of the cases, but do not meet the adoption criteria set by the practitioners.

Download Full-text

FE-YOLO: A Feature Enhancement Network for Remote Sensing Target Detection

Remote Sensing ◽

10.3390/rs13071311 ◽

2021 ◽

Vol 13 (7) ◽

pp. 1311

Author(s):

Danqing Xu ◽

Yiquan Wu

Keyword(s):

Remote Sensing ◽

Target Detection ◽

State Of The Art ◽

Computing Power ◽

Fast Detection ◽

Feature Enhancement ◽

The Past ◽

Small Targets ◽

Better Than ◽

Great Development

In the past few decades, target detection from remote sensing images gained from aircraft or satellites has become one of the hottest topics. However, the existing algorithms are still limited by the detection of small remote sensing targets. Benefiting from the great development of computing power, deep learning has also made great breakthroughs. Due to a large number of small targets and complexity of background, the task of remote sensing target detection is still a challenge. In this work, we establish a series of feature enhancement modules for the network based on YOLO (You Only Look Once -V3 to improve the performance of feature extraction. Therefore, we term our proposed network as FE-YOLO. In addition, to realize fast detection, the original Darknet-53 was simplified. Experimental results on remote sensing datasets show that our proposed FE-YOLO performs better than other state-of-the-art target detection models.

Download Full-text

A Macintosh Interface for the Cameca Camebax-Micro Electron Microprobe

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s042482010018094x ◽

1990 ◽

Vol 48 (1) ◽

pp. 438-439

Author(s):

Carl E. Henderson

Keyword(s):

Electron Microprobe ◽

Processing Speed ◽

Word Processing ◽

State Of The Art ◽

Computer Control ◽

Third Party ◽

Disk Storage ◽

The Past ◽

User Facility ◽

Instrument Control

Over the past few years it has become apparent in our multi-user facility that the computer system and software supplied in 1985 with our CAMECA CAMEBAX-MICRO electron microprobe analyzer has the greatest potential for improvement and updating of any component of the instrument. While the standard CAMECA software running on a DEC PDP-11/23+ computer under the RSX-11M operating system can perform almost any task required of the instrument, the commands are not always intuitive and can be difficult to remember for the casual user (of which our laboratory has many). Given the widespread and growing use of other microcomputers (such as PC’s and Macintoshes) by users of the microprobe, the PDP has become the “oddball” and has also fallen behind the state-of-the-art in terms of processing speed and disk storage capabilities. Upgrade paths within products available from DEC are considered to be too expensive for the benefits received. After using a Macintosh for other tasks in the laboratory, such as instrument use and billing records, word processing, and graphics display, its unique and “friendly” user interface suggested an easier-to-use system for computer control of the electron microprobe automation. Specifically a Macintosh IIx was chosen for its capacity for third-party add-on cards used in instrument control.

Download Full-text

DO WE TREAT OUR HEAD INJURY PATIENT BETTER THAN THE PAST? A CLINICAL RETROSPECTIVE SURVEY OF HEAD INJURY PATIENT IN 1999 TO 2000

Annals of the College of Surgeons Hong Kong ◽

10.1046/j.1442-2034.2001.00095-12.x ◽

2001 ◽

Vol 5 (1) ◽

pp. A3-A3

Author(s):

Hung Wai Man ◽

Dawson Fong

Keyword(s):

Head Injury ◽

Retrospective Survey ◽

The Past ◽

Injury Patient ◽

Better Than

Download Full-text

Implementasi Algoritma Hamming Distance dan Brute Force dalam Mendeteksi Kemiripan Source Code Bahasa Pemrograman C

Jurnal ULTIMATICS ◽

10.31937/ti.v8i2.514 ◽

2017 ◽

Vol 8 (2) ◽

Author(s):

Andreas Budiman ◽

Dennis Gunawan ◽

Seng Hansun

Keyword(s):

College Presidents ◽

Hamming Distance ◽

Source Code ◽

Structural Type ◽

Brute Force ◽

Important Thing ◽

The Past ◽

Index Terms ◽

Better Than

Plagiarism is a behavior that causes violence of copyrights. Survey shows 55% of college presidents say that plagiarism in students’ papers has increased over the past 10 years. Therefore, an application for detecting plagiarism is needed, especially for teachers. This plagiarism checker application is made by using Visual C# 2010. The plagiarism checker uses hamming distance algorithm for matching line code of the source code. This algorithm works by matching the same length string of the code programs. Thus, it needs brute will be matched with hamming distance. Another important thing for detecting plagiarism is the preprocessing, which is used to help the algorithm for detecting plagiarized source code. This paper shows that the application works good in detecting plagiarism, the hamming distance algorithm and brute force algorithm works better than levenstein distance algorithm for detecting structural type of plagiarism and this thesis also shows that the preprocessing could help the application to increase its percentage and its accuracy. Index Terms—Brute Force, Hamming Distance, Plagiarisme, Preprocessing.

Download Full-text

Ways of Being and Time

10.1093/oso/9780198719656.003.0004 ◽

2017 ◽

Author(s):

Kris McDaniel

Keyword(s):

Being And Time ◽

Present Moment ◽

Ontological Pluralism ◽

The Past ◽

Mode Of Being ◽

Ways Of Being ◽

Better Than

This chapter develops a version of ontological pluralism that respects two common intuitions about time: that the present moment is metaphysically distinguished but not in such a way that the past is unreal. The version of ontological pluralism developed—presentist existential pluralism (PEP)—embraces two modes of being, the mode of being that present objects enjoy and the mode of being that past objects enjoy. The author argues that this view fares at least as well, and probably better, than other views in which the present is metaphysically distinguished. The chapter also introduces another form of ontological superiority called “levels of being.”

Download Full-text