Pygenprop: a Python library for programmatic exploration and comparison of organism genome properties

Lee H Bergstrand; Josh D Neufeld; Andrew C Doxey

doi:10.1093/bioinformatics/btz522

Pygenprop: a Python library for programmatic exploration and comparison of organism genome properties

Bioinformatics ◽

10.1093/bioinformatics/btz522 ◽

2019 ◽

Vol 35 (23) ◽

pp. 5063-5065

Author(s):

Lee H Bergstrand ◽

Josh D Neufeld ◽

Andrew C Doxey

Keyword(s):

Data Science ◽

Source Code ◽

Open Reading Frames ◽

Third Party ◽

Partial Support ◽

Biochemical Pathways ◽

Bioinformatics Software ◽

Using Data ◽

Genome Analyses ◽

Reading Frames

Abstract Summary A critical step in comparative genomics is the identification of differences in the presence/absence of encoded biochemical pathways among organisms. Our library, Pygenprop, facilitates these comparisons using data from the Genome Properties database. Pygenprop is written in Python and, unlike existing libraries, it is compatible with a variety of tools in the Python data science ecosystem, such as Jupyter Notebooks for interactive analyses and scikit-learn for machine learning. Pygenprop assigns YES, NO, or PARTIAL support for each property based on InterProScan annotations of open reading frames from an organism’s genome. The library contains classes for representing the Genome Properties database as a whole and methods for detecting differences in property assignments between organisms. As the Genome Properties database grows, we anticipate widespread adoption of Pygenprop for routine genome analyses and integration within third-party bioinformatics software. Availability and implementation Pygenprop is written in Python and is compatible with versions 3.6 or higher. Source code is available under Apache Licence Version 2 at https://github.com/Micromeda/pygenprop. The package can be installed from both PyPi (https://pypi.org/project/pygenprop) and Anaconda (https://anaconda.org/lbergstrand/pygenprop). Documentation is available on Read the Docs (http://pygenprop.rtfd.io/).

Download Full-text

ATUCG — AN AGENT–BASED ENVIRONMENT FOR AUTOMATIC ANNOTATION OF GENOMES

International Journal of Cooperative Information Systems ◽

10.1142/s0218843003000735 ◽

2003 ◽

Vol 12 (02) ◽

pp. 241-273 ◽

Cited By ~ 5

Author(s):

ANA L. C. BAZZAN ◽

ROGÉRIO DUARTE ◽

ABNER N. PITINGA ◽

LUCIANA F. SCHROEDER ◽

FARLON DE A. SOUTO ◽

...

Keyword(s):

Machine Learning ◽

Mycoplasma Hyopneumoniae ◽

Machine Learning Algorithms ◽

Open Reading Frames ◽

Automatic Annotation ◽

Agent Based ◽

The Core ◽

Layer I ◽

Using Data ◽

Reading Frames

This work reports on the ATUCG environment (Agent-based environmenT for aUtomatiC annotation of Genomes). It consists of three layers, each having several agents in charge of performing repetitive and time-consuming tasks. Layer I aims at automating the tasks behind the process of finding ORFs (Open Reading Frames). Layer II (the core of our approach) is associated with three main tasks: extraction and formatting of data, automatic annotation of data regarding profiles or families of proteins, and generation and validation of rules to automatically annotate the Keywords field in the SWISS-PROT database. Layer III permits the user to check the correctness of the automatic annotation. This environment is being designed having the sequencing of the Mycoplasma hyopneumoniae in mind. Thus examples are presented using data of organisms of the Mycoplasmataceae family. We have concentrated the developments in layer II because this is the most general one and because it focusses on machine learning algorithms, a characteristic which is not usual in annotation systems. Results regarding this layer show that with learning (individual or colaborative), agents are able to generate rules for annotation which achieve better results than those reported in the literature.

Download Full-text

Automation of Data Consumption by Pluggable Module Software

Journal of University of Shanghai for Science and Technology ◽

10.51201/jusst/21/06479 ◽

2021 ◽

Vol 23 (06) ◽

pp. 1672-1681

Author(s):

Vinay Balamurali ◽

◽

Prof. Venkatesh S ◽

Keyword(s):

Software Design ◽

Data Science ◽

Source Code ◽

Application Programming Interface ◽

Related Information ◽

Server Systems ◽

Data Consumption ◽

Application Programming ◽

Using Data ◽

Programming Interface

Servers are required to monitor the health of the various I/O cards connected to it to alert the required personnel to service these cards. The Data Collection Unit (DCU) is responsible for detecting the I/O cards, sending their inventory as well as monitoring their health. Currently, the keys required to detect these I/O cards are manually coded into the source code. Such a task is highly laborious and time-consuming. To eliminate this manual work, a Software Pluggable Module was devised which would read the I/O card-related information from the I/O component list. This software design aims at using Data Science and OOPS concepts to automate certain tasks on server systems. The proposed methodology is implemented on a Linux system. The software design is modular in nature and extensible to accommodate future requirements. Such an automation framework can be used to track information maintained in Excel Spreadsheets and access them using an Application Programming Interface (API).

Download Full-text

A tale of two clades: monkeypox viruses

Journal of General Virology ◽

10.1099/vir.0.81215-0 ◽

2005 ◽

Vol 86 (10) ◽

pp. 2661-2672 ◽

Cited By ~ 162

Author(s):

Anna M. Likos ◽

Scott A. Sammons ◽

Victoria A. Olson ◽

A. Michael Frace ◽

Yu Li ◽

...

Keyword(s):

Democratic Republic ◽

Clinical Laboratory ◽

West African ◽

Open Reading Frames ◽

Congo Basin ◽

The Usa ◽

Republic Of The Congo ◽

Using Data ◽

And Control ◽

Reading Frames

Human monkeypox was first recognized outside Africa in 2003 during an outbreak in the USA that was traced to imported monkeypox virus (MPXV)-infected West African rodents. Unlike the smallpox-like disease described in the Democratic Republic of the Congo (DRC; a Congo Basin country), disease in the USA appeared milder. Here, analyses compared clinical, laboratory and epidemiological features of confirmed human monkeypox case-patients, using data from outbreaks in the USA and the Congo Basin, and the results suggested that human disease pathogenicity was associated with the viral strain. Genomic sequencing of USA, Western and Central African MPXV isolates confirmed the existence of two MPXV clades. A comparison of open reading frames between MPXV clades permitted prediction of viral proteins that could cause the observed differences in human pathogenicity between these two clades. Understanding the molecular pathogenesis and clinical and epidemiological properties of MPXV can improve monkeypox prevention and control.

Download Full-text

Issues in security and privacy of big data

International Journal of Advanced Research in Computer Science and Software Engineering ◽

10.23956/ijarcsse.v7i12.482 ◽

2018 ◽

Vol 7 (12) ◽

pp. 1

Author(s):

Shaveta Bhatia

Keyword(s):

Cloud Computing ◽

Big Data ◽

Approximate Method ◽

Biomedical Research ◽

Cyber Security ◽

Data Science ◽

Third Party ◽

Security And Privacy ◽

Security Threats ◽

The Third

The epoch of the big data presents many opportunities for the development in the range of data science, biomedical research cyber security, and cloud computing. Nowadays the big data gained popularity. It also invites many provocations and upshot in the security and privacy of the big data. There are various type of threats, attacks such as leakage of data, the third party tries to access, viruses and vulnerability that stand against the security of the big data. This paper will discuss about the security threats and their approximate method in the field of biomedical research, cyber security and cloud computing.

Download Full-text