pyKVFinder: an efficient and integrable Python package for biomolecular cavity detection and characterization in data science

Abstract Background Biomolecular interactions that modulate biological processes occur mainly in cavities throughout the surface of biomolecular structures. In the data science era, structural biology has benefited from the increasing availability of biostructural data due to advances in structural determination and computational methods. In this scenario, data-intensive cavity analysis demands efficient scripting routines built on easily manipulated data structures. To fulfill this need, we developed pyKVFinder, a Python package to detect and characterize cavities in biomolecular structures for data science and automated pipelines. Results pyKVFinder efficiently detects cavities in biomolecular structures and computes their volume, area, depth and hydropathy, storing these cavity properties in NumPy arrays. Benefited from Python ecosystem interoperability and data structures, pyKVFinder can be integrated with third-party scientific packages and libraries for mathematical calculations, machine learning and 3D visualization in automated workflows. As proof of pyKVFinder’s capabilities, we successfully identified and compared ADRP substrate-binding site of SARS-CoV-2 and a set of homologous proteins with pyKVFinder, showing its integrability with data science packages such as matplotlib, NGL Viewer, SciPy and Jupyter notebook. Conclusions We introduce an efficient, highly versatile and easily integrable software for detecting and characterizing biomolecular cavities in data science applications and automated protocols. pyKVFinder facilitates biostructural data analysis with scripting routines in the Python ecosystem and can be building blocks for data science and drug design applications.

Download Full-text

Glycowork: A Python package for glycan data science and machine learning

10.1101/2021.04.22.440981 ◽

2021 ◽

Author(s):

Luc Thomès ◽

Rebekka Burkholz ◽

Daniel Bojar

Keyword(s):

Machine Learning ◽

Open Source ◽

Data Science ◽

Biological Processes ◽

Biological Sequence ◽

Learning Models ◽

Related Data ◽

Strong Focus ◽

Python Package ◽

Machine Learning Models

AbstractAs a biological sequence, glycans occur in every domain of life and comprise monosaccharides that are chained together to form oligo- or polysaccharides. While glycans are crucial for most biological processes, existing analysis modalities make it difficult for researchers with limited computational background to include information from these diverse and nonlinear sequences into standard workflows. Here, we present glycowork, an open-source Python package that was designed for the processing and analysis of glycan data by end users, with a strong focus on glycan-related data science and machine learning. Glycowork includes numerous functions to, for instance, automatically annotate glycan motifs and analyze their distributions via heatmaps and statistical enrichment. We also provide visualization methods, routines to interact with stored databases, trained machine learning models, and learned glycan representations. We envision that glycowork can extract further insights from any glycan dataset and demonstrate this with several workflows that analyze glycan motifs in various biological contexts. Glycowork can be freely accessed at https://github.com/BojarLab/glycowork/.

Download Full-text

Issues in security and privacy of big data

International Journal of Advanced Research in Computer Science and Software Engineering ◽

10.23956/ijarcsse.v7i12.482 ◽

2018 ◽

Vol 7 (12) ◽

pp. 1

Author(s):

Shaveta Bhatia

Keyword(s):

Cloud Computing ◽

Big Data ◽

Approximate Method ◽

Biomedical Research ◽

Cyber Security ◽

Data Science ◽

Third Party ◽

Security And Privacy ◽

Security Threats ◽

The Third

The epoch of the big data presents many opportunities for the development in the range of data science, biomedical research cyber security, and cloud computing. Nowadays the big data gained popularity. It also invites many provocations and upshot in the security and privacy of the big data. There are various type of threats, attacks such as leakage of data, the third party tries to access, viruses and vulnerability that stand against the security of the big data. This paper will discuss about the security threats and their approximate method in the field of biomedical research, cyber security and cloud computing.

Download Full-text

The Building Blocks of Formulas to Measure Expectation Damages; the Indifference Principle

10.1093/oso/9780199731404.003.0013 ◽

2018 ◽

Author(s):

Melvin A. Eisenberg

Keyword(s):

Market Price ◽

Building Blocks ◽

Third Party ◽

Replacement Cost ◽

Contract Price ◽

Resale Price ◽

The Difference ◽

Lost Profit

Chapter 13 concerns the building blocks of formulas to measure expectation damages: replacement cost, market price, resale price, diminished value, and lost profits. Replacement-cost damages are based on the difference between the contract price and the actual or imputed cost of a replacement transaction. Resale-price damages are based on the difference between the contract price payable by a breaching buyer and the price the seller received on resale to a third party. Diminished-value damages are based on the difference between the value of the performance that a breaching seller rendered and the value of the performance that she promised to render. Lost-profit damages are based on the difference between the price a breaching buyer agreed to pay and the seller’s variable costs.

Download Full-text

Performance Comparison of SSH Libraries

Journal of University of Shanghai for Science and Technology ◽

10.51201/jusst/21/05357 ◽

2021 ◽

Vol 23 (06) ◽

pp. 868-873

Author(s):

Sonali Karki ◽

◽

Dr. Kiran V ◽

Keyword(s):

Community Development ◽

Performance Comparison ◽

Digital Transformation ◽

Third Party ◽

Security Measures ◽

Privileged Access ◽

Sense Of Security ◽

False Sense ◽

Pros And Cons ◽

Python Package

The business industry is evolving. Enterprises have begun a digital transformation path, adopting innovative technologies that enable them to move quickly and change how they cooperate, lowering costs and improving productivity. However, as a result of these technologies, the conventional perimeter has evaporated, and identification has become the new line of defense. New security concerns necessitate modern security measures. Passwords are no longer appropriate for authenticating privileged access to mission-critical assets. Passwords are notorious for being insecure, causing weariness, and giving the user a false sense of security. Enterprises must use password-less solutions, which is where SSH key-based authentication comes in. The Python language’s numerous applications are the consequence of a mixture of traits that offer this language advantage over others. Some of the advantages of programming with Python are as follows: To enable easy communication between Python and other systems, Python Package Index (PyPI) is used. The package consists of a variety of modules developed by third-party developers. It also has the benefit of being an Open Source and Community Development language, as well as having substantial Support Libraries. There are multiple SSH libraries in python and this paper focuses on each of their pros and cons as well as the time it has taken for each of them to perform.

Download Full-text

Service Assessment Planning for the Hunt Library Dataspace

10.31229/osf.io/t4vek ◽

2019 ◽

Author(s):

Mia Partlow ◽

Karen Ciccone ◽

Margaret Peak

Keyword(s):

North Carolina ◽

Statistical Analysis ◽

Data Science ◽

Planning Process ◽

Service Development ◽

State University ◽

University Libraries ◽

New Service Development ◽

Data Intensive ◽

Critical Data

Presentation given at TRLN Annual Meeting, Durham, North Carolina, July 1, 2019. The Hunt Library Dataspace was launched in August 2018 to provide students with access to the tools and support they need to develop critical data skills and perform data intensive tasks. It is outfitted with specialized computing hardware and software and staffed by graduate student Data Science Consultants who provide drop-in support for programming, data analysis, statistical analysis, visualization, and other data-related topics.Prior to launching the Dataspace the Libraries’ Director of Planning and Research worked with the Data & Visualization Services department to develop a plan for assessing the new Dataspace services. The process began with identifying relevant goals based on NC State University and the NC State University Libraries’ strategic priorities. Next we identified measures that would assess our success in relation to those goals. This talk describes the assessment planning process, the measures and methods employed, outcomes, and how this information will be used to improve our services and inform new service development.

Download Full-text

Platform for Integrating and Testing Tools which Transform Natural Language Queries into SPARQL Queries

Vestnik NSU Series Information Technologies ◽

10.25205/1818-7900-2019-17-2-138-152 ◽

2019 ◽

Vol 17 (2) ◽

pp. 138-152

Author(s):

I. S. Postanogov ◽

I. A. Turova

Keyword(s):

Natural Language ◽

Data Flow ◽

Data Access ◽

Building Blocks ◽

Third Party ◽

Flow Diagram ◽

Graphical Form ◽

Testing Tools ◽

Party Transformation ◽

The Individual

In the paper we discuss how to support the process of creating tools which transform natural language (NL) queries into SPARQL queries (hereinafter referred to as a transformation tool). In the introduction, we describe the relevance of the task of understanding natural language queries by information systems, as well as the advantages of using ontologies as a means of representing knowledge for solving this problem. This ontology-based data access approach can be also used in systems which provide natural language interface to databases. Based on the analysis of problems related to the integration and testing of existing transformation tools, as well as to support the creation and testing own transformation modules, the concept of a software platform that simplifies these tasks is proposed. The platform architecture satisfies the requirements for ease of connecting third party transformation tools, reusing individual modules, as well as integrating the resulting transformation tools into other systems, including testing systems. The building blocks of the created transformation systems are the individual transformation modules packaged in Docker containers. Program access to each module is carried out using gRPC. Modules loaded into the platform can be built into the transformation pipeline automatically or manually using the built-in third party SciVi data flow diagram editor. Compatibility of individual modules is controlled by automatic analysis of application programming interfaces. The resulting pipeline is combined according to specified data flow into a single multi-container application that can be integrated into other systems, as well as tested on extendable test suites. The expected and actual results of the query transformation are available for viewing in graphical form in the visualization tool developed earlier.

Download Full-text

Design of Multiple Mailboxes

MATEC Web of Conferences ◽

10.1051/matecconf/202031801041 ◽

2020 ◽

Vol 318 ◽

pp. 01041

Author(s):

Athena Baronos ◽

Odysseas Manoliadis ◽

Aristeidis Pavlidis

Keyword(s):

3D Visualization ◽

Building Blocks ◽

Design Stage ◽

Extensive Literature ◽

3D Design ◽

3D Cad ◽

Domestic Use ◽

The Right ◽

The Cost ◽

Apartment Complexes

In today’s world the design of multiple mailboxes comes to cover the evolution of logistics in delivering mail where the postman is not required to visit every user. In this research the 3D visualization is used for the design of multiple mailboxes for domestic use. It concerns the design of mailboxes in ergonomic building blocks and apartment complexes in 3D design so that they can be easily manufactured. Between the advantages of this design will be rapid production of ready-made products production of prototypes that enables testing at the design stage and reduces the time and the cost of production. The design when done with 3D CAD can be manufactured with modern machine tooling methods. In this paper after an extensive Literature Review the postal multiple mailboxes is used as a case study in the use of 3D CAD for 3D printing. A methodology is proposed that enables the examination of prototypes at the design stage according to specifications and allows the manufacturing department of a company to prepare the right tools and begin installing production lines. Conclusively this method gives the advantage of designing the product and supporting the production of scaffolds that can be functionally and ergonomically tested before finalizing the production.

Download Full-text

Current Trends in Cloud Computing for Data Science Experiments

International Journal of Cloud Applications and Computing ◽

10.4018/ijcac.2021100105 ◽

2021 ◽

Vol 11 (4) ◽

pp. 80-99

Author(s):

Syed Imran Jami ◽

Siraj Munir

Keyword(s):

Cloud Computing ◽

Resource Sharing ◽

Data Science ◽

Job Scheduling ◽

Resource Scheduling ◽

Data Intensive ◽

Organizational Issues ◽

Current Trends ◽

Recent Trends ◽

And Storage

Recent trends in data-intensive experiments require extensive computing and storage resources that are now handled using cloud resources. Industry experts and researchers use cloud-based services and resources to get analytics of their data to avoid inter-organizational issues including power overhead on local machines, cost associated with maintaining and running infrastructure, etc. This article provides detailed review of selected metrics for cloud computing according to the requirements of data science and big data that includes (1) load balancing, (2) resource scheduling, (3) resource allocation, (4) resource sharing, and (5) job scheduling. The major contribution of this review is the inclusion of these metrics collectively which is the first attempt towards evaluating the latest systems in the context of data science. The detailed analysis shows that cloud computing needs research in its association with data-intensive experiments with emphasis on the resource scheduling area.

Download Full-text

Mainlining databases

Proceedings of the VLDB Endowment ◽

10.14778/3436905.3436913 ◽

2020 ◽

Vol 14 (4) ◽

pp. 534-546

Author(s):

Tianyu Li ◽

Matthew Butrovich ◽

Amadou Ngom ◽

Wan Shen Lim ◽

Wes McKinney ◽

...

Keyword(s):

Open Source ◽

Data Science ◽

Analytical Data ◽

Transformation Process ◽

Third Party ◽

Learning Tools ◽

Network Bandwidth ◽

External Data ◽

Data Formats ◽

Comparable Performance

The proliferation of modern data processing tools has given rise to open-source columnar data formats. These formats help organizations avoid repeated conversion of data to a new format for each application. However, these formats are read-only, and organizations must use a heavy-weight transformation process to load data from on-line transactional processing (OLTP) systems. As a result, DBMSs often fail to take advantage of full network bandwidth when transferring data. We aim to reduce or even eliminate this overhead by developing a storage architecture for in-memory database management systems (DBMSs) that is aware of the eventual usage of its data and emits columnar storage blocks in a universal open-source format. We introduce relaxations to common analytical data formats to efficiently update records and rely on a lightweight transformation process to convert blocks to a read-optimized layout when they are cold. We also describe how to access data from third-party analytical tools with minimal serialization overhead. We implemented our storage engine based on the Apache Arrow format and integrated it into the NoisePage DBMS to evaluate our work. Our experiments show that our approach achieves comparable performance with dedicated OLTP DBMSs while enabling orders-of-magnitude faster data exports to external data science and machine learning tools than existing methods.

Download Full-text

The Minimum Information about a Molecular Interaction CAusal STatement (MI2CAST)

Bioinformatics ◽

10.1093/bioinformatics/btaa622 ◽

2020 ◽

Cited By ~ 3

Author(s):

Vasundra Touré ◽

Steven Vercruysse ◽

Marcio Luis Acencio ◽

Ruth C Lovering ◽

Sandra Orchard ◽

...

Keyword(s):

Molecular Interaction ◽

Regulatory Networks ◽

Building Blocks ◽

Supplementary Information ◽

Biological Processes ◽

Causal Interaction ◽

End User ◽

Minimum Information ◽

Causal Statement ◽

In Cells

Abstract Motivation A large variety of molecular interactions occurs between biomolecular components in cells. When a molecular interaction results in a regulatory effect, exerted by one component onto a downstream component, a so-called ‘causal interaction’ takes place. Causal interactions constitute the building blocks in our understanding of larger regulatory networks in cells. These causal interactions and the biological processes they enable (e.g. gene regulation) need to be described with a careful appreciation of the underlying molecular reactions. A proper description of this information enables archiving, sharing and reuse by humans and for automated computational processing. Various representations of causal relationships between biological components are currently used in a variety of resources. Results Here, we propose a checklist that accommodates current representations, called the Minimum Information about a Molecular Interaction CAusal STatement (MI2CAST). This checklist defines both the required core information, as well as a comprehensive set of other contextual details valuable to the end user and relevant for reusing and reproducing causal molecular interaction information. The MI2CAST checklist can be used as reporting guidelines when annotating and curating causal statements, while fostering uniformity and interoperability of the data across resources. Availability and implementation The checklist together with examples is accessible at https://github.com/MI2CAST/MI2CAST Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text