scholarly journals pyKVFinder: an efficient and integrable Python package for biomolecular cavity detection and characterization in data science

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
João Victor da Silva Guerra ◽  
Helder Veras Ribeiro-Filho ◽  
Gabriel Ernesto Jara ◽  
Leandro Oliveira Bortot ◽  
José Geraldo de Carvalho Pereira ◽  
...  

Abstract Background Biomolecular interactions that modulate biological processes occur mainly in cavities throughout the surface of biomolecular structures. In the data science era, structural biology has benefited from the increasing availability of biostructural data due to advances in structural determination and computational methods. In this scenario, data-intensive cavity analysis demands efficient scripting routines built on easily manipulated data structures. To fulfill this need, we developed pyKVFinder, a Python package to detect and characterize cavities in biomolecular structures for data science and automated pipelines. Results pyKVFinder efficiently detects cavities in biomolecular structures and computes their volume, area, depth and hydropathy, storing these cavity properties in NumPy arrays. Benefited from Python ecosystem interoperability and data structures, pyKVFinder can be integrated with third-party scientific packages and libraries for mathematical calculations, machine learning and 3D visualization in automated workflows. As proof of pyKVFinder’s capabilities, we successfully identified and compared ADRP substrate-binding site of SARS-CoV-2 and a set of homologous proteins with pyKVFinder, showing its integrability with data science packages such as matplotlib, NGL Viewer, SciPy and Jupyter notebook. Conclusions We introduce an efficient, highly versatile and easily integrable software for detecting and characterizing biomolecular cavities in data science applications and automated protocols. pyKVFinder facilitates biostructural data analysis with scripting routines in the Python ecosystem and can be building blocks for data science and drug design applications.

2021 ◽  
Author(s):  
Luc Thomès ◽  
Rebekka Burkholz ◽  
Daniel Bojar

AbstractAs a biological sequence, glycans occur in every domain of life and comprise monosaccharides that are chained together to form oligo- or polysaccharides. While glycans are crucial for most biological processes, existing analysis modalities make it difficult for researchers with limited computational background to include information from these diverse and nonlinear sequences into standard workflows. Here, we present glycowork, an open-source Python package that was designed for the processing and analysis of glycan data by end users, with a strong focus on glycan-related data science and machine learning. Glycowork includes numerous functions to, for instance, automatically annotate glycan motifs and analyze their distributions via heatmaps and statistical enrichment. We also provide visualization methods, routines to interact with stored databases, trained machine learning models, and learned glycan representations. We envision that glycowork can extract further insights from any glycan dataset and demonstrate this with several workflows that analyze glycan motifs in various biological contexts. Glycowork can be freely accessed at https://github.com/BojarLab/glycowork/.


Author(s):  
Shaveta Bhatia

 The epoch of the big data presents many opportunities for the development in the range of data science, biomedical research cyber security, and cloud computing. Nowadays the big data gained popularity.  It also invites many provocations and upshot in the security and privacy of the big data. There are various type of threats, attacks such as leakage of data, the third party tries to access, viruses and vulnerability that stand against the security of the big data. This paper will discuss about the security threats and their approximate method in the field of biomedical research, cyber security and cloud computing.


Author(s):  
Melvin A. Eisenberg

Chapter 13 concerns the building blocks of formulas to measure expectation damages: replacement cost, market price, resale price, diminished value, and lost profits. Replacement-cost damages are based on the difference between the contract price and the actual or imputed cost of a replacement transaction. Resale-price damages are based on the difference between the contract price payable by a breaching buyer and the price the seller received on resale to a third party. Diminished-value damages are based on the difference between the value of the performance that a breaching seller rendered and the value of the performance that she promised to render. Lost-profit damages are based on the difference between the price a breaching buyer agreed to pay and the seller’s variable costs.


2021 ◽  
Vol 23 (06) ◽  
pp. 868-873
Author(s):  
Sonali Karki ◽  
◽  
Dr. Kiran V ◽  

The business industry is evolving. Enterprises have begun a digital transformation path, adopting innovative technologies that enable them to move quickly and change how they cooperate, lowering costs and improving productivity. However, as a result of these technologies, the conventional perimeter has evaporated, and identification has become the new line of defense. New security concerns necessitate modern security measures. Passwords are no longer appropriate for authenticating privileged access to mission-critical assets. Passwords are notorious for being insecure, causing weariness, and giving the user a false sense of security. Enterprises must use password-less solutions, which is where SSH key-based authentication comes in. The Python language’s numerous applications are the consequence of a mixture of traits that offer this language advantage over others. Some of the advantages of programming with Python are as follows: To enable easy communication between Python and other systems, Python Package Index (PyPI) is used. The package consists of a variety of modules developed by third-party developers. It also has the benefit of being an Open Source and Community Development language, as well as having substantial Support Libraries. There are multiple SSH libraries in python and this paper focuses on each of their pros and cons as well as the time it has taken for each of them to perform.


2019 ◽  
Author(s):  
Mia Partlow ◽  
Karen Ciccone ◽  
Margaret Peak

Presentation given at TRLN Annual Meeting, Durham, North Carolina, July 1, 2019. The Hunt Library Dataspace was launched in August 2018 to provide students with access to the tools and support they need to develop critical data skills and perform data intensive tasks. It is outfitted with specialized computing hardware and software and staffed by graduate student Data Science Consultants who provide drop-in support for programming, data analysis, statistical analysis, visualization, and other data-related topics.Prior to launching the Dataspace the Libraries’ Director of Planning and Research worked with the Data & Visualization Services department to develop a plan for assessing the new Dataspace services. The process began with identifying relevant goals based on NC State University and the NC State University Libraries’ strategic priorities. Next we identified measures that would assess our success in relation to those goals. This talk describes the assessment planning process, the measures and methods employed, outcomes, and how this information will be used to improve our services and inform new service development.


2019 ◽  
Vol 17 (2) ◽  
pp. 138-152
Author(s):  
I. S. Postanogov ◽  
I. A. Turova

In the paper we discuss how to support the process of creating tools which transform natural language (NL) queries into SPARQL queries (hereinafter referred to as a transformation tool). In the introduction, we describe the relevance of the task of understanding natural language queries by information systems, as well as the advantages of using ontologies as a means of representing knowledge for solving this problem. This ontology-based data access approach can be also used in systems which provide natural language interface to databases. Based on the analysis of problems related to the integration and testing of existing transformation tools, as well as to support the creation and testing own transformation modules, the concept of a software platform that simplifies these tasks is proposed. The platform architecture satisfies the requirements for ease of connecting third party transformation tools, reusing individual modules, as well as integrating the resulting transformation tools into other systems, including testing systems. The building blocks of the created transformation systems are the individual transformation modules packaged in Docker containers. Program access to each module is carried out using gRPC. Modules loaded into the platform can be built into the transformation pipeline automatically or manually using the built-in third party SciVi data flow diagram editor. Compatibility of individual modules is controlled by automatic analysis of application programming interfaces. The resulting pipeline is combined according to specified data flow into a single multi-container application that can be integrated into other systems, as well as tested on extendable test suites. The expected and actual results of the query transformation are available for viewing in graphical form in the visualization tool developed earlier.


2020 ◽  
Vol 318 ◽  
pp. 01041
Author(s):  
Athena Baronos ◽  
Odysseas Manoliadis ◽  
Aristeidis Pavlidis

In today’s world the design of multiple mailboxes comes to cover the evolution of logistics in delivering mail where the postman is not required to visit every user. In this research the 3D visualization is used for the design of multiple mailboxes for domestic use. It concerns the design of mailboxes in ergonomic building blocks and apartment complexes in 3D design so that they can be easily manufactured. Between the advantages of this design will be rapid production of ready-made products production of prototypes that enables testing at the design stage and reduces the time and the cost of production. The design when done with 3D CAD can be manufactured with modern machine tooling methods. In this paper after an extensive Literature Review the postal multiple mailboxes is used as a case study in the use of 3D CAD for 3D printing. A methodology is proposed that enables the examination of prototypes at the design stage according to specifications and allows the manufacturing department of a company to prepare the right tools and begin installing production lines. Conclusively this method gives the advantage of designing the product and supporting the production of scaffolds that can be functionally and ergonomically tested before finalizing the production.


2021 ◽  
Vol 11 (4) ◽  
pp. 80-99
Author(s):  
Syed Imran Jami ◽  
Siraj Munir

Recent trends in data-intensive experiments require extensive computing and storage resources that are now handled using cloud resources. Industry experts and researchers use cloud-based services and resources to get analytics of their data to avoid inter-organizational issues including power overhead on local machines, cost associated with maintaining and running infrastructure, etc. This article provides detailed review of selected metrics for cloud computing according to the requirements of data science and big data that includes (1) load balancing, (2) resource scheduling, (3) resource allocation, (4) resource sharing, and (5) job scheduling. The major contribution of this review is the inclusion of these metrics collectively which is the first attempt towards evaluating the latest systems in the context of data science. The detailed analysis shows that cloud computing needs research in its association with data-intensive experiments with emphasis on the resource scheduling area.


2020 ◽  
Vol 14 (4) ◽  
pp. 534-546
Author(s):  
Tianyu Li ◽  
Matthew Butrovich ◽  
Amadou Ngom ◽  
Wan Shen Lim ◽  
Wes McKinney ◽  
...  

The proliferation of modern data processing tools has given rise to open-source columnar data formats. These formats help organizations avoid repeated conversion of data to a new format for each application. However, these formats are read-only, and organizations must use a heavy-weight transformation process to load data from on-line transactional processing (OLTP) systems. As a result, DBMSs often fail to take advantage of full network bandwidth when transferring data. We aim to reduce or even eliminate this overhead by developing a storage architecture for in-memory database management systems (DBMSs) that is aware of the eventual usage of its data and emits columnar storage blocks in a universal open-source format. We introduce relaxations to common analytical data formats to efficiently update records and rely on a lightweight transformation process to convert blocks to a read-optimized layout when they are cold. We also describe how to access data from third-party analytical tools with minimal serialization overhead. We implemented our storage engine based on the Apache Arrow format and integrated it into the NoisePage DBMS to evaluate our work. Our experiments show that our approach achieves comparable performance with dedicated OLTP DBMSs while enabling orders-of-magnitude faster data exports to external data science and machine learning tools than existing methods.


Author(s):  
Vasundra Touré ◽  
Steven Vercruysse ◽  
Marcio Luis Acencio ◽  
Ruth C Lovering ◽  
Sandra Orchard ◽  
...  

Abstract Motivation A large variety of molecular interactions occurs between biomolecular components in cells. When a molecular interaction results in a regulatory effect, exerted by one component onto a downstream component, a so-called ‘causal interaction’ takes place. Causal interactions constitute the building blocks in our understanding of larger regulatory networks in cells. These causal interactions and the biological processes they enable (e.g. gene regulation) need to be described with a careful appreciation of the underlying molecular reactions. A proper description of this information enables archiving, sharing and reuse by humans and for automated computational processing. Various representations of causal relationships between biological components are currently used in a variety of resources. Results Here, we propose a checklist that accommodates current representations, called the Minimum Information about a Molecular Interaction CAusal STatement (MI2CAST). This checklist defines both the required core information, as well as a comprehensive set of other contextual details valuable to the end user and relevant for reusing and reproducing causal molecular interaction information. The MI2CAST checklist can be used as reporting guidelines when annotating and curating causal statements, while fostering uniformity and interoperability of the data across resources. Availability and implementation The checklist together with examples is accessible at https://github.com/MI2CAST/MI2CAST Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document