ScaffoldGraph: an open-source library for the generation and analysis of molecular scaffold networks and scaffold trees

Abstract Summary ScaffoldGraph (SG) is an open-source Python library and command-line tool for the generation and analysis of molecular scaffold networks and trees, with the capability of processing large sets of input molecules. With the increase in high-throughput screening data, scaffold graphs have proven useful for the navigation and analysis of chemical space, being used for visualization, clustering, scaffold-diversity analysis and active-series identification. Built on RDKit and NetworkX, SG integrates scaffold graph analysis into the growing scientific/cheminformatics Python stack, increasing the flexibility and extendibility of the tool compared to existing software. Availability and implementation SG is freely available and released under the MIT licence at https://github.com/UCLCheminformatics/ScaffoldGraph.

Download Full-text

jCompoundMapper: An open source Java library and command-line tool for chemical fingerprints

Journal of Cheminformatics ◽

10.1186/1758-2946-3-3 ◽

2011 ◽

Vol 3 (1) ◽

Cited By ~ 51

Author(s):

Georg Hinselmann ◽

Lars Rosenbaum ◽

Andreas Jahn ◽

Nikolas Fechner ◽

Andreas Zell

Keyword(s):

Open Source ◽

Command Line ◽

Chemical Fingerprints ◽

Command Line Tool ◽

Java Library

Download Full-text

Finding the molecular scaffold of nuclear receptor inhibitors through high-throughput screening based on proteochemometric modelling

Journal of Cheminformatics ◽

10.1186/s13321-018-0275-x ◽

2018 ◽

Vol 10 (1) ◽

Cited By ~ 2

Author(s):

Tianyi Qiu ◽

Dingfeng Wu ◽

Jingxuan Qiu ◽

Zhiwei Cao

Keyword(s):

Nuclear Receptor ◽

High Throughput ◽

High Throughput Screening ◽

Molecular Scaffold

Download Full-text

Evaluation of e-liquid toxicity using an open-source high-throughput screening assay

PLoS Biology ◽

10.1371/journal.pbio.2003904 ◽

2018 ◽

Vol 16 (3) ◽

pp. e2003904 ◽

Cited By ~ 54

Author(s):

M. Flori Sassano ◽

Eric S. Davis ◽

James E. Keating ◽

Bryan T. Zorn ◽

Tavleen K. Kochar ◽

...

Keyword(s):

Open Source ◽

High Throughput ◽

High Throughput Screening ◽

Screening Assay ◽

High Throughput Screening Assay

Download Full-text

Alview: Portable Software for Viewing Sequence Reads in BAM Formatted Files

Cancer Informatics ◽

10.4137/cin.s26470 ◽

2015 ◽

Vol 14 ◽

pp. CIN.S26470 ◽

Cited By ~ 2

Author(s):

Richard P. Finney ◽

Qing-Rong Chen ◽

Cu V. Nguyen ◽

Chih Hao Hsu ◽

Chunhua Yan ◽

...

Keyword(s):

Graphical User Interface ◽

Reference Genome ◽

Source Code ◽

Software Tool ◽

Command Line ◽

Sequencing Data ◽

Genome Data ◽

Command Line Tool ◽

Portable Software ◽

Microsoft Windows

The name Alview is a contraction of the term Alignment Viewer. Alview is a compiled to native architecture software tool for visualizing the alignment of sequencing data. Inputs are files of short-read sequences aligned to a reference genome in the SAM/BAM format and files containing reference genome data. Outputs are visualizations of these aligned short reads. Alview is written in portable C with optional graphical user interface (GUI) code written in C, C++, and Objective-C. The application can run in three different ways: as a web server, as a command line tool, or as a native, GUI program. Alview is compatible with Microsoft Windows, Linux, and Apple OS X. It is available as a web demo at https://cgwb.nci.nih.gov/cgi-bin/alview . The source code and Windows/Mac/Linux executables are available via https://github.com/NCIP/alview .

Download Full-text

FAN-C: A Feature-rich Framework for the Analysis and Visualisation of C data

10.1101/2020.02.03.932517 ◽

2020 ◽

Cited By ~ 6

Author(s):

Kai Kruse ◽

Clemens B. Hug ◽

Juan M. Vaquerizas

Keyword(s):

High Throughput ◽

Matrix Analysis ◽

Set Covering ◽

Command Line ◽

Chromosome Conformation ◽

C Storage ◽

Data Formats ◽

Analysis Tools ◽

Command Line Tool ◽

Broad Feature

Chromosome conformation capture data, particularly from high-throughput approaches such as Hi-C and its derivatives, are typically very complex to analyse. Existing analysis tools are often single-purpose, or limited in compatibility to a small number of data formats, frequently making Hi-C analyses tedious and time-consuming. Here, we present FAN-C, an easy-to-use command-line tool and powerful Python API with a broad feature set covering matrix generation, analysis, and visualisation for C-like data (https://github.com/vaquerizaslab/fanc). Due to its comprehensiveness and compatibility with the most prevalent Hi-C storage formats, FAN-C can be used in combination with a large number of existing analysis tools, thus greatly simplifying Hi-C matrix analysis.

Download Full-text

ChemSpaX: Exploration of Chemical Space by Automated Functionalization of Molecular Scaffold

10.26434/chemrxiv.14617320 ◽

2021 ◽

Author(s):

Adarsh Kalikadien ◽

Evgeny A. Pidko ◽

Vivek Sinha

Keyword(s):

Force Field ◽

High Throughput ◽

High Throughput Screening ◽

Chemical Space ◽

Space Exploration ◽

Molecular Structures ◽

Data Driven ◽

Pincer Complexes ◽

Cobalt Porphyrin ◽

Input Structure

<div>Local chemical space exploration of an experimentally synthesized material can be done by making slight structural</div><div>variations of the synthesized material. This generation of many molecular structures with reasonable quality,</div><div>that resemble an existing (chemical) purposeful material, is needed for high-throughput screening purposes in</div><div>material design. Large databases of geometry and chemical properties of transition metal complexes are not</div><div>readily available, although these complexes are widely used in homogeneous catalysis. A Python-based workflow,</div><div>ChemSpaX, that is aimed at automating local chemical space exploration for any type of molecule, is introduced.</div><div>The overall computational workflow of ChemSpaX is explained in more detail. ChemSpaX uses 3D information,</div><div>to place functional groups on an input structure. For example, the input structure can be a catalyst for which one</div><div>wants to use high-throughput screening to investigate if the catalytic activity can be improved. The newly placed</div><div>substituents are optimized using a computationally cheap force-field optimization method. After placement of</div><div>new substituents, higher level optimizations using xTB or DFT instead of force-field optimization are also possible</div><div>in the current workflow. In representative applications of ChemSpaX, it is shown that the structures generated by</div><div>ChemSpaX have a reasonable quality for usage in high-throughput screening applications. Representative applications</div><div>of ChemSpaX are shown by investigating various adducts on functionalized Mn-based pincer complexes,</div><div>hydrogenation of Ru-based pincer complexes, functionalization of cobalt porphyrin complexes and functionalization</div><div>of a bipyridyl functionalized cobalt-porphyrin trapped in a M2L4 type cage complex. Descriptors such as</div><div>the Gibbs free energy of reaction and HOMO-LUMO gap, that can be used in data-driven design and discovery</div><div>of catalysts, were selected and studied in more detail for the selected use cases. The relatively fast GFN2-xTB</div><div>method was used to calculate these descriptors and a comparison was done against DFT calculated descriptors.</div><div>ChemSpaX is open-source and aims to bolster the efforts of the scientific community towards data-driven material</div><div>discovery.</div>

Download Full-text

Computational High-Throughput Screening of Polymeric Photocatalysts: Exploring the Effect of Composition, Sequence Isomerism and Conformational Degrees of Freedom

10.26434/chemrxiv.7314929.v3 ◽

2018 ◽

Author(s):

isabelle Heath-Apostolopoulos ◽

Liam Wilbraham ◽

Martijn Zwijnenburg

Keyword(s):

High Throughput ◽

High Throughput Screening ◽

Degrees Of Freedom ◽

Chemical Space ◽

Low Cost ◽

Computational Screening ◽

Computational Workflow ◽

Conformational Degrees Of Freedom

We discuss a low-cost computational workflow for the high-throughput screening of polymeric photocatalysts and demonstrate its utility by applying it to a number of challenging problems that would be difficult to tackle otherwise. Specifically we show how having access to a low-cost method allows one to screen a vast chemical space, as well as to probe the effects of conformational degrees of freedom and sequence isomerism. Finally, we discuss both the opportunities of computational screening in the search for polymer photocatalysts, as well as the biggest challenges.

Download Full-text

Doclass: open-source software to support document labeling and classification

10.5753/kdmile.2020.11965 ◽

2020 ◽

Author(s):

Marcelo Inuzuka ◽

Hugo Do Nascimento ◽

Fernando Almeida ◽

Bruno Barros ◽

Walid Jradi

Keyword(s):

Active Learning ◽

Open Source ◽

Open Source Software ◽

Design Science ◽

Text Processing ◽

Science Research ◽

Development Stage ◽

Large Sets ◽

Rest Api ◽

Future Work

This article introduces Doclass, a free and open-source software for the Web that aims to assist in labeling and classifying large sets of documents. The research involved a design science research methodology, guided by the real demands of a legal text processing company. The architecture, several design decisions and the current development stage of the software are presented. Preliminary user experiments for evaluating interactive document labeling are described. As a result, the first version of a system with an architecture composed of a mobile frontend that communicates with a backend through a REST API was published, with satisfactory performance evaluation by the applicant. Other results involve the use of active learning techniques to reduce human effort when performing the classification of documents, as well as the Uncertainty strategy to choose the document to be labeled. The effectiveness of the stop criterion for the active learning technique based on confidence level was tested and proved unsatisfactory, remaining as a future work.

Download Full-text

Emerging Trends of Big Data in Cloud Computing

Applications of Big Data in Large- and Small-Scale Systems - Advances in Data Mining and Database Management ◽

10.4018/978-1-7998-6673-2.ch003 ◽

2021 ◽

pp. 38-55

Author(s):

Poonam Nandal ◽

Deepa Bura ◽

Meeta Singh

Keyword(s):

Cloud Computing ◽

Big Data ◽

Data Processing ◽

Open Source ◽

Software Framework ◽

Effective Solution ◽

Apache Hadoop ◽

Large Sets ◽

Exponential Increase ◽

Emerging Trends

In today's world where data is accumulating at an ever-increasing rate, processing of this big data was a necessity rather than a need. This required some tools for processing as well as analysis of the data that could be achieved to obtain some meaningful result or outcome out of it. There are many tools available in market which could be used for processing of big data. But the main focus on this chapter is on Apache Hadoop which could be regarded as an open source software based framework which could be efficiently deployed for processing, storing, analyzing, and to produce meaningful insights from large sets of data. It is always said that if exponential increase of data is processing challenge then Hadoop could be considered as one of the effective solution for processing, managing, analyzing, and storing this big data. Hadoop versions and components are also illustrated in the later section of the paper. This chapter majorly focuses on the technique, methodology, components, and methodologies adopted by Apache Hadoop software framework for big data processing.

Download Full-text

Genesis and Gappa: processing, analyzing and visualizing phylogenetic (placement) data

Bioinformatics ◽

10.1093/bioinformatics/btaa070 ◽

2020 ◽

Vol 36 (10) ◽

pp. 3263-3265 ◽

Cited By ~ 14

Author(s):

Lucas Czech ◽

Pierre Barbera ◽

Alexandros Stamatakis

Keyword(s):

Phylogenetic Trees ◽

Supplementary Information ◽

Command Line ◽

Supplementary Data ◽

Computationally Efficient ◽

Data Types ◽

Low Level ◽

Phylogenetic Placement ◽

Command Line Tool ◽

High Level

Abstract Summary We present genesis, a library for working with phylogenetic data, and gappa, an accompanying command-line tool for conducting typical analyses on such data. The tools target phylogenetic trees and phylogenetic placements, sequences, taxonomies and other relevant data types, offer high-level simplicity as well as low-level customizability, and are computationally efficient, well-tested and field-proven. Availability and implementation Both genesis and gappa are written in modern C++11, and are freely available under GPLv3 at http://github.com/lczech/genesis and http://github.com/lczech/gappa. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text