EnTrance: Exploration of Entropy Scaling Ball Cover Search in Protein Sequences

Biological sequence alignment using computational power has received increasing attention as technology develops. It is important to predict if a novel DNA sequence is potentially dangerous by determining its taxonomic identity and functional characteristics through sequence identification. This task can be facilitated by the rapidly increasing amounts of biological data in DNA and protein databases thanks to the corresponding increase in computational and storage costs. Unfortunately, the growth in biological databases has caused difficulty in exploiting this information. EnTrance presents an approach that can expedite the analysis of this large database by employing entropy scaling. This allows scaling with the amount of entropy in the database instead of scaling with the absolute size of the database. Since DNA and protein sequences are biologically meaningful, the space of biological sequences demonstrates the structure exploited by entropy scaling. As biological sequence databases grow, taking advantage of this structure can be extremely beneficial for reducing query times. EnTrance, the entropy scaling search algorithm introduced here, accelerates the biological sequence search exemplified by tools such as BLAST. EnTrance does this by utilizing a two step search approach. In this fashion, EnTrance quickly reduces the number of potential matches before more exhaustively searching the remaining sequences. Tests of EnTrance show that this approach can lead to improved query times. However, constructing the required entropy scaling indices beforehand can be challenging. To improve performance, EnTrance investigates several ideas for accelerating index build time that supports entropy scaling searches. In particular, EnTrance makes full use of the concurrency features of Go language greatly reducing the index build time. Our results identify key tradeoffs and demonstrate that there is potential in using these techniques for sequence similarity searches. Finally, EnTrance returns more matches and higher percentage identity matches when compared with existing tools.

Download Full-text

A Modified Niching Crow Search Approach to Well Placement Optimization

Energies ◽

10.3390/en14040857 ◽

2021 ◽

Vol 14 (4) ◽

pp. 857

Author(s):

Jahedul Islam ◽

Md Shokor A. Rahaman ◽

Pandian M. Vasant ◽

Berihun Mamo Negash ◽

Ahshanul Hoqe ◽

...

Keyword(s):

Optimization Problem ◽

Search Algorithm ◽

Gravitational Search Algorithm ◽

Multimodal Optimization ◽

Test Functions ◽

Well Placement ◽

Suggested Approach ◽

Placement Optimization ◽

Well Placement Optimization ◽

Search Approach

Well placement optimization is considered a non-convex and highly multimodal optimization problem. In this article, a modified crow search algorithm is proposed to tackle the well placement optimization problem. This article proposes modifications based on local search and niching techniques in the crow search algorithm (CSA). At first, the suggested approach is verified by experimenting with the benchmark functions. For test functions, the results of the proposed approach demonstrated a higher convergence rate and a better solution. Again, the performance of the proposed technique is evaluated with well placement optimization problem and compared with particle swarm optimization (PSO), the Gravitational Search Algorithm (GSA), and the Crow search algorithm (CSA). The outcomes of the study revealed that the niching crow search algorithm is the most efficient and effective compared to the other techniques.

Download Full-text

Multi-Resolution SPH Simulation of a Laser Powder Bed Fusion Additive Manufacturing Process

Applied Sciences ◽

10.3390/app11072962 ◽

2021 ◽

Vol 11 (7) ◽

pp. 2962

Author(s):

Mohamadreza Afrasiabi ◽

Christof Lüthi ◽

Markus Bambach ◽

Konrad Wegener

Keyword(s):

Search Algorithm ◽

Computational Effort ◽

Powder Bed Fusion ◽

Laser Powder Bed Fusion ◽

Powder Bed ◽

Neighbor Search ◽

Particle Hydrodynamics ◽

Sph Simulation ◽

Particle Refinement ◽

Search Approach

This paper presents an efficient mesoscale simulation of a Laser Powder Bed Fusion (LPBF) process using the Smoothed Particle Hydrodynamics (SPH) method. The efficiency lies in reducing the computational effort via spatial adaptivity, for which a dynamic particle refinement pattern with an optimized neighbor-search algorithm is used. The melt pool dynamics is modeled by resolving the thermal, mechanical, and material fields in a single laser track application. After validating the solver by two benchmark tests where analytical and experimental data are available, we simulate a single-track LPBF process by adopting SPH in multi resolutions. The LPBF simulation results show that the proposed adaptive refinement with and without an optimized neighbor-search approach saves almost 50% and 35% of the SPH calculation time, respectively. This achievement enables several opportunities for parametric studies and running high-resolution models with less computational effort.

Download Full-text

Machine learning can differentiate venom toxins from other proteins having non-toxic physiological functions

PeerJ Computer Science ◽

10.7717/peerj-cs.90 ◽

2016 ◽

Vol 2 ◽

pp. e90 ◽

Cited By ~ 24

Author(s):

Ranko Gacesa ◽

David J. Barlow ◽

Paul F. Long

Keyword(s):

Machine Learning ◽

Sequence Data ◽

Biological Data ◽

Biological Databases ◽

Web Based ◽

Physiological Functions ◽

Link Type ◽

Venom Toxins ◽

Venomous Animals ◽

Toxin Protein

Ascribing function to sequence in the absence of biological data is an ongoing challenge in bioinformatics. Differentiating the toxins of venomous animals from homologues having other physiological functions is particularly problematic as there are no universally accepted methods by which to attribute toxin function using sequence data alone. Bioinformatics tools that do exist are difficult to implement for researchers with little bioinformatics training. Here we announce a machine learning tool called ‘ToxClassifier’ that enables simple and consistent discrimination of toxins from non-toxin sequences with >99% accuracy and compare it to commonly used toxin annotation methods. ‘ToxClassifer’ also reports the best-hit annotation allowing placement of a toxin into the most appropriate toxin protein family, or relates it to a non-toxic protein having the closest homology, giving enhanced curation of existing biological databases and new venomics projects. ‘ToxClassifier’ is available for free, either to download (https://github.com/rgacesa/ToxClassifier) or to use on a web-based server (http://bioserv7.bioinfo.pbf.hr/ToxClassifier/).

Download Full-text

THE TYPES OF BIOLOGICAL DATABASES (BIOBANKS)

Vestnik Tomskogo gosudarstvennogo universiteta Pravo ◽

10.17223/22253513/41/12 ◽

2021 ◽

pp. 136-148

Author(s):

Elena S. Boltanova ◽

◽

Maria P. Imekova ◽

Keyword(s):

Biological Samples ◽

Local Level ◽

Legal Regulation ◽

Biological Data ◽

Biological Information ◽

Biological Databases ◽

Legal Regime ◽

Eu Countries ◽

Legal Method ◽

The Eu

In the world, it is customary to create biological databases of different species. And initially, the databases for the investigation of crimes were widespread. However, later, when their potential and benefits, including for medicine, were assessed, the databases for other areas appeared. Russia was no exception in this regard. Although, in our country, unlike foreign states, the activities of biological databases based on purposes other than the disclosure of crimes are practically not regulated in any way. This article deals with the analysis of legal regulation of biobanks in the Russian Federation and abroad. Special attention is paid to the classification of biobanks. The purpose of the study is to determine the feasibility in the legislative regulation of their activities, as well as the patterns in such a regulation. To achieve this goal, the authors studied extensive regulatory material, which included EU directives and national regulations of the EU member states. The methodological basis of the study was the general scientific and private scientific meth-ods of research. Of course, such private scientific research methods as the comparative-legal method and the formal legal method have been widely used. Due to the comparative legal analysis, it is established that the EU countries have a high level of legislative activity in terms of determining the legal regime of biological databases. All countries recognize the specifics of such a legal regime, which can largely be explained by a special legal nature of biological samples and biological data. In this regard, the following issues related to the activities of biological databases are reflected everywhere in the EU countries at the level of law: the procedure for their creation; the procedure for receiving, processing, storing and transmitting biological samples and the data obtained on their basis; the rights and obligations of database creators and persons who have provided their biological samples and biological data about themselves; a set of measures aimed at protecting the rights and interests of donors and third parties, etc. As it seems, a similar approach to the regulation of the activities of biological bases estab-lished not for the investigation of crimes should be implemented by Russia. At the same time, special attention should be paid to the research of biological databases. In the Russian Federa-tion, they are created, as a rule, at the local level. Their main drawback is that they are sepa-rate sources of limited biological information, functioning independently of each other while comprehensive (concentrated in one place) information can bring invaluable benefits and advantages for Russian science and medicine as a whole. However, this requires the estab-lishment of an appropriate legal framework.

Download Full-text

A Reinforced Tabu Search Approach for 2D Strip Packing

Modeling, Analysis, and Applications in Metaheuristic Computing ◽

10.4018/978-1-4666-0270-0.ch011 ◽

2012 ◽

pp. 171-188

Author(s):

Giglia Gómez-Villouta ◽

Jean-Philippe Hamiez ◽

Jin-Kao Hao

Keyword(s):

Tabu Search ◽

Search Algorithm ◽

Fitness Function ◽

Packing Problem ◽

Two Dimensional ◽

Strip Packing ◽

Benchmark Instances ◽

Finite Set ◽

Infinite Height ◽

Search Approach

This paper discusses a particular “packing” problem, namely the two dimensional strip packing problem, where a finite set of objects have to be located in a strip of fixed width and infinite height. The variant studied considers regular items, rectangular to be precise, that must be packed without overlap, not allowing rotations. The objective is to minimize the height of the resulting packing. In this regard, the authors present a local search algorithm based on the well-known tabu search metaheuristic. Two important components of the presented tabu search strategy are reinforced in attempting to include problem knowledge. The fitness function incorporates a measure related to the empty spaces, while the diversification relies on a set of historically “frozen” objects. The resulting reinforced tabu search approach is evaluated on a set of well-known hard benchmark instances and compared with state-of-the-art algorithms.

Download Full-text

A New Approach for DNA Sequence Similarity Analysis based on Triplets of Nucleic Acid Bases

International Journal of Nanotechnology and Molecular Computation ◽

10.4018/978-1-60960-064-8.ch006 ◽

2010 ◽

Vol 2 (4) ◽

pp. 1-11

Author(s):

Dan Wei ◽

Qingshan Jiang ◽

Sheng Li

Keyword(s):

Nucleic Acid ◽

Dna Sequence ◽

Dna Sequences ◽

Sequence Similarity ◽

Similarity Analysis ◽

Biological Sequence ◽

Nucleic Acid Bases ◽

New Approach ◽

Characteristic Distribution ◽

Sequence Similarity Analysis

Similarity analysis of DNA sequences is a fundamental research area in Bioinformatics. The characteristic distribution of L-tuple, which is the tuple of length L, reflects the valuable information contained in a biological sequence and thus may be used in DNA sequence similarity analysis. However, similarity analysis based on characteristic distribution of L-tuple is not effective for the comparison of highly conservative sequences. In this paper, a new similarity measurement approach based on Triplets of Nucleic Acid Bases (TNAB) is introduced for DNA sequence similarity analysis. The new approach characterizes both the content feature and position feature of a DNA sequence using the frequency and position of occurrence of TNAB in the sequence. The experimental results show that the approach based on TNAB is effective for analysing DNA sequence similarity.

Download Full-text

Influence of Genomic and Other Biological Data Sets in the Understanding of Protein Structures, Functions and Interactions

International Journal of Knowledge Discovery in Bioinformatics ◽

10.4018/jkdb.2011010102 ◽

2011 ◽

Vol 2 (1) ◽

pp. 24-44

Author(s):

N. Srinivasan ◽

G. Agarwal ◽

R. M. Bhaskara ◽

R. Gadkari ◽

O. Krishnadev ◽

...

Keyword(s):

Protein Structures ◽

Biological Properties ◽

Biological Data ◽

Biological Information ◽

Biological Databases ◽

Homology Detection ◽

Putative Gene ◽

Remote Homology ◽

Strategic Integration ◽

Remote Homology Detection

In the post-genomic era, biological databases are growing at a tremendous rate. Despite rapid accumulation of biological information, functions and other biological properties of many putative gene products of various organisms remain either unknown or obscure. This paper examines how strategic integration of large biological databases and combinations of various biological information helps address some of the fundamental questions on protein structure, function and interactions. New developments in function recognition by remote homology detection and strategic use of sequence databases aid recognition of functions of newly discovered proteins. Knowledge of 3-D structures and combined use of sequences and 3-D structures of homologous protein domains expands the ability of remote homology detection enormously. The authors also demonstrate how combined consideration of functions of individual domains of multi-domain proteins helps in recognizing gross biological attributes. This paper also discusses a few cases of combining disparate biological datasets or combination of disparate biological information in obtaining new insights about protein-protein interactions across a host and a pathogen. Finally, the authors discuss how combinations of low resolution structural data, obtained using cryoEM studies, of gigantic multi-component assemblies, and atomic level 3-D structures of the components is effective in inferring finer features in the assembly.

Download Full-text

BLASTing away preconceptions in crystallization trials

Acta Crystallographica Section F Structural Biology Communications ◽

10.1107/s2053230x19000141 ◽

2019 ◽

Vol 75 (3) ◽

pp. 184-192 ◽

Cited By ~ 9

Author(s):

Gabriel Jan Abrahams ◽

Janet Newman

Keyword(s):

Protein Data Bank ◽

Sequence Similarity ◽

Three Dimensional ◽

Protein Sequences ◽

Protein Molecule ◽

Data Bank ◽

Dimensional Structure ◽

Critical Step ◽

Quantitative Verification ◽

Better Than

Crystallization is in many cases a critical step for solving the three-dimensional structure of a protein molecule. Determining which set of chemicals to use in the initial screen is typically agnostic of the protein under investigation; however, crystallization efficiency could potentially be improved if this were not the case. Previous work has assumed that sequence similarity may provide useful information about appropriate crystallization cocktails; however, the authors are not aware of any quantitative verification of this assumption. This research investigates whether, given current information, one can detect any correlation between sequence similarity and crystallization cocktails. BLAST was used to quantitate the similarity between protein sequences in the Protein Data Bank, and this was compared with three estimations of the chemical similarities of the respective crystallization cocktails. No correlation was detected between proteins of similar (but not identical) sequence and their crystallization cocktails, suggesting that methods of determining screens based on this assumption are unlikely to result in screens that are better than those currently in use.

Download Full-text

On Local Search for Bi-objective Knapsack Problems

Evolutionary Computation ◽

10.1162/evco_a_00074 ◽

2013 ◽

Vol 21 (1) ◽

pp. 179-196 ◽

Cited By ~ 9

Author(s):

Arnaud Liefooghe ◽

Luís Paquete ◽

José Rui Figueira

Keyword(s):

Experimental Study ◽

Local Search ◽

Search Algorithm ◽

Exact Algorithms ◽

Knapsack Problems ◽

Optimal Solutions ◽

Local Search Algorithm ◽

Property A ◽

Total Profit ◽

Search Approach

In this article, a local search approach is proposed for three variants of the bi-objective binary knapsack problem, with the aim of maximizing the total profit and minimizing the total weight. First, an experimental study on a given structural property of connectedness of the efficient set is conducted. Based on this property, a local search algorithm is proposed and its performance is compared to exact algorithms in terms of runtime and quality metrics. The experimental results indicate that this simple local search algorithm is able to find a representative set of optimal solutions in most of the cases, and in much less time than exact algorithms.

Download Full-text

RNASeq_similarity_matrix: visually identify sample mix-ups in RNASeq data using a ‘genomic’ sequence similarity matrix

Bioinformatics ◽

10.1093/bioinformatics/btz821 ◽

2019 ◽

Vol 36 (6) ◽

pp. 1940-1941

Author(s):

Nicolaas C Kist ◽

Robert A Power ◽

Andrew Skelton ◽

Seth D Seegobin ◽

Moira Verbelen ◽

...

Keyword(s):

Genomic Sequence ◽

Sequence Similarity ◽

Current Method ◽

Biological Data ◽

Similarity Matrix ◽

Rna Seq ◽

Rnaseq Data ◽

Phenotype Data ◽

Multiple Samples ◽

Automated Tool

Abstract Summary Mistakes in linking a patient’s biological samples with their phenotype data can confound RNA-Seq studies. The current method for avoiding such sample mix-ups is to test for inconsistencies between biological data and known phenotype data such as sex. However, in DNA studies a common QC step is to check for unexpected relatedness between samples. Here, we extend this method to RNA-Seq, which allows the detection of duplicated samples without relying on identifying inconsistencies with phenotype data. Results We present RNASeq_similarity_matrix: an automated tool to generate a sequence similarity matrix from RNA-Seq data, which can be used to visually identify sample mix-ups. This is particularly useful when a study contains multiple samples from the same individual, but can also detect contamination in studies with only one sample per individual. Availability and implementation RNASeq_similarity_matrix has been made available as a documented GPL licensed Docker image on www.github.com/nicokist/RNASeq_similarity_matrix.

Download Full-text