scholarly journals EnTrance: Exploration of Entropy Scaling Ball Cover Search in Protein Sequences

2021 ◽  
Author(s):  
Yoonjin Kim ◽  
Zhen Guo ◽  
Jeffrey A. Robertson ◽  
Benjamin Reidys ◽  
Ziyan Zhang ◽  
...  

Biological sequence alignment using computational power has received increasing attention as technology develops. It is important to predict if a novel DNA sequence is potentially dangerous by determining its taxonomic identity and functional characteristics through sequence identification. This task can be facilitated by the rapidly increasing amounts of biological data in DNA and protein databases thanks to the corresponding increase in computational and storage costs. Unfortunately, the growth in biological databases has caused difficulty in exploiting this information. EnTrance presents an approach that can expedite the analysis of this large database by employing entropy scaling. This allows scaling with the amount of entropy in the database instead of scaling with the absolute size of the database. Since DNA and protein sequences are biologically meaningful, the space of biological sequences demonstrates the structure exploited by entropy scaling. As biological sequence databases grow, taking advantage of this structure can be extremely beneficial for reducing query times. EnTrance, the entropy scaling search algorithm introduced here, accelerates the biological sequence search exemplified by tools such as BLAST. EnTrance does this by utilizing a two step search approach. In this fashion, EnTrance quickly reduces the number of potential matches before more exhaustively searching the remaining sequences. Tests of EnTrance show that this approach can lead to improved query times. However, constructing the required entropy scaling indices beforehand can be challenging. To improve performance, EnTrance investigates several ideas for accelerating index build time that supports entropy scaling searches. In particular, EnTrance makes full use of the concurrency features of Go language greatly reducing the index build time. Our results identify key tradeoffs and demonstrate that there is potential in using these techniques for sequence similarity searches. Finally, EnTrance returns more matches and higher percentage identity matches when compared with existing tools.

Energies ◽  
2021 ◽  
Vol 14 (4) ◽  
pp. 857
Author(s):  
Jahedul Islam ◽  
Md Shokor A. Rahaman ◽  
Pandian M. Vasant ◽  
Berihun Mamo Negash ◽  
Ahshanul Hoqe ◽  
...  

Well placement optimization is considered a non-convex and highly multimodal optimization problem. In this article, a modified crow search algorithm is proposed to tackle the well placement optimization problem. This article proposes modifications based on local search and niching techniques in the crow search algorithm (CSA). At first, the suggested approach is verified by experimenting with the benchmark functions. For test functions, the results of the proposed approach demonstrated a higher convergence rate and a better solution. Again, the performance of the proposed technique is evaluated with well placement optimization problem and compared with particle swarm optimization (PSO), the Gravitational Search Algorithm (GSA), and the Crow search algorithm (CSA). The outcomes of the study revealed that the niching crow search algorithm is the most efficient and effective compared to the other techniques.


2021 ◽  
Vol 11 (7) ◽  
pp. 2962
Author(s):  
Mohamadreza Afrasiabi ◽  
Christof Lüthi ◽  
Markus Bambach ◽  
Konrad Wegener

This paper presents an efficient mesoscale simulation of a Laser Powder Bed Fusion (LPBF) process using the Smoothed Particle Hydrodynamics (SPH) method. The efficiency lies in reducing the computational effort via spatial adaptivity, for which a dynamic particle refinement pattern with an optimized neighbor-search algorithm is used. The melt pool dynamics is modeled by resolving the thermal, mechanical, and material fields in a single laser track application. After validating the solver by two benchmark tests where analytical and experimental data are available, we simulate a single-track LPBF process by adopting SPH in multi resolutions. The LPBF simulation results show that the proposed adaptive refinement with and without an optimized neighbor-search approach saves almost 50% and 35% of the SPH calculation time, respectively. This achievement enables several opportunities for parametric studies and running high-resolution models with less computational effort.


2016 ◽  
Vol 2 ◽  
pp. e90 ◽  
Author(s):  
Ranko Gacesa ◽  
David J. Barlow ◽  
Paul F. Long

Ascribing function to sequence in the absence of biological data is an ongoing challenge in bioinformatics. Differentiating the toxins of venomous animals from homologues having other physiological functions is particularly problematic as there are no universally accepted methods by which to attribute toxin function using sequence data alone. Bioinformatics tools that do exist are difficult to implement for researchers with little bioinformatics training. Here we announce a machine learning tool called ‘ToxClassifier’ that enables simple and consistent discrimination of toxins from non-toxin sequences with >99% accuracy and compare it to commonly used toxin annotation methods. ‘ToxClassifer’ also reports the best-hit annotation allowing placement of a toxin into the most appropriate toxin protein family, or relates it to a non-toxic protein having the closest homology, giving enhanced curation of existing biological databases and new venomics projects. ‘ToxClassifier’ is available for free, either to download (https://github.com/rgacesa/ToxClassifier) or to use on a web-based server (http://bioserv7.bioinfo.pbf.hr/ToxClassifier/).


Author(s):  
Elena S. Boltanova ◽  
◽  
Maria P. Imekova ◽  

In the world, it is customary to create biological databases of different species. And initially, the databases for the investigation of crimes were widespread. However, later, when their potential and benefits, including for medicine, were assessed, the databases for other areas appeared. Russia was no exception in this regard. Although, in our country, unlike foreign states, the activities of biological databases based on purposes other than the disclosure of crimes are practically not regulated in any way. This article deals with the analysis of legal regulation of biobanks in the Russian Federation and abroad. Special attention is paid to the classification of biobanks. The purpose of the study is to determine the feasibility in the legislative regulation of their activities, as well as the patterns in such a regulation. To achieve this goal, the authors studied extensive regulatory material, which included EU directives and national regulations of the EU member states. The methodological basis of the study was the general scientific and private scientific meth-ods of research. Of course, such private scientific research methods as the comparative-legal method and the formal legal method have been widely used. Due to the comparative legal analysis, it is established that the EU countries have a high level of legislative activity in terms of determining the legal regime of biological databases. All countries recognize the specifics of such a legal regime, which can largely be explained by a special legal nature of biological samples and biological data. In this regard, the following issues related to the activities of biological databases are reflected everywhere in the EU countries at the level of law: the procedure for their creation; the procedure for receiving, processing, storing and transmitting biological samples and the data obtained on their basis; the rights and obligations of database creators and persons who have provided their biological samples and biological data about themselves; a set of measures aimed at protecting the rights and interests of donors and third parties, etc. As it seems, a similar approach to the regulation of the activities of biological bases estab-lished not for the investigation of crimes should be implemented by Russia. At the same time, special attention should be paid to the research of biological databases. In the Russian Federa-tion, they are created, as a rule, at the local level. Their main drawback is that they are sepa-rate sources of limited biological information, functioning independently of each other while comprehensive (concentrated in one place) information can bring invaluable benefits and advantages for Russian science and medicine as a whole. However, this requires the estab-lishment of an appropriate legal framework.


Author(s):  
Giglia Gómez-Villouta ◽  
Jean-Philippe Hamiez ◽  
Jin-Kao Hao

This paper discusses a particular “packing” problem, namely the two dimensional strip packing problem, where a finite set of objects have to be located in a strip of fixed width and infinite height. The variant studied considers regular items, rectangular to be precise, that must be packed without overlap, not allowing rotations. The objective is to minimize the height of the resulting packing. In this regard, the authors present a local search algorithm based on the well-known tabu search metaheuristic. Two important components of the presented tabu search strategy are reinforced in attempting to include problem knowledge. The fitness function incorporates a measure related to the empty spaces, while the diversification relies on a set of historically “frozen” objects. The resulting reinforced tabu search approach is evaluated on a set of well-known hard benchmark instances and compared with state-of-the-art algorithms.


Author(s):  
Dan Wei ◽  
Qingshan Jiang ◽  
Sheng Li

Similarity analysis of DNA sequences is a fundamental research area in Bioinformatics. The characteristic distribution of L-tuple, which is the tuple of length L, reflects the valuable information contained in a biological sequence and thus may be used in DNA sequence similarity analysis. However, similarity analysis based on characteristic distribution of L-tuple is not effective for the comparison of highly conservative sequences. In this paper, a new similarity measurement approach based on Triplets of Nucleic Acid Bases (TNAB) is introduced for DNA sequence similarity analysis. The new approach characterizes both the content feature and position feature of a DNA sequence using the frequency and position of occurrence of TNAB in the sequence. The experimental results show that the approach based on TNAB is effective for analysing DNA sequence similarity.


Author(s):  
N. Srinivasan ◽  
G. Agarwal ◽  
R. M. Bhaskara ◽  
R. Gadkari ◽  
O. Krishnadev ◽  
...  

In the post-genomic era, biological databases are growing at a tremendous rate. Despite rapid accumulation of biological information, functions and other biological properties of many putative gene products of various organisms remain either unknown or obscure. This paper examines how strategic integration of large biological databases and combinations of various biological information helps address some of the fundamental questions on protein structure, function and interactions. New developments in function recognition by remote homology detection and strategic use of sequence databases aid recognition of functions of newly discovered proteins. Knowledge of 3-D structures and combined use of sequences and 3-D structures of homologous protein domains expands the ability of remote homology detection enormously. The authors also demonstrate how combined consideration of functions of individual domains of multi-domain proteins helps in recognizing gross biological attributes. This paper also discusses a few cases of combining disparate biological datasets or combination of disparate biological information in obtaining new insights about protein-protein interactions across a host and a pathogen. Finally, the authors discuss how combinations of low resolution structural data, obtained using cryoEM studies, of gigantic multi-component assemblies, and atomic level 3-D structures of the components is effective in inferring finer features in the assembly.


Author(s):  
Gabriel Jan Abrahams ◽  
Janet Newman

Crystallization is in many cases a critical step for solving the three-dimensional structure of a protein molecule. Determining which set of chemicals to use in the initial screen is typically agnostic of the protein under investigation; however, crystallization efficiency could potentially be improved if this were not the case. Previous work has assumed that sequence similarity may provide useful information about appropriate crystallization cocktails; however, the authors are not aware of any quantitative verification of this assumption. This research investigates whether, given current information, one can detect any correlation between sequence similarity and crystallization cocktails. BLAST was used to quantitate the similarity between protein sequences in the Protein Data Bank, and this was compared with three estimations of the chemical similarities of the respective crystallization cocktails. No correlation was detected between proteins of similar (but not identical) sequence and their crystallization cocktails, suggesting that methods of determining screens based on this assumption are unlikely to result in screens that are better than those currently in use.


2013 ◽  
Vol 21 (1) ◽  
pp. 179-196 ◽  
Author(s):  
Arnaud Liefooghe ◽  
Luís Paquete ◽  
José Rui Figueira

In this article, a local search approach is proposed for three variants of the bi-objective binary knapsack problem, with the aim of maximizing the total profit and minimizing the total weight. First, an experimental study on a given structural property of connectedness of the efficient set is conducted. Based on this property, a local search algorithm is proposed and its performance is compared to exact algorithms in terms of runtime and quality metrics. The experimental results indicate that this simple local search algorithm is able to find a representative set of optimal solutions in most of the cases, and in much less time than exact algorithms.


2019 ◽  
Vol 36 (6) ◽  
pp. 1940-1941
Author(s):  
Nicolaas C Kist ◽  
Robert A Power ◽  
Andrew Skelton ◽  
Seth D Seegobin ◽  
Moira Verbelen ◽  
...  

Abstract Summary Mistakes in linking a patient’s biological samples with their phenotype data can confound RNA-Seq studies. The current method for avoiding such sample mix-ups is to test for inconsistencies between biological data and known phenotype data such as sex. However, in DNA studies a common QC step is to check for unexpected relatedness between samples. Here, we extend this method to RNA-Seq, which allows the detection of duplicated samples without relying on identifying inconsistencies with phenotype data. Results We present RNASeq_similarity_matrix: an automated tool to generate a sequence similarity matrix from RNA-Seq data, which can be used to visually identify sample mix-ups. This is particularly useful when a study contains multiple samples from the same individual, but can also detect contamination in studies with only one sample per individual. Availability and implementation RNASeq_similarity_matrix has been made available as a documented GPL licensed Docker image on www.github.com/nicokist/RNASeq_similarity_matrix.


Sign in / Sign up

Export Citation Format

Share Document