scholarly journals Optimizing viral genome subsampling by genetic diversity and temporal distribution (TARDiS) for Phylogenetics

2021 ◽  
Author(s):  
Simone Marini ◽  
Carla Mavian ◽  
Alberto Riva ◽  
Marco Salemi ◽  
Brittany Rife Magalis

AbstractTARDiS for Philogenetics is a novel tool for optimal genetic sub-sampling. It optimizes both genetic diversity and temporal distribution through a genetic algorithm. TARDiS, along with example data sets and a user manual, is available at https://github.com/smarini/tardis-phylogenetics

2021 ◽  
Author(s):  
Sebastiaan Valkiers ◽  
Max Van Houcke ◽  
Kris Laukens ◽  
Pieter Meysman

The T-cell receptor (TCR) determines the specificity of a T-cell towards an epitope. As of yet, the rules for antigen recognition remain largely undetermined. Current methods for grouping TCRs according to their epitope specificity remain limited in performance and scalability. Multiple methodologies have been developed, but all of them fail to efficiently cluster large data sets exceeding 1 million sequences. To account for this limitation, we developed clusTCR, a rapid TCR clustering alternative that efficiently scales up to millions of CDR3 amino acid sequences. Benchmarking comparisons revealed similar accuracy of clusTCR with other TCR clustering methods. clusTCR offers a drastic improvement in clustering speed, which allows clustering of millions of TCR sequences in just a few minutes through efficient similarity searching and sequence hashing.clusTCR was written in Python 3. It is available as an anaconda package (https://anaconda.org/svalkiers/clustcr) and on github (https://github.com/svalkiers/clusTCR).


2021 ◽  
Author(s):  
Benbo Gao ◽  
Jing Zhu ◽  
Soumya Negi ◽  
Xinmin Zhang ◽  
Stefka Gyoneva ◽  
...  

AbstractSummaryWe developed Quickomics, a feature-rich R Shiny-powered tool to enable biologists to fully explore complex omics data and perform advanced analysis in an easy-to-use interactive interface. It covers a broad range of secondary and tertiary analytical tasks after primary analysis of omics data is completed. Each functional module is equipped with customized configurations and generates both interactive and publication-ready high-resolution plots to uncover biological insights from data. The modular design makes the tool extensible with ease.AvailabilityResearchers can experience the functionalities with their own data or demo RNA-Seq and proteomics data sets by using the app hosted at http://quickomics.bxgenomics.com and following the tutorial, https://bit.ly/3rXIyhL. The source code under GPLv3 license is provided at https://github.com/interactivereport/[email protected], [email protected] informationSupplementary materials are available at https://bit.ly/37HP17g.


mSystems ◽  
2021 ◽  
Vol 6 (1) ◽  
Author(s):  
Gongchao Jing ◽  
Lu Liu ◽  
Zengbin Wang ◽  
Yufeng Zhang ◽  
Li Qian ◽  
...  

ABSTRACT Metagenomic data sets from diverse environments have been growing rapidly. To ensure accessibility and reusability, tools that quickly and informatively correlate new microbiomes with existing ones are in demand. Here, we introduce Microbiome Search Engine 2 (MSE 2), a microbiome database platform for searching query microbiomes in the global metagenome data space based on the taxonomic or functional similarity of a whole microbiome to those in the database. MSE 2 consists of (i) a well-organized and regularly updated microbiome database that currently contains over 250,000 metagenomic shotgun and 16S rRNA gene amplicon samples associated with unified metadata collected from 798 studies, (ii) an enhanced search engine that enables real-time and fast (<0.5 s per query) searches against the entire database for best-matched microbiomes using overall taxonomic or functional profiles, and (iii) a Web-based graphical user interface for user-friendly searching, data browsing, and tutoring. MSE 2 is freely accessible via http://mse.ac.cn. For standalone searches of customized microbiome databases, the kernel of the MSE 2 search engine is provided at GitHub (https://github.com/qibebt-bioinfo/meta-storms). IMPORTANCE A search-based strategy is useful for large-scale mining of microbiome data sets, such as a bird’s-eye view of the microbiome data space and disease diagnosis via microbiome big data. Here, we introduce Microbiome Search Engine 2 (MSE 2), a microbiome database platform for searching query microbiomes against the existing microbiome data sets on the basis of their similarity in taxonomic structure or functional profile. Key improvements include database extension, data compatibility, a search engine kernel, and a user interface. The new ability to search the microbiome space via functional similarity greatly expands the scope of search-based mining of the microbiome big data.


2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Yi Lu ◽  
Menghan Liu ◽  
Jie Zhou ◽  
Zhigang Li

Intrusion Detection System (IDS) is an important part of ensuring network security. When the system faces network attacks, it can identify the source of threats in a timely and accurate manner and adjust strategies to prevent hackers from intruding. Efficient IDS can identify external threats well, but traditional IDS has poor performance and low recognition accuracy. To improve the detection rate and accuracy of IDS, this paper proposes a novel ACGA-BPNN method based on adaptive clonal genetic algorithm (ACGA) and backpropagation neural network (BPNN). ACGA-BPNN is simulated on the KDD-CUP’99 and UNSW-NB15 data sets. The simulation results indicate that, in contrast to the methods based on simulated annealing (SA) and genetic algorithm (GA), the detection rate and accuracy of ACGA-BPNN are much higher than of GA-BPNN and SA-BPNN. In the classification results of KDD-CUP’99, the classification accuracy of ACGA-BPNN is 11% higher than GA-BPNN and 24.2% higher than SA-BPNN, and F-score reaches 99.0%. In addition, ACGA-BPNN has good global searchability and its convergence speed is higher than that of GA-BPNN and SA-BPNN. Furthermore, ACGA-BPNN significantly improves the overall detection performance of IDS.


2017 ◽  
pp. 1437-1467
Author(s):  
Joydev Hazra ◽  
Aditi Roy Chowdhury ◽  
Paramartha Dutta

Registration of medical images like CT-MR, MR-MR etc. are challenging area for researchers. This chapter introduces a new cluster based registration technique with help of the supervised optimized neural network. Features are extracted from different cluster of an image obtained from clustering algorithms. To overcome the drawback regarding convergence rate of neural network, an optimized neural network is proposed in this chapter. The weights are optimized to increase the convergence rate as well as to avoid stuck in local minima. Different clustering algorithms are explored to minimize the clustering error of an image and extract features from suitable one. The supervised learning method applied to train the neural network. During this training process an optimization algorithm named Genetic Algorithm (GA) is used to update the weights of a neural network. To demonstrate the effectiveness of the proposed method, investigation is carried out on MR T1, T2 data sets. The proposed method shows convincing results in comparison with other existing techniques.


1998 ◽  
Vol 06 (01n02) ◽  
pp. 135-150 ◽  
Author(s):  
D. G. Simons ◽  
M. Snellen

For a selected number of shallow water test cases of the 1997 Geoacoustic Inversion Workshop we have applied Matched-Field Inversion to determine the geoacoustic and geometric (source location, water depth) parameters. A genetic algorithm has been applied for performing the optimization, whereas the replica fields have been calculated using a standard normal-mode model. The energy function to be optimized is based on the incoherent multi-frequency Bartlett processor. We have used the data sets provided at a few frequencies in the band 25–500 Hz for a vertical line array positioned at 5 km from the source. A comparison between the inverted and true parameter values is made.


Author(s):  
Christopher Hammond ◽  
Cameron J. Turner

Non-Uniform Rational B-Splines (NURBS) curves have long been used to model 1D and 2D data because they are efficient to calculate, easy to manipulate, and capable of handling discontinuities and drastic changes in the general topology of the data. However, the user must assist in defining the control points, weights, knots and an order for the curve in order to fit the curve to the data. This paper uses a modified Genetic Algorithm (GA) to choose and manipulate these various parameters to produce a NURBS curve that minimizes the error between the data and the curve and also minimizes the time it takes the algorithm to compute the solution. The algorithm is tested on several 1D trial data sets and the results are explained. Then, several general difficulties for this application of the GA are explained and possible methods for overcoming them are discussed.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Karim Sorkheh ◽  
Mehrana Koohi Dehkordi ◽  
Sezai Ercisli ◽  
Attila Hegedus ◽  
Júlia Halász

Editor's Note: this Article has been retracted; the Retraction Note is available at https://www.nature.com/articles/s41598-020-72522-x


2019 ◽  
Vol 66 (4) ◽  
pp. 403-411 ◽  
Author(s):  
Yuanjie Zhi ◽  
Dongmei Fu ◽  
Tao Yang ◽  
Dawei Zhang ◽  
Xiaogang Li ◽  
...  

PurposeThis study aims to achieve long-term prediction on a specific monotonic data series of atmospheric corrosion rate vs time.Design/methodology/approachThis paper presents a new method, used to the collected corrosion data of carbon steel provided by the China Gateway to Corrosion and Protection, that combines non-linear gray Bernoulli model (NGBM(1,1) with genetic algorithm to attain the purpose of this study.FindingsResults of the experiments showed that the present study’s method is more accurate than other algorithms. In particular, the mean absolute percentage error (MAPE) and the root mean square error (RMSE) of the proposed method in data sets are 9.15 per cent and 1.23 µm/a, respectively. Furthermore, this study illustrates that model parameter can be used to evaluate the similarity of curve tendency between two carbon steel data sets.Originality/valueCorrosion data are part of a typical small-sample data set, and these also belong to a gray system because corrosion has a clear outcome and an uncertainly occurrence mechanism. In this work, a new gray forecast model was proposed to achieve the goal of long-term prediction of carbon steel in China.


2013 ◽  
Vol 3 (4) ◽  
pp. 31-46 ◽  
Author(s):  
Hanaa Ismail Elshazly ◽  
Ahmad Taher Azar ◽  
Aboul Ella Hassanien ◽  
Abeer Mohamed Elkorany

Computational intelligence provides the biomedical domain by a significant support. The application of machine learning techniques in medical applications have been evolved from the physician needs. Screening, medical images, pattern classification, prognosis are some examples of health care support systems. Typically medical data has its own characteristics such as huge size and features, continuous and real attributes that refer to patients' investigations. Therefore, discretization and feature selection process are considered a key issue in improving the extracted knowledge from patients' investigations records. In this paper, a hybrid system that integrates Rough Set (RS) and Genetic Algorithm (GA) is presented for the efficient classification of medical data sets of different sizes and dimensionalities. Genetic Algorithm is applied with the aim of reducing the dimension of medical datasets and RS decision rules were used for efficient classification. Furthermore, the proposed system applies the Entropy Gain Information (EI) for discretization process. Four biomedical data sets are tested by the proposed system (EI-GA-RS), and the highest score was obtained through three different datasets. Other different hybrid techniques shared the proposed technique the highest accuracy but the proposed system preserves its place as one of the highest results systems four three different sets. EI as discretization technique also is a common part for the best results in the mentioned datasets while RS as an evaluator realized the best results in three different data sets.


Sign in / Sign up

Export Citation Format

Share Document