Optimizing viral genome subsampling by genetic diversity and temporal distribution (TARDiS) for Phylogenetics

Mapping Intimacies ◽

10.1101/2021.01.15.426832 ◽

2021 ◽

Author(s):

Simone Marini ◽

Carla Mavian ◽

Alberto Riva ◽

Marco Salemi ◽

Brittany Rife Magalis

Keyword(s):

Genetic Algorithm ◽

Genetic Diversity ◽

Viral Genome ◽

Temporal Distribution ◽

Data Sets ◽

Link Type

AbstractTARDiS for Philogenetics is a novel tool for optimal genetic sub-sampling. It optimizes both genetic diversity and temporal distribution through a genetic algorithm. TARDiS, along with example data sets and a user manual, is available at https://github.com/smarini/tardis-phylogenetics

Download Full-text

clusTCR: a Python interface for rapid clustering of large sets of CDR3 sequences

10.1101/2021.02.22.432291 ◽

2021 ◽

Author(s):

Sebastiaan Valkiers ◽

Max Van Houcke ◽

Kris Laukens ◽

Pieter Meysman

Keyword(s):

T Cell ◽

Large Data ◽

Cell Receptor ◽

Amino Acid Sequences ◽

Large Data Sets ◽

Data Sets ◽

Clustering Methods ◽

Link Type ◽

Large Sets ◽

Similar Accuracy

The T-cell receptor (TCR) determines the specificity of a T-cell towards an epitope. As of yet, the rules for antigen recognition remain largely undetermined. Current methods for grouping TCRs according to their epitope specificity remain limited in performance and scalability. Multiple methodologies have been developed, but all of them fail to efficiently cluster large data sets exceeding 1 million sequences. To account for this limitation, we developed clusTCR, a rapid TCR clustering alternative that efficiently scales up to millions of CDR3 amino acid sequences. Benchmarking comparisons revealed similar accuracy of clusTCR with other TCR clustering methods. clusTCR offers a drastic improvement in clustering speed, which allows clustering of millions of TCR sequences in just a few minutes through efficient similarity searching and sequence hashing.clusTCR was written in Python 3. It is available as an anaconda package (https://anaconda.org/svalkiers/clustcr) and on github (https://github.com/svalkiers/clusTCR).

Download Full-text

Quickomics: exploring omics data in an intuitive, interactive and informative manner

10.1101/2021.01.19.427296 ◽

2021 ◽

Author(s):

Benbo Gao ◽

Jing Zhu ◽

Soumya Negi ◽

Xinmin Zhang ◽

Stefka Gyoneva ◽

...

Keyword(s):

Modular Design ◽

Functional Module ◽

Supplementary Information ◽

Data Sets ◽

Omics Data ◽

Proteomics Data ◽

Primary Analysis ◽

Link Type ◽

R Shiny ◽

Advanced Analysis

AbstractSummaryWe developed Quickomics, a feature-rich R Shiny-powered tool to enable biologists to fully explore complex omics data and perform advanced analysis in an easy-to-use interactive interface. It covers a broad range of secondary and tertiary analytical tasks after primary analysis of omics data is completed. Each functional module is equipped with customized configurations and generates both interactive and publication-ready high-resolution plots to uncover biological insights from data. The modular design makes the tool extensible with ease.AvailabilityResearchers can experience the functionalities with their own data or demo RNA-Seq and proteomics data sets by using the app hosted at http://quickomics.bxgenomics.com and following the tutorial, https://bit.ly/3rXIyhL. The source code under GPLv3 license is provided at https://github.com/interactivereport/[email protected], [email protected] informationSupplementary materials are available at https://bit.ly/37HP17g.

Download Full-text

Microbiome Search Engine 2: a Platform for Taxonomic and Functional Search of Global Microbiomes on the Whole-Microbiome Level

mSystems ◽

10.1128/msystems.00943-20 ◽

2021 ◽

Vol 6 (1) ◽

Author(s):

Gongchao Jing ◽

Lu Liu ◽

Zengbin Wang ◽

Yufeng Zhang ◽

Li Qian ◽

...

Keyword(s):

Big Data ◽

User Interface ◽

Search Engine ◽

Functional Similarity ◽

Metagenomic Data ◽

Data Sets ◽

Data Space ◽

Link Type ◽

Database Platform ◽

Microbiome Data

ABSTRACT Metagenomic data sets from diverse environments have been growing rapidly. To ensure accessibility and reusability, tools that quickly and informatively correlate new microbiomes with existing ones are in demand. Here, we introduce Microbiome Search Engine 2 (MSE 2), a microbiome database platform for searching query microbiomes in the global metagenome data space based on the taxonomic or functional similarity of a whole microbiome to those in the database. MSE 2 consists of (i) a well-organized and regularly updated microbiome database that currently contains over 250,000 metagenomic shotgun and 16S rRNA gene amplicon samples associated with unified metadata collected from 798 studies, (ii) an enhanced search engine that enables real-time and fast (<0.5 s per query) searches against the entire database for best-matched microbiomes using overall taxonomic or functional profiles, and (iii) a Web-based graphical user interface for user-friendly searching, data browsing, and tutoring. MSE 2 is freely accessible via http://mse.ac.cn. For standalone searches of customized microbiome databases, the kernel of the MSE 2 search engine is provided at GitHub (https://github.com/qibebt-bioinfo/meta-storms). IMPORTANCE A search-based strategy is useful for large-scale mining of microbiome data sets, such as a bird’s-eye view of the microbiome data space and disease diagnosis via microbiome big data. Here, we introduce Microbiome Search Engine 2 (MSE 2), a microbiome database platform for searching query microbiomes against the existing microbiome data sets on the basis of their similarity in taxonomic structure or functional profile. Key improvements include database extension, data compatibility, a search engine kernel, and a user interface. The new ability to search the microbiome space via functional similarity greatly expands the scope of search-based mining of the microbiome big data.

Download Full-text

Intrusion Detection Method Based on Adaptive Clonal Genetic Algorithm and Backpropagation Neural Network

Security and Communication Networks ◽

10.1155/2021/9938586 ◽

2021 ◽

Vol 2021 ◽

pp. 1-12

Author(s):

Yi Lu ◽

Menghan Liu ◽

Jie Zhou ◽

Zhigang Li

Keyword(s):

Neural Network ◽

Genetic Algorithm ◽

Intrusion Detection ◽

Detection Rate ◽

Detection System ◽

Poor Performance ◽

Data Sets ◽

Backpropagation Neural Network ◽

External Threats ◽

Kdd Cup 99

Intrusion Detection System (IDS) is an important part of ensuring network security. When the system faces network attacks, it can identify the source of threats in a timely and accurate manner and adjust strategies to prevent hackers from intruding. Efficient IDS can identify external threats well, but traditional IDS has poor performance and low recognition accuracy. To improve the detection rate and accuracy of IDS, this paper proposes a novel ACGA-BPNN method based on adaptive clonal genetic algorithm (ACGA) and backpropagation neural network (BPNN). ACGA-BPNN is simulated on the KDD-CUP’99 and UNSW-NB15 data sets. The simulation results indicate that, in contrast to the methods based on simulated annealing (SA) and genetic algorithm (GA), the detection rate and accuracy of ACGA-BPNN are much higher than of GA-BPNN and SA-BPNN. In the classification results of KDD-CUP’99, the classification accuracy of ACGA-BPNN is 11% higher than GA-BPNN and 24.2% higher than SA-BPNN, and F-score reaches 99.0%. In addition, ACGA-BPNN has good global searchability and its convergence speed is higher than that of GA-BPNN and SA-BPNN. Furthermore, ACGA-BPNN significantly improves the overall detection performance of IDS.

Download Full-text

Cluster Based Medical Image Registration Using Optimized Neural Network

Medical Imaging ◽

10.4018/978-1-5225-0571-6.ch061 ◽

2017 ◽

pp. 1437-1467

Author(s):

Joydev Hazra ◽

Aditi Roy Chowdhury ◽

Paramartha Dutta

Keyword(s):

Neural Network ◽

Genetic Algorithm ◽

Image Registration ◽

Convergence Rate ◽

Optimization Algorithm ◽

Clustering Algorithms ◽

Data Sets ◽

Learning Method ◽

Local Minima ◽

The Neural Network

Registration of medical images like CT-MR, MR-MR etc. are challenging area for researchers. This chapter introduces a new cluster based registration technique with help of the supervised optimized neural network. Features are extracted from different cluster of an image obtained from clustering algorithms. To overcome the drawback regarding convergence rate of neural network, an optimized neural network is proposed in this chapter. The weights are optimized to increase the convergence rate as well as to avoid stuck in local minima. Different clustering algorithms are explored to minimize the clustering error of an image and extract features from suitable one. The supervised learning method applied to train the neural network. During this training process an optimization algorithm named Genetic Algorithm (GA) is used to update the weights of a neural network. To demonstrate the effectiveness of the proposed method, investigation is carried out on MR T1, T2 data sets. The proposed method shows convincing results in comparison with other existing techniques.

Download Full-text

Multi-Frequency Matched-Field Inversion of Benchmark Data Using a Genetic Algorithm

Journal of Computational Acoustics ◽

10.1142/s0218396x98000119 ◽

1998 ◽

Vol 06 (01n02) ◽

pp. 135-150 ◽

Cited By ~ 10

Author(s):

D. G. Simons ◽

M. Snellen

Keyword(s):

Genetic Algorithm ◽

Test Cases ◽

Data Sets ◽

True Parameter ◽

Benchmark Data ◽

Line Array ◽

Standard Normal ◽

Water Test ◽

Parameter Values ◽

Field Inversion

For a selected number of shallow water test cases of the 1997 Geoacoustic Inversion Workshop we have applied Matched-Field Inversion to determine the geoacoustic and geometric (source location, water depth) parameters. A genetic algorithm has been applied for performing the optimization, whereas the replica fields have been calculated using a standard normal-mode model. The energy function to be optimized is based on the incoherent multi-frequency Bartlett processor. We have used the data sets provided at a few frequencies in the band 25–500 Hz for a vertical line array positioned at 5 km from the source. A comparison between the inverted and true parameter values is made.

Download Full-text

Data Modeling Using NURBs Curves and Modified Genetic Algorithms

Volume 3: Design and Manufacturing, Parts A and B ◽

10.1115/imece2010-37459 ◽

2010 ◽

Cited By ~ 1

Author(s):

Christopher Hammond ◽

Cameron J. Turner

Keyword(s):

Genetic Algorithm ◽

Genetic Algorithms ◽

Trial Data ◽

Data Sets ◽

Control Points ◽

Nurbs Curve ◽

B Splines ◽

Modified Genetic Algorithm ◽

Nurbs Curves ◽

2D Data

Non-Uniform Rational B-Splines (NURBS) curves have long been used to model 1D and 2D data because they are efficient to calculate, easy to manipulate, and capable of handling discontinuities and drastic changes in the general topology of the data. However, the user must assist in defining the control points, weights, knots and an order for the curve in order to fit the curve to the data. This paper uses a modified Genetic Algorithm (GA) to choose and manipulate these various parameters to produce a NURBS curve that minimizes the error between the data and the curve and also minimizes the time it takes the algorithm to compute the solution. The algorithm is tested on several 1D trial data sets and the results are explained. Then, several general difficulties for this application of the GA are explained and possible methods for overcoming them are discussed.

Download Full-text

Retraction Note: Comparison of traditional and new generation DNA markers declares high genetic diversity and differentiated population structure of wild almond species

Scientific Reports ◽

10.1038/s41598-020-72522-5 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Karim Sorkheh ◽

Mehrana Koohi Dehkordi ◽

Sezai Ercisli ◽

Attila Hegedus ◽

Júlia Halász

Keyword(s):

Genetic Diversity ◽

Population Structure ◽

Dna Markers ◽

High Genetic Diversity ◽

Link Type ◽

New Generation ◽

Wild Almond

Editor's Note: this Article has been retracted; the Retraction Note is available at https://www.nature.com/articles/s41598-020-72522-x

Download Full-text

Long-term prediction on atmospheric corrosion data series of carbon steel in China based on NGBM(1,1) model and genetic algorithm

Anti-Corrosion Methods and Materials ◽

10.1108/acmm-11-2017-1858 ◽

2019 ◽

Vol 66 (4) ◽

pp. 403-411 ◽

Cited By ~ 1

Author(s):

Yuanjie Zhi ◽

Dongmei Fu ◽

Tao Yang ◽

Dawei Zhang ◽

Xiaogang Li ◽

...

Keyword(s):

Genetic Algorithm ◽

Carbon Steel ◽

Atmospheric Corrosion ◽

Data Series ◽

Data Sets ◽

Content Type ◽

Term Prediction ◽

Long Term Prediction ◽

Corrosion Data

PurposeThis study aims to achieve long-term prediction on a specific monotonic data series of atmospheric corrosion rate vs time.Design/methodology/approachThis paper presents a new method, used to the collected corrosion data of carbon steel provided by the China Gateway to Corrosion and Protection, that combines non-linear gray Bernoulli model (NGBM(1,1) with genetic algorithm to attain the purpose of this study.FindingsResults of the experiments showed that the present study’s method is more accurate than other algorithms. In particular, the mean absolute percentage error (MAPE) and the root mean square error (RMSE) of the proposed method in data sets are 9.15 per cent and 1.23 µm/a, respectively. Furthermore, this study illustrates that model parameter can be used to evaluate the similarity of curve tendency between two carbon steel data sets.Originality/valueCorrosion data are part of a typical small-sample data set, and these also belong to a gray system because corrosion has a clear outcome and an uncertainly occurrence mechanism. In this work, a new gray forecast model was proposed to achieve the goal of long-term prediction of carbon steel in China.

Download Full-text

Hybrid System based on Rough Sets and Genetic Algorithms for Medical Data Classifications

International Journal of Fuzzy System Applications ◽

10.4018/ijfsa.2013100103 ◽

2013 ◽

Vol 3 (4) ◽

pp. 31-46 ◽

Cited By ~ 11

Author(s):

Hanaa Ismail Elshazly ◽

Ahmad Taher Azar ◽

Aboul Ella Hassanien ◽

Abeer Mohamed Elkorany

Keyword(s):

Genetic Algorithm ◽

Hybrid System ◽

Selection Process ◽

Decision Rules ◽

Medical Data ◽

Machine Learning Techniques ◽

Data Sets ◽

Biomedical Data ◽

Hybrid Techniques ◽

Learning Techniques

Computational intelligence provides the biomedical domain by a significant support. The application of machine learning techniques in medical applications have been evolved from the physician needs. Screening, medical images, pattern classification, prognosis are some examples of health care support systems. Typically medical data has its own characteristics such as huge size and features, continuous and real attributes that refer to patients' investigations. Therefore, discretization and feature selection process are considered a key issue in improving the extracted knowledge from patients' investigations records. In this paper, a hybrid system that integrates Rough Set (RS) and Genetic Algorithm (GA) is presented for the efficient classification of medical data sets of different sizes and dimensionalities. Genetic Algorithm is applied with the aim of reducing the dimension of medical datasets and RS decision rules were used for efficient classification. Furthermore, the proposed system applies the Entropy Gain Information (EI) for discretization process. Four biomedical data sets are tested by the proposed system (EI-GA-RS), and the highest score was obtained through three different datasets. Other different hybrid techniques shared the proposed technique the highest accuracy but the proposed system preserves its place as one of the highest results systems four three different sets. EI as discretization technique also is a common part for the best results in the mentioned datasets while RS as an evaluator realized the best results in three different data sets.

Download Full-text