Alignment-Free Antimicrobial Peptide Predictors: Improving Performance by a Thorough Analysis of the Largest Available Data Set

Author(s):  
Sergio A. Pinacho-Castellanos ◽  
César R. García-Jacas ◽  
Michael K. Gilson ◽  
Carlos A. Brizuela
Author(s):  
Sergio A. Pinacho-Castellanos ◽  
César R. García-Jacas ◽  
Michael K. Gilson ◽  
Carlos A. Brizuela

2010 ◽  
Vol 08 (02) ◽  
pp. 181-198 ◽  
Author(s):  
RAJIB SENGUPTA ◽  
DHUNDY R. BASTOLA ◽  
HESHAM H. ALI

Restriction Fragment Length Polymorphism (RFLP) is a powerful molecular tool that is extensively used in the molecular fingerprinting and epidemiological studies of microorganisms. In a wet-lab setting, the DNA is cut with one or more restriction enzymes and subjected to gel electrophoresis to obtain signature fragment patterns, which is utilized in the classification and identification of organisms. This wet-lab approach may not be practical when the experimental data set includes a large number of genetic sequences and a wide pool of restriction enzymes to choose from. In this study, we introduce a novel concept of Enzyme Cut Order — a biological property-based characteristic of DNA sequences which can be defined and analyzed computationally without any alignment algorithm. In this alignment-free approach, a similarity matrix is developed based on the pairwise Longest Common Subsequences (LCS) of the Enzyme Cut Orders. The choice of an ideal set of restriction enzymes used for analysis is augmented by using genetic algorithms. The results obtained from this approach using internal transcribed spacer regions of rDNA from fungi as the target sequence show that the phylogenetically-related organisms form a single cluster and successful grouping of phylogenetically close or distant organisms is dependent on the choice of restriction enzymes used in the analysis. Additionally, comparison of trees obtained with this alignment-free and the legacy method revealed highly similar tree topologies. This novel alignment-free method, which utilizes the Enzyme Cut Order and restriction enzyme profile, is a reliable alternative to local or global alignment-based classification and identification of organisms.


2020 ◽  
Author(s):  
Marika Kaden ◽  
Katrin Sophie Bohnsack ◽  
Mirko Weber ◽  
Mateusz Kudła ◽  
Kaja Gutowska ◽  
...  

AbstractWe present an approach to investigate SARS-CoV-2 virus sequences based on alignment-free methods for RNA sequence comparison. In particular, we verify a given clustering result for the GISAID data set, which was obtained analyzing the molecular differences in coronavirus populations by phylogenetic trees. For this purpose, we use alignment-free dissimilarity measures for sequences and combine them with learning vector quantization classifiers for virus type discriminant analysis and classification. Those vector quantizers belong to the class of interpretable machine learning methods, which, on the one hand side provide additional knowledge about the classification decisions like discriminant feature correlations, and on the other hand can be equipped with a reject option. This option gives the model the property of self controlled evidence if applied to new data, i.e. the models refuses to make a classification decision, if the model evidence for the presented data is not given. After training such a classifier for the GISAID data set, we apply the obtained classifier model to another but unlabeled SARS-CoV-2 virus data set. On the one hand side, this allows us to assign new sequences to already known virus types and, on the other hand, the rejected sequences allow speculations about new virus types with respect to nucleotide base mutations in the viral sequences.Author summaryThe currently emerging global disease COVID-19 caused by novel SARS-CoV-2 viruses requires all scientific effort to investigate the development of the viral epidemy, the properties of the virus and its types. Investigations of the virus sequence are of special interest. Frequently, those are based on mathematical/statistical analysis. However, machine learning methods represent a promising alternative, if one focuses on interpretable models, i.e. those that do not act as black-boxes. Doing so, we apply variants of Learning Vector Quantizers to analyze the SARS-CoV-2 sequences. We encoded the sequences and compared them in their numerical representations to avoid the computationally costly comparison based on sequence alignments. Our resulting model is interpretable, robust, efficient, and has a self-controlling mechanism regarding the applicability to data. This framework was applied to two data sets concerning SARS-CoV-2. We were able to verify previously published virus type findings for one of the data sets by training our model to accurately identify the virus type of sequences. For sequences without virus type information (second data set), our trained model can predict them. Thereby, we observe a new scattered spreading of the sequences in the data space which probably is caused by mutations in the viral sequences.


2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Daniel Lichtblau

Abstract Background Alignment-free methods of genomic comparison offer the possibility of scaling to large data sets of nucleotide sequences comprised of several thousand or more base pairs. Such methods can be used for purposes of deducing “nearby” species in a reference data set, or for constructing phylogenetic trees. Results We describe one such method that gives quite strong results. We use the Frequency Chaos Game Representation (FCGR) to create images from such sequences, We then reduce dimension, first using a Fourier trig transform, followed by a Singular Values Decomposition (SVD). This gives vectors of modest length. These in turn are used for fast sequence lookup, construction of phylogenetic trees, and classification of virus genomic data. We illustrate the accuracy and scalability of this approach on several benchmark test sets. Conclusions The tandem of FCGR and dimension reductions using Fourier-type transforms and SVD provides a powerful approach for alignment-free genomic comparison. Results compare favorably and often surpass best results reported in prior literature. Good scalability is also observed.


2014 ◽  
Vol 2014 ◽  
pp. 1-9 ◽  
Author(s):  
Yen-Wei Chen ◽  
So Sasatani ◽  
Xian-Hua Han

Face hallucination is one of learning-based super resolution techniques, which is focused on resolution enhancement of facial images. Though face hallucination is a powerful and useful technique, some detailed high-frequency components cannot be recovered. It also needs accurate alignment between training samples. In this paper, we propose a high-frequency compensation framework based on residual images for face hallucination method in order to improve the reconstruction performance. The basic idea of proposed framework is to reconstruct or estimate a residual image, which can be used to compensate the high-frequency components of the reconstructed high-resolution image. Three approaches based on our proposed framework are proposed. We also propose a patch-based alignment-free face hallucination. In the patch-based face hallucination, we first segment facial images into overlapping patches and construct training patch pairs. For an input low-resolution (LR) image, the overlapping patches are also used to obtain the corresponding high-resolution (HR) patches by face hallucination. The whole HR image can then be reconstructed by combining all of the HR patches. Experimental results show that the high-resolution images obtained using our proposed approaches can improve the quality of those obtained by conventional face hallucination method even if the training data set is unaligned.


1994 ◽  
Vol 144 ◽  
pp. 139-141 ◽  
Author(s):  
J. Rybák ◽  
V. Rušin ◽  
M. Rybanský

AbstractFe XIV 530.3 nm coronal emission line observations have been used for the estimation of the green solar corona rotation. A homogeneous data set, created from measurements of the world-wide coronagraphic network, has been examined with a help of correlation analysis to reveal the averaged synodic rotation period as a function of latitude and time over the epoch from 1947 to 1991.The values of the synodic rotation period obtained for this epoch for the whole range of latitudes and a latitude band ±30° are 27.52±0.12 days and 26.95±0.21 days, resp. A differential rotation of green solar corona, with local period maxima around ±60° and minimum of the rotation period at the equator, was confirmed. No clear cyclic variation of the rotation has been found for examinated epoch but some monotonic trends for some time intervals are presented.A detailed investigation of the original data and their correlation functions has shown that an existence of sufficiently reliable tracers is not evident for the whole set of examinated data. This should be taken into account in future more precise estimations of the green corona rotation period.


Author(s):  
Jules S. Jaffe ◽  
Robert M. Glaeser

Although difference Fourier techniques are standard in X-ray crystallography it has only been very recently that electron crystallographers have been able to take advantage of this method. We have combined a high resolution data set for frozen glucose embedded Purple Membrane (PM) with a data set collected from PM prepared in the frozen hydrated state in order to visualize any differences in structure due to the different methods of preparation. The increased contrast between protein-ice versus protein-glucose may prove to be an advantage of the frozen hydrated technique for visualizing those parts of bacteriorhodopsin that are embedded in glucose. In addition, surface groups of the protein may be disordered in glucose and ordered in the frozen state. The sensitivity of the difference Fourier technique to small changes in structure provides an ideal method for testing this hypothesis.


Author(s):  
D. E. Becker

An efficient, robust, and widely-applicable technique is presented for computational synthesis of high-resolution, wide-area images of a specimen from a series of overlapping partial views. This technique can also be used to combine the results of various forms of image analysis, such as segmentation, automated cell counting, deblurring, and neuron tracing, to generate representations that are equivalent to processing the large wide-area image, rather than the individual partial views. This can be a first step towards quantitation of the higher-level tissue architecture. The computational approach overcomes mechanical limitations, such as hysterisis and backlash, of microscope stages. It also automates a procedure that is currently done manually. One application is the high-resolution visualization and/or quantitation of large batches of specimens that are much wider than the field of view of the microscope.The automated montage synthesis begins by computing a concise set of landmark points for each partial view. The type of landmarks used can vary greatly depending on the images of interest. In many cases, image analysis performed on each data set can provide useful landmarks. Even when no such “natural” landmarks are available, image processing can often provide useful landmarks.


Author(s):  
Jaap Brink ◽  
Wah Chiu

Crotoxin complex is the principal neurotoxin of the South American rattlesnake, Crotalus durissus terrificus and has a molecular weight of 24 kDa. The protein is a heterodimer with subunit A assigneda chaperone function. Subunit B carries the lethal activity, which is exerted on both sides ofthe neuro-muscular junction, and which is thought to involve binding to the acetylcholine receptor. Insight in crotoxin complex’ mode of action can be gained from a 3 Å resolution structure obtained by electron crystallography. This abstract communicates our progress in merging the electron diffraction amplitudes into a 3-dimensional (3D) intensity data set close to completion. Since the thickness of crotoxin complex crystals varies from one crystal to the other, we chose to collect tilt series of electron diffraction patterns after determining their thickness. Furthermore, by making use of the symmetry present in these tilt data, intensities collected only from similar crystals will be merged.Suitable crystals of glucose-embedded crotoxin complex were searched for in the defocussed diffraction mode with the goniometer tilted to 55° of higher in a JEOL4000 electron cryo-microscopc operated at 400 kV with the crystals kept at -120°C in a Gatan 626 cryo-holder. The crystal thickness was measured using the local contrast of the crystal relative to the supporting film from search-mode images acquired using a 1024 x 1024 slow-scan CCD camera (model 679, Gatan Inc.).


Author(s):  
J. K. Samarabandu ◽  
R. Acharya ◽  
D. R. Pareddy ◽  
P. C. Cheng

In the study of cell organization in a maize meristem, direct viewing of confocal optical sections in 3D (by means of 3D projection of the volumetric data set, Figure 1) becomes very difficult and confusing because of the large number of nucleus involved. Numerical description of the cellular organization (e.g. position, size and orientation of each structure) and computer graphic presentation are some of the solutions to effectively study the structure of such a complex system. An attempt at data-reduction by means of manually contouring cell nucleus in 3D was reported (Summers et al., 1990). Apart from being labour intensive, this 3D digitization technique suffers from the inaccuracies of manual 3D tracing related to the depth perception of the operator. However, it does demonstrate that reducing stack of confocal images to a 3D graphic representation helps to visualize and analyze complex tissues (Figure 2). This procedure also significantly reduce computational burden in an interactive operation.


Sign in / Sign up

Export Citation Format

Share Document