COMPARING VIRUS CLASSIFICATION USING GENOMIC MATERIALS ACCORDING TO DIFFERENT TAXONOMIC LEVELS

2013 ◽  
Vol 11 (06) ◽  
pp. 1343003 ◽  
Author(s):  
JING-DOO WANG

In this paper, three genomic materials — DNA sequences, protein sequences, and regions (domains) are used to compare methods of virus classification. Virus classes (categories) are divided by various taxonomic level of virus into three datasets for 6 order, 42 family, and 33 genera. To increase the robustness and comparability of experimental results of virus classification, the classes are selected that contain at least 10 instances, and meanwhile each instance contains at least one region name. Experimental results show that the approach using region names achieved the best accuracies — reaching 99.9%, 97.3%, and 99.0% for 6 orders, 42 families, and 33 genera, respectively. This paper not only involves exhaustive experiments that compare virus classifications using different genomic materials, but also proposes a novel approach to biological classification based on molecular biology instead of traditional morphology.

A precision and efficiency model of the similarity computing of texts plays an important key of duplicate documents detection. In this paper, we focus on presenting and evaluating documents similarity based on a new method viaen coding text into unique strings, called Deoxyribo Nucleic Acid (DNA) sequences. Additionally, the proposed method including an algorithm for marking as well as coloring similar paragraphs in the test document compared to other documents available in the data warehouse and developing a system for copy detection are investigated. Experimental results show that this novel approach is highly accurate for areal dataset taken from PAN. The results corroborate the advantages of the novel approach with average of 99%accuracyfor the text similarity detection with a selection threshold of ε=10-12.The results of this study are applied to implement a practical system for evaluating documents similarity at the University of Danang, Vietnam


Author(s):  
Yanping Zhang ◽  
Pengcheng Chen ◽  
Ya Gao ◽  
Jianwei Ni ◽  
Xiaosheng Wang

Aim and Objective:: Given the rapidly increasing number of molecular biology data available, computational methods of low complexity are necessary to infer protein structure, function, and evolution. Method:: In the work, we proposed a novel mthod, FermatS, which based on the global position information and local position representation from the curve and normalized moments of inertia, respectively, to extract features information of protein sequences. Furthermore, we use the generated features by FermatS method to analyze the similarity/dissimilarity of nine ND5 proteins and establish the prediction model of DNA-binding proteins based on logistic regression with 5-fold crossvalidation. Results:: In the similarity/dissimilarity analysis of nine ND5 proteins, the results are consistent with evolutionary theory. Moreover, this method can effectively predict the DNA-binding proteins in realistic situations. Conclusion:: The findings demonstrate that the proposed method is effective for comparing, recognizing and predicting protein sequences. The main code and datasets can download from https://github.com/GaoYa1122/FermatS.


2021 ◽  
Vol 40 (1) ◽  
pp. 551-563
Author(s):  
Liqiong Lu ◽  
Dong Wu ◽  
Ziwei Tang ◽  
Yaohua Yi ◽  
Faliang Huang

This paper focuses on script identification in natural scene images. Traditional CNNs (Convolution Neural Networks) cannot solve this problem perfectly for two reasons: one is the arbitrary aspect ratios of scene images which bring much difficulty to traditional CNNs with a fixed size image as the input. And the other is that some scripts with minor differences are easily confused because they share a subset of characters with the same shapes. We propose a novel approach combing Score CNN, Attention CNN and patches. Attention CNN is utilized to determine whether a patch is a discriminative patch and calculate the contribution weight of the discriminative patch to script identification of the whole image. Score CNN uses a discriminative patch as input and predict the score of each script type. Firstly patches with the same size are extracted from the scene images. Secondly these patches are used as inputs to Score CNN and Attention CNN to train two patch-level classifiers. Finally, the results of multiple discriminative patches extracted from the same image via the above two classifiers are fused to obtain the script type of this image. Using patches with the same size as inputs to CNN can avoid the problems caused by arbitrary aspect ratios of scene images. The trained classifiers can mine discriminative patches to accurately identify some confusing scripts. The experimental results show the good performance of our approach on four public datasets.


2016 ◽  
Vol 09 (03) ◽  
pp. 1650043 ◽  
Author(s):  
Haolin Wu ◽  
Jie Yang ◽  
Haibiao Chen ◽  
Feng Pan

Preferentially etching either carbon or silica from silicon oxycarbide (SiOC) created a porous network as an inverse image of the removed phase. The porous structure was analyzed by gas adsorption, and the experimental results verified the nanodomain structure of SiOC. This work demonstrated a novel approach for analyzing materials containing nanocomposite structures.


2016 ◽  
Vol 2016 ◽  
pp. 1-9 ◽  
Author(s):  
Ji-Yong An ◽  
Fan-Rong Meng ◽  
Zhu-Hong You ◽  
Yu-Hong Fang ◽  
Yu-Jun Zhao ◽  
...  

We propose a novel computational method known as RVM-LPQ that combines the Relevance Vector Machine (RVM) model and Local Phase Quantization (LPQ) to predict PPIs from protein sequences. The main improvements are the results of representing protein sequences using the LPQ feature representation on a Position Specific Scoring Matrix (PSSM), reducing the influence of noise using a Principal Component Analysis (PCA), and using a Relevance Vector Machine (RVM) based classifier. We perform 5-fold cross-validation experiments onYeastandHumandatasets, and we achieve very high accuracies of 92.65% and 97.62%, respectively, which is significantly better than previous works. To further evaluate the proposed method, we compare it with the state-of-the-art support vector machine (SVM) classifier on theYeastdataset. The experimental results demonstrate that our RVM-LPQ method is obviously better than the SVM-based method. The promising experimental results show the efficiency and simplicity of the proposed method, which can be an automatic decision support tool for future proteomics research.


2014 ◽  
Vol 2014 ◽  
pp. 1-8 ◽  
Author(s):  
Chun-Hui Wu ◽  
Chia-Wei Chen ◽  
Long-Sheng Kuo ◽  
Ping-Hei Chen

A novel approach was proposed to measure the hydraulic capacitance of a microfluidic membrane pump. Membrane deflection equations were modified from various studies to propose six theoretical equations to estimate the hydraulic capacitance of a microfluidic membrane pump. Thus, measuring the center deflection of the membrane allows the corresponding pressure and hydraulic capacitance of the pump to be determined. This study also investigated how membrane thickness affected the Young’s modulus of a polydimethylsiloxane (PDMS) membrane. Based on the experimental results, a linear correlation was proposed to estimate the hydraulic capacitance. The measured hydraulic capacitance data and the proposed equations in the linear and nonlinear regions qualitatively exhibited good agreement.


2014 ◽  
Vol 8 (1) ◽  
pp. 166-170 ◽  
Author(s):  
Jia Wang ◽  
Shuai Liu ◽  
Weina Fu

The formation and precise positioning of nucleosome in chromatin occupies a very important role in studying life process. Today, there are many researchers who discovered that the positioning where the location of a DNA sequence fragment wraps around a histone octamer in genome is not random but regular. However, the positioning is closely relevant to the concrete sequence of core DNA. So in this paper, we analyzed the relation between the affinity and sequence structure of core DNA, and extracted the set of key positions. In these positions, the nucleotide sequences probably occupy mainly action in the binding. First, we simplified and formatted the experimental data with the affinity. Then, to find the key positions in the wrapping, we used neural network to analyze the positive and negative effects of nucleosome generation for each position in core DNA sequences. However, we reached a class of weights with every position to describe this effect. Finally, based on the positions with high weights, we analyzed the reason why the chosen positions are key positions, and used these positions to construct a model for nucleosome positioning prediction. Experimental results show the effectiveness of our method.


2021 ◽  
pp. 1-56
Author(s):  
Brandon Prickett

Abstract Since Halle (1962), explicit algebraic variables (often called alpha notation) have been commonplace in phonological theory. However, Hayes and Wilson (2008) proposed a variable-free model of phonotactic learning, sparking a debate about whether such algebraic representations are necessary to capture human phonological acquisition. While past experimental work has found evidence that suggested a need for variables in models of phonology (Berent et al. 2012, Moreton 2012, Gallagher 2013), this paper presents a novel mechanism, Probabilistic Feature Attention (PFA), that allows a variable-free model of phonotactics to predict a number of these phenomena. Additionally, experimental results involving phonological generalization that cannot be explained by variables are captured by this novel approach. These results cast doubt on whether variables are necessary to capture human-like phonotactic learning and provide a useful alternative to such representations.


Viruses ◽  
2020 ◽  
Vol 12 (9) ◽  
pp. 1019
Author(s):  
Majid Forghani ◽  
Michael Khachay

Evaluation of the antigenic similarity degree between the strains of the influenza virus is highly important for vaccine production. The conventional method used to measure such a degree is related to performing the immunological assays of hemagglutinin inhibition. Namely, the antigenic distance between two strains is calculated on the basis of HI assays. Usually, such distances are visualized by using some kind of antigenic cartography method. The known drawback of the HI assay is that it is rather time-consuming and expensive. In this paper, we propose a novel approach for antigenic distance approximation based on deep learning in the feature spaces induced by hemagglutinin protein sequences and Convolutional Neural Networks (CNNs). To apply a CNN to compare the protein sequences, we utilize the encoding based on the physical and chemical characteristics of amino acids. By varying (hyper)parameters of the CNN architecture design, we find the most robust network. Further, we provide insight into the relationship between approximated antigenic distance and antigenicity by evaluating the network on the HI assay database for the H1N1 subtype. The results indicate that the best-trained network gives a high-precision approximation for the ground-truth antigenic distances, and can be used as a good exploratory tool in practical tasks.


Author(s):  
Judy C.R. Tseng ◽  
Wen-Ling Tsai ◽  
Gwo-Jen Hwang ◽  
Po-Han Wu

In developing traditional learning materials, quality is the key issue to be considered. However, for high technical e-training courses, not only the quality of the learning materials but also the efficiency of developing the courses needs to be taken into consideration. It is a challenging issue for experienced engineers to develop up-to-date e-training courses for inexperienced engineers before further new technologies are proposed. To cope with these problems, a concept relationship-oriented approach is proposed in this paper. A system for developing e-training courses has been implemented based on the novel approach. Experimental results showed that the novel approach can significantly shorten the time needed for developing e-training courses, such that engineers can receive up-to-date technologies in time.


Sign in / Sign up

Export Citation Format

Share Document