scholarly journals Global atlas tree of natural proteins based on sorted composition vectors

2019 ◽  
Author(s):  
Pu Tian

AbstractSequence comparison is the cornerstone of bioinformatics and is traditionally realized by alignment. Unfortunately, exponential computational complexity renders rigorous multiple sequence alignment (MSA) intractable. Approximate algorithms and heuristics provide acceptable performance for relatively small number of sequences but engender prohibitive computational cost and unbounded accumulation of error for massive sequence sets. Alignment free algorithms achieved linear computational cost for sequence pair comparison but the challenge for multiple sequence comparison (MSC) remains. Meanwhile, various number of parameters and procedures need to be empirically adjusted for different MSC tasks with their complex interactions and impact not well understood. Therefore, development of efficient and nonparametric global sequence comparison method is essential for explosive sequencing data. It is shown here that sorted composition vector (SCV), which is based on a physical perspective on sequence composition constraint, is a feasible non-parametric encoding scheme for global protein sequence comparison and classification with linear computational complexity, and provides a global atlas tree for natural protein sequences. This finding renders massive sequence comparison and classification, which is infeasible on supercomputers, routine on a workstation. SCV sets an example of one-way encoding that might revolutionize recognition and classification tasks in general.

1993 ◽  
Vol 55 (2) ◽  
pp. 465-486 ◽  
Author(s):  
A WONG ◽  
S CHAN ◽  
D CHIU

1993 ◽  
Vol 55 (2) ◽  
pp. 465-486 ◽  
Author(s):  
A. K. C. Wong ◽  
S. C. Chan ◽  
D. K. Y. Chiu

2005 ◽  
Vol 15 (3) ◽  
pp. 254-260 ◽  
Author(s):  
William R Pearson ◽  
Michael L Sierk

2021 ◽  
Vol 13 (1) ◽  
Author(s):  
Dejun Jiang ◽  
Zhenxing Wu ◽  
Chang-Yu Hsieh ◽  
Guangyong Chen ◽  
Ben Liao ◽  
...  

AbstractGraph neural networks (GNN) has been considered as an attractive modelling method for molecular property prediction, and numerous studies have shown that GNN could yield more promising results than traditional descriptor-based methods. In this study, based on 11 public datasets covering various property endpoints, the predictive capacity and computational efficiency of the prediction models developed by eight machine learning (ML) algorithms, including four descriptor-based models (SVM, XGBoost, RF and DNN) and four graph-based models (GCN, GAT, MPNN and Attentive FP), were extensively tested and compared. The results demonstrate that on average the descriptor-based models outperform the graph-based models in terms of prediction accuracy and computational efficiency. SVM generally achieves the best predictions for the regression tasks. Both RF and XGBoost can achieve reliable predictions for the classification tasks, and some of the graph-based models, such as Attentive FP and GCN, can yield outstanding performance for a fraction of larger or multi-task datasets. In terms of computational cost, XGBoost and RF are the two most efficient algorithms and only need a few seconds to train a model even for a large dataset. The model interpretations by the SHAP method can effectively explore the established domain knowledge for the descriptor-based models. Finally, we explored use of these models for virtual screening (VS) towards HIV and demonstrated that different ML algorithms offer diverse VS profiles. All in all, we believe that the off-the-shelf descriptor-based models still can be directly employed to accurately predict various chemical endpoints with excellent computability and interpretability.


2016 ◽  
Vol 06 (02) ◽  
pp. 33-40 ◽  
Author(s):  
Jayanta Pal ◽  
Soumen Ghosh ◽  
Bansibadan Maji ◽  
Dilip Kumar Bhattacharya

2020 ◽  
Vol 34 (02) ◽  
pp. 1830-1837 ◽  
Author(s):  
Robert Bredereck ◽  
Jiehua Chen ◽  
Dušan Knop ◽  
Junjie Luo ◽  
Rolf Niedermeier

Adaptivity to changing environments and constraints is key to success in modern society. We address this by proposing “incrementalized versions” of Stable Marriage and Stable Roommates. That is, we try to answer the following question: for both problems, what is the computational cost of adapting an existing stable matching after some of the preferences of the agents have changed. While doing so, we also model the constraint that the new stable matching shall be not too different from the old one. After formalizing these incremental versions, we provide a fairly comprehensive picture of the computational complexity landscape of Incremental Stable Marriage and Incremental Stable Roommates. To this end, we exploit the parameters “degree of change” both in the input (difference between old and new preference profile) and in the output (difference between old and new stable matching). We obtain both hardness and tractability results, in particular showing a fixed-parameter tractability result with respect to the parameter “distance between old and new stable matching”.


Author(s):  
Awder Mohammed Ahmed ◽  
◽  
Adnan Mohsin Abdulazeez ◽  

Multi-label classification addresses the issues that more than one class label assigns to each instance. Many real-world multi-label classification tasks are high-dimensional due to digital technologies, leading to reduced performance of traditional multi-label classifiers. Feature selection is a common and successful approach to tackling this problem by retaining relevant features and eliminating redundant ones to reduce dimensionality. There is several feature selection that is successfully applied in multi-label learning. Most of those features are wrapper methods that employ a multi-label classifier in their processes. They run a classifier in each step, which requires a high computational cost, and thus they suffer from scalability issues. Filter methods are introduced to evaluate the feature subsets using information-theoretic mechanisms instead of running classifiers to deal with this issue. Most of the existing researches and review papers dealing with feature selection in single-label data. While, recently multi-label classification has a wide range of real-world applications such as image classification, emotion analysis, text mining, and bioinformatics. Moreover, researchers have recently focused on applying swarm intelligence methods in selecting prominent features of multi-label data. To the best of our knowledge, there is no review paper that reviews swarm intelligence-based methods for multi-label feature selection. Thus, in this paper, we provide a comprehensive review of different swarm intelligence and evolutionary computing methods of feature selection presented for multi-label classification tasks. To this end, in this review, we have investigated most of the well-known and state-of-the-art methods and categorize them based on different perspectives. We then provided the main characteristics of the existing multi-label feature selection techniques and compared them analytically. We also introduce benchmarks, evaluation measures, and standard datasets to facilitate research in this field. Moreover, we performed some experiments to compare existing works, and at the end of this survey, some challenges, issues, and open problems of this field are introduced to be considered by researchers in the future.


Sign in / Sign up

Export Citation Format

Share Document