scholarly journals Deep forest ensemble learning for classification of alignments of non-coding RNA sequences based on multi-view structure representations

Author(s):  
Ying Li ◽  
Qi Zhang ◽  
Zhaoqian Liu ◽  
Cankun Wang ◽  
Siyu Han ◽  
...  

Abstract Non-coding RNAs (ncRNAs) play crucial roles in multiple biological processes. However, only a few ncRNAs’ functions have been well studied. Given the significance of ncRNAs classification for understanding ncRNAs’ functions, more and more computational methods have been introduced to improve the classification automatically and accurately. In this paper, based on a convolutional neural network and a deep forest algorithm, multi-grained cascade forest (GcForest), we propose a novel deep fusion learning framework, GcForest fusion method (GCFM), to classify alignments of ncRNA sequences for accurate clustering of ncRNAs. GCFM integrates a multi-view structure feature representation including sequence-structure alignment encoding, structure image representation and shape alignment encoding of structural subunits, enabling us to capture the potential specificity between ncRNAs. For the classification of pairwise alignment of two ncRNA sequences, the F-value of GCFM improves 6% than an existing alignment-based method. Furthermore, the clustering of ncRNA families is carried out based on the classification matrix generated from GCFM. Results suggest better performance (with 20% accuracy improved) than existing ncRNA clustering methods (RNAclust, Ensembleclust and CNNclust). Additionally, we apply GCFM to construct a phylogenetic tree of ncRNA and predict the probability of interactions between RNAs. Most ncRNAs are located correctly in the phylogenetic tree, and the prediction accuracy of RNA interaction is 90.63%. A web server (http://bmbl.sdstate.edu/gcfm/) is developed to maximize its availability, and the source code and related data are available at the same URL.

2021 ◽  
Vol 11 (9) ◽  
pp. 3974
Author(s):  
Laila Bashmal ◽  
Yakoub Bazi ◽  
Mohamad Mahmoud Al Rahhal ◽  
Haikel Alhichri ◽  
Naif Al Ajlan

In this paper, we present an approach for the multi-label classification of remote sensing images based on data-efficient transformers. During the training phase, we generated a second view for each image from the training set using data augmentation. Then, both the image and its augmented version were reshaped into a sequence of flattened patches and then fed to the transformer encoder. The latter extracts a compact feature representation from each image with the help of a self-attention mechanism, which can handle the global dependencies between different regions of the high-resolution aerial image. On the top of the encoder, we mounted two classifiers, a token and a distiller classifier. During training, we minimized a global loss consisting of two terms, each corresponding to one of the two classifiers. In the test phase, we considered the average of the two classifiers as the final class labels. Experiments on two datasets acquired over the cities of Trento and Civezzano with a ground resolution of two-centimeter demonstrated the effectiveness of the proposed model.


2012 ◽  
Vol 38 ◽  
pp. 1362-1366 ◽  
Author(s):  
Gayatri Mahapatro ◽  
Debahuti Mishra ◽  
Kailash Shaw ◽  
Sashikala Mishra ◽  
Tanushree Jena

Author(s):  
Jing Tao ◽  
Suiran Yu

LCA predicts the life cycle impacts of product solutions and can help determine what solution is better for the environment. However, LCA is very data dependent and requires in-depth knowledge to explicit relate environmental impacts of product to its design attributes. Current LCA methods are generally still not adapted to designers, who often lack the expertise and time to make LCA efficiently useful to their daily work. This study aims to develop a LCA module integrated with CAD system for machined products. The module employs a feature-based approach for identify, extract and convert life cycle related data in existing product models for LCA modeling and analysis. A coding system for machining feature representation and a rule-based reasoning package to generate manufacturing plans based on feature codes are developed to enable convenient eco-assessment along with CAD modeling of machined products. A step shaft LCA case study is presented to demonstrate the proposed approach.


Proceedings ◽  
2019 ◽  
Vol 31 (1) ◽  
pp. 29 ◽  
Author(s):  
Sebastian Matthias Müller ◽  
Andreas Hein

To enable independent living for people in need of care and to accommodate the increasing demand of ambulant care due to demographic changes, a multitude of systems and applications that monitor activities and health-related data based on ambient sensors commonly found in smart homes have been developed. When such a system is used in a multi-person household, some form of identification or separation of residents is required. Most of these systems require permanent participation in the form of body-worn sensors or a complicated supervised learning procedure which may take hours or days to set up. To resolve this, we study several unsupervised learning approaches for the separation of activity data of multiple residents recorded with ambient, binary sensors such as light barriers and contact switches. We show how various clustering methods on data from a tracking system can, under optimal conditions, separate the activity of two residents with low error rates (<2%, Rand Index of 0 . 959 ). We also show that imprecisions in the underlying tracking algorithm have a significant impact on the clustering performance and that most of these errors can be corrected by adding a single “identifying sensor area” into the environment. As a consequence, activity monitoring applications need to rely less on body-worn sensors, which may be forgotten or biometric sensors, which may be perceived as a violation of privacy.


Energies ◽  
2020 ◽  
Vol 13 (7) ◽  
pp. 1723
Author(s):  
Hyun Cheol Jeong ◽  
Jaesung Jung ◽  
Byung O Kang

This study proposes a methodology to develop adaptive operational strategies of customer-installed Energy Storage Systems (ESS) based on the classification of customer load profiles. In addition, this study proposes a methodology to characterize and classify customer load profiles based on newly proposed Time-of-Use (TOU) indices. The TOU indices effectively distribute daily customer load profiles on multi-dimensional domains, indicating customer energy consumption patterns under the TOU tariff. The K-means and Self-Organizing Map (SOM) sophisticated clustering methods were applied for classification. Furthermore, this study demonstrates peak shaving and arbitrage operations of ESS with current supporting polices in South Korea. Actual load profiles accumulated from customers under the TOU rate were used to validate the proposed methodologies. The simulation results show that the TOU index-based clustering effectively classifies load patterns into ‘M-shaped’ and ‘square wave-shaped’ load patterns. In addition, the feasibility analysis results suggest different ESS operational strategies for different load patterns: the ‘M-shaped’ pattern fixes a 2-cycle operation per day due to battery life, while the ‘square wave-shaped’ pattern maximizes its operational cycle (a 3-cycle operation during the winter) for the highest profits.


1990 ◽  
Vol 55 (3) ◽  
pp. 644-652 ◽  
Author(s):  
Oldřich Pytela

The paper presents a classification of 51 solvents based on clustering in three-dimensional space formed by the empirical scale of PAC, PBC, and PPC parameters designed for interpretation of solvent effect on a model with cross-terms. For the classification used are the clustering methods of the nearest neighbour, of the furthest neighbour, of average bond, and the centroid method. As a result, the solvents have been divided into 8 classes denoted as: I - nonpolar-inert solvents (aliphatic hydrocarbons), IIp - nonpolar-polarizable (aromatic hydrocarbons, tetrachloromethane, carbon disulphide), IIb - nonpolar-basic (ethers, triethylamine), IIIp - little polar-polarizable (aliphatic halogen derivatives, substituted benzenes with heteroatom-containing substituents), IIIb - little polar-basic (cyclic ethers, ketones, esters, pyridine), IVa - polar-aprotic (acetanhydride, dialkylamides, acetonitrile, nitromethane, dimethyl sulfoxide, sulfolane), IVp - polar-protic (alcohols, acetic acid), and V - exceptional solvents (water, formamide, glycol, hexamethylphosphoric triamide). The information content of the individual parameters used for the classification has been determined. The classification is based primarily on solvent polarity/acidity (PAC), less on polarity/basicity (PBC), and the least on polarity/polarizability (PPC). Causal relation between chemical structure of solvent and its effect on the process taking place therein has been established.


2005 ◽  
Vol 49 (7) ◽  
pp. 2778-2784 ◽  
Author(s):  
Gianpiero Garau ◽  
Anne Marie Di Guilmi ◽  
Barry G. Hall

ABSTRACTThe metallo-β-lactamases fall into two groups: Ambler class B subgroups B1 and B2 and Ambler class B subgroup B3. The two groups are so distantly related that there is no detectable sequence homology between members of the two different groups, but homology is clearly detectable at the protein structure level. The multiple structure alignment program MAPS has been used to align the structures of eight metallo-β-lactamases and five structurally homologous proteins from the metallo-β-lactamase superfamily, and that alignment has been used to construct a phylogenetic tree of the metallo-β-lactamases. The presence of genes fromEubacteria,Archaebacteria, andEukaryotaon that tree is consistent with a very ancient origin of the metallo-β-lactamase family.


10.12737/7483 ◽  
2014 ◽  
Vol 8 (7) ◽  
pp. 0-0
Author(s):  
Олег Сдвижков ◽  
Oleg Sdvizhkov

Cluster analysis [3] is a relatively new branch of mathematics that studies the methods partitioning a set of objects, given a finite set of attributes into homogeneous groups (clusters). Cluster analysis is widely used in psychology, sociology, economics (market segmentation), and many other areas in which there is a problem of classification of objects according to their characteristics. Clustering methods implemented in a package STATISTICA [1] and SPSS [2], they return the partitioning into clusters, clustering and dispersion statistics dendrogram of hierarchical clustering algorithms. MS Excel Macros for main clustering methods and application examples are given in the monograph [5]. One of the central problems of cluster analysis is to define some criteria for the number of clusters, we denote this number by K, into which separated are a given set of objects. There are several dozen approaches [4] to determine the number K. In particular, according to [6], the number of clusters K - minimum number which satisfies where - the minimum value of total dispersion for partitioning into K clusters, N - number of objects. Among the clusters automatically causes the consistent application of abnormal clusters [4]. In 2010, proposed and experimentally validated was a method for obtaining the number of K by applying the density function [4]. The article offers two simple approaches to determining K, where each cluster has at least two objects. In the first number K is determined by the shortest Hamiltonian cycles in the second - through the minimum spanning tree. The examples of clustering with detailed step by step solutions and graphic illustrations are suggested. Shown is the use of macro VBA Excel, which returns the minimum spanning tree to the problems of clustering. The article contains a macro code, with commentaries to the main unit.


Sign in / Sign up

Export Citation Format

Share Document