scholarly journals Orthogonal linear separation analysis: an approach to decompose the complex effects of a perturbagen

2018 ◽  
Author(s):  
Tadahaya Mizuno ◽  
Setsuo Kinoshita ◽  
Shotaro Maedera ◽  
Takuya Ito ◽  
Hiroyuki Kusuhara

AbstractDrugs have multiple, not single, effects. Decomposition of drug effects into basic components helps us to understand the pharmacological properties of a drug and contributes to drug discovery. We have extended factor analysis and developed a novel profile data analysis method, orthogonal linear separation analysis (OLSA). OLSA contracted 11,911 genes to 118 factors from transcriptome data of MCF7 cells treated with 318 compounds in Connectivity Map. Ontology of the main genes constituting the factors detected significant enrichment of the ontology in 65 of 118 factors and similar results were obtained in two other data sets. One factor discriminated two Hsp90 inhibitors, geldanamycin and radicicol, while clustering analysis could not. Doxorubicin was estimated to inhibit Na+/K+ATPase, one of the suggested mechanisms of doxorubicin-induced cardiotoxicity. Based on the factor including PI3K/AKT/mTORC1 inhibition activity, 5 compounds were predicted to be novel autophagy inducers, and other analysis including western blotting revealed that 4 of the 5 actually induced autophagy. These findings indicate the potential of OLSA to decompose the effects of a drug and identify its basic components. (<175 words)


2020 ◽  
Vol 6 ◽  
Author(s):  
Jaime de Miguel Rodríguez ◽  
Maria Eugenia Villafañe ◽  
Luka Piškorec ◽  
Fernando Sancho Caparrini

Abstract This work presents a methodology for the generation of novel 3D objects resembling wireframes of building types. These result from the reconstruction of interpolated locations within the learnt distribution of variational autoencoders (VAEs), a deep generative machine learning model based on neural networks. The data set used features a scheme for geometry representation based on a ‘connectivity map’ that is especially suited to express the wireframe objects that compose it. Additionally, the input samples are generated through ‘parametric augmentation’, a strategy proposed in this study that creates coherent variations among data by enabling a set of parameters to alter representative features on a given building type. In the experiments that are described in this paper, more than 150 k input samples belonging to two building types have been processed during the training of a VAE model. The main contribution of this paper has been to explore parametric augmentation for the generation of large data sets of 3D geometries, showcasing its problems and limitations in the context of neural networks and VAEs. Results show that the generation of interpolated hybrid geometries is a challenging task. Despite the difficulty of the endeavour, promising advances are presented.



2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Tadahaya Mizuno ◽  
Setsuo Kinoshita ◽  
Takuya Ito ◽  
Shotaro Maedera ◽  
Hiroyuki Kusuhara




2014 ◽  
Vol 11 (4) ◽  
pp. 597-608
Author(s):  
Dragan Antic ◽  
Miroslav Milovanovic ◽  
Stanisa Peric ◽  
Sasa Nikolic ◽  
Marko Milojkovic

The aim of this paper is to present a method for neural network input parameters selection and preprocessing. The purpose of this network is to forecast foreign exchange rates using artificial intelligence. Two data sets are formed for two different economic systems. Each system is represented by six categories with 70 economic parameters which are used in the analysis. Reduction of these parameters within each category was performed by using the principal component analysis method. Component interdependencies are established and relations between them are formed. Newly formed relations were used to create input vectors of a neural network. The multilayer feed forward neural network is formed and trained using batch training. Finally, simulation results are presented and it is concluded that input data preparation method is an effective way for preprocessing neural network data.



Author(s):  
Junjie Wu ◽  
Jian Chen ◽  
Hui Xiong

Cluster analysis (Jain & Dubes, 1988) provides insight into the data by dividing the objects into groups (clusters), such that objects in a cluster are more similar to each other than objects in other clusters. Cluster analysis has long played an important role in a wide variety of fields, such as psychology, bioinformatics, pattern recognition, information retrieval, machine learning, and data mining. Many clustering algorithms, such as K-means and Unweighted Pair Group Method with Arithmetic Mean (UPGMA), have been wellestablished. A recent research focus on clustering analysis is to understand the strength and weakness of various clustering algorithms with respect to data factors. Indeed, people have identified some data characteristics that may strongly affect clustering analysis including high dimensionality and sparseness, the large size, noise, types of attributes and data sets, and scales of attributes (Tan, Steinbach, & Kumar, 2005). However, further investigation is expected to reveal whether and how the data distributions can have the impact on the performance of clustering algorithms. Along this line, we study clustering algorithms by answering three questions: 1. What are the systematic differences between the distributions of the resultant clusters by different clustering algorithms? 2. How can the distribution of the “true” cluster sizes make impact on the performances of clustering algorithms? 3. How to choose an appropriate clustering algorithm in practice? The answers to these questions can guide us for the better understanding and the use of clustering methods. This is noteworthy, since 1) in theory, people seldom realized that there are strong relationships between the clustering algorithms and the cluster size distributions, and 2) in practice, how to choose an appropriate clustering algorithm is still a challenging task, especially after an algorithm boom in data mining area. This chapter thus tries to fill this void initially. To this end, we carefully select two widely used categories of clustering algorithms, i.e., K-means and Agglomerative Hierarchical Clustering (AHC), as the representative algorithms for illustration. In the chapter, we first show that K-means tends to generate the clusters with a relatively uniform distribution on the cluster sizes. Then we demonstrate that UPGMA, one of the robust AHC methods, acts in an opposite way to K-means; that is, UPGMA tends to generate the clusters with high variation on the cluster sizes. Indeed, the experimental results indicate that the variations of the resultant cluster sizes by K-means and UPGMA, measured by the Coefficient of Variation (CV), are in the specific intervals, say [0.3, 1.0] and [1.0, 2.5] respectively. Finally, we put together K-means and UPGMA for a further comparison, and propose some rules for the better choice of the clustering schemes from the data distribution point of view.





2013 ◽  
Vol 13 (8) ◽  
pp. 1251-1255
Author(s):  
Yang Liu ◽  
Qin-Liang Li ◽  
Li-Yuan Dong ◽  
Bang-Chun Wen


2017 ◽  
Vol 8 (2) ◽  
pp. 30-43
Author(s):  
Mrutyunjaya Panda

The Big Data, due to its complicated and diverse nature, poses a lot of challenges for extracting meaningful observations. This sought smart and efficient algorithms that can deal with computational complexity along with memory constraints out of their iterative behavior. This issue may be solved by using parallel computing techniques, where a single machine or a multiple machine can perform the work simultaneously, dividing the problem into sub problems and assigning some private memory to each sub problems. Clustering analysis are found to be useful in handling such a huge data in the recent past. Even though, there are many investigations in Big data analysis are on, still, to solve this issue, Canopy and K-Means++ clustering are used for processing the large-scale data in shorter amount of time with no memory constraints. In order to find the suitability of the approach, several data sets are considered ranging from small to very large ones having diverse filed of applications. The experimental results opine that the proposed approach is fast and accurate.



Sign in / Sign up

Export Citation Format

Share Document