scholarly journals Molecular Cavity Topological Representation for Pattern Analysis: A NLP Analogy-Based Word2Vec Method

2019 ◽  
Vol 20 (23) ◽  
pp. 6019 ◽  
Author(s):  
Dongliang Guo ◽  
Qiaoqiao Wang ◽  
Meng Liang ◽  
Wei Liu ◽  
Junlan Nie

Cavity analysis in molecular dynamics is important for understanding molecular function. However, analyzing the dynamic pattern of molecular cavities remains a difficult task. In this paper, we propose a novel method to topologically represent molecular cavities by vectorization. First, a characterization of cavities is established through Word2Vec model, based on an analogy between the cavities and natural language processing (NLP) terms. Then, we use some techniques such as dimension reduction and clustering to conduct an exploratory analysis of the vectorized molecular cavity. On a real data set, we demonstrate that our approach is applicable to maintain the topological characteristics of the cavity and can find the change patterns from a large number of cavities.

1975 ◽  
Vol 26 (3) ◽  
pp. 467 ◽  
Author(s):  
PJ Robinson ◽  
RG Megarrity

Seed protein patterns of 182 Stylosanthes accessions, representing 16 species and two hybrids, were obtained by polyacrylamide gel electrophoresis of crude extracts. All species could be recognized by examination of photographs and densitometer traces of the gels. Within the species capitata, guyanensis, hamata and viscosa considerable variation occurred, whilst the variation in humilis, scabra and fruticosa was not as great. Data from the densitometer traces were analysed by various methods of pattern analysis and the resulting classifications compared. A variance-standardized Euclidean distance coefficient was found to be the similarity measure of choice, whilst selection of fusion strategy was not as critical.Species relationships obtained by using the chemical data were not in agreement with the accepted taxonomic division of the genus into the sections Styposanthes and Stylosanthes. A classification based on the complete data set was compared with a working classification based on morphological and agronomic data, which is used in the agronomic assessment of the genus. Only within S. scabra did the two classifications conform. Morphological–agronomic (M–A) types within the species hamata and subsericea could be distinguished by the examination of the fine structure of the densitometer traces, whilst groups based on protein data in the species ahumilis, guyanensis, fruticosa and viscosa did not correspond with M–A groups. The application of seed protein patterns as a rapid and inexpensive means of identifying introductions of the genus at the species level, as well as characterizing types within certain species, is proposed.


2021 ◽  
Vol 40 (7) ◽  
pp. 534-542
Author(s):  
Ricard Durall ◽  
Valentin Tschannen ◽  
Norman Ettrich ◽  
Janis Keuper

Interpreting seismic data requires the characterization of a number of key elements such as the position of faults and main reflections, presence of structural bodies, and clustering of areas exhibiting a similar amplitude versus angle response. Manual interpretation of geophysical data is often a difficult and time-consuming task, complicated by lack of resolution and presence of noise. In recent years, approaches based on convolutional neural networks have shown remarkable results in automating certain interpretative tasks. However, these state-of-the-art systems usually need to be trained in a supervised manner, and they suffer from a generalization problem. Hence, it is highly challenging to train a model that can yield accurate results on new real data obtained with different acquisition, processing, and geology than the data used for training. In this work, we introduce a novel method that combines generative neural networks with a segmentation task in order to decrease the gap between annotated training data and uninterpreted target data. We validate our approach on two applications: the detection of diffraction events and the picking of faults. We show that when transitioning from synthetic training data to real validation data, our workflow yields superior results compared to its counterpart without the generative network.


1997 ◽  
Vol 490 ◽  
Author(s):  
I. Vurgaftman ◽  
J. R. Meyer ◽  
C. A. Hoffman ◽  
D. Redfern ◽  
J. Antoszewski ◽  
...  

ABSTRACTWe discuss an improved quantitative mobility spectrum analysis (i-QMSA) of magnetic-field-dependent Hall and resistivity data, which can determine multiple electron and hole densities and mobilities. A fully automated computer implementation of i-QMSA is applied to a variety of synthetic and real data sets. The results show that the new algorithm increases the information available from a given data set and is suitable for use as a standard tool in the characterization of semiconductor materials and devices.


Author(s):  
CHUN-GUANG LI ◽  
JUN GUO ◽  
BO XIAO

In this paper, a novel method to estimate the intrinsic dimensionality of high-dimensional data set is proposed. Based on neighborhood information, our method calculates the non-negative locally linear reconstruction coefficients from its neighbors for each data point, and the numbers of those dominant positive reconstruction coefficients are regarded as a faithful guide to the intrinsic dimensionality of data set. The proposed method requires no parametric assumption on data distribution and is easy to implement in the general framework of manifold learning. Experimental results on several synthesized data sets and real data sets have shown the benefits of the proposed method.


Author(s):  
RENYAN JIANG ◽  
MING J. ZUO ◽  
D. N. P. MURTHY

In this paper, we study two sectional models, each involving two Weibull distributions. Characterization of the plot on Weibull plotting paper (WPP) for each model is carried out. We also study the shapes of the probability density and the failure rate functions. These are useful in determining if a given failure data set can be modeled by such a model. We discuss the estimation of model parameters based on the WPP plot and illustrate through two examples involving real data.


2014 ◽  
Vol 513-517 ◽  
pp. 1280-1284
Author(s):  
Ming He ◽  
Zhen Zhen Wang ◽  
Yong Ping Du

Document similarity computation is an exciting research topic in information retrieval (IR) and it is a key issue for automatic document categorization, clustering analysis, fuzzy query and question answering. Topic model is an emerging field in natural language processing (NLP), IR and machine learning (ML). In this paper, we apply a latent Dirichlet allocation (LDA) topic model-based method to compute similarity between documents. By mapping a document with term space representation into a topic space, a distribution over topics derived for computing document similarity. An empirical study using real data set demonstrates the efficiency of our method.


Author(s):  
Ogunde Adebisi Ade ◽  
Chukwu Angela Unna ◽  
Agwuegbo Samuel Obi-Nnamd

This work provides a new statistical distribution named Cubic rank transmuted Inverse Weibull distribution which was developed using the cubic transmutation map. Various statistical properties of the new distribution which includes: hazard function, moments, moment generating function, skewness, kurtosis, Renyl entropy and the order statistics were studied. A maximum likelihood estimation method was used in estimating the parameters of the distribution. Applications to real data set show the tractability of the distribution over other distributions and its sub-model.


2020 ◽  
pp. 3-17
Author(s):  
Peter Nabende

Natural Language Processing for under-resourced languages is now a mainstream research area. However, there are limited studies on Natural Language Processing applications for many indigenous East African languages. As a contribution to covering the current gap of knowledge, this paper focuses on evaluating the application of well-established machine translation methods for one heavily under-resourced indigenous East African language called Lumasaaba. Specifically, we review the most common machine translation methods in the context of Lumasaaba including both rule-based and data-driven methods. Then we apply a state of the art data-driven machine translation method to learn models for automating translation between Lumasaaba and English using a very limited data set of parallel sentences. Automatic evaluation results show that a transformer-based Neural Machine Translation model architecture leads to consistently better BLEU scores than the recurrent neural network-based models. Moreover, the automatically generated translations can be comprehended to a reasonable extent and are usually associated with the source language input.


Author(s):  
Tian Lu ◽  
Qinxue Chen ◽  
Zeyu Liu

Although cyclo[18]carbon has been theoretically and experimentally investigated since long time ago, only very recently it was prepared and directly observed by means of STM/AFM in condensed phase (Kaiser et al., <i>Science</i>, <b>365</b>, 1299 (2019)). The unique ring structure and dual 18-center π delocalization feature bring a variety of unusual characteristics and properties to the cyclo[18]carbon, which are quite worth to be explored. In this work, we present an extremely comprehensive and detailed investigation on almost all aspects of the cyclo[18]carbon, including (1) Geometric characteristics (2) Bonding nature (3) Electron delocalization and aromaticity (4) Intermolecular interaction (5) Reactivity (6) Electronic excitation and UV/Vis spectrum (7) Molecular vibration and IR/Raman spectrum (8) Molecular dynamics (9) Response to external field (10) Electron ionization, affinity and accompanied process (11) Various molecular properties. We believe that our full characterization of the cyclo[18]carbon will greatly deepen researchers' understanding of this system, and thereby help them to utilize it in practice and design its various valuable derivatives.


Author(s):  
Tian Lu ◽  
Qinxue Chen ◽  
Zeyu Liu

Although cyclo[18]carbon has been theoretically and experimentally investigated since long time ago, only very recently it was prepared and directly observed by means of STM/AFM in condensed phase (Kaiser et al., <i>Science</i>, <b>365</b>, 1299 (2019)). The unique ring structure and dual 18-center π delocalization feature bring a variety of unusual characteristics and properties to the cyclo[18]carbon, which are quite worth to be explored. In this work, we present an extremely comprehensive and detailed investigation on almost all aspects of the cyclo[18]carbon, including (1) Geometric characteristics (2) Bonding nature (3) Electron delocalization and aromaticity (4) Intermolecular interaction (5) Reactivity (6) Electronic excitation and UV/Vis spectrum (7) Molecular vibration and IR/Raman spectrum (8) Molecular dynamics (9) Response to external field (10) Electron ionization, affinity and accompanied process (11) Various molecular properties. We believe that our full characterization of the cyclo[18]carbon will greatly deepen researchers' understanding of this system, and thereby help them to utilize it in practice and design its various valuable derivatives.


Sign in / Sign up

Export Citation Format

Share Document