scholarly journals Sequence analysis paradigm shift reveals unsuspected semantic properties of species proteomes

2021 ◽  
Author(s):  
Antonio Starcevic ◽  
Ena Melvan ◽  
Toni Cvrljak ◽  
Janko Diminic ◽  
Jurica Zucko ◽  
...  

Alignment-based methods dominate molecular biology. However, by primarily allowing one-to-one comparisons, they are focused on gene-centered viewpoint and lack the broad aperture needed for complex biological systems analysis. We hypothesized existence of contextual information related to gene's inclusion in a molecular network of the cell being distributed among more than one sequence. The need for conservation of established interactions, which is arguably more important to the evolutionary success of species than conservation of individual function was the rationale behind this. To test whether this information exists, we applied distributional semantics method - Latent Semantic Analysis (LSA) to thousands of species proteomes. Using natural language processing we identified Latent Taxonomic Signatures (LTSs), a novel proteome distributed feature supporting the argument that protein-coding genes do not evolve as taxonomy independent variables. LTSs reflect constraint imposed to individual gene/protein evolution by their genome/proteome context. In summary, discovery of LTSs indicates that genes had to trade some of their "selfishness" by becoming parts of genome conglomerates.

Author(s):  
Radha Guha

Background:: In the era of information overload it is very difficult for a human reader to make sense of the vast information available in the internet quickly. Even for a specific domain like college or university website it may be difficult for a user to browse through all the links to get the relevant answers quickly. Objective:: In this scenario, design of a chat-bot which can answer questions related to college information and compare between colleges will be very useful and novel. Methods:: In this paper a novel conversational interface chat-bot application with information retrieval and text summariza-tion skill is designed and implemented. Firstly this chat-bot has a simple dialog skill when it can understand the user query intent, it responds from the stored collection of answers. Secondly for unknown queries, this chat-bot can search the internet and then perform text summarization using advanced techniques of natural language processing (NLP) and text mining (TM). Results:: The advancement of NLP capability of information retrieval and text summarization using machine learning tech-niques of Latent Semantic Analysis(LSI), Latent Dirichlet Allocation (LDA), Word2Vec, Global Vector (GloVe) and Tex-tRank are reviewed and compared in this paper first before implementing them for the chat-bot design. This chat-bot im-proves user experience tremendously by getting answers to specific queries concisely which takes less time than to read the entire document. Students, parents and faculty can get the answers for variety of information like admission criteria, fees, course offerings, notice board, attendance, grades, placements, faculty profile, research papers and patents etc. more effi-ciently. Conclusion:: The purpose of this paper was to follow the advancement in NLP technologies and implement them in a novel application.


2021 ◽  
pp. 1-17
Author(s):  
J. Shobana ◽  
M. Murali

Text Sentiment analysis is the process of predicting whether a segment of text has opinionated or objective content and analyzing the polarity of the text’s sentiment. Understanding the needs and behavior of the target customer plays a vital role in the success of the business so the sentiment analysis process would help the marketer to improve the quality of the product as well as a shopper to buy the correct product. Due to its automatic learning capability, deep learning is the current research interest in Natural language processing. Skip-gram architecture is used in the proposed model for better extraction of the semantic relationships as well as contextual information of words. However, the main contribution of this work is Adaptive Particle Swarm Optimization (APSO) algorithm based LSTM for sentiment analysis. LSTM is used in the proposed model for understanding complex patterns in textual data. To improve the performance of the LSTM, weight parameters are enhanced by presenting the Adaptive PSO algorithm. Opposition based learning (OBL) method combined with PSO algorithm becomes the Adaptive Particle Swarm Optimization (APSO) classifier which assists LSTM in selecting optimal weight for the environment in less number of iterations. So APSO - LSTM ‘s ability in adjusting the attributes such as optimal weights and learning rates combined with the good hyper parameter choices leads to improved accuracy and reduces losses. Extensive experiments were conducted on four datasets proved that our proposed APSO-LSTM model secured higher accuracy over the classical methods such as traditional LSTM, ANN, and SVM. According to simulation results, the proposed model is outperforming other existing models.


Author(s):  
Ángela Almela ◽  
Gema Alcaraz-Mármol ◽  
Arancha García-Pinar ◽  
Clara Pallejá

In this paper, the methods for developing a database of Spanish writing that can be used for forensic linguistic research are presented, including our data collection procedures. Specifically, the main instrument used for data collection has been translated into Spanish and adapted from Chaski (2001). It consists of ten tasks, by means of which the subjects are asked to write formal and informal texts about different topics. To date, 93 undergraduates from Spanish universities have already participated in the study and prisoners convicted of gender-based abuse have participated. A twofold analysis has been performed, since the data collected have been approached from a semantic and a morphosyntactic perspective. Regarding the semantic analysis, psycholinguistic categories have been used, many of them taken from the LIWC dictionary (Pennebaker et al., 2001). In order to obtain a more comprehensive depiction of the linguistic data, some other ad-hoc categories have been created, based on the corpus itself, using a double-check method for their validation so as to ensure inter-rater reliability. Furthermore, as regards morphosyntactic analysis, the natural language processing tool ALIAS TATTLER is being developed for Spanish.  Results shows that is it possible to differentiate non-abusers from abusers with strong accuracy based on linguistic features.


2001 ◽  
Vol 13 (6) ◽  
pp. 829-843 ◽  
Author(s):  
A. L. Roskies ◽  
J. A. Fiez ◽  
D. A. Balota ◽  
M. E. Raichle ◽  
S. E. Petersen

To distinguish areas involved in the processing of word meaning (semantics) from other regions involved in lexical processing more generally, subjects were scanned with positron emission tomography (PET) while performing lexical tasks, three of which required varying degrees of semantic analysis and one that required phonological analysis. Three closely apposed regions in the left inferior frontal cortex and one in the right cerebellum were significantly active above baseline in the semantic tasks, but not in the nonsemantic task. The activity in two of the frontal regions was modulated by the difficulty of the semantic judgment. Other regions, including some in the left temporal cortex and the cerebellum, were active across all four language tasks. Thus, in addition to a number of regions known to be active during language processing, regions in the left inferior frontal cortex were specifically recruited during semantic processing in a task-dependent manner. A region in the right cerebellum may be functionally related to those in the left inferior frontal cortex. Discussion focuses on the implications of these results for current views regarding neural substrates of semantic processing.


2021 ◽  
Vol 47 (05) ◽  
Author(s):  
NGUYỄN CHÍ HIẾU

Knowledge Graphs are applied in many fields such as search engines, semantic analysis, and question answering in recent years. However, there are many obstacles for building knowledge graphs as methodologies, data and tools. This paper introduces a novel methodology to build knowledge graph from heterogeneous documents.  We use the methodologies of Natural Language Processing and deep learning to build this graph. The knowledge graph can use in Question answering systems and Information retrieval especially in Computing domain


2021 ◽  
pp. 002224292110478
Author(s):  
Xin (Shane) Wang ◽  
Jiaxiu He ◽  
David J. Curry ◽  
Jun Hyun (Joseph) Ryoo

Sales, product design, and engineering teams benefit immensely from better understanding customer perspectives. How do customers combine a product’s technical specifications (i.e., engineered attributes) to form abstract product benefits (i.e., meta-attributes)? To address this question, the authors use machine learning and natural language processing to develop a methodological framework that extracts a hierarchy of product attributes based on contextual information of how attributes are expressed in consumer reviews. The attribute hierarchy reveals linkages between engineered attributes and meta-attributes within a product category, enabling flexible sentiment analysis that can identify how meta-attributes are received by consumers, and which engineered attributes are main drivers. The framework can guide managers to monitor only portions of review content that are relevant to specific attributes. Moreover, managers can compare products within and between brands, where different names and attribute combinations are often associated with similar benefits. The authors apply the framework to the tablet computer category to generate dashboards and perceptual maps, and provide validations of the attribute hierarchy using both primary and secondary data. Resultant insights allow the exploration of substantive questions, such as how successive generations of iPads were improved by Apple, and why HP and Toshiba discontinued their tablet product lines.


Author(s):  
Subhadra Dutta ◽  
Eric M. O’Rourke

Natural language processing (NLP) is the field of decoding human written language. This chapter responds to the growing interest in using machine learning–based NLP approaches for analyzing open-ended employee survey responses. These techniques address scalability and the ability to provide real-time insights to make qualitative data collection equally or more desirable in organizations. The chapter walks through the evolution of text analytics in industrial–organizational psychology and discusses relevant supervised and unsupervised machine learning NLP methods for survey text data, such as latent Dirichlet allocation, latent semantic analysis, sentiment analysis, word relatedness methods, and so on. The chapter also lays out preprocessing techniques and the trade-offs of growing NLP capabilities internally versus externally, points the readers to available resources, and ends with discussing implications and future directions of these approaches.


2020 ◽  
pp. 51-68
Author(s):  
Michael Devitt

Linguistics takes speakers’ intuitions about the syntactic and semantic properties of their language as good evidence for a theory of that language. Why are these intuitions good evidence? The received Chomskyan answer is that they are the product of an underlying linguistic competence. In Devitt’s Ignorance of Language, this Voice of Competence answer (VoC) was criticized and an alternative view, according to which intuitions are empirical theory-laden central-processor responses to phenomena, was defended. After summarizing this position, the chapter responds to Steven Gross and Georges Rey, who defend VoC. It argues that they have not provided the sort of empirically based details that make VoC worth pursuing. In doing so, it emphasizes two distinctions: (1) between the intuitive behavior of language processing and the intuitive judgments that are the subject of VoC; and (2) between the possible roles of structural descriptions in language processing and in providing intuitions.


2020 ◽  
Vol 16 ◽  
pp. 117693432090373 ◽  
Author(s):  
Katherine E Noah ◽  
Jiasheng Hao ◽  
Luyan Li ◽  
Xiaoyan Sun ◽  
Brian Foley ◽  
...  

Deep phylogeny involving arthropod lineages is difficult to recover because the erosion of phylogenetic signals over time leads to unreliable multiple sequence alignment (MSA) and subsequent phylogenetic reconstruction. One way to alleviate the problem is to assemble a large number of gene sequences to compensate for the weakness in each individual gene. Such an approach has led to many robustly supported but contradictory phylogenies. A close examination shows that the supermatrix approach often suffers from two shortcomings. The first is that MSA is rarely checked for reliability and, as will be illustrated, can be poor. The second is that, to alleviate the problem of homoplasy at the third codon position of protein-coding genes due to convergent evolution of nucleotide frequencies, phylogeneticists may remove or degenerate the third codon position but may do it improperly and introduce new biases. We performed extensive reanalysis of one of such “big data” sets to highlight these two problems, and demonstrated the power and benefits of correcting or alleviating these problems. Our results support a new group with Xiphosura and Arachnopulmonata (Tetrapulmonata + Scorpiones) as sister taxa. This favors a new hypothesis in which the ancestor of Xiphosura and the extinct Eurypterida (sea scorpions, of which many later forms lived in brackish or freshwater) returned to the sea after the initial chelicerate invasion of land. Our phylogeny is supported even with the original data but processed with a new “principled” codon degeneration. We also show that removing the 1673 codon sites with both AGN and UCN codons (encoding serine) in our alignment can partially reconcile discrepancies between nucleotide-based and AA-based tree, partly because two sequences, one with AGN and the other with UCN, would be identical at the amino acid level but quite different at the nucleotide level.


Sign in / Sign up

Export Citation Format

Share Document