scholarly journals Cross-domain Visual Exploration of Academic Corpora via the Latent Meaning of User-authored Keywords

Author(s):  
Alejandro Benito ◽  
Roberto Theron

Nowadays, scholars dedicate a substantial amount of their work to the querying and browsing of increasingly large collections of research papers on the Internet. In parallel, the recent surge of novel interdisciplinary approaches in science requires scholars to acquire competencies in new fields for which they may lack the necessary vocabulary to formulate adequate queries. This problem, together with the issue of information overload, poses new challenges in the fields of natural language processing (NLP) and visualization design that call for a rapid response from the scientific community. In this respect, we report on a novel visualization scheme that enables the exploration of research paper collections via the analysis of semantic proximity relationships found in author-assigned keywords. Our proposal replaces traditional string queries by a bag-of-words (BoW) extracted from a user-generated auxiliary corpus that captures the intentionality of the research. Continuing on the line established by previous works, we combine novel advances in the fields of NLP with visual network analysis techniques to offer scholars a perspective of the target corpus that better fits their research needs. To highlight the advantages of our proposal, we conduct two experiments employing a collection of visualization research papers and an auxiliary cross-domain BoW. Here, we showcase how our visualization can be used to maximize the effectiveness of a browsing session by enhancing the language acquisition task, which allows an effective extraction of knowledge that is in line with the users’ previous expectations.

Author(s):  
Mattson Ogg ◽  
L. Robert Slevc

Music and language are uniquely human forms of communication. What neural structures facilitate these abilities? This chapter conducts a review of music and language processing that follows these acoustic signals as they ascend the auditory pathway from the brainstem to auditory cortex and on to more specialized cortical regions. Acoustic, neural, and cognitive mechanisms are identified where processing demands from both domains might overlap, with an eye to examples of experience-dependent cortical plasticity, which are taken as strong evidence for common neural substrates. Following an introduction describing how understanding musical processing informs linguistic or auditory processing more generally, findings regarding the major components (and parallels) of music and language research are reviewed: pitch perception, syntax and harmonic structural processing, semantics, timbre and speaker identification, attending in auditory scenes, and rhythm. Overall, the strongest evidence that currently exists for neural overlap (and cross-domain, experience-dependent plasticity) is in the brainstem, followed by auditory cortex, with evidence and the potential for overlap becoming less apparent as the mechanisms involved in music and speech perception become more specialized and distinct at higher levels of processing.


Author(s):  
Robert Procter ◽  
Miguel Arana-Catania ◽  
Felix-Anselm van Lier ◽  
Nataliya Tkachenko ◽  
Yulan He ◽  
...  

The development of democratic systems is a crucial task as confirmed by its selection as one of the Millennium Sustainable Development Goals by the United Nations. In this article, we report on the progress of a project that aims to address barriers, one of which is information overload, to achieving effective direct citizen participation in democratic decision-making processes. The main objectives are to explore if the application of Natural Language Processing (NLP) and machine learning can improve citizens? experience of digital citizen participation platforms. Taking as a case study the ?Decide Madrid? Consul platform, which enables citizens to post proposals for policies they would like to see adopted by the city council, we used NLP and machine learning to provide new ways to (a) suggest to citizens proposals they might wish to support; (b) group citizens by interests so that they can more easily interact with each other; (c) summarise comments posted in response to proposals; (d) assist citizens in aggregating and developing proposals. Evaluation of the results confirms that NLP and machine learning have a role to play in addressing some of the barriers users of platforms such as Consul currently experience.


Author(s):  
Arkadipta De ◽  
Dibyanayan Bandyopadhyay ◽  
Baban Gain ◽  
Asif Ekbal

Fake news classification is one of the most interesting problems that has attracted huge attention to the researchers of artificial intelligence, natural language processing, and machine learning (ML). Most of the current works on fake news detection are in the English language, and hence this has limited its widespread usability, especially outside the English literate population. Although there has been a growth in multilingual web content, fake news classification in low-resource languages is still a challenge due to the non-availability of an annotated corpus and tools. This article proposes an effective neural model based on the multilingual Bidirectional Encoder Representations from Transformer (BERT) for domain-agnostic multilingual fake news classification. Large varieties of experiments, including language-specific and domain-specific settings, are conducted. The proposed model achieves high accuracy in domain-specific and domain-agnostic experiments, and it also outperforms the current state-of-the-art models. We perform experiments on zero-shot settings to assess the effectiveness of language-agnostic feature transfer across different languages, showing encouraging results. Cross-domain transfer experiments are also performed to assess language-independent feature transfer of the model. We also offer a multilingual multidomain fake news detection dataset of five languages and seven different domains that could be useful for the research and development in resource-scarce scenarios.


2022 ◽  
pp. 155-170
Author(s):  
Lap-Kei Lee ◽  
Kwok Tai Chui ◽  
Jingjing Wang ◽  
Yin-Chun Fung ◽  
Zhanhui Tan

The dependence on Internet in our daily life is ever-growing, which provides opportunity to discover valuable and subjective information using advanced techniques such as natural language processing and artificial intelligence. In this chapter, the research focus is a convolutional neural network for three-class (positive, neutral, and negative) cross-domain sentiment analysis. The model is enhanced in two-fold. First, a similarity label method facilitates the management between the source and target domains to generate more labelled data. Second, term frequency-inverse document frequency (TF-IDF) and latent semantic indexing (LSI) are employed to compute the similarity between source and target domains. Performance evaluation is conducted using three datasets, beauty reviews, toys reviews, and phone reviews. The proposed method enhances the accuracy by 4.3-7.6% and reduces the training time by 50%. The limitations of the research work have been discussed, which serve as the rationales of future research directions.


Author(s):  
Kamal Al-Sabahi ◽  
Zhang Zuping

In the era of information overload, text summarization has become a focus of attention in a number of diverse fields such as, question answering systems, intelligence analysis, news recommendation systems, search results in web search engines, and so on. A good document representation is the key point in any successful summarizer. Learning this representation becomes a very active research in natural language processing field (NLP). Traditional approaches mostly fail to deliver a good representation. Word embedding has proved an excellent performance in learning the representation. In this paper, a modified BM25 with Word Embeddings are used to build the sentence vectors from word vectors. The entire document is represented as a set of sentence vectors. Then, the similarity between every pair of sentence vectors is computed. After that, TextRank, a graph-based model, is used to rank the sentences. The summary is generated by picking the top-ranked sentences according to the compression rate. Two well-known datasets, DUC2002 and DUC2004, are used to evaluate the models. The experimental results show that the proposed models perform comprehensively better compared to the state-of-the-art methods.


Author(s):  
Alexandr N. Komandzhaev ◽  
◽  
Saglar E. Badmaeva

Introduction. The article examines the understudied issues of how and to what extent epidemic diseases used to spread across Kalmyk uluses (‘districts’) in the late 19th – early 20th centuries, with special attention be paid to employed control and monitoring methods. The problem was covered in a number of published reports delivered at the First Congress of Astrakhan physicians to have worked in Kalmyk-inhabited lands during the period under study, and the former contain their shared their experiences and valuable findings. Historians hardly ever approached the topic in just a few papers. Goals. So, the work aims at a detailed survey of epidemic diseases in the Kalmyk Steppe of Astrakhan Governorate in the late 19th – early 20th centuries. Materials and Methods. The study employs a set of general scientific and specific historical research methods. The observance of the historicism principle made it possible to avoid modern misinterpretations of the century-old events examined, while system-analysis techniques and interdisciplinary approaches resulted in that certain specific events of Kalmyk life were analyzed as parts of an overall picture. The article mainly explores and newly introduces materials of the Medical Department — a healthcare agency within the Kalmyk People’s Administration — currently stored at the National Archive of Kalmykia. Results. Despite the remoteness of Kalmyk nomadic settlements (Kalm. khoton) from administrative centers and first-aid stations, healthcare practitioners still were efficient enough to promptly respond in case of epidemic outbreaks. Besides treatment proper, the medical, administrative and police personnel were largely responsible for quarantine and disinfection activities, medical examination and supervision of people living around the periphery of the effective disease area. Conclusions. The analysis of materials dealing with the issue reveals Kalmyk districts were widely affected by epidemic diseases, such as typhus, smallpox, measles, diphtheria and others, while cholera and plague were not that often. It should be admitted that the frequent occurrence of those diseases in medical records across the Kalmyk Steppe was determined by their endemicity to have resulted from a number of reasons.


Author(s):  
Miss. Priyanka Vasant Ambilwade

In today’s information age as use of websites, mobile apps and all forms of information sharing forms have increased which gave rise to malicious URL forms. These malicious URLs are forwarded and users attention is diverted from the main course for what he is searching to other non-necessary and harmful content, thus wasting a lot of time and money. Theses malicious URLs have given rise to authentication thefts, money thefts and bullying of a user who falls in to a trap set by hackers by accessing these URLs. To resolve and find a solution to this kind of menace there is need to detect and prevent users from accessing these URLs. So, while studying various techniques put forward by various authors in different research papers, we found a few techniques quite interesting and useful. The first is detecting malicious URLs using CNN and GRU. The second is where a text mining technique is proposed using Natural Language Processing (NLP) which can be used for classification. The third is a combination of CNN and NLP. By studying them we came to understand that there should be a combination of both NLP and CNN together to implement a successful malicious URL detection system. So, in our paper we are proposing a fusion of R-CNN, NLP and Cloud together. The main work in our paper is to collect malicious and healthy URL which will be done using internet and multiple sources and combined as one dataset. Thus, we will use Google cloud to create a blacklisted URL database of our own and not depend upon multiple sources internet for them. In our system first we will create a blacklist database on cloud and then apply classification on it using NLP and machine learning algorithm SVM. The second step will be to use same URL dataset to train a R-CNN AI algorithm and get an output in form of malicious identified URLs. Then in the final phase we will compare the final results from SVM and R-CNN and analyse which one is efficient and highs and lows of the technique.


Sign in / Sign up

Export Citation Format

Share Document