From Specialized Web Corpora of Tourism to a Learner’s Dictionary

This paper presents the two approaches used in creating specialized web corpora of Croatian tourism in Japanese for their usage in building a specialized learners’ dictionary. Both approaches use the WebBootCat technology (Baroni et al. 2006, Kilgarriff et al. 2014) to automatically create specialized web corpora. The first approach creates the corpora from the selected seed words most relevant to the topic. The second approach specifies a number of web pages that cover tourism-oriented information on specified regions, cities, and sites in Croatia available in Japanese, which are then used for web corpora creation inside the Sketch Engine platform. Both approaches provide specialized web corpora small in size, but quite useful for lexical profiling in the specific field of tourism. In the process of dictionary creation, the second approach has proven to be especially useful for the selection of lexical items, while both approaches have proven to be highly useful for the exploration and selection of authentic examples from the corpora. The research exposes some shortcomings in Japanese language processing, such as errors in the lemmatization of some culturally specific terms and indicates the need to refine existing language processing tools in Japanese. The Japanese-Croatian bilingual learner’s dictionary (Srdanović 2018) is currently in the pilot phase and is being used and built by learners and teachers through the open-source dictionary platform Lexonomy (Mechura 2017). In addition to the fact that work on the bilingual dictionary is useful as a means for training students in language analysis and description using modern technologies (e.g. corpora, corpus query systems, dictionary editing platform), the dictionary is also important in educating new personnel capable of working in tourism using the Japanese language, which is strongly needed. In future, the same approach could be used for creating specialized corpora and dictionaries for Japanese and other language pairs.

Download Full-text

Magnetoencephalography and the Cortical Dynamics of Language Processing

The Oxford Handbook of Neurolinguistics ◽

10.1093/oxfordhb/9780190672027.013.6 ◽

2019 ◽

pp. 114-153

Author(s):

Riitta Salmelin ◽

Jan Kujala ◽

Mia Liljeström

Keyword(s):

Language Processing ◽

Language Disorders ◽

Scientific Data ◽

Neural Processing ◽

Powerful Method ◽

Cortical Dynamics ◽

Brain Correlates ◽

Adults And Children ◽

The Brain ◽

Selection Of

When seeking to uncover the brain correlates of language processing, timing and location are of the essence. Magnetoencephalography (MEG) offers them both, with the highest sensitivity to cortical activity. MEG has shown its worth in revealing cortical dynamics of reading, speech perception, and speech production in adults and children, in unimpaired language processing as well as developmental and acquired language disorders. The MEG signals, once recorded, provide an extensive selection of measures for examination of neural processing. Like all other neuroimaging tools, MEG has its own strengths and limitations of which the user should be aware in order to make the best possible use of this powerful method and to generate meaningful and reliable scientific data. This chapter reviews MEG methodology and how MEG has been used to study the cortical dynamics of language.

Download Full-text

Developing the Persian Wordnet of Verbs Using Supervised Learning

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3450969 ◽

2021 ◽

Vol 20 (4) ◽

pp. 1-18

Author(s):

Zahra Mousavi ◽

Heshaam Faili

Keyword(s):

Information Retrieval ◽

Natural Language Processing ◽

Language Processing ◽

Supervised Classification ◽

Word Sense ◽

Direct Influence ◽

Training Set ◽

Bilingual Dictionary ◽

Automated Method ◽

Princeton Wordnet

Nowadays, wordnets are extensively used as a major resource in natural language processing and information retrieval tasks. Therefore, the accuracy of wordnets has a direct influence on the performance of the involved applications. This paper presents a fully-automated method for extending a previously developed Persian wordnet to cover more comprehensive and accurate verbal entries. At first, by using a bilingual dictionary, some Persian verbs are linked to Princeton WordNet synsets. A feature set related to the semantic behavior of compound verbs as the majority of Persian verbs is proposed. This feature set is employed in a supervised classification system to select the proper links for inclusion in the wordnet. We also benefit from a pre-existing Persian wordnet, FarsNet, and a similarity-based method to produce a training set. This is the largest automatically developed Persian wordnet with more than 27,000 words, 28,000 PWN synsets and 67,000 word-sense pairs that substantially outperforms the previous Persian wordnet with about 16,000 words, 22,000 PWN synsets and 38,000 word-sense pairs.

Download Full-text

Generation of Cross-Lingual Word Vectors for Low-Resourced Languages Using Deep Learning and Topological Metrics in a Data-Efficient Way

Electronics ◽

10.3390/electronics10121372 ◽

2021 ◽

Vol 10 (12) ◽

pp. 1372

Author(s):

Sanjanasri JP ◽

Vijay Krishna Menon ◽

Soman KP ◽

Rajendran S ◽

Agnieszka Wolk

Keyword(s):

Deep Learning ◽

Language Processing ◽

Semantic Space ◽

Semantic Interpretation ◽

Learning Approaches ◽

Qualitative Comparison ◽

Bilingual Dictionary ◽

Pos Tagging ◽

Part Of Speech ◽

Cross Lingual

Linguists have been focused on a qualitative comparison of the semantics from different languages. Evaluation of the semantic interpretation among disparate language pairs like English and Tamil is an even more formidable task than for Slavic languages. The concept of word embedding in Natural Language Processing (NLP) has enabled a felicitous opportunity to quantify linguistic semantics. Multi-lingual tasks can be performed by projecting the word embeddings of one language onto the semantic space of the other. This research presents a suite of data-efficient deep learning approaches to deduce the transfer function from the embedding space of English to that of Tamil, deploying three popular embedding algorithms: Word2Vec, GloVe and FastText. A novel evaluation paradigm was devised for the generation of embeddings to assess their effectiveness, using the original embeddings as ground truths. Transferability across other target languages of the proposed model was assessed via pre-trained Word2Vec embeddings from Hindi and Chinese languages. We empirically prove that with a bilingual dictionary of a thousand words and a corresponding small monolingual target (Tamil) corpus, useful embeddings can be generated by transfer learning from a well-trained source (English) embedding. Furthermore, we demonstrate the usability of generated target embeddings in a few NLP use-case tasks, such as text summarization, part-of-speech (POS) tagging, and bilingual dictionary induction (BDI), bearing in mind that those are not the only possible applications.

Download Full-text

A WORD-BASED CHINESE LANGUAGE UNDERSTANDING SYSTEM

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001488000042 ◽

1988 ◽

Vol 02 (01) ◽

pp. 25-35

Author(s):

TIAN-SHUN YAO

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Chinese Language ◽

Computer Programs ◽

World Knowledge ◽

Knowledge Source ◽

Language Understanding ◽

Language Analysis ◽

The World

With the word-based theory of natural language processing, a word-based Chinese language understanding system has been developed. In the light of psychological language analysis and the features of the Chinese language, this theory of natural language processing is presented with the description of the computer programs based on it. The heart of the system is to define a Total Information Dictionary and the World Knowledge Source used in the system. The purpose of this research is to develop a system which can understand not only Chinese sentences but also the whole text.

Download Full-text

Selection of Suitable PageRank Calculation for Analysis of Differences Between Expected and Observed Probability of Accesses to Web Pages

Lecture Notes in Computer Science - Multi-disciplinary Trends in Artificial Intelligence ◽

10.1007/978-3-030-03014-8_12 ◽

2018 ◽

pp. 139-150

Author(s):

Jozef Kapusta ◽

Michal Munk ◽

Peter Svec

Keyword(s):

Web Pages ◽

Selection Of

Download Full-text

Ekstraksi Informasi Halaman Web Menggunakan Pendekatan Bootstrapping pada Ontology-Based Information Extraction

IJCCS (Indonesian Journal of Computing and Cybernetics Systems) ◽

10.22146/ijccs.7540 ◽

2015 ◽

Vol 9 (2) ◽

pp. 111 ◽

Cited By ~ 1

Author(s):

Erma Susanti ◽

Khabib Mustofa

Keyword(s):

Information Extraction ◽

Language Processing ◽

Semantic Content ◽

Extraction Process ◽

Web Pages ◽

Structured Information ◽

Improved Performance ◽

Types Of Information ◽

Unstructured Information

AbstrakEkstraksi informasi merupakan suatu bidang ilmu untuk pengolahan bahasa alami, dengan cara mengubah teks tidak terstruktur menjadi informasi dalam bentuk terstruktur. Berbagai jenis informasi di Internet ditransmisikan secara tidak terstruktur melalui website, menyebabkan munculnya kebutuhan akan suatu teknologi untuk menganalisa teks dan menemukan pengetahuan yang relevan dalam bentuk informasi terstruktur. Contoh informasi tidak terstruktur adalah informasi utama yang ada pada konten halaman web. Bermacam pendekatan untuk ekstraksi informasi telah dikembangkan oleh berbagai peneliti, baik menggunakan metode manual atau otomatis, namun masih perlu ditingkatkan kinerjanya terkait akurasi dan kecepatan ekstraksi. Pada penelitian ini diusulkan suatu penerapan pendekatan ekstraksi informasi dengan mengkombinasikan pendekatan bootstrapping dengan Ontology-based Information Extraction (OBIE). Pendekatan bootstrapping dengan menggunakan sedikit contoh data berlabel, digunakan untuk memimalkan keterlibatan manusia dalam proses ekstraksi informasi, sedangkan penggunakan panduan ontologi untuk mengekstraksi classes (kelas), properties dan instance digunakan untuk menyediakan konten semantik untuk web semantik. Pengkombinasian kedua pendekatan tersebut diharapkan dapat meningkatan kecepatan proses ekstraksi dan akurasi hasil ekstraksi. Studi kasus untuk penerapan sistem ekstraksi informasi menggunakan dataset “LonelyPlanet”. Kata kunci—Ekstraksi informasi, ontologi, bootstrapping, Ontology-Based Information Extraction, OBIE, kinerja Abstract Information extraction is a field study of natural language processing by converting unstructured text into structured information. Several types of information on the Internet is transmitted through unstructured information via websites, led to emergence of the need a technology to analyze text and found relevant knowledge into structured information. For example of unstructured information is existing main information on the content of web pages. Various approaches for information extraction have been developed by many researchers, either using manual or automatic method, but still need to be improved performance related accuracy and speed of extraction. This research proposed an approach of information extraction that combines bootstrapping approach with Ontology-Based Information Extraction (OBIE). Bootstrapping approach using small seed of labelled data, is used to minimize human intervention on information extraction process, while the use of guide ontology for extracting classes, properties and instances, using for provide semantic content for semantic web. Combining both approaches expected to increase speed of extraction process and accuracy of extraction results. Case study to apply information extraction system using “LonelyPlanet” datasets. Keywords— Information extraction, ontology, bootstrapping, Ontology-Based Information Extraction, OBIE, performance

Download Full-text

From Text to Thought: How Analyzing Language Can Advance Psychological Science

10.31234/osf.io/nws35 ◽

2020 ◽

Author(s):

Joshua Conrad Jackson ◽

Joseph Watts ◽

Johann-Mattis List ◽

Ryan Drabble ◽

Kristen Lindquist

Keyword(s):

Natural Language ◽

Language Processing ◽

Statistical Power ◽

Culturally Diverse ◽

Large Scale ◽

Human Cognition ◽

Comparative Linguistics ◽

Psychological Science ◽

Language Analysis ◽

The Mind

Humans have been using language for thousands of years, but psychologists seldom consider what natural language can tell us about the mind. Here we propose that language offers a unique window into human cognition. After briefly summarizing the legacy of language analyses in psychological science, we show how methodological advances have made these analyses more feasible and insightful than ever before. In particular, we describe how two forms of language analysis—comparative linguistics and natural language processing—are already contributing to how we understand emotion, creativity, and religion, and overcoming methodological obstacles related to statistical power and culturally diverse samples. We summarize resources for learning both of these methods, and highlight the best way to combine language analysis techniques with behavioral paradigms. Applying language analysis to large-scale and cross-cultural datasets promises to provide major breakthroughs in psychological science.

Download Full-text

An Empirical Assessment of Participation and Decision Making by Rural Women in Agriculture and Livestock Activities

International Journal of Economic Plants ◽

10.23910/2/2020.0355 ◽

2020 ◽

Vol 7 (1) ◽

pp. 009-012

Author(s):

Rashmi Chaudhary ◽

◽

Yasmin Janjhua ◽

Avineet ◽

Krishan Kumar ◽

...

Keyword(s):

Decision Making ◽

Rural Women ◽

Financial Management ◽

Management Practices ◽

Animal Husbandry ◽

Economic Activities ◽

Poor Access ◽

Women In Agriculture ◽

Modern Technologies ◽

Selection Of

Women make essential contributions to agriculture and rural economic activities in all developing countries. Even though women contribute 60 to 80% of the labour in agriculture and animal husbandry, their involvement in selection of suitable crops and adoption of innovative and good management practices is very low. The study reported that sampled women respondents have shown participation in all the selected agriculture and livestock activities excluding marketing and financial management. The study put forth that very less households witness female participation in agriculture and livestock activities related decision making. Some of the important reasons for their subdued role in decision making in agricultural production could be lack of awareness about new opportunities and modern technologies, inadequate facilities for training and capacity building and poor access to extension workers for consultation whenever needed.

Download Full-text

A Survey on Intelligence Tools for Data Analytics

Advances in Data Mining and Database Management - Handbook of Research on Engineering, Business, and Healthcare Applications of Data Science and Analytics ◽

10.4018/978-1-7998-3053-5.ch005 ◽

2021 ◽

pp. 73-95

Author(s):

Shatakshi Singh ◽

Kanika Gautam ◽

Prachi Singhal ◽

Sunil Kumar Jangir ◽

Manish Kumar

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Language Processing ◽

Real Life ◽

Learning Tools ◽

The Core ◽

Training Mode ◽

Real Life Situation ◽

Selection Of

The recent development in artificial intelligence is quite astounding in this decade. Especially, machine learning is one of the core subareas of AI. Also, ML field is an incessantly growing along with evolution and becomes a rise in its demand and importance. It transmogrified the way data is extracted, analyzed, and interpreted. Computers are trained to get in a self-training mode so that when new data is fed they can learn, grow, change, and develop themselves without explicit programming. It helps to make useful predictions that can guide better decisions in a real-life situation without human interference. Selection of ML tool is always a challenging task, since choosing an appropriate tool can end up saving time as well as making it faster and easier to provide any solution. This chapter provides a classification of various machine learning tools on the following aspects: for non-programmers, for model deployment, for Computer vision, natural language processing, and audio for reinforcement learning and data mining.

Download Full-text

Does open access citation advantage depend on paper topics?

Journal of Information Science ◽

10.1177/0165551519865489 ◽

2019 ◽

Vol 46 (5) ◽

pp. 696-709

Author(s):

Hajar Sotudeh

Keyword(s):

Open Access ◽

Language Processing ◽

Analysis Method ◽

Research Topics ◽

Abstract Level ◽

Article Processing Charges ◽

Processing Techniques ◽

Citation Advantage ◽

Selection Of

Research topics vary in their citation potential. In a metric-wise scientific milieu, it would be probable that authors tend to select citation-attractive topics especially when choosing open access (OA) outlets that are more likely to attract citations. Applying a matched-pairs study design, this research aims to examine the role of research topics in the citation advantage of OA papers. Using a comparative citation analysis method, it investigates a sample of papers published in 47 Elsevier article processing charges (APC)-funded journals in different access models including non-open access (NOA), APC, Green and mixed Green-APC. The contents of the papers are analysed using natural language processing techniques at the title and abstract level and served as a basis to match the NOA papers to their peers in the OA models. The publication years and journals are controlled for in order to avoid their impacts on the citation numbers. According to the results, the OA citation advantage that is observed in the whole sample still holds even for the highly similar OA and NOA papers. This implies that the OA citation surplus is not an artefact of the OA and NOA papers’ differences in their topics and, therefore, in their citation potential. This leads to the conclusion that OA authors’ self-selectivity, if it exists at all, is not responsible for the OA citation advantage, at least as far as selection of topics with probably higher citation potentials is concerned.

Download Full-text