scholarly journals POS-Taggging Malay Corpus: A Novel Approach Based on Maximum Entropy

2018 ◽  
Vol 7 (3.20) ◽  
pp. 6
Author(s):  
Juhaida Abu Bakar ◽  
Khairuddin Khairuddin ◽  
Mohammad Faidzul Nasrudin ◽  
Mohd Zamri Murah

Jawi and Roman scripts are represented Malay language. In the past, Jawi writings are widely used by the Malay community and foreigners; and it can be seen in the old documents. Old documents face the risk of background damage. In order to preserve this valuable information, there are significant needs to automated Jawi materials. Based on previous literature, POS-tags are known as the first phase in the automated text analysis; and the development of language technologies can barely initiate without this phase. We highlight the existing POS-tags approaches; and suggest the development of Malay Jawi POS-tags using extended ME-based approach on NUWT Corpus. Results have shown that the proposed model yielded a higher accuracy in comparison to the state-of-the-art model.  

2021 ◽  
Vol 15 (5) ◽  
pp. 1-32
Author(s):  
Quang-huy Duong ◽  
Heri Ramampiaro ◽  
Kjetil Nørvåg ◽  
Thu-lan Dam

Dense subregion (subgraph & subtensor) detection is a well-studied area, with a wide range of applications, and numerous efficient approaches and algorithms have been proposed. Approximation approaches are commonly used for detecting dense subregions due to the complexity of the exact methods. Existing algorithms are generally efficient for dense subtensor and subgraph detection, and can perform well in many applications. However, most of the existing works utilize the state-or-the-art greedy 2-approximation algorithm to capably provide solutions with a loose theoretical density guarantee. The main drawback of most of these algorithms is that they can estimate only one subtensor, or subgraph, at a time, with a low guarantee on its density. While some methods can, on the other hand, estimate multiple subtensors, they can give a guarantee on the density with respect to the input tensor for the first estimated subsensor only. We address these drawbacks by providing both theoretical and practical solution for estimating multiple dense subtensors in tensor data and giving a higher lower bound of the density. In particular, we guarantee and prove a higher bound of the lower-bound density of the estimated subgraph and subtensors. We also propose a novel approach to show that there are multiple dense subtensors with a guarantee on its density that is greater than the lower bound used in the state-of-the-art algorithms. We evaluate our approach with extensive experiments on several real-world datasets, which demonstrates its efficiency and feasibility.


Author(s):  
Fabricio Almeida-Silva ◽  
Kanhu C Moharana ◽  
Thiago M Venancio

Abstract In the past decade, over 3000 samples of soybean transcriptomic data have accumulated in public repositories. Here, we review the state of the art in soybean transcriptomics, highlighting the major microarray and RNA-seq studies that investigated soybean transcriptional programs in different tissues and conditions. Further, we propose approaches for integrating such big data using gene coexpression network and outline important web resources that may facilitate soybean data acquisition and analysis, contributing to the acceleration of soybean breeding and functional genomics research.


1967 ◽  
Vol 71 (677) ◽  
pp. 342-343
Author(s):  
F. H. East

The Aviation Group of the Ministry of Technology (formerly the Ministry of Aviation) is responsible for spending a large part of the country's defence budget, both in research and development on the one hand and production or procurement on the other. In addition, it has responsibilities in many non-defence fields, mainly, but not exclusively, in aerospace.Few developments have been carried out entirely within the Ministry's own Establishments; almost all have required continuous co-operation between the Ministry and Industry. In the past the methods of management and collaboration and the relative responsibilities of the Ministry and Industry have varied with time, with the type of equipment to be developed, with the size of the development project and so on. But over the past ten years there has been a growing awareness of the need to put some system into the complex business of translating a requirement into a specification and a specification into a product within reasonable bounds of time and cost.


2018 ◽  
Vol 17 (2) ◽  
pp. 169
Author(s):  
Aceng Ruhendi Saifullah

Dalam dekade terakhir, kajian tentang  relasi bahasa, media, dan teknologi komunikasi telah menjadi kajian lintas disiplin yang menarik  perhatian para ahli dari berbagai disiplin ilmu. Lebih khusus, dalam kaitannya dengan kajian wacana  di Internet, penggunaan bahasa di Internet  dipandang sebagai pertanda lahirnya “new genre” sekaligus sebagai the state of the art dalam kajian wacana, yang dikenal sebagai kajian computer mediated discourse analysis (CMDA).  Dalam konteks perkembangan itu, kajian ini dimaksudkan untuk merumuskan model  analisis relasi bahasa dan Internet berbasis CMDA. Pertanyaannya, “sejauh mana paradigma CMDA  dapat dirumuskan sebagai model pengembangan analisis relasi bahasa dan Internet. Kajian ini menemukan, bahwa ragam bahasa di Internet tidak sepenuhnya menunjukkan ciri-ciri ragam tulis, akan tetapi cenderung menunjukkan ciri-ciri “ragam lisan yang dituliskan”. Di samping itu, ditemukan pula, bahwa konteks media dan konteks situasi komunikasi tampak berpengaruh secara signifikan dalam menentukan makna suatu tuturan di Internet.  Dengan demikian, paradigma CMDA dalam kajian wacana di Internet tampak relevan digunakan, terutama untuk mengindentifikasi ragam bahasa dan makna tuturan di Internet.Kata kunci: konteks media; konteks situasi komunikasi; Internet; computer mediated discourse analysis (CMDA)In the last decade, the study of language relations, media, and communications technology has become an interdisciplinary study that attracts the attention of experts from various disciplines. More specifically, in relation to the study of discourse on the Internet, the use of language on the Internet is seen as a sign of the birth of "new genre" as well as the state of the art in discourse studies, known as computer mediated discourse analysis (CMDA). In the context of this development, this study is intended to formulate models of analysis of language and Internet relationships based on CMDA. The question centers on the extent to which the CMDA paradigm can be formulated as a model for the development of language and Internet relation analysis. This study reveals that the variety of languages on the Internet does not fully show the characteristics of writing, but tends to show the characteristics of "written verbal". In addition, the analysis showed that the context of the media and the context of the communication situation seemed to have a significant effect on determining the meaning of a speech on the Internet. Thus, the CMDA paradigm in the study of discourse on the Internet seems relevant to use, especially to identify the variety of languages and meanings of speech on the Internet.Keywords: media context; context of communication situation; Internet; computer mediated discourse analysis (CMDA)


2021 ◽  
Vol 27 (1) ◽  
pp. 7-32
Author(s):  
Bruce A. Seaman

The intellectual development of cultural economics has exhibited some notable similarities to the challenges faced by researchers pioneering in other areas of economics. While this is not really surprising, previous reviews of this literature have not focused on such patterns. Specifically, the methodology and normative implications of the field of industrial organization and antitrust policy suggest a series of stages identified here as foundation, maturation, reevaluation, and backlash that suggest a way of viewing the development of and controversies surrounding cultural economics. Also, the emerging field of sports economics, which already shares some substantive similarities to the questions addressed in cultural economics, presents a pattern of development by which core questions and principles are identified in a fragmented literature, which then slowly coalesces and becomes consolidated into a more unified literature that essentially reconfirms and extends those earlier core principles. This fragmentation and consolidation pattern is also exhibited by the development of cultural economics. While others could surely suggest different parallels in the search for such developmental patterns, this way of organizing ones thinking about the past and future of this field provides a hoped for alternative perspective on the state of the art of cultural economics.


Resources ◽  
2020 ◽  
Vol 9 (2) ◽  
pp. 15
Author(s):  
Juan Uribe-Toril ◽  
José Luis Ruiz-Real ◽  
Jaime de Pablo Valenciano

Sustainability, local development, and ecology are keywords that cover a wide range of research fields in both experimental and social sciences. The transversal nature of this knowledge area creates synergies but also divergences, making a continuous review of the existing literature necessary in order to facilitate research. There has been an increasing number of articles that have analyzed trends in the literature and the state-of-the-art in many subjects. In this Special Issue of Resources, the most prestigious researchers analyzed the past and future of Social Sciences in Resources from an economic, social, and environmental perspective.


Computers ◽  
2020 ◽  
Vol 9 (2) ◽  
pp. 37 ◽  
Author(s):  
Luca Cappelletti ◽  
Tommaso Fontana ◽  
Guido Walter Di Donato ◽  
Lorenzo Di Tucci ◽  
Elena Casiraghi ◽  
...  

Missing data imputation has been a hot topic in the past decade, and many state-of-the-art works have been presented to propose novel, interesting solutions that have been applied in a variety of fields. In the past decade, the successful results achieved by deep learning techniques have opened the way to their application for solving difficult problems where human skill is not able to provide a reliable solution. Not surprisingly, some deep learners, mainly exploiting encoder-decoder architectures, have also been designed and applied to the task of missing data imputation. However, most of the proposed imputation techniques have not been designed to tackle “complex data”, that is high dimensional data belonging to datasets with huge cardinality and describing complex problems. Precisely, they often need critical parameters to be manually set or exploit complex architecture and/or training phases that make their computational load impracticable. In this paper, after clustering the state-of-the-art imputation techniques into three broad categories, we briefly review the most representative methods and then describe our data imputation proposals, which exploit deep learning techniques specifically designed to handle complex data. Comparative tests on genome sequences show that our deep learning imputers outperform the state-of-the-art KNN-imputation method when filling gaps in human genome sequences.


Author(s):  
Jukka Tyrkkö

This chapter outlines the state of the art in corpus-based language teaching and digital pedagogy, focusing on the differences between using corpora with present-day and historical data. The basic concepts of corpus-based research such as representativeness, frequency, and statistical significance can be introduced to students who are new to corpus methods, and the application of these concepts to the history of English can deepen students’ understanding of how historical varieties of the language are researched. This chapter will also address some of the key challenges particular to teaching the history of English using corpora, such as dealing with the seemingly counterintuitive findings, non-standard features, and small datasets. Finally, following an overview of available historical corpora and corpus tools, several practical examples of corpus-driven activities will be discussed in detail, with suggestions and ideas on how a teacher might prepare and run corpus-based lessons.


Author(s):  
Zhipeng Chen ◽  
Yiming Cui ◽  
Wentao Ma ◽  
Shijin Wang ◽  
Guoping Hu

Machine Reading Comprehension (MRC) with multiplechoice questions requires the machine to read given passage and select the correct answer among several candidates. In this paper, we propose a novel approach called Convolutional Spatial Attention (CSA) model which can better handle the MRC with multiple-choice questions. The proposed model could fully extract the mutual information among the passage, question, and the candidates, to form the enriched representations. Furthermore, to merge various attention results, we propose to use convolutional operation to dynamically summarize the attention values within the different size of regions. Experimental results show that the proposed model could give substantial improvements over various state-of- the-art systems on both RACE and SemEval-2018 Task11 datasets.


Author(s):  
Gaetano Rossiello ◽  
Alfio Gliozzo ◽  
Michael Glass

We propose a novel approach to learn representations of relations expressed by their textual mentions. In our assumption, if two pairs of entities belong to the same relation, then those two pairs are analogous. We collect a large set of analogous pairs by matching triples in knowledge bases with web-scale corpora through distant supervision. This dataset is adopted to train a hierarchical siamese network in order to learn entity-entity embeddings which encode relational information through the different linguistic paraphrasing expressing the same relation. The model can be used to generate pre-trained embeddings which provide a valuable signal when integrated into an existing neural-based model by outperforming the state-of-the-art methods on a relation extraction task.


Sign in / Sign up

Export Citation Format

Share Document