language technology
Recently Published Documents


TOTAL DOCUMENTS

490
(FIVE YEARS 107)

H-INDEX

14
(FIVE YEARS 2)

2021 ◽  
Vol 21 (4) ◽  
pp. 1-25
Author(s):  
Sara Vogel

Critical computing approaches to K-12 computer science education aim to promote justice in computing and the wider world. Despite being intertwined with inequitable power dynamics in computing, issues of linguistic (in)justice have received less attention in critical computing. In this article, I draw on theoretical ideas from sociolinguistics and critical computing to analyze qualitative data collected in computing and technology-integrated language and humanities classes serving emergent bi/multilingual middle school students. Conversations about language, technology, and power were close at hand in focal classrooms, and surfaced in moments when students acted as users and critics of, and tinkerers with, digital tools. Students exercised agency in relation to both technology and language—using their budding understandings of language to question digital tools, and their engagements with tools to challenge traditional language ideologies. I build on past scholarship and the findings of this analysis to argue for the development of critical translingual computing education —an approach that would engage especially language-minoritized students in critical computing to build on and affirm their language practices and promote linguistic justice in computer science education, fields, and tools.


Author(s):  
Rajesh Kumar Mundotiya ◽  
Manish Kumar Singh ◽  
Rahul Kapur ◽  
Swasti Mishra ◽  
Anil Kumar Singh

Corpus preparation for low-resource languages and for development of human language technology to analyze or computationally process them is a laborious task, primarily due to the unavailability of expert linguists who are native speakers of these languages and also due to the time and resources required. Bhojpuri, Magahi, and Maithili, languages of the Purvanchal region of India (in the north-eastern parts), are low-resource languages belonging to the Indo-Aryan (or Indic) family. They are closely related to Hindi, which is a relatively high-resource language, which is why we compare them with Hindi. We collected corpora for these three languages from various sources and cleaned them to the extent possible, without changing the data in them. The text belongs to different domains and genres. We calculated some basic statistical measures for these corpora at character, word, syllable, and morpheme levels. These corpora were also annotated with parts-of-speech (POS) and chunk tags. The basic statistical measures were both absolute and relative and were expected to indicate linguistic properties, such as morphological, lexical, phonological, and syntactic complexities (or richness). The results were compared with a standard Hindi corpus. For most of the measures, we tried to match the corpus size across the languages to avoid the effect of corpus size, but in some cases it turned out that using the full corpus was better, even if sizes were very different. Although the results are not very clear, we tried to draw some conclusions about the languages and the corpora. For POS tagging and chunking, the BIS tagset was used to manually annotate the data. The POS-tagged data sizes are 16,067, 14,669, and 12,310 sentences, respectively, for Bhojpuri, Magahi, and Maithili. The sizes for chunking are 9,695 and 1,954 sentences for Bhojpuri and Maithili, respectively. The inter-annotator agreement for these annotations, using Cohen’s Kappa, was 0.92, 0.64, and 0.74, respectively, for the three languages. These (annotated) corpora have been used for developing preliminary automated tools, which include POS tagger, Chunker, and Language Identifier. We have also developed the Bilingual dictionary (Purvanchal languages to Hindi) and a Synset (that can be integrated later in the Indo-WordNet) as additional resources. The main contribution of the work is the creation of basic resources for facilitating further language processing research for these languages, providing some quantitative measures about them and their similarities among themselves and with Hindi. For similarities, we use a somewhat novel measure of language similarity based on an n-gram-based language identification algorithm. An additional contribution is providing baselines for three basic NLP applications (POS tagging, chunking, and language identification) for these closely related languages.


2021 ◽  
Vol 2066 (1) ◽  
pp. 012048
Author(s):  
Lin Chen

Abstract With the rapid development of speech recognition technology, voice chat robot has become a breakthrough of artificial intelligence. Voice chat robot should be a typical application field of customer service, providing customers with efficient and convenient service all day. The traditional customer service center is mainly based on telephone service, facing the problems of large number of customers, high maintenance cost, slow knowledge update, limited service time, low training cost and so on. So, at the same time, the use habits of customers have also changed fundamentally. The vast majority of services and transactions can be carried out through the Internet, such as Taobao and Jingdong. However, the quality and cost of voice services can be greatly reduced through the interaction between robots and channelization voice service centers. Through the research and development of natural language technology, an intelligent and centralized mobile communication service application platform is constructed by using we-chat platform. Through natural language processing, machine learning, big data computing and other technological innovation, we focus on the use of online robot recognition to understand customer problems and timely feedback customer needs. The results show that in the statistics of customer service machine problems, the highest proportion of consumers’ problems about payment is 37%, and the lowest is 29%.


2021 ◽  
Vol 2083 (4) ◽  
pp. 042024
Author(s):  
Yikun Zhao

Abstract C language programming is more and more favoured by the majority of technical personnel in embedded systems. The application of C language technology in computer software programming can effectively avoid unnecessary language logic problems, ensure the smooth progress of programming work and effectively improve the quality and efficiency of programming. For the development of C language embedded system, the programming ideas of system software are explained, the functional module division based on hierarchical design is given, and the realization methods of project organization, program framework design, module reuse design, etc. in the software development process are clarified. To solve the contradiction between C language flexibility and application development engineering. Although it is introduced for the ARM platform, the basic experience and algorithms are also suitable for software design on other embedded platforms.


2021 ◽  
Author(s):  
Gareth Morlais

When you're making plans to get people using your language as much and as often as possible, there's a list of things related to Wikipedia which can really help. I'll share our experience with the Welsh language. Supporting the Welsh-language Wikipedia community forms Work Package 15 of 27 in the Welsh Government's Welsh Language Technology Action Plan https://gov.wales/sites/default/files/publications/2018-12/welsh-language-technology-and-digital-media-action-plan.pdf. We like supporting Welsh language Wikipedia editing workshops, video workshops and other channels that encourage people to create and publish Welsh-language video, audio, graphic and text content because we're on a mission to try to help double daily use of Welsh by 2050. I'll share developments we're funding in speech, translation and conversational AI. The partners we're giving grants to publish what they develop under open licence. So we can share what we've funded with many companies. We think Microsoft might have used some to make their new synthetic voices in Welsh. We're excited by the potential Wikidata offers. We'll look at its potential in populating Welsh maps this year. We've already used Wikipedia search data as a way of prioritising the training of a Welsh virtual assistant. Welsh may not be spending as much as Icelandic and Estonian do on language technologies, but we'd like to share what we're learning as a smaller language about the important areas to focus on and how Wikipedia can help.


2021 ◽  
Author(s):  
Maria Heuschkel Heuschkel

The European Commission is funding the project “European Language Equality” and next to 52 other partners, Wikimedia Deutschland is part of this partner consortium. The project is aiming to address the challenge that not all 24 official EU languages or the regional and minority languages in Europe have the same digital support. In order to achieve a state in which all languages have the technological support necessary for them to continue to exist and prosper in the digital age, the project partners are preparing a convincing agenda and roadmap on how to get there by 2030.  The Wikimedia movement, consisting of volunteers and organizations whose daily business is dealing with languages and language technology, is a major stakeholder for the language technology community. In order to know what it will take to get to a state of full digital equality we want the project consortium to know the pains, challenges, wishes and needs of the volunteers and communities keeping the multi-language environment of Europe alive everyday.  This lightning talk will give a short introduction to this European project and present a survey that is used to collect needs, hopes and challenges from the language technology community. We are hoping that with this the communities perspective on digital language equality will influence future programs, projects, funding and structures on an European level. 


Author(s):  
Tanmai Khanna ◽  
Jonathan N. Washington ◽  
Francis M. Tyers ◽  
Sevilay Bayatlı ◽  
Daniel G. Swanson ◽  
...  

AbstractThis paper presents an overview of Apertium, a free and open-source rule-based machine translation platform. Translation in Apertium happens through a pipeline of modular tools, and the platform continues to be improved as more language pairs are added. Several advances have been implemented since the last publication, including some new optional modules: a module that allows rules to process recursive structures at the structural transfer stage, a module that deals with contiguous and discontiguous multi-word expressions, and a module that resolves anaphora to aid translation. Also highlighted is the hybridisation of Apertium through statistical modules that augment the pipeline, and statistical methods that augment existing modules. This includes morphological disambiguation, weighted structural transfer, and lexical selection modules that learn from limited data. The paper also discusses how a platform like Apertium can be a critical part of access to language technology for so-called low-resource languages, which might be ignored or deemed unapproachable by popular corpus-based translation technologies. Finally, the paper presents some of the released and unreleased language pairs, concluding with a brief look at some supplementary Apertium tools that prove valuable to users as well as language developers. All Apertium-related code, including language data, is free/open-source and available at https://github.com/apertium.


Author(s):  
António Branco ◽  
Amália Mendes ◽  
Paulo Quaresma

This paper presents the PORTULAN CLARIN Research Infrastructure for the Science and Technology of Language, which is part of the European research infrastructure CLARIN ERIC as its Portuguese national node, and belongs to the Portuguese National Roadmap of Research Infrastructures of Strategic Relevance. The PORTULAN CLARIN includes a helpdesk, a repository, where resources, such as corpora, lexicons and processing tools are deposited for long-term archiving and can be searched and retrieved, and a workbench, where Language Technology tools and applications are made readily available online and can be used in different types of interfaces. Its goal is to contribute to the technological development of natural languages and for their preparation for the digital age, with a special focus on the Portuguese language in all its varieties and modalities.


2021 ◽  
Vol 2037 (1) ◽  
pp. 012118
Author(s):  
Zhen Gong ◽  
Danhong Chen ◽  
Yu Li ◽  
Qiuning Song ◽  
Meilin Li

Sign in / Sign up

Export Citation Format

Share Document