language technology Latest Research Papers

Critical computing approaches to K-12 computer science education aim to promote justice in computing and the wider world. Despite being intertwined with inequitable power dynamics in computing, issues of linguistic (in)justice have received less attention in critical computing. In this article, I draw on theoretical ideas from sociolinguistics and critical computing to analyze qualitative data collected in computing and technology-integrated language and humanities classes serving emergent bi/multilingual middle school students. Conversations about language, technology, and power were close at hand in focal classrooms, and surfaced in moments when students acted as users and critics of, and tinkerers with, digital tools. Students exercised agency in relation to both technology and language—using their budding understandings of language to question digital tools, and their engagements with tools to challenge traditional language ideologies. I build on past scholarship and the findings of this analysis to argue for the development of critical translingual computing education —an approach that would engage especially language-minoritized students in critical computing to build on and affirm their language practices and promote linguistic justice in computer science education, fields, and tools.

Linguistic Resources for Bhojpuri, Magahi, and Maithili: Statistics about Them, Their Similarity Estimates, and Baselines for Three Applications

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3458250 ◽

2021 ◽

Vol 20 (6) ◽

pp. 1-37

Author(s):

Rajesh Kumar Mundotiya ◽

Manish Kumar Singh ◽

Rahul Kapur ◽

Swasti Mishra ◽

Anil Kumar Singh

Keyword(s):

Language Processing ◽

Language Identification ◽

Identification Algorithm ◽

Additional Contribution ◽

Low Resource ◽

Pos Tagging ◽

High Resource ◽

Language Technology ◽

Statistical Measures ◽

Corpus Size

Corpus preparation for low-resource languages and for development of human language technology to analyze or computationally process them is a laborious task, primarily due to the unavailability of expert linguists who are native speakers of these languages and also due to the time and resources required. Bhojpuri, Magahi, and Maithili, languages of the Purvanchal region of India (in the north-eastern parts), are low-resource languages belonging to the Indo-Aryan (or Indic) family. They are closely related to Hindi, which is a relatively high-resource language, which is why we compare them with Hindi. We collected corpora for these three languages from various sources and cleaned them to the extent possible, without changing the data in them. The text belongs to different domains and genres. We calculated some basic statistical measures for these corpora at character, word, syllable, and morpheme levels. These corpora were also annotated with parts-of-speech (POS) and chunk tags. The basic statistical measures were both absolute and relative and were expected to indicate linguistic properties, such as morphological, lexical, phonological, and syntactic complexities (or richness). The results were compared with a standard Hindi corpus. For most of the measures, we tried to match the corpus size across the languages to avoid the effect of corpus size, but in some cases it turned out that using the full corpus was better, even if sizes were very different. Although the results are not very clear, we tried to draw some conclusions about the languages and the corpora. For POS tagging and chunking, the BIS tagset was used to manually annotate the data. The POS-tagged data sizes are 16,067, 14,669, and 12,310 sentences, respectively, for Bhojpuri, Magahi, and Maithili. The sizes for chunking are 9,695 and 1,954 sentences for Bhojpuri and Maithili, respectively. The inter-annotator agreement for these annotations, using Cohen’s Kappa, was 0.92, 0.64, and 0.74, respectively, for the three languages. These (annotated) corpora have been used for developing preliminary automated tools, which include POS tagger, Chunker, and Language Identifier. We have also developed the Bilingual dictionary (Purvanchal languages to Hindi) and a Synset (that can be integrated later in the Indo-WordNet) as additional resources. The main contribution of the work is the creation of basic resources for facilitating further language processing research for these languages, providing some quantitative measures about them and their similarities among themselves and with Hindi. For similarities, we use a somewhat novel measure of language similarity based on an n-gram-based language identification algorithm. An additional contribution is providing baselines for three basic NLP applications (POS tagging, chunking, and language identification) for these closely related languages.

Research on Application of Computer Recognition Technology in C Language Programming Modeling System

Journal of Physics Conference Series ◽

10.1088/1742-6596/2083/4/042024 ◽

2021 ◽

Vol 2083 (4) ◽

pp. 042024

Author(s):

Yikun Zhao

Keyword(s):

Embedded System ◽

Functional Module ◽

Computer Software ◽

Application Development ◽

Project Organization ◽

C Language ◽

Framework Design ◽

Language Technology ◽

Research On Application ◽

Development Engineering

Abstract C language programming is more and more favoured by the majority of technical personnel in embedded systems. The application of C language technology in computer software programming can effectively avoid unnecessary language logic problems, ensure the smooth progress of programming work and effectively improve the quality and efficiency of programming. For the development of C language embedded system, the programming ideas of system software are explained, the functional module division based on hierarchical design is given, and the realization methods of project organization, program framework design, module reuse design, etc. in the software development process are clarified. To solve the contradiction between C language flexibility and application development engineering. Although it is introduced for the ARM platform, the basic experience and algorithms are also suitable for software design on other embedded platforms.

Power Intelligent Customer Service Robot Based on Artificial Intelligence

Journal of Physics Conference Series ◽

10.1088/1742-6596/2066/1/012048 ◽

2021 ◽

Vol 2066 (1) ◽

pp. 012048

Author(s):

Lin Chen

Keyword(s):

Artificial Intelligence ◽

Natural Language ◽

Customer Service ◽

Rapid Development ◽

Service Robot ◽

Maintenance Cost ◽

Typical Application ◽

Language Technology ◽

Voice Chat ◽

Number Of Customers

Abstract With the rapid development of speech recognition technology, voice chat robot has become a breakthrough of artificial intelligence. Voice chat robot should be a typical application field of customer service, providing customers with efficient and convenient service all day. The traditional customer service center is mainly based on telephone service, facing the problems of large number of customers, high maintenance cost, slow knowledge update, limited service time, low training cost and so on. So, at the same time, the use habits of customers have also changed fundamentally. The vast majority of services and transactions can be carried out through the Internet, such as Taobao and Jingdong. However, the quality and cost of voice services can be greatly reduced through the interaction between robots and channelization voice service centers. Through the research and development of natural language technology, an intelligent and centralized mobile communication service application platform is constructed by using we-chat platform. Through natural language processing, machine learning, big data computing and other technological innovation, we focus on the use of online robot recognition to understand customer problems and timely feedback customer needs. The results show that in the statistics of customer service machine problems, the highest proportion of consumers’ problems about payment is 37%, and the lowest is 29%.

Picking the crucial language technologies and how Wikipedia can help: the Welsh experience

Septentrio Conference Series ◽

10.7557/5.6204 ◽

2021 ◽

Author(s):

Gareth Morlais

Keyword(s):

Digital Media ◽

Action Plan ◽

Work Package ◽

Speech Translation ◽

Language Technology ◽

Welsh Language ◽

Search Data ◽

Text Content ◽

Synthetic Voices ◽

Language Technologies

When you're making plans to get people using your language as much and as often as possible, there's a list of things related to Wikipedia which can really help. I'll share our experience with the Welsh language. Supporting the Welsh-language Wikipedia community forms Work Package 15 of 27 in the Welsh Government's Welsh Language Technology Action Plan https://gov.wales/sites/default/files/publications/2018-12/welsh-language-technology-and-digital-media-action-plan.pdf. We like supporting Welsh language Wikipedia editing workshops, video workshops and other channels that encourage people to create and publish Welsh-language video, audio, graphic and text content because we're on a mission to try to help double daily use of Welsh by 2050. I'll share developments we're funding in speech, translation and conversational AI. The partners we're giving grants to publish what they develop under open licence. So we can share what we've funded with many companies. We think Microsoft might have used some to make their new synthetic voices in Welsh. We're excited by the potential Wikidata offers. We'll look at its potential in populating Welsh maps this year. We've already used Wikipedia search data as a way of prioritising the training of a Welsh virtual assistant. Welsh may not be spending as much as Icelandic and Estonian do on language technologies, but we'd like to share what we're learning as a smaller language about the important areas to focus on and how Wikipedia can help.

How the ELE projects aims to bring the voices of the Wikimedia communities on Digital Language Equality into future European Commission Programs

Septentrio Conference Series ◽

10.7557/5.5941 ◽

2021 ◽

Author(s):

Maria Heuschkel Heuschkel

Keyword(s):

European Commission ◽

Minority Languages ◽

European Level ◽

European Language ◽

Short Introduction ◽

Technological Support ◽

Language Technology ◽

Major Stakeholder ◽

Language Environment ◽

Digital Language

The European Commission is funding the project “European Language Equality” and next to 52 other partners, Wikimedia Deutschland is part of this partner consortium. The project is aiming to address the challenge that not all 24 official EU languages or the regional and minority languages in Europe have the same digital support. In order to achieve a state in which all languages have the technological support necessary for them to continue to exist and prosper in the digital age, the project partners are preparing a convincing agenda and roadmap on how to get there by 2030. The Wikimedia movement, consisting of volunteers and organizations whose daily business is dealing with languages and language technology, is a major stakeholder for the language technology community. In order to know what it will take to get to a state of full digital equality we want the project consortium to know the pains, challenges, wishes and needs of the volunteers and communities keeping the multi-language environment of Europe alive everyday. This lightning talk will give a short introduction to this European project and present a survey that is used to collect needs, hopes and challenges from the language technology community. We are hoping that with this the communities perspective on digital language equality will influence future programs, projects, funding and structures on an European level.

Recent advances in Apertium, a free/open-source rule-based machine translation platform for low-resource languages

Machine Translation ◽

10.1007/s10590-021-09260-6 ◽

2021 ◽

Author(s):

Tanmai Khanna ◽

Jonathan N. Washington ◽

Francis M. Tyers ◽

Sevilay Bayatlı ◽

Daniel G. Swanson ◽

...

Keyword(s):

Open Source ◽

Machine Translation ◽

Lexical Selection ◽

Rule Based ◽

Low Resource ◽

Language Technology ◽

Language Data ◽

Recursive Structures ◽

Platform Translation ◽

Free Open Source

AbstractThis paper presents an overview of Apertium, a free and open-source rule-based machine translation platform. Translation in Apertium happens through a pipeline of modular tools, and the platform continues to be improved as more language pairs are added. Several advances have been implemented since the last publication, including some new optional modules: a module that allows rules to process recursive structures at the structural transfer stage, a module that deals with contiguous and discontiguous multi-word expressions, and a module that resolves anaphora to aid translation. Also highlighted is the hybridisation of Apertium through statistical modules that augment the pipeline, and statistical methods that augment existing modules. This includes morphological disambiguation, weighted structural transfer, and lexical selection modules that learn from limited data. The paper also discusses how a platform like Apertium can be a critical part of access to language technology for so-called low-resource languages, which might be ignored or deemed unapproachable by popular corpus-based translation technologies. Finally, the paper presents some of the released and unreleased language pairs, concluding with a brief look at some supplementary Apertium tools that prove valuable to users as well as language developers. All Apertium-related code, including language data, is free/open-source and available at https://github.com/apertium.

Infraestrutura de Investigação para a Ciência e Tecnologia da Linguagem - PORTULAN CLARIN

Revista da Associação Portuguesa de Linguística ◽

10.26334/2183-9077/rapln8ano2021a5 ◽

2021 ◽

pp. 54-70

Author(s):

António Branco ◽

Amália Mendes ◽

Paulo Quaresma

Keyword(s):

Technological Development ◽

Digital Age ◽

Research Infrastructure ◽

Special Focus ◽

European Research ◽

Natural Languages ◽

Language Technology ◽

Research Infrastructures ◽

Different Types

This paper presents the PORTULAN CLARIN Research Infrastructure for the Science and Technology of Language, which is part of the European research infrastructure CLARIN ERIC as its Portuguese national node, and belongs to the Portuguese National Roadmap of Research Infrastructures of Strategic Relevance. The PORTULAN CLARIN includes a helpdesk, a repository, where resources, such as corpora, lexicons and processing tools are deposited for long-term archiving and can be searched and retrieved, and a workbench, where Language Technology tools and applications are made readily available online and can be used in different types of interfaces. Its goal is to contribute to the technological development of natural languages and for their preparation for the digital age, with a special focus on the Portuguese language in all its varieties and modalities.

Why ASR + NLP isn't enough for commercial language technology

The Journal of the Acoustical Society of America ◽

10.1121/10.0008537 ◽

2021 ◽

Vol 150 (4) ◽

pp. A347-A347

Author(s):

Rachael Tatman

Keyword(s):

Language Technology

Upgrade and Optimization of Natural language Technology of unmanned Distribution car by using Mathematical Model

Journal of Physics Conference Series ◽

10.1088/1742-6596/2037/1/012118 ◽

2021 ◽

Vol 2037 (1) ◽

pp. 012118

Author(s):

Zhen Gong ◽

Danhong Chen ◽

Yu Li ◽

Qiuning Song ◽

Meilin Li

Keyword(s):

Mathematical Model ◽

Natural Language ◽

Language Technology

language technology
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

“Los Programadores Debieron Pensarse Como Dos Veces”: Exploring the Intersections of Language, Power, and Technology with Bi/Multilingual Students

Linguistic Resources for Bhojpuri, Magahi, and Maithili: Statistics about Them, Their Similarity Estimates, and Baselines for Three Applications

Research on Application of Computer Recognition Technology in C Language Programming Modeling System

Power Intelligent Customer Service Robot Based on Artificial Intelligence

Picking the crucial language technologies and how Wikipedia can help: the Welsh experience

How the ELE projects aims to bring the voices of the Wikimedia communities on Digital Language Equality into future European Commission Programs

Recent advances in Apertium, a free/open-source rule-based machine translation platform for low-resource languages

Infraestrutura de Investigação para a Ciência e Tecnologia da Linguagem - PORTULAN CLARIN

Why ASR + NLP isn't enough for commercial language technology

Upgrade and Optimization of Natural language Technology of unmanned Distribution car by using Mathematical Model

Export Citation Format

language technologyRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

“Los Programadores Debieron Pensarse Como Dos Veces”: Exploring the Intersections of Language, Power, and Technology with Bi/Multilingual Students

Linguistic Resources for Bhojpuri, Magahi, and Maithili: Statistics about Them, Their Similarity Estimates, and Baselines for Three Applications

Research on Application of Computer Recognition Technology in C Language Programming Modeling System

Power Intelligent Customer Service Robot Based on Artificial Intelligence

Picking the crucial language technologies and how Wikipedia can help: the Welsh experience

How the ELE projects aims to bring the voices of the Wikimedia communities on Digital Language Equality into future European Commission Programs

Recent advances in Apertium, a free/open-source rule-based machine translation platform for low-resource languages

Infraestrutura de Investigação para a Ciência e Tecnologia da Linguagem - PORTULAN CLARIN

Why ASR + NLP isn't enough for commercial language technology

Upgrade and Optimization of Natural language Technology of unmanned Distribution car by using Mathematical Model

language technology
Recently Published Documents