Challenges of releasing audio material for spoken data: The case of the London-Lund Corpus 2

This article aims to describe key challenges of preparing and releasing audio material for spoken data and to propose solutions to these challenges. We draw on our experience of compiling the new London-Lund Corpus 2 (LLC-2), where transcripts are released together with the audio files. However, making the audio material publicly available required careful consideration of how to, most effectively, 1) align the transcripts with the audio and 2) anonymise personal information in the recordings. First, audio-to-text alignment was solved through the insertion of timestamps in front of speaker turns in the transcription stage, which, as we show in the article, may later be used as a valuable complement to more robust automatic segmentation. Second, anonymisation was done by means of a Praat script, which replaced all personal information with a sound that made the lexical information incomprehensible but retained the prosodic characteristics. The public release of the LLC-2 audio material is a valuable feature of the corpus that allows users to extend the corpus data relative to their own research interests and, thus, broaden the scope of corpus linguistics. To illustrate this, we present three studies that have successfully used the LLC-2 audio material.

Download Full-text

A Legal Study on Data Risk Management in the Public Sector - Focused on Personal Information Risk Associated with Opening up Public Data -

ADMINISTRATIVE LAW JOURNAL ◽

10.35979/alj.2020.02.60.165 ◽

2020 ◽

Vol 60 ◽

pp. 165-190

Author(s):

Eunjeong Kwon

Keyword(s):

Risk Management ◽

Public Sector ◽

Personal Information ◽

Legal Study ◽

Information Risk ◽

The Public ◽

Public Data ◽

Opening Up

Download Full-text

The Public Domain: Surveillance in Everyday Life

Surveillance & Society ◽

10.24908/ss.v9i4.4342 ◽

2012 ◽

Vol 9 (4) ◽

pp. 378-393 ◽

Cited By ~ 133

Author(s):

Alice Marwick

Keyword(s):

Social Media ◽

Personal Information ◽

Social Contexts ◽

Social Boundaries ◽

Power Relationships ◽

The Public ◽

Ethnographic Studies ◽

Power Differentials ◽

And Behavior ◽

Traditional Surveillance

People create profiles on social network sites and Twitter accounts against the background of an audience. This paper argues that closely examining content created by others and looking at one’s own content through other people’s eyes, a common part of social media use, should be framed as social surveillance. While social surveillance is distinguished from traditional surveillance along three axes (power, hierarchy, and reciprocity), its effects and behavior modification is common to traditional surveillance. Drawing on ethnographic studies of United States populations, I look at social surveillance, how it is practiced, and its impact on people who engage in it. I use Foucault’s concept of capillaries of power to demonstrate that social surveillance assumes the power differentials evident in everyday interactions rather than the hierarchical power relationships assumed in much of the surveillance literature. Social media involves a collapse of social contexts and social roles, complicating boundary work but facilitating social surveillance. Individuals strategically reveal, disclose and conceal personal information to create connections with others and tend social boundaries. These processes are normal parts of day-to-day life in communities that are highly connected through social media.

Download Full-text

Making corpus data visible: visualising text with research intermediaries

Corpora ◽

10.3366/cor.2017.0128 ◽

2017 ◽

Vol 12 (3) ◽

pp. 459-482 ◽

Cited By ~ 3

Author(s):

William Allen

Keyword(s):

Corpus Linguistics ◽

Communication Strategies ◽

Knowledge Exchange ◽

Technical Aspects ◽

Digital Methods ◽

Corpus Data ◽

Practical Factors ◽

Project Objectives

Researchers using corpora can visualise their data and analyses using a growing number of tools. Visualisations are especially valuable in environments where researchers communicate and work with public-facing partners under the auspices of ‘knowledge exchange’ or ‘impact’, and corpus data are more available thanks to digital methods. However, although the field of corpus linguistics continues to generate its own range of techniques, it largely remains orientated towards finding ways for academics to communicate results directly with other academics rather than with or through groups outside universities. Also, there is a lack of discussion about how communication, motivations and values also feature in the process of making corpus data visible. My argument is that these sociocultural and practical factors also influence visualisation outputs alongside technical aspects. I draw upon two corpus-based projects about press portrayal of migrants, conducted by an intermediary organisation that links university researchers with users outside academia. Analysing these projects' visualisation outputs in their organisational and communication contexts produces key lessons for researchers wanting to visualise text; consider the aims and values of partners; develop communication strategies that acknowledge different areas of expertise; and link visualisation choices with wider project objectives.

Download Full-text

Phraseographie

HERMES - Journal of Language and Communication in Business ◽

10.7146/hjlcb.v19i36.25841 ◽

2017 ◽

Vol 19 (36) ◽

pp. 91

Author(s):

Erla Hallsteinsdóttir

Keyword(s):

Language Learning ◽

Language Learners ◽

Corpus Linguistics ◽

Point Of View ◽

Methodological Basis ◽

Multiword Expressions ◽

Interesting Part ◽

Corpus Data

Multiword expressions – i.e. phraseological units – like idioms and collocations are one of the most interesting part of every language. In this article, I investigate phraseological units from a lexicographical point of view. I discuss the theoretical and methodological basis of phraseography as a discipline that includes aspects of lexicography, phraseology, corpus linguistics and theories of language learning. I demonstrate the importance of corpora as a source for the lexicographer and the use of corpus data. I also discuss the requirements for the lexicographical treatment of phraseological units by the compilation of a phraseological database for language learners in relation to their assumed needs that have already been described in detail.

Download Full-text

Development of Personal Information Privacy Concerns Evaluation

Encyclopedia of Information Science and Technology, Fourth Edition ◽

10.4018/978-1-5225-2255-3.ch421 ◽

2018 ◽

pp. 4862-4871

Author(s):

Anna Rohunen ◽

Jouni Markkula

Keyword(s):

Service Providers ◽

Personal Information ◽

Personal Data ◽

Future Research ◽

Privacy Concerns ◽

The Public ◽

Data Intensive ◽

Information Privacy Concerns ◽

Evaluation Instruments ◽

Information And Communication

Personal data is increasingly collected with the support of rapidly advancing information and communication technology, which raises privacy concerns among data subjects. In order to address these concerns and offer the full benefits of personal data intensive services to the public, service providers need to understand how to evaluate privacy concerns in evolving service contexts. By analyzing the earlier used privacy concerns evaluation instruments, we can learn how to adapt them to new contexts. In this article, the historical development of the most widely used privacy concerns evaluation instruments is presented and analyzed regarding privacy concerns' dimensions. Privacy concerns' core dimensions, and the types of context dependent dimensions, to be incorporated into evaluation instruments are identified. Following this, recommendations on how to utilize the existing evaluation instruments are given, as well as suggestions for future research dealing with validation and standardization of the instruments.

Download Full-text

Language and Disciplinary Concepts in Corpus Linguistics: Investigating Corpus Data

LSP International Journal ◽

10.11113/lspi.v8.17972 ◽

2021 ◽

Vol 8 (2) ◽

pp. 79-91

Author(s):

Zuraidah Mohd Don ◽

Gerry Knowles

Keyword(s):

Corpus Linguistics ◽

Digital Humanities ◽

The Past ◽

Language Classroom ◽

Corpus Data ◽

Working Device ◽

Technical Terms

This paper is intended for researchers involved in or contemplating research in corpus linguistics, and is concerned in particular with the language of corpus linguistics. It introduces and explains technical terms in the context in which they are normally used. Technical terms lead on to the concepts to which they refer, and the concepts are related to the procedures, including tagging and parsing, by which they are implemented. English and Malay are used as the languages of illustration, and for the benefit of readers who do not know Malay, Malay examples are translated into English. The paper has a historical dimension, and the language of corpus linguistics is traced to traditional usage in the language classroom, and in particular to the study of Latin in Europe. The inheritance from the past is evident in the design of MaLex, which is a working device that does empirical Malay corpus linguistics, and is presented here as a contribution to the digital humanities.

Download Full-text

What do (some of) our association measures measure (most)? Association?

Journal of Second Language Studies ◽

10.1075/jsls.21028.gri ◽

2021 ◽

Author(s):

Stefan Th. Gries

Keyword(s):

Corpus Linguistics ◽

Odds Ratio ◽

Association Measure ◽

Measures Of Association ◽

Association Measures ◽

Log Odds ◽

Dispersion Measures ◽

Corpus Data ◽

Behavior Supports ◽

True Association

Abstract This paper discusses the degree to which some of the most widely-used measures of association in corpus linguistics are not particularly valid in the sense of actually measuring association rather than some amalgam of a lot of frequency and a little association. The paper demonstrates these issues on the basis of hypothetical and actual corpus data and outlines implications of the findings. I then outline how to design an association measure that only measures association and show that its behavior supports the use of the log odds ratio as a true association-only measure but separately from frequency; in addition, this paper sets the stage for an analogous review of dispersion measures in corpus linguistics.

Download Full-text

Next Generation E-Government

Advances in Electronic Government, Digital Divide, and Regional Development - Emerging Mobile and Web 2.0 Technologies for Connected E-Government ◽

10.4018/978-1-4666-6082-3.ch006 ◽

2014 ◽

pp. 124-146

Author(s):

Maria Moloney ◽

Gary Coyle

Keyword(s):

Private Information ◽

Personal Information ◽

Future Internet ◽

The Other ◽

Digital Society ◽

The Public ◽

Traditional Sense ◽

Open Discussion ◽

The One ◽

Evolving Model

The evolving model of the Future Internet has, at its heart, the users of the Internet. Web 2.0 and Government 2.0 initiatives help citizens communicate even better with their governments. Such initiatives have the potential to empower citizens by giving them a stronger voice in both the traditional sense and in the digital society. Pressure is mounting on governments to listen to the voice of the public expressed through these technologies and incorporate their needs into public policy. On the other hand, governments still have a duty to protect their citizens' personal information against unlawful and malicious intent. This responsibility is essential to any government in an age where there is an increasing burden on citizens to interact with governments via electronic means. This chapter examines this dual agenda of modern governments to engage with its citizens, on the one hand, to encourage transparency and open discussion, and to provide digitally offered public services that require the protection of citizens' private information, on the other. In this chapter, it is argued that a citizen-centric approach to online privacy protection that works in tandem with the open government agenda will provide a unified mode of interaction between citizens, businesses, and governments in digital society.

Download Full-text

Behind the Scenes with the Snowden Files

National Security, Leaks and Freedom of the Press ◽

10.1093/oso/9780197519387.003.0007 ◽

2021 ◽

pp. 105-122

Author(s):

Ellen Nakashima

Keyword(s):

National Security ◽

Civil Liberties ◽

Careful Consideration ◽

Right To Know ◽

Washington Post ◽

The Public ◽

The Media ◽

The Government ◽

Duty To Inform ◽

Push And Pull

This essay examines how the Washington Post dealt with the tension between its duty to inform the public and its desire to protect national security when it received documents leaked by Edward Snowden. The essay describes the push-and-pull between the media and the government. Journalists try to advance the public’s right to know, particularly about potential government encroachment on civil liberties, and the government tries to defend the security of the country while respecting civil liberties. Reporters with a bias for public disclosure voluntarily withhold certain documents and details based on a careful consideration of harm, and intelligence officials with a bias toward secrecy do not fight every disclosure. The Post’s coverage of the Snowden leaks provides an opportunity to gain insights into how to navigate the inevitable conflicts between journalists’ desire to inform the public and the government’s desire to protect its secrets from foreign powers.

Download Full-text

Privacy in the 21st Century

Standards and Standardization ◽

10.4018/978-1-4666-8111-8.ch075 ◽

2015 ◽

pp. 1638-1652

Author(s):

Panagiotis Kitsos ◽

Aikaterini Yannoukakou

Keyword(s):

Public Interest ◽

Focal Point ◽

Personal Information ◽

Open Data ◽

Personal Data ◽

Common Denominator ◽

Patriot Act ◽

The Public ◽

Personal Data Protection ◽

The Common

The events of 9/11 along with the bombarding in Madrid and London forced governments to resort to new structures of privacy safeguarding and electronic surveillance under the common denominator of terrorism and transnational crime fighting. Legislation as US PATRIOT Act and EU Data Retention Directive altered fundamentally the collection, processing and sharing methods of personal data, while it granted increased powers to police and law enforcement authorities concerning their jurisdiction in obtaining and processing personal information to an excessive degree. As an aftermath of the resulted opacity and the public outcry, a shift is recorded during the last years towards a more open governance by the implementation of open data and cloud computing practices in order to enhance transparency and accountability from the side of governments, restore the trust between the State and the citizens, and amplify the citizens' participation to the decision-making procedures. However, privacy and personal data protection are major issues in all occasions and, thus, must be safeguarded without sacrificing national security and public interest on one hand, but without crossing the thin line between protection and infringement on the other. Where this delicate balance stands, is the focal point of this paper trying to demonstrate that it is better to be cautious with open practices than hostage of clandestine practices.

Download Full-text