Corpora as game changers: The growing impact of corpus tools for dictionary makers and users

English Today ◽  
2015 ◽  
Vol 32 (2) ◽  
pp. 24-30 ◽  
Author(s):  
Reinhard Heuberger

English lexicography is undergoing a transformation so profound that both dictionary makers and users need new strategies to cope with the challenges of today's technologies and to take full advantage of their potential. Rundell has rightly stated that dictionaries have finally found their ideal platform in the electronic medium (2012: 15), which provides quicker and more sophisticated access to large data collections that are no longer subject to space restrictions. But the innovations go far beyond storage space and ease of access - customization, hybridization and user-input are amongst the most promising trends in electronic lexicography. Customization means that dictionaries can be adaptable, i.e. manually customized by the user, or even adaptive, i.e. automatically adapted to users’ needs on the basis of their behaviour (Granger, 2012: 4). Paquot lists genre, domain as well as L1 as examples of fruitful areas for customization (2012: 185). In the electronic medium, the barriers between different language resources such as dictionaries, encyclopaedias, databases, writing aids and translation tools are disappearing, a development referred to as hybridization (Granger, 2012: 4). And the concept of user-input is exemplified by the well-known platforms Wiktionary and Urban Dictionary, both of which are online reference works based on contributions by users.

2020 ◽  
Vol 33 (4) ◽  
pp. 404-416
Author(s):  
Reinhard Heuberger

Abstract Online dictionaries provide unique possibilities to both dictionary makers and users, in particular in the following areas (cf. Granger 2012: 4ff): Accessibility of data, multimedia functions, customization, hybridization, user-input and storage space. This article investigates the extent to which these opportunities have been exhausted in current online learner’s dictionaries. It demonstrates that the vast technological opportunities of the internet are only beginning to be fully exploited. While storage space, for example, is already being used effectively to provide additional example sentences and collocations, the dictionaries under investigation offer partly unsatisfactory functionality in terms of data accessibility and several other areas.


2019 ◽  
Vol 7 (2) ◽  
pp. 66-77
Author(s):  
Fathu Rahman ◽  
M Amir P ◽  
Tammasse

This research investigated the trends in reading literary fiction by students of Hasanuddin University and their main reasons for reading works of fiction. Reading tendencies were grouped into types, reading of fiction in print and fiction in electronic (cyber) media. The purposes of this study were: 1) to quantify the literary fiction reading media preferred by students; 2) to identify specific reasons for their choice of media; 3) to identify perceived personal benefits obtained from reading literary fiction, and 4) to evaluate readers’ personal choices in terms of contents. The majority of students preferred to read using electronic media (62%), although a substantial majority preferred the classical printed book format (38%). The reasons given for preferring cyber literature (defined as works of fiction presented in an electronic medium) to printed literature were mainly practical, such as ease of access using electronic devices (tablets, computers, smartphones, etc.) as well as capacity and versatility, and that one multi-functional device can hold many books or other reading media. This research indicates that young people view reading fiction not only as entertainment, but also as a valuable and rewarding activity. The trend towards electronic media provides a growing and increasingly used opportunity for casual readers and enthusiasts to access and enjoy a wide cross-section of literary fiction.


Author(s):  
Saranya N. ◽  
Saravana Selvam

After an era of managing data collection difficulties, these days the issue has turned into the problem of how to process these vast amounts of information. Scientists, as well as researchers, think that today, probably the most essential topic in computing science is Big Data. Big Data is used to clarify the huge volume of data that could exist in any structure. This makes it difficult for standard controlling approaches for mining the best possible data through such large data sets. Classification in Big Data is a procedure of summing up data sets dependent on various examples. There are distinctive classification frameworks which help us to classify data collections. A few methods that discussed in the chapter are Multi-Layer Perception Linear Regression, C4.5, CART, J48, SVM, ID3, Random Forest, and KNN. The target of this chapter is to provide a comprehensive evaluation of classification methods that are in effect commonly utilized.


2011 ◽  
pp. 877-891
Author(s):  
Katrin Weller ◽  
Isabella Peters ◽  
Wolfgang G. Stock

This chapter discusses folksonomies as a novel way of indexing documents and locating information based on user generated keywords. Folksonomies are considered from the point of view of knowledge organization and representation in the context of user collaboration within the Web 2.0 environments. Folksonomies provide multiple benefits which make them a useful indexing method in various contexts; however, they also have a number of shortcomings that may hamper precise or exhaustive document retrieval. The position maintained is that folksonomies are a valuable addition to the traditional spectrum of knowledge organization methods since they facilitate user input, stimulate active language use and timeliness, create opportunities for processing large data sets, and allow new ways of social navigation within document collections. Applications of folksonomies as well as recommendations for effective information indexing and retrieval are discussed.


2019 ◽  
Vol 56 (4) ◽  
pp. 604-614 ◽  
Author(s):  
Yuri M Zhukov ◽  
Christian Davenport ◽  
Nadiya Kostyuk

Researchers today have access to an unprecedented amount of geo-referenced, disaggregated data on political conflict. Because these new data sources use disparate event typologies and units of analysis, findings are rarely comparable across studies. As a result, we are unable to answer basic questions like ‘what does conflict A tell us about conflict B?’ This article introduces xSub – a ‘database of databases’ for disaggregated research on political conflict ( www.x-sub.org ). xSub reduces barriers to comparative subnational research, by empowering researchers to quickly construct custom, analysis-ready datasets. xSub currently features subnational data on conflict in 156 countries, from 21 sources, including large data collections and data from individual scholars. To facilitate comparisons across countries and sources, xSub organizes these data into consistent event categories, actors, spatial units (country, province, district, grid cell, electoral constituency), and time units (year, month, week, and day). This article introduces xSub and illustrates its potential, by investigating the impact of repression on dissent across thousands of subnational datasets.


Geophysics ◽  
1995 ◽  
Vol 60 (5) ◽  
pp. 1354-1364 ◽  
Author(s):  
Glenn W. Bear ◽  
Haydar J. Al‐Shukri ◽  
Albert J. Rudman

We have developed an improved Levenburg‐Marquart technique to rapidly invert Bouguer gravity data for a 3-D density distribution as a source of the observed field. This technique is designed to replace tedious forward modeling with an automatic solver that determines density models constrained by geologic information supplied by the user. Where such information is not available, objective models are generated. The technique estimates the density distribution within the source volume using a least‐squares inverse solution that is obtained iteratively by singular value decomposition using orthogonal decomposition of matrices with sequential Householder transformations. The source volume is subdivided into a series of right rectangular prisms of specified size but of unknown density. This discretization allows the construction of a system of linear equations relating the observed gravity field to the unknown density distribution. Convergence of the solution to the system is tightly controlled by a damping parameter which may be varied at each iteration. The associated algorithm generates statistical measures of solution quality not available with most forward methods. Along with the ability to handle large data sets within reasonable time constraints, the advantages of this approach are: (1) the ease with which pre‐existing geological information can be included to constrain the solution, (2) its minimization of subjective user input, (3) the avoidance of difficulties encountered during wavenumber domain transformations, and (4) the objective nature of the solution. Application to a gravity data set from Hamilton County, Indiana, has yielded a geologically reasonable result that agrees with published models derived from interpretation of gravity, magnetic, seismic, and drilling data.


2013 ◽  
Vol 380-384 ◽  
pp. 2367-2370
Author(s):  
Sai Feng Zeng ◽  
Li Gu Zhu ◽  
Lei Zhang

In the traditional file system, The lack of contact between the file name and the data content led to waste a lot of availability storage space, especially in large data archive storage system. This paper designs and implements a scalable multimedia archive storage system, called IMCAS. IMCAS use CAS store the archive data and use real-time monitor program automatically extract metadata information. through dynamic loading extraction module, the system can support variety of multimedia form.


1998 ◽  
Vol 54 (6) ◽  
pp. 1178-1182 ◽  
Author(s):  
Manfred Hendlich

Recent advances in experimental techniques have led to an enormous explosion of available data about protein–ligand complexes. To exploit the information that is hidden in these large data, collection tools for managing and accessing huge data collections are needed. This paper discusses databases for protein–ligand data which are accessibleviathe World Wide Web. A strong focus is placed on the ReLiBase database system which is a new three-dimensional database for storing and analysing structures of protein–ligand complexes currently deposited in the Brookhaven Protein Data Bank (PDB). ReLiBase contains efficient query tools for identifying and analysing ligands and protein–ligand complexes. Its application for structure-based drug design is illustrated.


2012 ◽  
Vol 9 (7) ◽  
pp. 8039-8073
Author(s):  
T. Tanhua ◽  
R. F. Keeling

Abstract. Increasing concentrations of dissolved inorganic carbon (DIC) in the interior ocean is expected as a direct consequence of increasing concentrations of CO2 in the atmosphere. This extra DIC is often referred to as anthropogenic carbon (Cant), and its inventory, or increase rate, in the interior ocean has previously been estimated by a multitude of observational approaches. Each of these methods are associated with hard to test assumptions since Cant cannot be directly observed. Results from a simpler concept with few assumptions applied to the Atlantic Ocean are reported on here using two large data collections of carbon relevant bottle data. The change in column inventory on decadal time scales, i.e. the storage rate, of DIC, respiration compensated DIC and oxygen is calculated for the Atlantic Ocean. The average storage rates for DIC and oxygen is calculated to 0.72 ± 1.22 (95% confidence interval of the mean trend: 0.65–0.78) mol m−2 yr−1 and −0.54 ± 1.64 (95% confidence interval of the mean trend: –0.64–(−0.45)) mol m−2 yr−1, respectively, for the Atlantic Ocean, where the uncertainties reflect station-to-station variability and where the mean trends are non-zero at the 95% confidence level. The standard deviation mainly reflects uncertainty due to regional variations, whereas the confidence interval reflects the mean trend. The storage rates are similar to changes found by other studies, although with large uncertainty. For the subpolar North Atlantic the storage rates show significant temporal variation of all variables. This seems to be due to variations in the prevalence of subsurface water masses with different DIC concentrations leading to sometimes different signs of storage rates for DIC and Cant. This study suggest that accurate assessment of the uptake of CO2 by the oceans will require accounting not only for processes that influence Cant but also additional processes that modify CO2 storage.


Sign in / Sign up

Export Citation Format

Share Document