Academic Vocabulary Use in Doctoral Theses: A Corpus-Based Lexical Analysis of Academic Word List (AWL) in Major Scientific Disciplinary Groups

Since the development of academic word list (AWL) by Coxhead (2000), multiple studies have attempted to investigate its effectiveness and relevance of the included academic vocabulary in the texts or corpora of various academic fields, disciplines, subjects and also in multiple academic genres and registers. Similarly, this study also aims at investigating the text coverage of Coxhead’s (2000) AWL in Pakistani doctoral theses of two major scientific disciplinary groups (Biological & health sciences as well as Physical sciences); furthermore the study also analyses the frequency of the AWL word families to extract the most frequent word families in the theses texts. In order to achieve this goal, a pre-built corpus of Pakistani doctoral theses (PAKDTh) (Aziz, 2016) comprises of 200 doctoral theses from two major scientific disciplinary groups was used as textual data. Using concordance software AntConc version 3.4.4 (Anthony, 2016), computer-driven data analysis revealed that in total 8.76% (496839 words) of the text in Pakistani doctoral thesis corpus is covered by the AWL words. Further distributing the analysis per sub-lists, shows that the first three sub-lists of AWL accounted for almost 57% of the whole text coverage. An attempt was made to further analyze the AWL text coverage by considering the frequency of occurrences in terms of word families. The findings showed that among 570- word families of Coxhead’s (2000) AWL, 550-word families with the sum of 96.49% are found to occur more than 10 times in PAKDTh corpus, which are taken as word families used in the corpus. This study concludes that Coxhead’s (2000) AWL is proved effective for the writing of theses. On the basis of the findings, further possible academic implications are discussed in detail.

Download Full-text

Academic Words in the English Research Article Abstracts: the Coverage and Frequency

Vision Journal for Language and Foreign Language Learning ◽

10.21580/vjv8i23935 ◽

2019 ◽

Vol 8 (2) ◽

pp. 133

Author(s):

Fransiskus Jemadi ◽

Fatmawati . ◽

Priska Filomena Iku

Keyword(s):

Word List ◽

Research Articles ◽

Academic Vocabulary ◽

Corpus Study ◽

Academic Texts ◽

Word Families ◽

Research Article ◽

Lexical Items

The present study aimed at exploring the abstracts of research articles written by non-native English researchers to uncover the specific characteristics of academic vocabulary employed in the English research articles abstracts.It focuses on frequency and coverage distribution of the words from the Academic Word List (Coxhead, 2000) in the abstracts of research articles. The source of data for this corpus study was gathered from 97 abstracts written by the EFL researchers and published by the Journal Pendidikan dan Kebudayaan Missio at STKIP St. Paulus Ruteng from 2015 until 2018. The results of this study revealed that the coverage of K1, the first most frequent 1000 English words, is the most dominant lexical items applied by the researchers. It covered 71.33% of the texts. The representation of lexical items that belong to K2, the second most frequent 1000 English words, covered 5.44% of all the words used by the writers in their abstracts. Moreover, the presence of Academic Word List, which refers to a list of 570 word families that are commonly found in academic texts and Off-list, which refers to the words that do not belong to K1 or K2 because it is related to certain field, has slight difference over all of the texts where the former covers 11.95% and the later covers 11.26%. As far as the findings of the present study are concerned, the room for some improvements on academic words applied in the abstracts need to pay attention.

Download Full-text

Quantitative Methods of Data Analysis for the Physical Sciences and Engineering

10.1017/9781139342568 ◽

2018 ◽

Cited By ~ 3

Author(s):

Douglas G. Martinson

Keyword(s):

Data Analysis ◽

Quantitative Methods ◽

Physical Sciences

Download Full-text

Bayesian Logical Data Analysis for the Physical Sciences

10.1017/cbo9780511791277 ◽

2005 ◽

Cited By ~ 453

Author(s):

Phil Gregory

Keyword(s):

Data Analysis ◽

Physical Sciences ◽

Logical Data

Download Full-text

COVID-19 Pandemic Is About More than Health: A State of Governance Catastrophe in Bangladesh

South Asian Survey ◽

10.1177/0971523121993344 ◽

2021 ◽

pp. 097152312199334

Author(s):

Khandakar Farid Uddin

Keyword(s):

Data Analysis ◽

Qualitative Methods ◽

Economic Recovery ◽

Journal Articles ◽

Governance Model ◽

Textual Data ◽

Governance Strategies ◽

Spread Of Infection

Governance can help minimise the effects of catastrophes. Countries had some time to prepare for the current coronavirus disease 2019 (COVID-19) pandemic, but some did not use it to improve their arrangements. This research investigates several countries’ governance strategies, develops a governance model and critically analyses Bangladesh’s failure as a case of governance catastrophe. This study applies qualitative methods of textual data analysis to explore data sourced from current newspapers, blogs, websites, journal articles and books to determine the most appropriate evidence and generate connections and interpretations. The COVID-19 pandemic has had devastating consequences for all countries; however, the different national responses have provided the opportunity to measure governments’ capability in addressing the crisis. Governments need to study the current COVID-19 response and enhance their governance capacities to minimise the spread of infection and to prepare for the challenge of socio-economic recovery.

Download Full-text

LEXICAL BUNDLES IN JOURNAL ARTICLES ACROSS ACADEMIC DISCIPLINES

Indonesian Journal of Applied Linguistics ◽

10.17509/ijal.v7i1.6866 ◽

2017 ◽

Vol 7 (1) ◽

pp. 131

Author(s):

Deny Arnos Kwary ◽

Dewantoro Ratri ◽

Almira F. Artha

Keyword(s):

Social Sciences ◽

High Frequency ◽

Life Sciences ◽

Health Sciences ◽

Academic Disciplines ◽

Journal Articles ◽

Physical Sciences ◽

Lexical Bundles ◽

Referential Expressions

This study focuses on the use of lexical bundles (LBs), their structural forms, and their functional classifications in journal articles of four academic disciplines: Health sciences, Life sciences, Physical sciences, and Social sciences. The corpus comprises 2,937,431 words derived from 400 journal articles which were equally distributed in the four disciplines. The results show that Physical sciences feature the most number of lexical bundles, while Health sciences comprise the least. When we pair-up the disciplines, we found that Physical sciences and Social sciences shared the most number of LBs. We also found that there were no LBs shared between Health sciences and Physical sciences, and neither between Health sciences and Social sciences. For the distribution of the structural forms, we found that the prepositional-based and the verb-based bundles were the most frequent forms (each of them accounts for 37.1% of the LBs, making a total of 74.2%). Within the verb-based bundles, the passive form can be found in 12 out of 23 LB types. Finally, for the functional classifications, the number of referential expressions (40 LBs) is a lot higher than those of discourse organizers (12 LBs) and stance expressions (10 LBs). The high frequency of LBs in the referential expressions can be related to the needs to refer to theories, concepts, data and findings of the study.

Download Full-text

Word Roots in English: Learning English Words through Form and Meaning Similarity

10.26686/wgtn.16998841.v1 ◽

2021 ◽

Author(s):

◽

Zheng Wei

Keyword(s):

High Frequency ◽

English Language ◽

Vocabulary Learning ◽

Data Gathering ◽

Low Frequency ◽

Word List ◽

Word Families ◽

Frequency Criterion ◽

Etymological Dictionary ◽

Two Stages

The research first proposes a vocabulary learning technique: the word part technique, and then tests its effectiveness in aiding vocabulary learning and retention. The first part of the thesis centers around the idea that the knowledge of the first 2000 words language learners already possess may give them easier access to words of other frequency levels because the root parts of the low frequency new words share form and meaning similarities with the high frequency known words. The research addresses the issue at two stages: to quantify the information concerning the number of words able to be accessed through the analysis of the word roots, and to analyze the pedagogical usefulness of the accessible words. A Comprehensive Etymological Dictionary of the English Language (Klein, 1966) was used as the source to show the possible formal and meaning connections among words. All the words in the first 2000 word list were first looked up individually and all the cognates provided under each of these words were collected and placed under each of the high frequency words if they meet the requirement that their roots share more than one letter and/or more than one phoneme with the roots of the first 2000 known words. After the data was roughly gathered, three criteria were applied to filter the data, namely, the frequency criterion, the meaning criterion and form criterion. In applying the frequency criterion, words with frequency levels lower than the tenth thousand were removed from the data. In applying the meaning criterion, hints were given to show the semantic relations between the higher frequency words and the first 2000 thousand words. The hints were then rated on the scale for measuring meaning transparency. Words that were rated at level 5 on the scale were considered inaccessible; words that were rated at levels 1, 2a, 2b, 2c, and 3a were considered easy to access. In applying the form criterion, calculations were done for each semantically accessible word to show their phonological similarity and orthographic similarity in relation to the known word. The words whose phonological or orthographical similarity scores were larger than 0.5 were considered to be phonologically or orthographically easy to access. Finally the "find" function of Microsoft Word was used to check the data by picking up any words that might have been missed in the first round of data gathering. The above procedures resulted in 2156 word families that are able to be accessed through the meaning and form relations with the first 2000 words in their root parts. Among the 2156 word families, 739 can be accessed easily and are therefore more pedagogically useful and 259 can be accessed, but with difficulty. 21 pedagogically useful form constants were selected because they can give access to more unknown lower frequency words than other form constants. In the second part of the thesis, an experiment was conducted to test the effectiveness of the word part technique in comparison with the keyword technique and self-strategy learning. The results show that with the experienced Chinese EFL learners, the keyword technique is slightly inferior to the word part technique and the self-strategy learning.

Download Full-text

Using Computational Techniques to Fill the Gap between Qualitative Data Analysis and Text Analytics

KWALON ◽

10.5117/2010.015.003.002 ◽

2010 ◽

Vol 15 (3) ◽

Author(s):

Curtis Atkisson ◽

Colin Monaghan ◽

Edward Brent

Keyword(s):

Data Analysis ◽

Qualitative Data ◽

Computational Techniques ◽

Text Analytics ◽

Qualitative Data Analysis ◽

Text Data ◽

Textual Data ◽

Data Flows ◽

Mass Digitization ◽

Analysis System

The recent mass digitization of text data has led to a need to efficiently and effectively deal with the mountain of textual data that is generated. Digitized text is increasingly in the form of digitized data flows (Brent, 2008). Digitized data flows are non-static streams of generated content – including twitter, electronic news, etc. An oft-cited statistic is that currently 85% of all business data is in the form of text (cited in Hotho, Nürnberger & Paass, 2005). This mountain of data leads us to the question whether the labor-intensive traditional qualitative data analysis techniques are best suited for this large amount of data. Other techniques for dealing with large amounts of data may also be found wanting because those techniques remove the researcher from an immersion in the data. Both dealing with large amounts of data and allowing immersion in data are clearly desired features of any text analysis system.

Download Full-text

Contextualized Text OLAP Based on Information Retrieval

International Journal of Data Warehousing and Mining ◽

10.4018/ijdwm.2015040101 ◽

2015 ◽

Vol 11 (2) ◽

pp. 1-21 ◽

Cited By ~ 3

Author(s):

Lamia Oukid ◽

Nadjia Benblidia ◽

Fadila Bentayeb ◽

Ounas Asfari ◽

Omar Boussaid

Keyword(s):

Information Retrieval ◽

Data Analysis ◽

Decision Maker ◽

Contextual Information ◽

Contextual Factors ◽

Expansion Method ◽

Aggregation Operator ◽

Current Data ◽

Textual Data ◽

Olap Analysis

Current data warehousing and On-Line Analytical Processing (OLAP) systems are not yet particularly appropriate for textual data analysis. It is therefore crucial to develop a new data model and an OLAP system to provide the necessary analyses for textual data. To achieve this objective, this paper proposes a new approach based on information retrieval (IR) techniques. Moreover, several contextual factors may significantly affect the information relevant to a decision-maker. Thus, the paper proposes to consider contextual factors in an OLAP system to provide relevant results. It provides a generalized approach for Text OLAP analysis which consists of two parts: The first one is a context-based text cube model, denoted CXT-Cube. It is characterized by several contextual dimensions. Hence, during the OLAP analysis process, CXT-Cube exploits the contextual information in order to better consider the semantics of textual data. Besides, the work associates to CXT-Cube a new text analysis measure based on an OLAP-adapted vector space model and a relevance propagation technique. The second part is an OLAP aggregation operator called ORank (OLAP-Rank) which allows to aggregate textual data in an OLAP environment while considering relevant contextual factors. To consider the user context, this paper proposes a query expansion method based on a decision-maker profile. Based on IR metrics, it evaluates the proposed aggregation operator in different cases using several data analysis queries. The evaluation shows that the precision of the system is significantly better than that of a Text OLAP system based on classical IR. This is due to the consideration of the contextual factors.

Download Full-text

The impact of virtual reality technology on tourists’ experience: a textual data analysis

Soft Computing ◽

10.1007/s00500-020-04883-y ◽

2020 ◽

Vol 24 (18) ◽

pp. 13879-13892 ◽

Cited By ~ 1

Author(s):

M. Rosario González-Rodríguez ◽

M. Carmen Díaz-Fernández ◽

Miguel Ángel Pino-Mejías

Keyword(s):

Virtual Reality ◽

Data Analysis ◽

Virtual Reality Technology ◽

Textual Data ◽

The Impact

Download Full-text

Word lists and the role of academic vocabulary use in high stakes speaking assessments

International Journal of Learner Corpus Research ◽

10.1075/ijlcr.20008.smi ◽

2020 ◽

Vol 6 (2) ◽

pp. 193-219

Author(s):

George Fredrik Smith ◽

Kristopher Kyle ◽

Scott A. Crossley

Keyword(s):

List Item ◽

Word List ◽

Spoken Word ◽

Effect Sizes ◽

Academic Vocabulary ◽

Weak Effect ◽

High Stakes ◽

Word Use ◽

Speaking Assessment

Abstract The current study explored the extent to which academic vocabulary lists could meet the lexical demands of academic speaking assessments. Indices of word use from lists of academic and general vocabulary were used to predict speaking scores on three TOEFL tasks. The results found weak associations between list-item use and response scores that varied by task. Independent response scores were associated with the use of specialized vocabulary from the first level of the Academic Spoken Word List. Integrated campus situation response scores were most strongly associated with the use of unique words from the Academic Word List. Integrated academic course response scores were associated with the use of more sophisticated general vocabulary. Although the findings provide some support for the use of academic vocabulary lists in speaking assessment preparation, the weak effect sizes point to the need to develop lists of academic vocabulary specific to academic speaking and assessment.

Download Full-text