scholarly journals From a Smoking Gun to Spent Fuel: Principled Subsampling Methods for Building Big Language Data Corpora from Monitor Corpora

Data ◽  
2019 ◽  
Vol 4 (2) ◽  
pp. 48 ◽  
Author(s):  
Jacqueline Tidwell

With the influence of Big Data culture on qualitative data collection, acquisition, and processing, it is becoming increasingly important that social scientists understand the complexity underlying data collection and the resulting models and analyses. Systematic approaches for creating computationally tractable models need to be employed in order to create representative, specialized reference corpora subsampled from Big Language Data sources. Even more importantly, any such method must be tested and vetted for its reproducibility and consistency in generating a representative model of a particular population in question. This article considers and tests one such method for Big Language Data downsampling of digitally accessible language data to determine both how to operationalize this form of corpus model creation, as well as testing whether the method is reproducible. Using the U.S. Nuclear Regulatory Commission’s public documentation database as a test source, the sampling method’s procedure was evaluated to assess variation in the rate of which documents were deemed fit for inclusion or exclusion from the corpus across four iterations. After performing multiple sampling iterations, the approach pioneered by the Tobacco Documents Corpus creators was deemed to be reproducible and valid using a two-proportion z-test at a 99% confidence interval at each stage of the evaluation process–leading to a final mean rejection ratio of 23.5875 and variance of 0.891 for the documents sampled and evaluated for inclusion into the final text-based model. The findings of this study indicate that such a principled sampling method is viable, thus necessitating the need for an approach for creating language-based models that account for extralinguistic factors and linguistic characteristics of documents.

Author(s):  
Jacqueline Hettel Tidwell

With the influence of Big Data culture on qualitative data collection, acquisition, and processing, it is becoming increasingly important that social scientists understand the complexity underlying data collection and the resulting models and analyses. Systematic approaches for creating computationally tractable models need to be employed in order to create representative, specialized reference corpora subsampled from Big Language Data sources. Even more importantly, any such method must be tested and vetted for its reproducibility and consistency in generating a representative model of a particular population in question. This article considers and tests one such method for Big Language Data downsampling of digitally-accessible language data to determine both how to operationalize this form of corpus model creation, as well as testing whether the method is reproducible. Using the U.S. Nuclear Regulatory Commission's public documentation database as a test source, the sampling method's procedure was evaluated to assess variation in the rate of which documents were deemed fit for inclusion or exclusion from the corpus across four iterations. The findings of this study indicate that such a principled sampling method is viable, thus necessitating the need for an approach for creating language-based models that account for extralinguistic factors and linguistic characteristics of documents.


2020 ◽  
Author(s):  
Andria Pragholapati

Work motivation is an influential condition for arousing, directing, and maintaining behavior related to the work environment including nurse work motivation. The purpose of this study was to edit the Nurses' Work Motivation in the Inpatient Room of Majalaya Regional Hospital. This type of research uses analytic survey methods. The sampling method uses a total sampling technique with a total sample of 55 nurses in 6 inpatients. Data collection techniques using a work motivation questionnaire. The analysis used is univariate. The results of the study 28 people (50.9%) have high work motivation. The conclusion of the results of this study some nurses have work motivation of nurses in the inpatient room of Majalaya Regional Hospital. Based on the results of the study are expected to require motivation support to increase work motivation of nurses.


2020 ◽  
Vol 2 (2) ◽  
pp. 167-180
Author(s):  
Luli Achmad Gozali ◽  
Yusniar Lubis ◽  
Syaifuddin Syaifuddin

This study is aimed to determine and analyze the effect of the implementation of motivation and culture on the employees productivity at Huta Padang estate of PT. Perkebunan Nusantara III (Persero) Asahan Regency North Sumatera. This research method uses a quantitative approach, the type of research is a survey. The sample was determined by stratified random sampling method, 95 people. The data collection through questionnaires. Data were analyzed using multiple linear regression. The results showed that partially and simultaneously, the implementation of motivation and culture had a positive and significant effect on the employess productivity at Huta padang estate of PT. Perkebunan Nusantara III (Persero) Asahan Regency North Sumatera. The determination coefficient value of 0.882, indicates that the influence of the implementation of motivation and culture on the employess productivity of Huta Padang estate of PT. Perkebunan Nusantara III (Persero) Asahan Regency North Sumatera is 88.2%. The culture has more dominant influence on the employees produktivity at  Huta Padang estate of PT. Perkebunan Nusantara III (Persero) Asahan Regency North Sumatera, with a direct influence of 73,2%. 


2016 ◽  
Vol 6 (2) ◽  
pp. 117
Author(s):  
Marzie Ghanbari ◽  
Reza Hoveida ◽  
Seyed Ali Siadat

The objective of the present study is to investigate the relationship between managers’ professionalism and (technical, human, and perceptual) skills in managers of Iran Poly Akril Company. The research is an applied one in terms of objectives, and a descriptive-correlational in terms of method. The population includes all experts working in the company in 2012 as 240 individuals among who 144 participants were selected using the stratified random sampling method proportionate to the population size as the sample size. The data collection instruments were two researcher-made questionnaires of Managers’ skills containing 22 items and with the reliability coefficient as 0.96, and Professionalism containing 28 items and the reliability coefficient as 0.95. Their validity was investigated and confirmed by professors and experts of management. Analyzing data was conducted at the two level of descriptive statistics (frequency, mean, SD, and presentation of tables and charts) and inferential statistics (one sample t-test, correlation coefficient, regression coefficient, ANOVA, and F-test).


2012 ◽  
Vol 12 (1) ◽  
pp. 26-31
Author(s):  
Maya Haviland ◽  
With James Pillsbury

Jalaris Aboriginal Corporation in Western Australia was established in 1994 to look after the needs of an Aboriginal community. The organisation's most recent project is the ‘Kids Future Club’, an after-school activities program. Jalaris has a history of evaluating their work using a participatory action research approach, but decided to approach evaluation of the Kids Future Club in a slightly different way. This article discusses the reasons for the changed approach, the efforts made to develop culturally appropriate tools for data collection, and the challenges encountered in undertaking evaluation of outcomes for individual children in the context of Jalaris and their Aboriginal community. The tensions between ethical approaches to working within the Aboriginal kinship network and undertaking evaluation that required detailed observation and data collection of individuals proved to be irreconcilable for Jalaris. Lessons learnt from this evaluation process may inform future efforts to evaluate Aboriginal community initiatives.


Behaviour ◽  
1974 ◽  
Vol 49 (3-4) ◽  
pp. 227-266 ◽  
Author(s):  
Jeanne Altmann

AbstractSeven major types of sampling for observational studies of social behavior have been found in the literature. These methods differ considerably in their suitability for providing unbiased data of various kinds. Below is a summary of the major recommended uses of each technique: In this paper, I have tried to point out the major strengths and weaknesses of each sampling method. Some methods are intrinsically biased with respect to many variables, others to fewer. In choosing a sampling method the main question is whether the procedure results in a biased sample of the variables under study. A method can produce a biased sample directly, as a result of intrinsic bias with respect to a study variable, or secondarily due to some degree of dependence (correlation) between the study variable and a directly-biased variable. In order to choose a sampling technique, the observer needs to consider carefully the characteristics of behavior and social interactions that are relevant to the study population and the research questions at hand. In most studies one will not have adequate empirical knowledge of the dependencies between relevant variables. Under the circumstances, the observer should avoid intrinsic biases to whatever extent possible, in particular those that direcly affect the variables under study. Finally, it will often be possible to use more than one sampling method in a study. Such samples can be taken successively or, under favorable conditions, even concurrently. For example, we have found it possible to take Instantaneous Samples of the identities and distances of nearest neighbors of a focal individual at five or ten minute intervals during Focal-Animal (behavior) Samples on that individual. Often during Focal-Animal Sampling one can also record All Occurrences of Some Behaviors, for the whole social group, for categories of conspicuous behavior, such as predation, intergroup contact, drinking, and so on. The extent to which concurrent multiple sampling is feasible will depend very much on the behavior categories and rate of occurrence, the observational conditions, etc. Where feasible, such multiple sampling can greatly aid in the efficient use of research time.


2015 ◽  
Vol 1 (2) ◽  
pp. 92
Author(s):  
Novalia Nastiti ◽  
Imron Mawardi

Amil zaka in zaka institutions has the right as one of eight ashnaf. Their rights are usually used by zaka institutions as operational cost. However, not all of the intitutions which manage zaka take amil’s right, one of them is Yayasan Nurul Hayat. This institution does not take amil’s right and it is independent in its operational cost. To support this operational cost, Yayasan Nurul Hayat establish business unit with utilize its profit.This study aimed to discover the capability of business unit in supporting operational cost of Yayasan Nurul Hayat. This study used a qualitative approach with descriptive case study method. The selections of informant are using purposive sampling method. Data collection was conducted by semi-structured interviews and documentation. This data is analyzed using descriptive method.The result of this study shows that Yayasan Nurul Hayat Employments’ Salary is taken from business unit’s profit. It is also used to give bonus for employments and grow the business unit of Yayasan Nurul Hayat up. From the result of this study, it can be concluded that the business unit which is developed has great capability to support operational cost of Yayasan Nurul Hayat.


2021 ◽  
Vol 9 (2) ◽  
pp. 193
Author(s):  
Rahmad Hidayat

This research is here to explain several forms of errors in the material module of the Pendidikan Profesi Guru Dalam Jabatan Tahun 2020. Research on the analysis of language errors in the PPG module has never been carried out.  In data collection, used the Listening method with the Note Technique.  The data are recorded in such a way in tabulations.  In analyzing the data, the Intralingual Matching method was used with HBS and HBB techniques. HBS and HBB techniques are realized by comparing between language data and applicable rules.  Furthermore, deviant linguistic data are classified based on the types of violations against linguistic rules and theories.  The presentation of the results of data analysis in this study is based on the taxonomy of linguistic categories in language error analysis.  The results showed that in the module I PPG Dalam Jabatan Tahun 2020 there were spelling errors in the form of punctuation errors, capital letters errors, italicization errors, and word writing errors; morphological errors in the form of word formation errors and word non-conformity; syntactic errors in the form of misuse of conjunctor and ineffective sentences.


2021 ◽  
Vol 19 (1) ◽  
pp. 20
Author(s):  
Devi Dwi Agusthera ◽  
Theresia Militina ◽  
Saida Zainurrosalmia ZA

This study aims to analyze and find out Brand Identification, Self-Concept Connection, Brand Love and Brand Loyalty for iPhone brand Smartphone users in Samarinda. The background of this research refers to the relationship between Brand Identification and Self-Concept Connection that is woven by the iPhone Brand with its users. Besides seeing how the Brand Identification and Self-Concept Connection formed the Brand Love and Brand Loyalty iPhone smartphone users in Samarinda. Brand Identification and Self-Concept Connection may not necessarily create optimal love for iPhone smartphone users in Samarinda, so it needs to be reevaluated along with how much influence it has on brand loyalty. This research is a quantitative descriptive study and uses SEM-AMOS v.22, analysis tool with the SAMPLE formula = Σ Indicator x (5-10), which then selected samples of 102 respondents. Data collection techniques using the proportional accidental sampling method. The results showed that five proposed hypotheses were accepted namely Brand Identification had a significant effect on Brand Love, Self-Concept Connection had a significant effect on Brand Love, Brand Identification had a significant effect on Brand Loyalty, Self-Concept Connection had a significant effect on Brand Loyalty and Brand Love had a significant effect towards Brand Loyalty. These findings support the theory and previous research that became the reference of researchers.


2020 ◽  
Vol 2 (4) ◽  
pp. 3737-3754
Author(s):  
Witta Widiya ◽  
Efrizal Syofyan

The purpose of this study was to analyze the effect of competency, independence, and auditor ethics on audit quality in Inspectorate Office. Type of this research is a quantitative researh. The population in this study were auditors in inspectorate office West Sumatera province with 35 samples taken. The technique of sample collection is total sampling method. The sources of the data of this research is the primary data. The technique of the data collection is questionnaires. The data were analyzed using multiple regression analysis using SPSS version 25. Data analysis methods used in this study are multiple linier regression analysis, with audit quality as the dependent variabel, competency, independence and auditor ethics as the independent variabel. This result of the reseacrh support the third hypotheses, where the research shows that auditor ethics has an effect on audit quality of examiners at the inspectorate West Sumatera province. The results also show that competency and independence have no effect on audit quality of examiners at the inspecorate West Sumatera provinsi.


Sign in / Sign up

Export Citation Format

Share Document