scholarly journals Qualitative Coding in the Computational Era: A Hybrid Approach to Improve Reliability and Reduce Effort for Coding Ethnographic Interviews

2021 ◽  
Author(s):  
Zhuofan Li ◽  
Daniel Dohan ◽  
Corey Abramson

Sociologists have argued that there is value in incorporating computational tools into qualitative research, including using machine learning to code qualitative data. Yet standard computational approaches do not neatly align with traditional qualitative practices. The authors introduce a hybrid human-machine learning approach (HHMLA) that combines a contemporary iterative approach to qualitative coding with advanced word embedding models that allow contextual interpretation beyond what can be reliably accomplished with conventional computational approaches. The results, drawn from an analysis of 87 human-coded ethnographic interview transcripts, demonstrate that HHMLA can code data sets at a fraction of the effort of human-only strategies, saving hundreds of hours labor in even modestly sized qualitative studies, while improving coding reliability. The authors conclude that HHMLA may provide a promising model for coding data sets where human-only coding would be logistically prohibitive but conventional computational approaches would be inadequate given qualitative foci.

2021 ◽  
Vol 7 ◽  
pp. 237802312110623
Author(s):  
Zhuofan Li ◽  
Daniel Dohan ◽  
Corey M. Abramson

Sociologists have argued that there is value in incorporating computational tools into qualitative research, including using machine learning to code qualitative data. Yet standard computational approaches do not neatly align with traditional qualitative practices. The authors introduce a hybrid human-machine learning approach (HHMLA) that combines a contemporary iterative approach to qualitative coding with advanced word embedding models that allow contextual interpretation beyond what can be reliably accomplished with conventional computational approaches. The results, drawn from an analysis of 87 human-coded ethnographic interview transcripts, demonstrate that HHMLA can code data sets at a fraction of the effort of human-only strategies, saving hundreds of hours labor in even modestly sized qualitative studies, while improving coding reliability. The authors conclude that HHMLA may provide a promising model for coding data sets where human-only coding would be logistically prohibitive but conventional computational approaches would be inadequate given qualitative foci.


Author(s):  
Gediminas Adomavicius ◽  
Yaqiong Wang

Numerical predictive modeling is widely used in different application domains. Although many modeling techniques have been proposed, and a number of different aggregate accuracy metrics exist for evaluating the overall performance of predictive models, other important aspects, such as the reliability (or confidence and uncertainty) of individual predictions, have been underexplored. We propose to use estimated absolute prediction error as the indicator of individual prediction reliability, which has the benefits of being intuitive and providing highly interpretable information to decision makers, as well as allowing for more precise evaluation of reliability estimation quality. As importantly, the proposed reliability indicator allows the reframing of reliability estimation itself as a canonical numeric prediction problem, which makes the proposed approach general-purpose (i.e., it can work in conjunction with any outcome prediction model), alleviates the need for distributional assumptions, and enables the use of advanced, state-of-the-art machine learning techniques to learn individual prediction reliability patterns directly from data. Extensive experimental results on multiple real-world data sets show that the proposed machine learning-based approach can significantly improve individual prediction reliability estimation as compared with a number of baselines from prior work, especially in more complex predictive scenarios.


2001 ◽  
Vol 27 (4) ◽  
pp. 521-544 ◽  
Author(s):  
Wee Meng Soon ◽  
Hwee Tou Ng ◽  
Daniel Chung Yong Lim

In this paper, we present a learning approach to coreference resolution of noun phrases in unrestricted text. The approach learns from a small, annotated corpus and the task includes resolving not just a certain type of noun phrase (e.g., pronouns) but rather general noun phrases. It also does not restrict the entity types of the noun phrases; that is, coreference is assigned whether they are of “organization,” “person,” or other types. We evaluate our approach on common data sets (namely, the MUC-6 and MUC-7 coreference corpora) and obtain encouraging results, indicating that on the general noun phrase coreference task, the learning approach holds promise and achieves accuracy comparable to that of nonlearning approaches. Our system is the first learning-based system that offers performance comparable to that of state-of-the-art nonlearning systems on these data sets.


2019 ◽  
Vol 8 (2S11) ◽  
pp. 3616-3620

The Developing enthusiasm for the field of opinion mining and its applications in various regions of information and also, sociology has activated numerous researchers to investigate the field The chance to catch the opinion of the overall public about get-togethers, political developments, organization systems, advertising efforts, and item inclinations has raised expanding enthusiasm of both scientific community (as a result of the energizing open difficulties) and the business world (due to the wonderful advantages for promoting and money related market expectation). Today, sentiment analysis investigation has its applications in a few unique situations. There are a decent number of organizations, both huge and little scale, that focuses on opinions and sentiments as a major aspect of their central goal. This work introduces hybrid approach that includes lexicon based approach and machine learning approach for extracting aspects and sentiments


Author(s):  
Zhao Zhang ◽  
Yun Yuan ◽  
Xianfeng (Terry) Yang

Accurate and timely estimation of freeway traffic speeds by short segments plays an important role in traffic monitoring systems. In the literature, the ability of machine learning techniques to capture the stochastic characteristics of traffic has been proved. Also, the deployment of intelligent transportation systems (ITSs) has provided enriched traffic data, which enables the adoption of a variety of machine learning methods to estimate freeway traffic speeds. However, the limitation of data quality and coverage remain a big challenge in current traffic monitoring systems. To overcome this problem, this study aims to develop a hybrid machine learning approach, by creating a new training variable based on the second-order traffic flow model, to improve the accuracy of traffic speed estimation. Grounded on a novel integrated framework, the estimation is performed using three machine learning techniques, that is, Random Forest (RF), Extreme Gradient Boosting (XGBoost), and Artificial Neural Network (ANN). All three models are trained with the integrated dataset including the traffic flow model estimates and the iPeMS and PeMS data from the Utah Department of Transportation (DOT). Further using the PeMS data as the ground truth for model evaluation, the comparisons between the hybrid approach and pure machine learning models show that the hybrid approach can effectively capture the time-varying pattern of the traffic and help improve the estimation accuracy.


2021 ◽  
Author(s):  
Diti Roy ◽  
Md. Ashiq Mahmood ◽  
Tamal Joyti Roy

<p>Heart Disease is the most dominating disease which is taking a large number of deaths every year. A report from WHO in 2016 portrayed that every year at least 17 million people die of heart disease. This number is gradually increasing day by day and WHO estimated that this death toll will reach the summit of 75 million by 2030. Despite having modern technology and health care system predicting heart disease is still beyond limitations. As the Machine Learning algorithm is a vital source predicting data from available data sets we have used a machine learning approach to predict heart disease. We have collected data from the UCI repository. In our study, we have used Random Forest, Zero R, Voted Perceptron, K star classifier. We have got the best result through the Random Forest classifier with an accuracy of 97.69.<i><b></b></i></p> <p><b> </b></p>


2021 ◽  
Vol 9 (2) ◽  
pp. 313-317
Author(s):  
Vanitha kakollu, Et. al.

Today we have large amounts of textual data to be processed and the procedure involved in classifying text is called natural language processing. The basic goal is to identify whether the text is positive or negative. This process is also called as opinion mining. In this paper, we consider three different data sets and perform sentiment analysis to find the test accuracy. We have three different cases- 1. If the text contains more positive data than negative data then the overall result leans towards positive. 2. If the text contains more negative data than positive data then the overall result leans towards negative. 3. In the final case the number or positive and negative data is nearly equal then we have a neutral output. For sentiment analysis we have several steps like term extraction, feature selection, sentiment classification etc. In this paper the key point of focus is on sentiment analysis by comparing the machine learning approach and lexicon-based approach and their respective accuracy loss graphs.


Author(s):  
Ganesh K. Shinde

Abstract: Most important part of information gathering is to focus on how people think. There are so many opinion resources such as online review sites and personal blogs are available. In this paper we focused on the Twitter. Twitter allow user to express his opinion on variety of entities. We performed sentiment analysis on tweets using Text Mining methods such as Lexicon and Machine Learning Approach. We performed Sentiment Analysis in two steps, first by searching the polarity words from the pool of words that are already predefined in lexicon dictionary and in Second step training the machine learning algorithm using polarities given in the first step. Keywords: Sentiment analysis, Social Media, Twitter, Lexicon Dictionary, Machine Learning Classifiers, SVM.


Author(s):  
Ahmad Fallatah ◽  
Simon Jones ◽  
David Mitchell

The identification of informal settlements in urban areas is an important step in developing and implementing pro-poor urban policies. Understanding when, where and who lives inside informal settlements is critical to efforts to improve their resilience. This study aims to analyse the capability of machine-learning (ML) methods to map informal areas in Jeddah, Saudi Arabia, using very-high-resolution (VHR) imagery and terrain data. Fourteen indicators of settlement characteristics were derived and mapped using an object-based ML approach and VHR imagery. These indicators were categorised according to three different spatial levels: environ, settlement and object. The most useful indicators for prediction were found to be density and texture measures, (with random forest (RF) relative importance measures of over 25% and 23% respectively). The success of this approach was evaluated using a small, fully independent validation dataset. Informal areas were mapped with an overall accuracy of 91%. Object-based ML as a hybrid approach performed better (8%) than object-based image analysis alone due to its ability to encompass all available geospatial levels.


2020 ◽  
Author(s):  
Mareen Lösing ◽  
Jörg Ebbing ◽  
Wolfgang Szwillus

&lt;p&gt;Improving the understanding of geothermal heat flux in Antarctica is crucial for ice-sheet modelling and glacial isostatic adjustment. It affects the ice rheology and can lead to basal melting, thereby promoting ice flow. Direct measurements are sparse and models inferred from e.g. magnetic or seismological data differ immensely. By Bayesian inversion, we evaluated the uncertainties of some of these models and studied the interdependencies of the thermal parameters. In contrast to previous studies, our method allows the parameters to vary laterally, which leads to a heterogeneous West- and a slightly more homogeneous East Antarctica with overall lower surface heat flux. The Curie isotherm depth and radiogenic heat production have the strongest impact on our results but both parameters have a high uncertainty.&lt;/p&gt;&lt;p&gt;To overcome such shortcomings, we adopt a machine learning approach, more specifically a Gradient Boosted Regression Tree model, in order to find an optimal predictor for locations with sparse measurements. However, this approach largely relies on global data sets, which are notoriously unreliable in Antarctica. Therefore, validity and quality of the data sets is reviewed and discussed. Using regional and more detailed data sets of Antarctica&amp;#8217;s Gondwana neighbors might improve the predictions due to their similar tectonic history. The performance of the machine learning algorithm can then be examined by comparing the predictions to the existing measurements. From our study, we expect to get new insights in the geothermal structure of Antarctica, which will help with future studies on the coupling of Solid Earth and Cryosphere.&lt;/p&gt;


Sign in / Sign up

Export Citation Format

Share Document