scholarly journals Generalized word shift graphs: a method for visualizing and explaining pairwise comparisons between texts

2021 ◽  
Vol 10 (1) ◽  
Author(s):  
Ryan J. Gallagher ◽  
Morgan R. Frank ◽  
Lewis Mitchell ◽  
Aaron J. Schwartz ◽  
Andrew J. Reagan ◽  
...  

AbstractA common task in computational text analyses is to quantify how two corpora differ according to a measurement like word frequency, sentiment, or information content. However, collapsing the texts’ rich stories into a single number is often conceptually perilous, and it is difficult to confidently interpret interesting or unexpected textual patterns without looming concerns about data artifacts or measurement validity. To better capture fine-grained differences between texts, we introduce generalized word shift graphs, visualizations which yield a meaningful and interpretable summary of how individual words contribute to the variation between two texts for any measure that can be formulated as a weighted average. We show that this framework naturally encompasses many of the most commonly used approaches for comparing texts, including relative frequencies, dictionary scores, and entropy-based measures like the Kullback–Leibler and Jensen–Shannon divergences. Through a diverse set of case studies ranging from presidential speeches to tweets posted in urban green spaces, we demonstrate how generalized word shift graphs can be flexibly applied across domains for diagnostic investigation, hypothesis generation, and substantive interpretation. By providing a detailed lens into textual shifts between corpora, generalized word shift graphs help computational social scientists, digital humanists, and other text analysis practitioners fashion more robust scientific narratives.

2020 ◽  
Vol 0 (0) ◽  
Author(s):  
Xiaohong Wang ◽  
Shuang Dong

AbstractWith the rapid development of online shopping, how to explore the value of online reviews, so as to give full play to their role in potential users’ purchasing decisions. Based on text mining and quantitative analysis, this paper studies the sentiment analysis of online reviews on B2C shopping website. The main attributes of commodity or service are extracted based on the order of word frequency in the online reviews. Text analysis method is used to judge the relationship between attributes of commodity or service and its emotional words. The fine-grained sentimental polarity and intensity of attributes are identified to analyze users’ concerns and preferences. The research shows that users pay more attention to the configuration and after-sales service of mobile, and have a positive sentimental orientation to most of attributes, especially unlocking function, hand feeling attribute and logistics service; and have a neutral sentimental orientation towards the attributes of battery and memory, and a negative sentimental orientation towards the membrane of mobile phone. The results can provide a reference for consumers to make purchasing decisions, for enterprises to improve product quality, and for shopping platform to optimize service.


2020 ◽  
Author(s):  
Olivier Bolle ◽  
Michel Corsini ◽  
Hervé Diot ◽  
Oscar Laurent ◽  
Raphaël Melis

<p>A significant portion of the Maures-Tanneron Massif (SE branch of the European Variscan Belt) is occupied by late orogenic, anatectic crustal granitoids that were emplaced at ca. 325-300 Ma (Upper Carboniferous)<sup>1,2</sup>. The Camarat granite<sup>3</sup> is one of the smallest representatives of these granitoids (~2.5 km<sup>2</sup>). It is a composite intrusion exposed in migmatitic gneisses of the Maures Massif, along the southern shore of the Saint-Tropez Peninsula. From west to east, it consists of an E-W strip of Ms-Bt-Crd leucogranite where coarse- and fine-grained facies are found in similar amounts, and two bodies of Bt-Ms leucogranite, dominantly coarse-grained.</p><p>Zircon and monazite from two samples of the Camarat granite have been analyzed by LA-ICP-MS for U-Pb dating. Sixteen monazite analyses from the fine-grained facies of the E-W granite strip give a Concordia age of 303.5 ± 1.8 Ma (2 S.E., MSWD = 0.9). Sixteen zircons from the coarse-grained facies of the easternmost intrusion provide a Concordia age of 304.6 ± 2.1 Ma (2 S.E., MSWD = 1.2). The two dates are identical within uncertainty and are considered to constrain crystallization of the Camarat granite at ~304 Ma (Kasimovian–Gzhelian limit).</p><p>Twenty-one measurements of the anisotropy of magnetic susceptibility (AMS) and direct textural quantifications through image analysis (IA) of 10 samples give agreeing results that reveal the fabric orientation in the Camarat granite. The foliation has a variable orientation, with a weighted average of N65°E/26°NNW for the AMS data and N77°E/17°NNW for the IA data (D = 10°). The lineation pattern is more homogeneous, displaying a consistent northerly shallow plunge (mean of N12°E/22°NNE vs. N22°E/20°NNE; D = 10°). The Camarat granite lineations are parallel to lineations in the gneissic country rocks. These were produced during the last Variscan tectonic event evidenced in the area, a partitioned transpression phase, localized along ca. N-S sinistral strike-slip shear zones<sup>4</sup>. It is proposed that the ascent of the Camarat granite was favoured by such strike-slip structures and that pull-aparts represent the sites of emplacement, as best exemplified by the E-W granite strip.</p><p>In the Corso-Sardinian Block, another portion of the SE Variscides formerly juxtaposed to the Maures-Tanneron Massif<sup>5</sup>, a model of progressive transition from orogen-parallel flow (late orogenic, Upper Carboniferous transpression) to orogen-perpendicular extension (post orogenic, Permian rifting) has been recently proposed<sup>6</sup>. Such a model may be extended to other areas of the SE Variscan Belt, in particular to the Maures-Tanneron Massif which is cut and bordered by ca. E-W Permian grabens<sup>7</sup>, implying that a ca. N-S direction of stretching, as recorded by the 304 Ma Camarat granite, was still prevailing in Permian times.</p><p> </p><ol><li>Duchesne et al., Lithos 162-163, 195-220 (2013). 2. Schneider et al., Geol. Soc. Spec. Pub. 405, 313-331 (2014). 3. Amenzou & Pupin, C. R. Acad. Sc. Paris (Série II) 303, 697-700 (1986). 4. Corsini & Rolland, C. R. Geoscience 341, 214-223 (2009). 5. Edel et al., Geol. Soc. Spec. Pub. 405, 333-361 (2014). 6. Casini et al., Tectonophysics 646, 65-78 (2015). 7. Toutin-Morin, Ann. Soc. géol. Nord 106, 183-187 (1987).</li> </ol>


Author(s):  
Franklin Tchakounté ◽  
Athanase Esdras Yera Pagore ◽  
Marcellin Atemkeng ◽  
Jean Claude Kamgang

Comments are exploited by product vendors to measure satisfaction of consumers. With the advent of Natural Language Processing (NLP), comments on Google Play can be processed to extract knowledge on applications such as their reputation. Proposals in that direction are either informal or interested merely on functionality. Unlike, this work aims to determine reputation of Android applications in terms of confidentiality, integrity, availability and authentication (CIAA). This work proposes a model of assessing app reputation relying on sentiment analysis and text analysis of comments. While assuming that comments are reliable, we collect Google Play applications subject to comments which include security keywords. An in-depth analysis of keywords based on Naive Bayes classification is made to provide polarity of any comment. Based on comment polarity, reputation is evaluated for the whole application. Experiments made on real applications including dozens to billions of comments, reveal that developers lack to make efforts to guarantee CIAA services. A fine-grained analysis shows that not security reputed applications can be reputed in specific CIAA services. Results also show that applications with negative security polarities display in general positive functional polarities. This result suggests that security checking should include careful comment analysis to improve security of applications.


Author(s):  
V. L. Andreichev ◽  
◽  
A. A. Soboleva ◽  
V. B. Khubanov ◽  
I. D. Sobolev ◽  
...  

The article presents the first U-Pb data on the age of detrital zircons from clastic sediments of Rumyanichnaya Formation included in Barma Group which constitutes the lowest outcropped part of the Precambrian sequence of the Northern Timan. Age data (LA-ICP-MS) for 94 zircon grains from fine-grained aleuritic sandstone cover the range of 981–2582 Ma. Weighted average age of the two youngest zircons yields the age of 983±40 Ma which provides grounds to assume that sediment deposition took place in Late Riphean (Neoproterozoic). The accumulation of clastic sediments that compose the all three formations of Barma Group (~5 km thick) was controlled mostly with terrigenous material from eroded rock complexes coeval with crystalline complexes of Fennoscandia and Central Russian Belt.


Ethnicities ◽  
2021 ◽  
pp. 146879682110615
Author(s):  
Suresh Canagarajah

This article develops a complex orientation to linguistic domination and resistance to demonstrate how academic communication can be diversified to facilitate anti-racist scholarship. While it draws from social sciences which provide complex theories of social structuration, it demonstrates how linguists can offer fine-grained analytical tools to track these processes across diverse scales of space, time, and institutions. The objective of this article is to introduce an orientation to language which goes beyond traditional reductive and overdetermined perspectives to accommodate its generative and resistant potential. It introduces translingual practice as accommodating the theoretical developments discussed, and demonstrates how methods of indexical analyses can help scholars study texts and communication across various spatiotemporal scales in achieving structuration. This approach is applied to the writing practice of African American scholar, Geneva Smitherman, to demonstrate how her anti-racist scholarship renegotiates established structures of academic communication and generates change. While this article will help applied linguists to develop an appreciation of writers and writing in constructing diversified academic communication, it can provide linguistic tools to social scientists for tracing the workings of structuration and change at diverse spatiotemporal and social scales of consideration.


2016 ◽  
Vol 28 (12) ◽  
pp. 2842-2863 ◽  
Author(s):  
Andrea Guizzardi ◽  
Alice Monti ◽  
Ercolino Ranieri

Purpose The present study aims to suggest a new approach to hotel quality rating, specifically designed for the business travel segment, where the evaluation of surveyed consumers (business travelers) does not necessarily reflect the priority of customers (corporate travel departments [CTDs]). Design/methodology/approach Preliminarily, the authors defined key areas (domains), exploring what was done by quality certifiers recognized worldwide. Then, each domain quality was considered as a latent variable measured by a set of observable attributes (sub-domains) surveyed by a professional assessor. A continuous, fine-grained, composite indicator (CI) for quality was finally obtained by a weighted average of the domain (latent) quality measures. Weights were endogenously determined by data envelopment analysis. Findings The suggested CI shows both the existence of large quality disparities within the same star rating and a relevant bias in the internet reviews. A “soundproofed” room, a front desk open 24 h with sufficient staff and an adequate urban context are necessary features of any business hotel. Research limitations/implications Data came from a professional assessor’s database; therefore, the authors could only consider a three-domains measurement model. The database is mainly composed of three- and four-star hotels in Italy; nonetheless, these accommodations are the most widespread in the Italian corporation hotel programs, preserving the practical utility of the results. Originality/value This study provides a transparent (replicable) evaluation protocol that is of potential use in the most popular models for quality measurement; any assessor can use it to underline its impartiality to CTD and assessed hotels.


2014 ◽  
Vol 7 (2) ◽  
pp. 265-290 ◽  
Author(s):  
RUMEN ILIEV ◽  
MORTEZA DEHGHANI ◽  
EYAL SAGI

abstractRecent years have seen rapid developments in automated text analysis methods focused on measuring psychological and demographic properties. While this development has mainly been driven by computer scientists and computational linguists, such methods can be of great value for social scientists in general, and for psychologists in particular. In this paper, we review some of the most popular approaches to automated text analysis from the perspective of social scientists, and give examples of their applications in different theoretical domains. After describing some of the pros and cons of these methods, we speculate about future methodological developments, and how they might change social sciences. We conclude that, despite the fact that current methods have many disadvantages and pitfalls compared to more traditional methods of data collection, the constant increase of computational power and the wide availability of textual data will inevitably make automated text analysis a common tool for psychologists.


2020 ◽  
Vol 34 (1) ◽  
pp. 19-42
Author(s):  
David Moats

It is often claimed that the rise of so called ‘big data’ and computationally advanced methods may exacerbate tensions between disciplines like data science and anthropology. This paper is an attempt to reflect on these possible tensions and their resolution, empirically. It contributes to a growing body of literature which observes interdisciplinary collabrations around new methods and digital infrastructures in practice but argues that many existing arrangements for interdisciplinary collaboration enforce a separation between disciplines in which identities are not really put at risk. In order to disrupt these standard roles and routines we put on a series of workshops in which mainly self-identified qualitative or non-technical researchers were encouraged to use digital tools (scrapers, automated text analysis and data visualisations). The paper focuses on three empirical examples from the workshops in which tensions, both between disciplines and methods, flared up and how they were ultimately managed or settled. In order to characterise both these tensions and negotiating strategies I draw on Woolgar and Stengers’ use of the humour and irony to describe how disciplines relate to each others truth claims. I conclude that while there is great potential in more open-ended collaborative settings, qualitative social scientists may need to confront some of their own disciplinary baggage in order for better dialogue and more radical mixings between disciplines to occur.


Sign in / Sign up

Export Citation Format

Share Document