Predicting and Analyzing Language Specificity in Social Media Posts

In computational linguistics, specificity quantifies how much detail is engaged in text. It is an important characteristic of speaker intention and language style, and is useful in NLP applications such as summarization and argumentation mining. Yet to date, expert-annotated data for sentence-level specificity are scarce and confined to the news genre. In addition, systems that predict sentence specificity are classifiers trained to produce binary labels (general or specific).We collect a dataset of over 7,000 tweets annotated with specificity on a fine-grained scale. Using this dataset, we train a supervised regression model that accurately estimates specificity in social media posts, reaching a mean absolute error of 0.3578 (for ratings on a scale of 1-5) and 0.73 Pearson correlation, significantly improving over baselines and previous sentence specificity prediction systems. We also present the first large-scale study revealing the social, temporal and mental health factors underlying language specificity on social media.

Download Full-text

Dissociable effects of prediction and integration during language comprehension: evidence from a large-scale study using brain potentials

Philosophical Transactions of the Royal Society B Biological Sciences ◽

10.1098/rstb.2018.0522 ◽

2019 ◽

Vol 375 (1791) ◽

pp. 20180522 ◽

Cited By ~ 18

Author(s):

Mante S. Nieuwland ◽

Dale J. Barr ◽

Federica Bartolozzi ◽

Simon Busch-Moreno ◽

Emily Darley ◽

...

Keyword(s):

Semantic Processing ◽

Large Scale ◽

Word Meaning ◽

Mixed Effect ◽

Fine Grained ◽

Sentence Level ◽

Spatio Temporal ◽

Word Predictability ◽

Using Data ◽

Temporal Profiles

Composing sentence meaning is easier for predictable words than for unpredictable words. Are predictable words genuinely predicted, or simply more plausible and therefore easier to integrate with sentence context? We addressed this persistent and fundamental question using data from a recent, large-scale ( n = 334) replication study, by investigating the effects of word predictability and sentence plausibility on the N400, the brain's electrophysiological index of semantic processing. A spatio-temporally fine-grained mixed-effect multiple regression analysis revealed overlapping effects of predictability and plausibility on the N400, albeit with distinct spatio-temporal profiles. Our results challenge the view that the predictability-dependent N400 reflects the effects of either prediction or integration, and suggest that semantic facilitation of predictable words arises from a cascade of processes that activate and integrate word meaning with context into a sentence-level meaning. This article is part of the theme issue ‘Towards mechanistic models of meaning composition’.

Download Full-text

Detecting online expressional anomie and its evolutions in social media

The Electronic Library ◽

10.1108/el-02-2019-0021 ◽

2019 ◽

Vol 37 (4) ◽

pp. 703-721

Author(s):

Qingqing Zhou ◽

Ming Jing

Keyword(s):

Social Media ◽

Design Methodology ◽

Supervised Classification ◽

Large Scale ◽

Local Conditions ◽

Content Type ◽

Fine Grained ◽

Rapid Transmission ◽

National Literacy ◽

Microblog Mining

Purpose Expressional anomie (e.g. obscene words) can hinder communications and even obstruct improvements of national literacy. Meanwhile, the borderless and rapid transmission of the internet has exacerbated the influences. Hence, the purpose of this paper is detecting online anomic expression automatically and analyzing dynamic evolution processes of expressional anomie, so as to reveal multidimensional status of expressional anomie. Design/methodology/approach This paper conducted expressional anomie analysis via fine-grained microblog mining. Specifically, anomic microblogs and their anomic types were identified via a supervised classification method. Then, the evolutions of expressional anomie were analyzed, and impacts of users’ characteristics on the evolution process were mined. Finally, expressional anomie characteristics and evolution trends were obtained. Findings Empirical results on microblogs indicate that more effective and diversified measures need to be used to address the current large-scale anomie in expression. Moreover, measures should be tailored to individuals and local conditions. Originality/value To the best of the authors’ knowledge, it is the first research to mine evolutions of expressional anomie automatically in social media. It may discover more continuous and universal rules of expressional anomie, so as to optimize the online expression environment.

Download Full-text

Dissociable effects of prediction and integration during language comprehension: Evidence from a large-scale study using brain potentials

10.1101/267815 ◽

2018 ◽

Cited By ~ 7

Author(s):

Mante S. Nieuwland ◽

Dale J. Barr ◽

Federica Bartolozzi ◽

Simon Busch-Moreno ◽

Emily Darley ◽

...

Keyword(s):

Semantic Processing ◽

Large Scale ◽

Word Meaning ◽

Mixed Effects ◽

Brain Potentials ◽

Fine Grained ◽

Sentence Meaning ◽

Sentence Level ◽

Word Predictability ◽

Using Data

AbstractComposing sentence meaning is easier for predictable words than for unpredictable words. Are predictable words genuinely predicted, or simply more plausible and therefore easier to integrate with sentence context? We addressed this persistent and fundamental question using data from a recent, large-scale (N= 334) replication study, by investigating the effects of word predictability and sentence plausibility on the N400, the brain’s electrophysiological index of semantic processing. A spatiotemporally fine-grained mixed effects multiple regression analysis revealed overlapping effects of predictability and plausibility on the N400, albeit with distinct spatiotemporal profiles. Our results challenge the view that the predictability-dependent N400 reflects the effects ofeitherpredictionorintegration, and suggest that semantic facilitation of predictable words arises from a cascade of processes that activate and integrate word meaning with context into a sentence-level meaning.

Download Full-text

Assessing the predictability of MLR models for long-term streamflow using lagged climate indices as predictors: a case study of NSW (Australia)

Hydrology Research ◽

10.2166/nh.2018.171 ◽

2018 ◽

Vol 50 (1) ◽

pp. 262-281 ◽

Cited By ~ 5

Author(s):

Rijwana I. Esha ◽

Monzur A. Imteaz

Keyword(s):

Large Scale ◽

Southern Oscillation ◽

Pearson Correlation ◽

Absolute Error ◽

Climate Indices ◽

South Wales ◽

Forecast Models ◽

Input Variables ◽

Combined Impact

Abstract The current study aims to assess the potential of statistical multiple linear regression (MLR) techniques to develop long-term streamflow forecast models for New South Wales (NSW). While most of the past studies were concentrated on revealing the relationship between streamflow and single concurrent or lagged climate indices, this study intends to explore the combined impact of large-scale climate drivers. Considering their influences on the streamflow of NSW, several major climate drivers – IPO (Inter Decadal Pacific Oscillation)/PDO (Pacific Decadal Oscillation), IOD (Indian Ocean Dipole) and ENSO (El Niño-Southern Oscillation) are selected. Single correlation analysis is exploited as the basis for selecting different combinations of input variables for developing MLR models to examine the extent of the combined impacts of the selected climate drivers on forecasting spring streamflow several months ahead. The developed models with all the possible combinations show significantly good results for all selected 12 stations in terms of Pearson correlation (r), root mean square error (RMSE), mean absolute error (MAE) and Willmott index of agreement (d). For each region, the best model with lower errors provides statistically significant maximum correlation which ranges from 0.51 to 0.65.

Download Full-text

Enabling Fair Pricing on High Performance Computer Systems with Node Sharing

Scientific Programming ◽

10.1155/2014/906454 ◽

2014 ◽

Vol 22 (2) ◽

pp. 59-74 ◽

Cited By ~ 4

Author(s):

Alex D. Breslow ◽

Ananta Tiwari ◽

Martin Schulz ◽

Laura Carrington ◽

Lingjia Tang ◽

...

Keyword(s):

High Performance ◽

Large Scale ◽

Mean Absolute Error ◽

Absolute Error ◽

Interference Detection ◽

Fair Pricing ◽

Fine Grained ◽

High Performance Computer ◽

Shutter Mechanism ◽

Application Execution

Co-location, where multiple jobs share compute nodes in large-scale HPC systems, has been shown to increase aggregate throughput and energy efficiency by 10–20%. However, system operators disallow co-location due to fair-pricing concerns, i.e., a pricing mechanism that considers performance interference from co-running jobs. In the current pricing model, application execution time determines the price, which results in unfair prices paid by the minority of users whose jobs suffer from co-location. This paper presents POPPA, a runtime system that enables fair pricing by delivering precise online interference detection and facilitates the adoption of supercomputers with co-locations. POPPA leverages a novel shutter mechanism – a cyclic, fine-grained interference sampling mechanism to accurately deduce the interference between co-runners – to provide unbiased pricing of jobs that share nodes. POPPA is able to quantify inter-application interference within 4% mean absolute error on a variety of co-located benchmark and real scientific workloads.

Download Full-text

Fine-Grained Visual Textual Alignment for Cross-Modal Retrieval Using Transformer Encoders

ACM Transactions on Multimedia Computing Communications and Applications ◽

10.1145/3451390 ◽

2021 ◽

Vol 17 (4) ◽

pp. 1-23

Author(s):

Nicola Messina ◽

Giuseppe Amato ◽

Andrea Esuli ◽

Fabrizio Falchi ◽

Claudio Gennaro ◽

...

Keyword(s):

Information Retrieval ◽

Large Scale ◽

State Of The Art ◽

Retrieval Task ◽

Fine Grained ◽

Retrieval Systems ◽

Sentence Retrieval ◽

Novel Approach ◽

Sentence Level ◽

Sentence Matching

Despite the evolution of deep-learning-based visual-textual processing systems, precise multi-modal matching remains a challenging task. In this work, we tackle the task of cross-modal retrieval through image-sentence matching based on word-region alignments, using supervision only at the global image-sentence level. Specifically, we present a novel approach called Transformer Encoder Reasoning and Alignment Network (TERAN). TERAN enforces a fine-grained match between the underlying components of images and sentences (i.e., image regions and words, respectively) to preserve the informative richness of both modalities. TERAN obtains state-of-the-art results on the image retrieval task on both MS-COCO and Flickr30k datasets. Moreover, on MS-COCO, it also outperforms current approaches on the sentence retrieval task. Focusing on scalable cross-modal information retrieval, TERAN is designed to keep the visual and textual data pipelines well separated. Cross-attention links invalidate any chance to separately extract visual and textual features needed for the online search and the offline indexing steps in large-scale retrieval systems. In this respect, TERAN merges the information from the two domains only during the final alignment phase, immediately before the loss computation. We argue that the fine-grained alignments produced by TERAN pave the way toward the research for effective and efficient methods for large-scale cross-modal information retrieval. We compare the effectiveness of our approach against relevant state-of-the-art methods. On the MS-COCO 1K test set, we obtain an improvement of 5.7% and 3.5% respectively on the image and the sentence retrieval tasks on the Recall@1 metric. The code used for the experiments is publicly available on GitHub at https://github.com/mesnico/TERAN .

Download Full-text

EXPLORING POTENTIAL OF CROWDSOURCED GEOGRAPHIC INFORMATION IN STUDIES OF ACTIVE TRAVEL AND HEALTH: STRAVA DATA AND CYCLING BEHAVIOUR

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xlii-2-w7-1357-2017 ◽

2017 ◽

Vol XLII-2/W7 ◽

pp. 1357-1361 ◽

Cited By ~ 2

Author(s):

Y. Sun

Keyword(s):

Social Media ◽

Spatial Patterns ◽

Large Scale ◽

Active Travel ◽

Geographic Information ◽

Sustainable Transportation ◽

Large Spatial Scale ◽

Fine Grained ◽

Gps Traces ◽

Hand Cycling

In development of sustainable transportation and green city, policymakers encourage people to commute by cycling and walking instead of motor vehicles in cities. One the one hand, cycling and walking enables decrease in air pollution emissions. On the other hand, cycling and walking offer health benefits by increasing people’s physical activity. Earlier studies on investigating spatial patterns of active travel (cycling and walking) are limited by lacks of spatially fine-grained data. In recent years, with the development of information and communications technology, GPS-enabled devices are popular and portable. With smart phones or smart watches, people are able to record their cycling or walking GPS traces when they are moving. A large number of cyclists and pedestrians upload their GPS traces to sport social media to share their historical traces with other people. Those sport social media thus become a potential source for spatially fine-grained cycling and walking data. Very recently, Strava Metro offer aggregated cycling and walking data with high spatial granularity. Strava Metro aggregated a large amount of cycling and walking GPS traces of Strava users to streets or intersections across a city. Accordingly, as a kind of crowdsourced geographic information, the aggregated data is useful for investigating spatial patterns of cycling and walking activities, and thus is of high potential in understanding cycling or walking behavior at a large spatial scale. This study is a start of demonstrating usefulness of Strava Metro data for exploring cycling or walking patterns at a large scale.

Download Full-text

Class-based approach to disambiguating Levin verbs

Natural Language Engineering ◽

10.1017/s1351324910000136 ◽

2010 ◽

Vol 16 (4) ◽

pp. 391-415 ◽

Cited By ~ 1

Author(s):

JIANGUO LI ◽

CHRIS BREW

Keyword(s):

Computational Linguistics ◽

Large Scale ◽

Preliminary Investigation ◽

Joint Probability ◽

Informative Priors ◽

Prior Model ◽

Verb Classes ◽

Fine Grained ◽

Large Scale Data ◽

Shallow Parser

AbstractLapata and Brew (Computational Linguistics, vol. 30, 2004, pp. 295–313) (hereafter LB04) obtain from untagged texts a statistical prior model that is able to generate class preferences for ambiguous Lewin (English Verb Classes and Alternations: A Preliminary Investigation, 1993, University of Chicago Press) verbs (hereafter Levin). They also show that their informative priors, incorporated into a Naive Bayes classifier deduced from hand-tagged data (HTD), can aid in verb class disambiguation. We re-analyse LB04's prior model and show that a single factor (the joint probability of class and frame) determines the predominant class for a particular verb in a particular frame. This means that the prior model cannot be sensitive to fine-grained lexical distinctions between different individual verbs falling in the same class.We replicate LB04's supervised disambiguation experiments on large-scale data, using deep parsers rather than the shallow parser of LB04. In addition, we introduce a method for training our classifier without using HTD. This relies on knowledge of Levin class memberships to move information from unambiguous to ambiguous instances of each class. We regard this system as unsupervised because it does not rely on human annotation of individual verb instances. Although our unsupervised verb class disambiguator does not match the performance of the ones that make use of HTD, it consistently outperforms the random baseline model. Our experiments also demonstrate that the informative priors derived from untagged texts help improve the performance of the classifier trained on untagged data.

Download Full-text

Change of Attitude, Technology and Practice: Identifying the Change for Increased Value Creation with Customer Co-creation

TRANSNATIONAL MARKETING JOURNAL ◽

10.33182/tmj.v5i1.388 ◽

2017 ◽

Vol 5 (1) ◽

pp. 70-82

Author(s):

Soumi Paul ◽

Paola Peretti ◽

Saroj Kumar Datta

Keyword(s):

Social Media ◽

New Product Development ◽

Value Creation ◽

Large Scale ◽

Customer Orientation ◽

Customer Relationships ◽

Future Research ◽

Management Competencies ◽

Business Decisions ◽

Prime Concern

Building customer relationships and customer equity is the prime concern in today’s business decisions. The emergence of internet, especially social media like Facebook and Twitter, changed traditional marketing thought to a great extent. The importance of customer orientation is reflected in the axiom, “The customer is the king”. A good number of organizations are engaging customers in their new product development activities via social media platforms. Co-creation, a new perspective in which customers are active co-creators of the products they buy and use, is currently challenging the traditional paradigm. The concept of co-creation involving the customer’s knowledge, creativity and judgment to generate value is considered not only an upcoming trend that introduces new products or services but also fitting their need and increasing value for money. Knowledge and innovation are inseparable. Knowledge management competencies and capacities are essential to any organization that aspires to be distinguished and innovative. The present work is an attempt to identify the change in value creation procedure along with one area of business, where co-creation can return significant dividends. It is on extending the brand or brand category through brand extension or line extension. This article, through an in depth literature review analysis, identifies the changes in every perspective of this paradigm shift and it presents a conceptual model of company-customer-brand-based co-creation activity via social media. The main objective is offering an agenda for future research of this emerging trend and ensuring the way to move from theory to practice. The paper acts as a proposal; it allows the organization to go for this change in a large scale and obtain early feedback on the idea presented.

Download Full-text

Benchmark of Simplified Time-Dependent Density Functional Theory for UV-Vis Spectral Properties of Porphyrinoids

10.26434/chemrxiv.9912860 ◽

2019 ◽

Author(s):

Kamal Batra ◽

Stefan Zahn ◽

Thomas Heine

Keyword(s):

Density Functional Theory ◽

Density Functional ◽

Large Scale ◽

Absolute Error ◽

Time Dependent ◽

Basis Set ◽

Basis Sets ◽

Functional Theory ◽

Framework Materials ◽

Dependent Density

<p>We thoroughly benchmark time-dependent density- functional theory for the predictive calculation of UV/Vis spectra of porphyrin derivatives. With the aim to provide an approach that is computationally feasible for large-scale applications such as biological systems or molecular framework materials, albeit performing with high accuracy for the Q-bands, we compare the results given by various computational protocols, including basis sets, density-functionals (including gradient corrected local functionals, hybrids, double hybrids and range-separated functionals), and various variants of time-dependent density-functional theory, including the simplified Tamm-Dancoff approximation. An excellent choice for these calculations is the range-separated functional CAM-B3LYP in combination with the simplified Tamm-Dancoff approximation and a basis set of double-ζ quality def2-SVP (mean absolute error [MAE] of ~0.05 eV). This is not surpassed by more expensive approaches, not even by double hybrid functionals, and solely systematic excitation energy scaling slightly improves the results (MAE ~0.04 eV). </p>

Download Full-text