scholarly journals Predicting and Analyzing Language Specificity in Social Media Posts

Author(s):  
Yifan Gao ◽  
Yang Zhong ◽  
Daniel Preoţiuc-Pietro ◽  
Junyi Jessy Li

In computational linguistics, specificity quantifies how much detail is engaged in text. It is an important characteristic of speaker intention and language style, and is useful in NLP applications such as summarization and argumentation mining. Yet to date, expert-annotated data for sentence-level specificity are scarce and confined to the news genre. In addition, systems that predict sentence specificity are classifiers trained to produce binary labels (general or specific).We collect a dataset of over 7,000 tweets annotated with specificity on a fine-grained scale. Using this dataset, we train a supervised regression model that accurately estimates specificity in social media posts, reaching a mean absolute error of 0.3578 (for ratings on a scale of 1-5) and 0.73 Pearson correlation, significantly improving over baselines and previous sentence specificity prediction systems. We also present the first large-scale study revealing the social, temporal and mental health factors underlying language specificity on social media.

2019 ◽  
Vol 375 (1791) ◽  
pp. 20180522 ◽  
Author(s):  
Mante S. Nieuwland ◽  
Dale J. Barr ◽  
Federica Bartolozzi ◽  
Simon Busch-Moreno ◽  
Emily Darley ◽  
...  

Composing sentence meaning is easier for predictable words than for unpredictable words. Are predictable words genuinely predicted, or simply more plausible and therefore easier to integrate with sentence context? We addressed this persistent and fundamental question using data from a recent, large-scale ( n = 334) replication study, by investigating the effects of word predictability and sentence plausibility on the N400, the brain's electrophysiological index of semantic processing. A spatio-temporally fine-grained mixed-effect multiple regression analysis revealed overlapping effects of predictability and plausibility on the N400, albeit with distinct spatio-temporal profiles. Our results challenge the view that the predictability-dependent N400 reflects the effects of either prediction or integration, and suggest that semantic facilitation of predictable words arises from a cascade of processes that activate and integrate word meaning with context into a sentence-level meaning. This article is part of the theme issue ‘Towards mechanistic models of meaning composition’.


2019 ◽  
Vol 37 (4) ◽  
pp. 703-721
Author(s):  
Qingqing Zhou ◽  
Ming Jing

Purpose Expressional anomie (e.g. obscene words) can hinder communications and even obstruct improvements of national literacy. Meanwhile, the borderless and rapid transmission of the internet has exacerbated the influences. Hence, the purpose of this paper is detecting online anomic expression automatically and analyzing dynamic evolution processes of expressional anomie, so as to reveal multidimensional status of expressional anomie. Design/methodology/approach This paper conducted expressional anomie analysis via fine-grained microblog mining. Specifically, anomic microblogs and their anomic types were identified via a supervised classification method. Then, the evolutions of expressional anomie were analyzed, and impacts of users’ characteristics on the evolution process were mined. Finally, expressional anomie characteristics and evolution trends were obtained. Findings Empirical results on microblogs indicate that more effective and diversified measures need to be used to address the current large-scale anomie in expression. Moreover, measures should be tailored to individuals and local conditions. Originality/value To the best of the authors’ knowledge, it is the first research to mine evolutions of expressional anomie automatically in social media. It may discover more continuous and universal rules of expressional anomie, so as to optimize the online expression environment.


2018 ◽  
Author(s):  
Mante S. Nieuwland ◽  
Dale J. Barr ◽  
Federica Bartolozzi ◽  
Simon Busch-Moreno ◽  
Emily Darley ◽  
...  

AbstractComposing sentence meaning is easier for predictable words than for unpredictable words. Are predictable words genuinely predicted, or simply more plausible and therefore easier to integrate with sentence context? We addressed this persistent and fundamental question using data from a recent, large-scale (N= 334) replication study, by investigating the effects of word predictability and sentence plausibility on the N400, the brain’s electrophysiological index of semantic processing. A spatiotemporally fine-grained mixed effects multiple regression analysis revealed overlapping effects of predictability and plausibility on the N400, albeit with distinct spatiotemporal profiles. Our results challenge the view that the predictability-dependent N400 reflects the effects ofeitherpredictionorintegration, and suggest that semantic facilitation of predictable words arises from a cascade of processes that activate and integrate word meaning with context into a sentence-level meaning.


2018 ◽  
Vol 50 (1) ◽  
pp. 262-281 ◽  
Author(s):  
Rijwana I. Esha ◽  
Monzur A. Imteaz

Abstract The current study aims to assess the potential of statistical multiple linear regression (MLR) techniques to develop long-term streamflow forecast models for New South Wales (NSW). While most of the past studies were concentrated on revealing the relationship between streamflow and single concurrent or lagged climate indices, this study intends to explore the combined impact of large-scale climate drivers. Considering their influences on the streamflow of NSW, several major climate drivers – IPO (Inter Decadal Pacific Oscillation)/PDO (Pacific Decadal Oscillation), IOD (Indian Ocean Dipole) and ENSO (El Niño-Southern Oscillation) are selected. Single correlation analysis is exploited as the basis for selecting different combinations of input variables for developing MLR models to examine the extent of the combined impacts of the selected climate drivers on forecasting spring streamflow several months ahead. The developed models with all the possible combinations show significantly good results for all selected 12 stations in terms of Pearson correlation (r), root mean square error (RMSE), mean absolute error (MAE) and Willmott index of agreement (d). For each region, the best model with lower errors provides statistically significant maximum correlation which ranges from 0.51 to 0.65.


2014 ◽  
Vol 22 (2) ◽  
pp. 59-74 ◽  
Author(s):  
Alex D. Breslow ◽  
Ananta Tiwari ◽  
Martin Schulz ◽  
Laura Carrington ◽  
Lingjia Tang ◽  
...  

Co-location, where multiple jobs share compute nodes in large-scale HPC systems, has been shown to increase aggregate throughput and energy efficiency by 10–20%. However, system operators disallow co-location due to fair-pricing concerns, i.e., a pricing mechanism that considers performance interference from co-running jobs. In the current pricing model, application execution time determines the price, which results in unfair prices paid by the minority of users whose jobs suffer from co-location. This paper presents POPPA, a runtime system that enables fair pricing by delivering precise online interference detection and facilitates the adoption of supercomputers with co-locations. POPPA leverages a novel shutter mechanism – a cyclic, fine-grained interference sampling mechanism to accurately deduce the interference between co-runners – to provide unbiased pricing of jobs that share nodes. POPPA is able to quantify inter-application interference within 4% mean absolute error on a variety of co-located benchmark and real scientific workloads.


Author(s):  
Nicola Messina ◽  
Giuseppe Amato ◽  
Andrea Esuli ◽  
Fabrizio Falchi ◽  
Claudio Gennaro ◽  
...  

Despite the evolution of deep-learning-based visual-textual processing systems, precise multi-modal matching remains a challenging task. In this work, we tackle the task of cross-modal retrieval through image-sentence matching based on word-region alignments, using supervision only at the global image-sentence level. Specifically, we present a novel approach called Transformer Encoder Reasoning and Alignment Network (TERAN). TERAN enforces a fine-grained match between the underlying components of images and sentences (i.e., image regions and words, respectively) to preserve the informative richness of both modalities. TERAN obtains state-of-the-art results on the image retrieval task on both MS-COCO and Flickr30k datasets. Moreover, on MS-COCO, it also outperforms current approaches on the sentence retrieval task. Focusing on scalable cross-modal information retrieval, TERAN is designed to keep the visual and textual data pipelines well separated. Cross-attention links invalidate any chance to separately extract visual and textual features needed for the online search and the offline indexing steps in large-scale retrieval systems. In this respect, TERAN merges the information from the two domains only during the final alignment phase, immediately before the loss computation. We argue that the fine-grained alignments produced by TERAN pave the way toward the research for effective and efficient methods for large-scale cross-modal information retrieval. We compare the effectiveness of our approach against relevant state-of-the-art methods. On the MS-COCO 1K test set, we obtain an improvement of 5.7% and 3.5% respectively on the image and the sentence retrieval tasks on the Recall@1 metric. The code used for the experiments is publicly available on GitHub at https://github.com/mesnico/TERAN .


Author(s):  
Y. Sun

In development of sustainable transportation and green city, policymakers encourage people to commute by cycling and walking instead of motor vehicles in cities. One the one hand, cycling and walking enables decrease in air pollution emissions. On the other hand, cycling and walking offer health benefits by increasing people’s physical activity. Earlier studies on investigating spatial patterns of active travel (cycling and walking) are limited by lacks of spatially fine-grained data. In recent years, with the development of information and communications technology, GPS-enabled devices are popular and portable. With smart phones or smart watches, people are able to record their cycling or walking GPS traces when they are moving. A large number of cyclists and pedestrians upload their GPS traces to sport social media to share their historical traces with other people. Those sport social media thus become a potential source for spatially fine-grained cycling and walking data. Very recently, Strava Metro offer aggregated cycling and walking data with high spatial granularity. Strava Metro aggregated a large amount of cycling and walking GPS traces of Strava users to streets or intersections across a city. Accordingly, as a kind of crowdsourced geographic information, the aggregated data is useful for investigating spatial patterns of cycling and walking activities, and thus is of high potential in understanding cycling or walking behavior at a large spatial scale. This study is a start of demonstrating usefulness of Strava Metro data for exploring cycling or walking patterns at a large scale.


2010 ◽  
Vol 16 (4) ◽  
pp. 391-415 ◽  
Author(s):  
JIANGUO LI ◽  
CHRIS BREW

AbstractLapata and Brew (Computational Linguistics, vol. 30, 2004, pp. 295–313) (hereafter LB04) obtain from untagged texts a statistical prior model that is able to generate class preferences for ambiguous Lewin (English Verb Classes and Alternations: A Preliminary Investigation, 1993, University of Chicago Press) verbs (hereafter Levin). They also show that their informative priors, incorporated into a Naive Bayes classifier deduced from hand-tagged data (HTD), can aid in verb class disambiguation. We re-analyse LB04's prior model and show that a single factor (the joint probability of class and frame) determines the predominant class for a particular verb in a particular frame. This means that the prior model cannot be sensitive to fine-grained lexical distinctions between different individual verbs falling in the same class.We replicate LB04's supervised disambiguation experiments on large-scale data, using deep parsers rather than the shallow parser of LB04. In addition, we introduce a method for training our classifier without using HTD. This relies on knowledge of Levin class memberships to move information from unambiguous to ambiguous instances of each class. We regard this system as unsupervised because it does not rely on human annotation of individual verb instances. Although our unsupervised verb class disambiguator does not match the performance of the ones that make use of HTD, it consistently outperforms the random baseline model. Our experiments also demonstrate that the informative priors derived from untagged texts help improve the performance of the classifier trained on untagged data.


2017 ◽  
Vol 5 (1) ◽  
pp. 70-82
Author(s):  
Soumi Paul ◽  
Paola Peretti ◽  
Saroj Kumar Datta

Building customer relationships and customer equity is the prime concern in today’s business decisions. The emergence of internet, especially social media like Facebook and Twitter, changed traditional marketing thought to a great extent. The importance of customer orientation is reflected in the axiom, “The customer is the king”. A good number of organizations are engaging customers in their new product development activities via social media platforms. Co-creation, a new perspective in which customers are active co-creators of the products they buy and use, is currently challenging the traditional paradigm. The concept of co-creation involving the customer’s knowledge, creativity and judgment to generate value is considered not only an upcoming trend that introduces new products or services but also fitting their need and increasing value for money. Knowledge and innovation are inseparable. Knowledge management competencies and capacities are essential to any organization that aspires to be distinguished and innovative. The present work is an attempt to identify the change in value creation procedure along with one area of business, where co-creation can return significant dividends. It is on extending the brand or brand category through brand extension or line extension. This article, through an in depth literature review analysis, identifies the changes in every perspective of this paradigm shift and it presents a conceptual model of company-customer-brand-based co-creation activity via social media. The main objective is offering an agenda for future research of this emerging trend and ensuring the way to move from theory to practice. The paper acts as a proposal; it allows the organization to go for this change in a large scale and obtain early feedback on the idea presented. 


2019 ◽  
Author(s):  
Kamal Batra ◽  
Stefan Zahn ◽  
Thomas Heine

<p>We thoroughly benchmark time-dependent density- functional theory for the predictive calculation of UV/Vis spectra of porphyrin derivatives. With the aim to provide an approach that is computationally feasible for large-scale applications such as biological systems or molecular framework materials, albeit performing with high accuracy for the Q-bands, we compare the results given by various computational protocols, including basis sets, density-functionals (including gradient corrected local functionals, hybrids, double hybrids and range-separated functionals), and various variants of time-dependent density-functional theory, including the simplified Tamm-Dancoff approximation. An excellent choice for these calculations is the range-separated functional CAM-B3LYP in combination with the simplified Tamm-Dancoff approximation and a basis set of double-ζ quality def2-SVP (mean absolute error [MAE] of ~0.05 eV). This is not surpassed by more expensive approaches, not even by double hybrid functionals, and solely systematic excitation energy scaling slightly improves the results (MAE ~0.04 eV). </p>


Sign in / Sign up

Export Citation Format

Share Document