Linguistic effects on news headline success: Evidence from thousands of online field experiments (Registered Report Protocol)

What makes written text appealing? In this registered report protocol, we propose to study the linguistic characteristics of news headline success using a large-scale dataset of field experiments (A/B tests) conducted on the popular website Upworthy comparing multiple headline variants for the same news articles. This unique setup allows us to control for factors that can have crucial confounding effects on headline success. Based on prior literature and a pilot partition of the data, we formulate hypotheses about the linguistic features that are associated with statistically superior headlines. We will test our hypotheses on a much larger partition of the data that will become available after the publication of this registered report protocol. Our results will contribute to resolving competing hypotheses about the linguistic features that affect the success of text and will provide avenues for research into the psychological mechanisms that are activated by those features.

Download Full-text

Closed and Open Vocabulary Approaches to Text Analysis: A Review, Quantitative Comparison, and Recommendations

10.31234/osf.io/t52c6 ◽

2020 ◽

Author(s):

johannes Christopher Eichstaedt ◽

Margaret L. Kern ◽

David Bryce Yaden ◽

H. Andrew Schwartz ◽

Salvatore Giorgi ◽

...

Keyword(s):

Large Scale ◽

Latent Dirichlet Allocation ◽

Ambiguous Word ◽

Linguistic Features ◽

Written Text ◽

Complementary Approach ◽

Size Number ◽

Language Analysis ◽

Quantitative Synthesis ◽

The Impact

Technology now makes it possible to understand efficiently and at large scale how people use language to reveal their everyday thoughts, behaviors, and emotions. Written text has been analyzed through both theory-based, closed-vocabulary methods from the social sciences as well as data-driven, open-vocabulary methods from computer science, but these approaches have not been comprehensively compared. To provide guidance on best practices for automatically analyzing written text, this narrative review and quantitative synthesis compares five predominant closed- and open-vocabulary methods: Linguistic Inquiry and Word Count (LIWC), the General Inquirer, DICTION, Latent Dirichlet Allocation, and Differential Language Analysis. We compare the linguistic features associated with gender, age, and personality across the five methods using an existing dataset of Facebook status updates and self-reported survey data from 65,896 users. Results are fairly consistent across methods. The closed-vocabulary approaches efficiently summarize concepts and are helpful for understanding how people think, with LIWC 2015 yielding the strongest, most parsimonious results. Open-vocabulary approaches reveal more specific and concrete patterns across a broad range of content domains, better address ambiguous word senses, and are less prone to misinterpretation, suggesting that they are well-suited for capturing the nuances of everyday psychological processes. We detail several errors that can occur in closed-vocabulary analyses, the impact of sample size, number of words per user and number of topics included in open-vocabulary analyses, and implications of different analytical decisions. We conclude with recommendations for researchers, advocating for a complementary approach that combines closed- and open-vocabulary methods.

Download Full-text

Survey of Clustering Methods for Large Scale Dataset

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v7i5.13381344 ◽

2019 ◽

Vol 7 (5) ◽

pp. 1338-1344

Author(s):

Anupama Jawale ◽

Ganesh Magar

Keyword(s):

Large Scale ◽

Clustering Methods ◽

Large Scale Dataset

Download Full-text

Joint regression and learning from pairwise rankings for personalized image aesthetic assessment

Computational Visual Media ◽

10.1007/s41095-021-0207-y ◽

2021 ◽

Author(s):

Jin Zhou ◽

Qing Zhang ◽

Jian-Hao Fan ◽

Wei Sun ◽

Wei-Shi Zheng

Keyword(s):

Large Scale ◽

Assessment Model ◽

Generic Model ◽

Small Subset ◽

Deep Convolutional Neural Networks ◽

Personal Taste ◽

Hinge Loss ◽

Novel Approach ◽

Large Scale Dataset ◽

Image Pairs

AbstractRecent image aesthetic assessment methods have achieved remarkable progress due to the emergence of deep convolutional neural networks (CNNs). However, these methods focus primarily on predicting generally perceived preference of an image, making them usually have limited practicability, since each user may have completely different preferences for the same image. To address this problem, this paper presents a novel approach for predicting personalized image aesthetics that fit an individual user’s personal taste. We achieve this in a coarse to fine manner, by joint regression and learning from pairwise rankings. Specifically, we first collect a small subset of personal images from a user and invite him/her to rank the preference of some randomly sampled image pairs. We then search for the K-nearest neighbors of the personal images within a large-scale dataset labeled with average human aesthetic scores, and use these images as well as the associated scores to train a generic aesthetic assessment model by CNN-based regression. Next, we fine-tune the generic model to accommodate the personal preference by training over the rankings with a pairwise hinge loss. Experiments demonstrate that our method can effectively learn personalized image aesthetic preferences, clearly outperforming state-of-the-art methods. Moreover, we show that the learned personalized image aesthetic benefits a wide variety of applications.

Download Full-text

VIPPrint: Validating Synthetic Image Detection and Source Linking Methods on a Large Scale Dataset of Printed Documents

Journal of Imaging ◽

10.3390/jimaging7030050 ◽

2021 ◽

Vol 7 (3) ◽

pp. 50

Author(s):

Anselmo Ferreira ◽

Ehsan Nowroozi ◽

Mauro Barni

Keyword(s):

Large Scale ◽

State Of The Art ◽

Child Pornography ◽

Forensic Analysis ◽

Synthetic Image ◽

Image Detection ◽

Face Images ◽

Large Scale Dataset ◽

Scanned Images ◽

Analysis Of The Images

The possibility of carrying out a meaningful forensic analysis on printed and scanned images plays a major role in many applications. First of all, printed documents are often associated with criminal activities, such as terrorist plans, child pornography, and even fake packages. Additionally, printing and scanning can be used to hide the traces of image manipulation or the synthetic nature of images, since the artifacts commonly found in manipulated and synthetic images are gone after the images are printed and scanned. A problem hindering research in this area is the lack of large scale reference datasets to be used for algorithm development and benchmarking. Motivated by this issue, we present a new dataset composed of a large number of synthetic and natural printed face images. To highlight the difficulties associated with the analysis of the images of the dataset, we carried out an extensive set of experiments comparing several printer attribution methods. We also verified that state-of-the-art methods to distinguish natural and synthetic face images fail when applied to print and scanned images. We envision that the availability of the new dataset and the preliminary experiments we carried out will motivate and facilitate further research in this area.

Download Full-text

Large-scale transcriptomics to dissect 2 years of the life of a fungal phytopathogen interacting with its host plant

BMC Biology ◽

10.1186/s12915-021-00989-3 ◽

2021 ◽

Vol 19 (1) ◽

Author(s):

Elise J. Gay ◽

Jessica L. Soyer ◽

Nicolas Lapalu ◽

Juliette Linglin ◽

Isabelle Fudal ◽

...

Keyword(s):

Host Plant ◽

Large Scale ◽

Field Experiments ◽

Plant Residues ◽

Plant Infection ◽

Effector Genes ◽

Controlled Conditions ◽

Niche Adaptation ◽

Depth Analysis ◽

Fungal Genes

Abstract Background The fungus Leptosphaeria maculans has an exceptionally long and complex relationship with its host plant, Brassica napus, during which it switches between different lifestyles, including asymptomatic, biotrophic, necrotrophic, and saprotrophic stages. The fungus is also exemplary of “two-speed” genome organisms in the genome of which gene-rich and repeat-rich regions alternate. Except for a few stages of plant infection under controlled conditions, nothing is known about the genes mobilized by the fungus throughout its life cycle, which may last several years in the field. Results We performed RNA-seq on samples corresponding to all stages of the interaction of L. maculans with its host plant, either alive or dead (stem residues after harvest) in controlled conditions or in field experiments under natural inoculum pressure, over periods of time ranging from a few days to months or years. A total of 102 biological samples corresponding to 37 sets of conditions were analyzed. We show here that about 9% of the genes of this fungus are highly expressed during its interactions with its host plant. These genes are distributed into eight well-defined expression clusters, corresponding to specific infection lifestyles or to tissue-specific genes. All expression clusters are enriched in effector genes, and one cluster is specific to the saprophytic lifestyle on plant residues. One cluster, including genes known to be involved in the first phase of asymptomatic fungal growth in leaves, is re-used at each asymptomatic growth stage, regardless of the type of organ infected. The expression of the genes of this cluster is repeatedly turned on and off during infection. Whatever their expression profile, the genes of these clusters are enriched in heterochromatin regions associated with H3K9me3 or H3K27me3 repressive marks. These findings provide support for the hypothesis that part of the fungal genes involved in niche adaptation is located in heterochromatic regions of the genome, conferring an extreme plasticity of expression. Conclusion This work opens up new avenues for plant disease control, by identifying stage-specific effectors that could be used as targets for the identification of novel durable disease resistance genes, or for the in-depth analysis of chromatin remodeling during plant infection, which could be manipulated to interfere with the global expression of effector genes at crucial stages of plant infection.

Download Full-text

ShadingNet: Image Intrinsics by Fine-Grained Shading Decomposition

International Journal of Computer Vision ◽

10.1007/s11263-021-01477-5 ◽

2021 ◽

Author(s):

Anil S. Baslamisli ◽

Partha Das ◽

Hoang-An Le ◽

Sezer Karaoglu ◽

Theo Gevers

Keyword(s):

Neural Network ◽

Large Scale ◽

State Of The Art ◽

Image Decomposition ◽

Natural Environments ◽

Decomposition Algorithms ◽

Ambient Light ◽

Fine Grained ◽

Large Scale Dataset ◽

Direct Illumination

AbstractIn general, intrinsic image decomposition algorithms interpret shading as one unified component including all photometric effects. As shading transitions are generally smoother than reflectance (albedo) changes, these methods may fail in distinguishing strong photometric effects from reflectance variations. Therefore, in this paper, we propose to decompose the shading component into direct (illumination) and indirect shading (ambient light and shadows) subcomponents. The aim is to distinguish strong photometric effects from reflectance variations. An end-to-end deep convolutional neural network (ShadingNet) is proposed that operates in a fine-to-coarse manner with a specialized fusion and refinement unit exploiting the fine-grained shading model. It is designed to learn specific reflectance cues separated from specific photometric effects to analyze the disentanglement capability. A large-scale dataset of scene-level synthetic images of outdoor natural environments is provided with fine-grained intrinsic image ground-truths. Large scale experiments show that our approach using fine-grained shading decompositions outperforms state-of-the-art algorithms utilizing unified shading on NED, MPI Sintel, GTA V, IIW, MIT Intrinsic Images, 3DRMS and SRD datasets.

Download Full-text

Building Damage Detection Using U-Net with Attention Mechanism from Pre- and Post-Disaster Remote Sensing Datasets

Remote Sensing ◽

10.3390/rs13050905 ◽

2021 ◽

Vol 13 (5) ◽

pp. 905

Author(s):

Chuyi Wu ◽

Feng Zhang ◽

Junshi Xia ◽

Yichen Xu ◽

Guoqing Li ◽

...

Keyword(s):

Damage Assessment ◽

Large Scale ◽

Binary Classification ◽

Open Data ◽

Building Damage ◽

Attention Mechanism ◽

Large Scale Dataset ◽

Data Program ◽

The Impact ◽

Post Disaster

The building damage status is vital to plan rescue and reconstruction after a disaster and is also hard to detect and judge its level. Most existing studies focus on binary classification, and the attention of the model is distracted. In this study, we proposed a Siamese neural network that can localize and classify damaged buildings at one time. The main parts of this network are a variety of attention U-Nets using different backbones. The attention mechanism enables the network to pay more attention to the effective features and channels, so as to reduce the impact of useless features. We train them using the xBD dataset, which is a large-scale dataset for the advancement of building damage assessment, and compare their result balanced F (F1) scores. The score demonstrates that the performance of SEresNeXt with an attention mechanism gives the best performance, with the F1 score reaching 0.787. To improve the accuracy, we fused the results and got the best overall F1 score of 0.792. To verify the transferability and robustness of the model, we selected the dataset on the Maxar Open Data Program of two recent disasters to investigate the performance. By visual comparison, the results show that our model is robust and transferable.

Download Full-text

Development and Testing of an Unmanned Aerial Vehicle for Large Scale Particle Image Velocimetry

Volume 3: Industrial Applications; Modeling for Oil and Gas, Control and Validation, Estimation, and Control of Automotive Systems; Multi-Agent and Networked Systems; Control System Design; Physical Human-Robot Interaction; Rehabilitation Robotics; Sensing and Actuation for Control; Biomedical Systems; Time Delay Systems and Stability; Unmanned Ground and Surface Robotics; Vehicle Motion Controls; Vibration Analysis and Isolation; Vibration and Control for Energy Harvesting; Wind Energy ◽

10.1115/dscc2014-5838 ◽

2014 ◽

Cited By ~ 4

Author(s):

Christopher Pagano ◽

Flavia Tauro ◽

Salvatore Grimaldi ◽

Maurizio Porfiri

Keyword(s):

Particle Image Velocimetry ◽

Large Scale ◽

Field Experiments ◽

Particle Image ◽

Low Cost ◽

Surface Flow ◽

Natural Environments ◽

Natural Stream ◽

Image Velocimetry ◽

Scale Particle

Large scale particle image velocimetry (LSPIV) is a nonintrusive environmental monitoring methodology that allows for continuous characterization of surface flows in natural catchments. Despite its promise, the implementation of LSPIV in natural environments is limited to areas accessible to human operators. In this work, we propose a novel experimental configuration that allows for unsupervised LSPIV over large water bodies. Specifically, we design, develop, and characterize a lightweight, low cost, and stable quadricopter hosting a digital acquisition system. An active gimbal maintains the camera lens orthogonal to the water surface, thus preventing severe image distortions. Field experiments are performed to characterize the vehicle and assess the feasibility of the approach. We demonstrate that the quadricopter can hover above an area of 1×1m2 for 4–5 minutes with a payload of 500g. Further, LSPIV measurements on a natural stream confirm that the methodology can be reliably used for surface flow studies.

Download Full-text