scholarly journals A guide on extracting and tidying tweets with R

2021 ◽  
Vol 2 (4) ◽  
pp. e410
Author(s):  
Julia Bahia Adams ◽  
Carlos Augusto Jardim Chiarelli

Social media platforms represent a deep resource for academic research and a wide range of untapped possibilities for linguists (D'ARCY; YOUNG, 2012). This rapidly developing field presents various ethical issues and unique challenges regarding methods to retrieve and analyze data. This tutorial provides a straightforward guide to harvesting and tidying Twitter data, focused mainly on the Tweets' text, by using the R programming language (R CORE TEAM, 2020) via Twitter's APIs. The R code was developed in Adams (2020), based on the rtweet package (KEARNEY, 2018), and successfully resulted in a script for corpora compilation. In this tutorial, we discuss limitations, problems, and solutions in our framework for conducting ethical research on this social networking site. Our ethical concerns go beyond what we "agree to" in terms of use and privacy policies, that is, we argue that their content does not contemplate all the concerns researchers need to attend to. Additionally, our aim is to show that using Twitter as a data source does not require advanced computational skills.

Author(s):  
Ramin Nabizadeh ◽  
Mostafa Hadei

Introduction: The wide range of studies on air pollution requires accurate and reliable datasets. However, due to many reasons, the measured concentra-tions may be incomplete or biased. The development of an easy-to-use and reproducible exposure assessment method is required for researchers. There-fore, in this article, we describe and present a series of codes written in R Programming Language for data handling, validating and averaging of PM10, PM2.5, and O3 datasets.   Findings: These codes can be used in any types of air pollution studies that seek for PM and ozone concentrations that are indicator of real concentra-tions. We used and combined criteria from several guidelines proposed by US EPA and APHEKOM project to obtain an acceptable methodology. Separate   .csv files for PM 10, PM 2.5 and O3 should be prepared as input file. After the file was imported to the R Programming software, first, negative and zero values of concentrations within all the dataset will be removed. Then, only monitors will be selected that have at least 75% of hourly concentrations. Then, 24-h averages and daily maximum of 8-h moving averages will be calculated for PM and ozone, respectively. For output, the codes create two different sets of data. One contains the hourly concentrations of the interest pollutant (PM10, PM2.5, or O3) in valid stations and their average at city level. Another is the   final 24-h averages of city for PM10 and PM2.5 or the final daily maximum 8-h averages of city for O3. Conclusion: These validated codes use a reliable and valid methodology, and eliminate the possibility of wrong or mistaken data handling and averaging. The use of these codes are free and without any limitation, only after the cita-tion to this article.


Author(s):  
Joe J. Murphy ◽  
Michael A. Duprey ◽  
Robert F. Chew ◽  
Paul P. Biemer ◽  
Kathleen Mullan Harris ◽  
...  

Surveys often require monitoring during data collection to ensure progress in meeting goals or to evaluate the interim results of an embedded experiment. Under complex designs, the amount of data available to monitor may be overwhelming and the production of reports and charts can be costly and time consuming. This is especially true in the case of longitudinal surveys, where data may originate from multiple waves. Other such complex scenarios include adaptive and responsive designs, which were developed to act on the results of such monitoring to implement prespecified options or alternatives in protocols. This paper discusses the development of an interactive web-based data visualization tool, the Adaptive Total Design (ATD) Dashboard, which we designed to provide a wide array of survey staff with the information needed to monitor data collection daily. The dashboard was built using the R programming language and Shiny framework and provides users with a wide range of functionality to quickly assess trends. We present the structure of the data used to populate the dashboard, its design, and the process for hosting it on the web. Furthermore, we provide guidance on graphic design, data taxonomy, and software decisions that can help guide others in the process of developing their own data collection monitoring systems. To illustrate the benefits of the dashboard, we present examples from the National Longitudinal Study of Adolescent to Adult Health (Add Health). We also discuss features of the dashboard to be developed for future waves of Add Health.


2020 ◽  
Author(s):  
Arnab Chanda

Soft tissue surrogate based test dummies are used across industries to simulate real life accidents. To date, there are a wide range of surrogates available in the market, including gels, elastomers, and animal tissues, which are backdated and have mechanical properties very different from actual human tissues. However, in academic research, biofidelic soft tissue surrogates have evolved in the last two decades, but have lacked technology transfer. This book aims to bridge the gap between the industry and academia with the state of the art in soft tissue surrogate research. Surrogates are presented with respect to skin, muscles, brain tissue, arteries, and female pelvis. Fabrication techniques, mechanical testing, and test results required for reproducing these surrogates are discussed. Also, characterization methodologies and limitations of each type of surrogate are presented, for their use in both experimental and computational research. Some major industries which can use these biofidelic surrogates are car manufacturers, prosthetics and orthotics designers, ballistic testing facilities, military and sports equipment manufacturers. Also, hospitals and medical centres can take advantage of these synthetic surrogates over actual tissues for surgical training with minimal biosafety approvals and ethical issues.


2020 ◽  
Vol 9 (9) ◽  
pp. 526
Author(s):  
Innocensia Owuor ◽  
Hartwig H. Hochmair

Social media apps provide analysts with a wide range of data to study behavioral aspects of our everyday lives and to answer societal questions. Although social media data analysis is booming, only a handful of prominent social media apps, such as Twitter, Foursquare/Swarm, Facebook, or LinkedIn are typically used for this purpose. However, there is a large selection of less known social media apps that go unnoticed in the scientific community. This paper reviews 110 social media apps and assesses their potential usability in geospatial research through providing metrics on selected characteristics. About half of the apps (57 out of 110) offer an Application Programming Interface (API) for data access, where rate limits, fee models, and type of spatial data available for download vary strongly between the different apps. To determine the current role and relevance of social media platforms that offer an API in academic research, a search for scientific papers on Google Scholar, the Association for Computing Machinery (ACM) Digital Library, and the Science Core Collection of the Web of Science (WoS) is conducted. This search revealed that Google Scholar returns the highest number of documents (Mean = 183,512) compared to ACM (Mean = 1895) and WoS (Mean = 1495), and that data and usage patterns from prominent social media apps are more frequently analyzed in research studies than those of less known apps. The WoS citation database was also used to generate lists of themes covered in academic publications that analyze the 57 social media platforms that offer an API. Results show that among these 57 platforms, for 26 apps at least some papers evolve around a geospatial discipline, such as Geography, Remote Sensing, Transportation, or Urban Planning. This analysis, therefore, connects apps with commonly used research themes, and together with tabulated API characteristics can help researchers to identify potentially suitable social media apps for their research. Word clouds generated from titles and abstracts of papers associated with the 57 platforms, grouped into seven thematic categories, show further refinement of topics addressed in the analysis of social media platforms. Considering various evaluation criteria, such as provision of geospatial data or the number (i.e., absence) of currently published research papers in connection with a social media platform, the study concludes that among the numerous social media apps available today, 17 less known apps deserve closer examination since they might be used to investigate previously underexplored research topics. It is hoped that this study can serve as a reference for the analysis of the social media landscape in the future.


2020 ◽  
Vol 108 (2) ◽  
pp. 334
Author(s):  
Benjamin H. Saracco

Ivo D. Dinov’s Data Science and Predictive Analytics: Biomedical and Health Applications Using R is a comprehensive twenty-three-chapter text and online course for burgeoning or seasoned biomedical and/or health sciences professionals who analyze data sets using the R programming language.


2018 ◽  
Author(s):  
Li-Min Huang ◽  
Iman Tahamtan

Current study investigates what variables, including expenditures, services, and collections, can predict the total annual pub-lic library visits (VISITS) in the United States. The data source was the 2015 Public Library Service data and reports, and a multiple regression model was used to analyze the data in the R programming language. Results indicated that the best variables for predicating VISITS were a library’s total operating expenditures, usage of public Internet-enabled computers per year, audio-physical units, total children programs, and video-physical units. The best subset model for predicting VISITS included the total number of public libraries, total operating expenditures, print materials, audio-physical units, total children programs, total young programs, and the usage of public Internet-enabled computers per year.


Author(s):  
Alan Kelly

What is scientific research? It is the process by which we learn about the world. For this research to have an impact, and positively contribute to society, it needs to be communicated to those who need to understand its outcomes and significance for them. Any piece of research is not complete until it has been recorded and passed on to those who need to know about it. So, good communication skills are a key attribute for researchers, and scientists today need to be able to communicate through a wide range of media, from formal scientific papers to presentations and social media, and to a range of audiences, from expert peers to stakeholders to the general public. In this book, the goals and nature of scientific communication are explored, from the history of scientific publication; through the stages of how papers are written, evaluated, and published; to what happens after publication, using examples from landmark historical papers. In addition, ethical issues relating to publication, and the damage caused by cases of fabrication and falsification, are explored. Other forms of scientific communication such as conference presentations are also considered, with a particular focus on presenting and writing for nonspecialist audiences, the media, and other stakeholders. Overall, this book provides a broad overview of the whole range of scientific communication and should be of interest to researchers and also those more broadly interested in the process how what scientists do every day translates into outcomes that contribute to society.


Author(s):  
David B. Resnik

This chapter provides an overview of the ethics of environmental health, and it introduces five chapters in the related section of The Oxford Handbook of Public Health Ethics. A wide range of ethical issues arises in managing the relationship between human health and the environment, including regulation of toxic substances, air and water pollution, waste management, agriculture, the built environment, occupational health, energy production and use, environmental justice, population control, and climate change. The values at stake in environmental health ethics include those usually mentioned in ethical debates in biomedicine and public health, such as autonomy, social utility, and justice, as well as values that address environmental concerns, such as animal welfare, stewardship of biological resources, and sustainability. Environmental health ethics, therefore, stands at the crossroads of several disciplines, including public health ethics, environmental ethics, biomedical ethics, and business ethics.


Author(s):  
Takeuchi Ayano

AbstractPublic participation has become increasingly necessary to connect a wide range of knowledge and various values to agenda setting, decision-making and policymaking. In this context, deliberative democratic concepts, especially “mini-publics,” are gaining attention. Generally, mini-publics are conducted with randomly selected lay citizens who provide sufficient information to deliberate on issues and form final recommendations. Evaluations are conducted by practitioner researchers and independent researchers, but the results are not standardized. In this study, a systematic review of existing research regarding practices and outcomes of mini-publics was conducted. To analyze 29 papers, the evaluation methodologies were divided into 4 categories of a matrix between the evaluator and evaluated data. The evaluated cases mainly focused on the following two points: (1) how to maintain deliberation quality, and (2) the feasibility of mini-publics. To create a new path to the political decision-making process through mini-publics, it must be demonstrated that mini-publics can contribute to the decision-making process and good-quality deliberations are of concern to policy-makers and experts. Mini-publics are feasible if they can contribute to the political decision-making process and practitioners can evaluate and understand the advantages of mini-publics for each case. For future research, it is important to combine practical case studies and academic research, because few studies have been evaluated by independent researchers.


2021 ◽  
pp. 007542422098206
Author(s):  
Claudia Claridge ◽  
Ewa Jonsson ◽  
Merja Kytö

Even though intensifiers have received a good deal of attention over the past few decades, downtoners, comprising diminishers and minimizers, have remained by and large a neglected category (but cf. Brinton, this issue). Among downtoners, the adverb little or a little stands out as the most frequent item. It is multifunctional and serves as a diminishing and minimizing intensifier and also in non-degree uses as a quantifier, frequentative, and durative. Therefore, the present paper is devoted to the structural and functional profile of ( a) little in Late Modern English speech-related data. The data source is the socio-pragmatically annotated Old Bailey Corpus (OBC, version 2.0), which allows, among other things, the investigation of the usage of the item among different speaker groups. Our research charts the semantic and formal uses of adverbial little. Downtoner uses outnumber non-degree uses in the data, and diminishing uses are more common than minimizing uses. The formal realization is predominantly a little, with very rare determinerless or modified instances, such as very little. Little modifies a wide range of “targets,” but most frequently adjectives and prepositional phrases, focusing on human states and circumstantial detail. With regard to variation and change, adverbial little declines in use over the 200 years and is used more commonly by speakers from the lower social ranks and by the lay, non-professional participants in the courtroom.


Sign in / Sign up

Export Citation Format

Share Document