scholarly journals WHY MAKING MATTERS ONLINE: THE PINTEREST-DIY DATA SET

Author(s):  
Michelle Sylvia Weintraub ◽  
David R W Sears

ABSTRACT The Do-It-Yourself (DIY) community is currently one of the largest creative content communities on Pinterest (Hall et al., 2018), a social networking service (SNS) that encourages users to both share information about creative processes and attempt projects in real life (IRL). Pinterest users share ongoing projects by creating Project “Pins”, which consist of images, videos, and text descriptions of creative content. And yet, while several studies have investigated user behavior in relation to everyday ideation and creativity on the site (Linder et al., 2014, Hu et al., 2018, Mull and Lee, 2014), little is known about the characteristics that lead users to prefer some DIY projects over others. Thus, this paper introduces the Pinterest-DIY data set, which consists of text data mined from 500 DIY project Pins on Pinterest. Using a custom sampling approach, we created a taxonomy of DIY characteristics related to each Pin’s project type, function, materials, and complexity. To measure user preferences on the site, we also conducted a sentiment analysis on user comments for each DIY project Pin. This paper introduces the data set and presents two use cases for the internet research community using both exploratory and confirmatory statistical methods. In our view, the Pinterest-DIY data set will provide further opportunities to examine whether, and to what degree, participation in online DIY communities promotes everyday creativity and increases engagement with physical matter.

2018 ◽  
Vol 2018 ◽  
pp. 1-12 ◽  
Author(s):  
Yancui Shi ◽  
Jianhua Cao ◽  
Congcong Xiong ◽  
Xiankun Zhang

User preference will be impacted by other users. To accurately predict mobile user preference, the influence between users is introduced into the prediction model of user preference. First, the mobile social network is constructed according to the interaction behavior of the mobile user, and the influence of the user is calculated according to the topology of the constructed mobile social network and mobile user behavior. Second, the influence between users is calculated according to the user’s influence, the interaction behavior between users, and the similarity of user preferences. When calculating the influence based on the interaction behavior, the context information is considered; the context information and the order of user preferences are considered when calculating the influence based on the similarity of user preferences. The improved collaborative filtering method is then employed to predict mobile user preferences based on the obtained influence between users. Finally, the experiment is executed on the real data set and the integrated data set, and the results show that the proposed method can obtain more accurate mobile user preferences than those of existing methods.


2020 ◽  
Vol 3 ◽  
pp. 251581631989886 ◽  
Author(s):  
Hao Deng ◽  
Qiushi Wang ◽  
Dana P Turner ◽  
Katherine E Sexton ◽  
Sara M Burns ◽  
...  

Background: Migraine is a highly prevalent disorder that is typically episodic in nature. Social network data reflecting personal commentary on everyday life patterns, including those interrupted by migraine, represent a unique window into the real-life experience of those willing to share them. The experience of a migraine attack might be captured by twitter text data, and this information might be used to complement our current knowledge of activity in the general population and even lead to enhanced prediction. Objective: To characterize tweets reporting migraine activity and to explore their social-behavior features as foundation for further investigations. Methods: A longitudinal cohort study utilizing 1 month of Twitter data from November to December 2014 was conducted. Tweets containing the word “migraine” were extracted, preprocessed, and managed using natural language processing (NLP) techniques. User behavior profiles including tweeting frequencies, high-frequency words, and sentimental presentations were reported and analyzed. Results: During the observation period, 98,622 tweets were captured from 77,335 different users. The overall sentiment of tweets was slightly negative for expressive tweets but neutral for informative tweets. Among posted negative expressive tweets, we found a strong tendency that high-frequent expressions were those with the extreme sentiment, and profanity was common. Conclusions: Twitter users with migraine showed distinct sentimental patterns while suffering from disease onsets exemplified by posting tweets with extreme negative sentiments.


Author(s):  
I. G. Zakharova ◽  
Yu. V. Boganyuk ◽  
M. S. Vorobyova ◽  
E. A. Pavlova

The article goal is to demonstrate the possibilities of the approach to diagnosing the level of IT graduates’ professional competence, based on the analysis of the student’s digital footprint and the content of the corresponding educational program. We describe methods for extracting student professional level indicators from digital footprint text data — courses’ descriptions and graduation qualification works. We show methods of comparing these indicators with the formalized requirements of employers, reflected in the texts of vacancies in the field of information technology. The proposed approach was applied at the Institute of Mathematics and Computer Science of the University of Tyumen. We performed diagnostics using a data set that included texts of courses’ descriptions for IT areas of undergraduate studies, 542 graduation qualification works in these areas, 879 descriptions of job requirements and information on graduate employment. The presented approach allows us to evaluate the relevance of the educational program as a whole and the level of professional competence of each student based on objective data. The results were used to update the content of some major courses and to include new elective courses in the curriculum.


2019 ◽  
Vol 13 (1) ◽  
pp. 20-27 ◽  
Author(s):  
Srishty Jindal ◽  
Kamlesh Sharma

Background: With the tremendous increase in the use of social networking sites for sharing the emotions, views, preferences etc. a huge volume of data and text is available on the internet, there comes the need for understanding the text and analysing the data to determine the exact intent behind the same for a greater good. This process of understanding the text and data involves loads of analytical methods, several phases and multiple techniques. Efficient use of these techniques is important for an effective and relevant understanding of the text/data. This analysis can in turn be very helpful in ecommerce for targeting audience, social media monitoring for anticipating the foul elements from society and take proactive actions to avoid unethical and illegal activities, business analytics, market positioning etc. Method: The goal is to understand the basic steps involved in analysing the text data which can be helpful in determining sentiments behind them. This review provides detailed description of steps involved in sentiment analysis with the recent research done. Patents related to sentiment analysis and classification are reviewed to throw some light in the work done related to the field. Results: Sentiment analysis determines the polarity behind the text data/review. This analysis helps in increasing the business revenue, e-health, or determining the behaviour of a person. Conclusion: This study helps in understanding the basic steps involved in natural language understanding. At each step there are multiple techniques that can be applied on data. Different classifiers provide variable accuracy depending upon the data set and classification technique used.


2021 ◽  
Author(s):  
Amarildo Likmeta ◽  
Alberto Maria Metelli ◽  
Giorgia Ramponi ◽  
Andrea Tirinzoni ◽  
Matteo Giuliani ◽  
...  

AbstractIn real-world applications, inferring the intentions of expert agents (e.g., human operators) can be fundamental to understand how possibly conflicting objectives are managed, helping to interpret the demonstrated behavior. In this paper, we discuss how inverse reinforcement learning (IRL) can be employed to retrieve the reward function implicitly optimized by expert agents acting in real applications. Scaling IRL to real-world cases has proved challenging as typically only a fixed dataset of demonstrations is available and further interactions with the environment are not allowed. For this reason, we resort to a class of truly batch model-free IRL algorithms and we present three application scenarios: (1) the high-level decision-making problem in the highway driving scenario, and (2) inferring the user preferences in a social network (Twitter), and (3) the management of the water release in the Como Lake. For each of these scenarios, we provide formalization, experiments and a discussion to interpret the obtained results.


Biosensors ◽  
2021 ◽  
Vol 11 (8) ◽  
pp. 257
Author(s):  
Sebastian Fudickar ◽  
Eike Jannik Nustede ◽  
Eike Dreyer ◽  
Julia Bornhorst

Caenorhabditis elegans (C. elegans) is an important model organism for studying molecular genetics, developmental biology, neuroscience, and cell biology. Advantages of the model organism include its rapid development and aging, easy cultivation, and genetic tractability. C. elegans has been proven to be a well-suited model to study toxicity with identified toxic compounds closely matching those observed in mammals. For phenotypic screening, especially the worm number and the locomotion are of central importance. Traditional methods such as human counting or analyzing high-resolution microscope images are time-consuming and rather low throughput. The article explores the feasibility of low-cost, low-resolution do-it-yourself microscopes for image acquisition and automated evaluation by deep learning methods to reduce cost and allow high-throughput screening strategies. An image acquisition system is proposed within these constraints and used to create a large data-set of whole Petri dishes containing C. elegans. By utilizing the object detection framework Mask R-CNN, the nematodes are located, classified, and their contours predicted. The system has a precision of 0.96 and a recall of 0.956, resulting in an F1-Score of 0.958. Considering only correctly located C. elegans with an [email protected] IoU, the system achieved an average precision of 0.902 and a corresponding F1 Score of 0.906.


2021 ◽  
pp. 58-60
Author(s):  
Naziru Fadisanku Haruna ◽  
Ran Vijay Kumar Singh ◽  
Samsudeen Dahiru

In This paper a modied ratio-type estimator for nite population mean under stratied random sampling using single auxiliary variable has been proposed. The expression for mean square error and bias of the proposed estimator are derived up to the rst order of approximation. The expression for minimum mean square error of proposed estimator is also obtained. The mean square error the proposed estimator is compared with other existing estimators theoretically and condition are obtained under which proposed estimator performed better. A real life population data set has been considered to compare the efciency of the proposed estimator numerically.


2021 ◽  
Author(s):  
Annette Dietmaier ◽  
Thomas Baumann

<p>The European Water Framework Directive (WFD) commits EU member states to achieve a good qualitative and quantitative status of all their water bodies.  WFD provides a list of actions to be taken to achieve the goal of good status.  However, this list disregards the specific conditions under which deep (> 400 m b.g.l.) groundwater aquifers form and exist.  In particular, deep groundwater fluid composition is influenced by interaction with the rock matrix and other geofluids, and may assume a bad status without anthropogenic influences. Thus, a new concept with directions of monitoring and modelling this specific kind of aquifers is needed. Their status evaluation must be based on the effects induced by their exploitation. Here, we analyze long-term real-life production data series to detect changes in the hydrochemical deep groundwater characteristics which might be triggered by balneological and geothermal exploitation. We aim to use these insights to design a set of criteria with which the status of deep groundwater aquifers can be quantitatively and qualitatively determined. Our analysis is based on a unique long-term hydrochemical data set, taken from 8 balneological and geothermal sites in the molasse basin of Lower Bavaria, Germany, and Upper Austria. It is focused on a predefined set of annual hydrochemical concentration values. The data range dates back to 1937. Our methods include developing threshold corridors, within which a good status can be assumed, and developing cluster analyses, correlation, and piper diagram analyses. We observed strong fluctuations in the hydrochemical characteristics of the molasse basin deep groundwater during the last decades. Special interest is put on fluctuations that seem to have a clear start and end date, and to be correlated with other exploitation activities in the region. For example, during the period between 1990 and 2020, bicarbonate and sodium values displayed a clear increase, followed by a distinct dip to below-average values and a subsequent return to average values at site F. During the same time, these values showed striking irregularities at site B. Furthermore, we observed fluctuations in several locations, which come close to disqualifying quality thresholds, commonly used in German balneology. Our preliminary results prove the importance of using long-term (multiple decades) time series analysis to better inform quality and quantity assessments for deep groundwater bodies: most fluctuations would stay undetected within a < 5 year time series window, but become a distinct irregularity when viewed in the context of multiple decades. In the next steps, a quality assessment matrix and threshold corridors will be developed, which take into account methods to identify these fluctuations. This will ultimately aid in assessing the sustainability of deep groundwater exploitation and reservoir management for balneological and geothermal uses.</p>


2020 ◽  
Vol 30 (11n12) ◽  
pp. 1759-1777
Author(s):  
Jialing Liang ◽  
Peiquan Jin ◽  
Lin Mu ◽  
Jie Zhao

With the development of Web 2.0, social media such as Twitter and Sina Weibo have become an essential platform for disseminating hot events. Simultaneously, due to the free policy of microblogging services, users can post user-generated content freely on microblogging platforms. Accordingly, more and more hot events on microblogging platforms have been labeled as spammers. Spammers will not only hurt the healthy development of social media but also introduce many economic and social problems. Therefore, the government and enterprises must distinguish whether a hot event on microblogging platforms is a spammer or is a naturally-developing event. In this paper, we focus on the hot event list on Sina Weibo and collect the relevant microblogs of each hot event to study the detecting methods of spammers. Notably, we develop an integral feature set consisting of user profile, user behavior, and user relationships to reflect various factors affecting the detection of spammers. Then, we employ typical machine learning methods to conduct extensive experiments on detecting spammers. We use a real data set crawled from the most prominent Chinese microblogging platform, Sina Weibo, and evaluate the performance of 10 machine learning models with five sampling methods. The results in terms of various metrics show that the Random Forest model and the over-sampling method achieve the best accuracy in detecting spammers and non-spammers.


2020 ◽  
Vol 13 (10) ◽  
pp. 1669-1681
Author(s):  
Zijing Tan ◽  
Ai Ran ◽  
Shuai Ma ◽  
Sheng Qin

Pointwise order dependencies (PODs) are dependencies that specify ordering semantics on attributes of tuples. POD discovery refers to the process of identifying the set Σ of valid and minimal PODs on a given data set D. In practice D is typically large and keeps changing, and it is prohibitively expensive to compute Σ from scratch every time. In this paper, we make a first effort to study the incremental POD discovery problem, aiming at computing changes ΔΣ to Σ such that Σ ⊕ ΔΣ is the set of valid and minimal PODs on D with a set Δ D of tuple insertion updates. (1) We first propose a novel indexing technique for inputs Σ and D. We give algorithms to build and choose indexes for Σ and D , and to update indexes in response to Δ D. We show that POD violations w.r.t. Σ incurred by Δ D can be efficiently identified by leveraging the proposed indexes, with a cost dependent on log (| D |). (2) We then present an effective algorithm for computing ΔΣ, based on Σ and identified violations caused by Δ D. The PODs in Σ that become invalid on D + Δ D are efficiently detected with the proposed indexes, and further new valid PODs on D + Δ D are identified by refining those invalid PODs in Σ on D + Δ D. (3) Finally, using both real-life and synthetic datasets, we experimentally show that our approach outperforms the batch approach that computes from scratch, up to orders of magnitude.


Sign in / Sign up

Export Citation Format

Share Document