scholarly journals Make data sing: The automation of storytelling

2018 ◽  
Vol 5 (1) ◽  
pp. 205395171875668 ◽  
Author(s):  
Kristin Veel

With slogans such as ‘Tell the stories hidden in your data’ ( www.narrativescience.com ) and ‘From data to clear, insightful content – Wordsmith automatically generates narratives on a massive scale that sound like a person crafted each one’ ( www.automatedinsights.com ), a series of companies currently market themselves on the ability to turn data into stories through Natural Language Generation (NLG) techniques. The data interpretation and knowledge production process is here automated, while at the same time hailing narrativity as a fundamental human ability of meaning-making. Reading both the marketing rhetoric and the functionality of the automated narrative services through narrative theory allows for a contextualization of the rhetoric flourishing in Big Data discourse. Building upon case material obtained from companies such as Arria NLG, Automated Insights, Narrativa, Narrative Science, and Yseop, this article argues that what might be seen as a ‘re-turn’ of narrative as a form of knowledge production that can make sense of large data sets inscribes itself in – but also rearticulates – an ongoing debate about what narrative entails. Methodological considerations are thus raised on the one hand about the insights to be gained for critical data studies by turning to literary theory, and on the other hand about how automated technologies may inform our understanding of narrative as a faculty of human meaning-making.

Author(s):  
A. Sheik Abdullah ◽  
R. Suganya ◽  
S. Selvakumar ◽  
S. Rajaram

Classification is considered to be the one of the data analysis technique which can be used over many applications. Classification model predicts categorical continuous class labels. Clustering mainly deals with grouping of variables based upon similar characteristics. Classification models are experienced by comparing the predicted values to that of the known target values in a set of test data. Data classification has many applications in business modeling, marketing analysis, credit risk analysis; biomedical engineering and drug retort modeling. The extension of data analysis and classification makes the insight into big data with an exploration to processing and managing large data sets. This chapter deals with various techniques, methodologies that correspond to the classification problem in data analysis process and its methodological impacts to big data.


Author(s):  
Irina Zakharova

Datafication is widely acknowledged as a process “transforming all things under the sun into a data format” (van Dijck, 2017, p. 11). As data become both objects and instruments of social science, many scholars call for attention to the ways datafication reconfigures scholarly knowledge production, its methodological opportunities, and challenges (Lomborg et al., 2020). This contribution offers a reflection on the interdependence between methodological approaches taken to study datafication and concepts about it, that these approaches provide within the domains of critical data studies and media studies. Expanding on the concept of methods' performativity (Barad, 2007), I apply the notion of methods assemblages: “a continuing process of crafting and enacting necessary boundaries [and relations]" between researchers and all relevant matters (Law, 2004: 144). The key question in the presented study is what kinds of methods assemblages are being applied in current datafication research and what concepts of datafication they produce. 32 expert interviews were conducted with scholars who published empirical work on dataficaiton between 2015 and 2020. Three methods assemblages were developed. Central to distinguishing between methods assemblages are the ways of associating of the involved actors and things. In my analysis the questions of (1) what we are talking about when talking about datafication and (2) kinds of knowledges that researchers were interested in producing can be understood as such ways of associating. The methods assemblages contribute to critical data studies by producing accounts about datafication processes that are in concert with the methods assemblages applied to study these.


Molecules ◽  
2021 ◽  
Vol 26 (17) ◽  
pp. 5291
Author(s):  
José Naveja ◽  
Martin Vogt

Analogue series play a key role in drug discovery. They arise naturally in lead optimization efforts where analogues are explored based on one or a few core structures. However, it is much harder to accurately identify and extract pairs or series of analogue molecules in large compound databases with no predefined core structures. This methodological review outlines the most common and recent methodological developments to automatically identify analogue series in large libraries. Initial approaches focused on using predefined rules to extract scaffold structures, such as the popular Bemis–Murcko scaffold. Later on, the matched molecular pair concept led to efficient algorithms to identify similar compounds sharing a common core structure by exploring many putative scaffolds for each compound. Further developments of these ideas yielded, on the one hand, approaches for hierarchical scaffold decomposition and, on the other hand, algorithms for the extraction of analogue series based on single-site modifications (so-called matched molecular series) by exploring potential scaffold structures based on systematic molecule fragmentation. Eventually, further development of these approaches resulted in methods for extracting analogue series defined by a single core structure with several substitution sites that allow convenient representations, such as R-group tables. These methods enable the efficient analysis of large data sets with hundreds of thousands or even millions of compounds and have spawned many related methodological developments.


1983 ◽  
Vol 16 (1) ◽  
pp. 154-156 ◽  
Author(s):  
M. Sakata ◽  
A. W. Stevenson ◽  
J. Harada

A computer program for calculating the one-phonon thermal diffuse scattering (TDS) contribution to observed integrated intensities of Bragg reflections from single crystals has been written. The program is based on a general formula [Harada & Sakata (1974). Acta Cryst. A30, 77–82; Sakata & Harada (1976). Acta Cryst. A35, 426–433] which is applicable to any crystal system if elastic constants are available. The volume integral with respect to the wavevector, over the region swept out around the reciprocal-lattice point by the counter in the course of a measurement, has been simplified by use of the spherical volume approximation (SVA). Use of the SVA greatly reduces computing time for the case of large data sets. Comparison of the results with those obtained without using the SVA is given and the limitations of the SVA are pointed out.


2021 ◽  
Vol 8 (1) ◽  
pp. 205395172110207
Author(s):  
Simon Aagaard Enni ◽  
Maja Bak Herrie

Machine learning (ML) systems have shown great potential for performing or supporting inferential reasoning through analyzing large data sets, thereby potentially facilitating more informed decision-making. However, a hindrance to such use of ML systems is that the predictive models created through ML are often complex, opaque, and poorly understood, even if the programs “learning” the models are simple, transparent, and well understood. ML models become difficult to trust, since lay-people, specialists, and even researchers have difficulties gauging the reasonableness, correctness, and reliability of the inferences performed. In this article, we argue that bridging this gap in the understanding of ML models and their reasonableness requires a focus on developing an improved methodology for their creation. This process has been likened to “alchemy” and criticized for involving a large degree of “black art,” owing to its reliance on poorly understood “best practices”. We soften this critique and argue that the seeming arbitrariness often is the result of a lack of explicit hypothesizing stemming from an empiricist and myopic focus on optimizing for predictive performance rather than from an occult or mystical process. We present some of the problems resulting from the excessive focus on optimizing generalization performance at the cost of hypothesizing about the selection of data and biases. We suggest embedding ML in a general logic of scientific discovery similar to the one presented by Charles Sanders Peirce, and present a recontextualized version of Peirce’s scientific hypothesis adjusted to ML.


2019 ◽  
pp. 004912411988246 ◽  
Author(s):  
Vincent Arel-Bundock

Qualitative comparative analysis (QCA) is an influential methodological approach motivated by set theory and boolean logic. QCA proponents have developed algorithms to analyze quantitative data, in a bid to uncover necessary and sufficient conditions where causal relationships are complex, conditional, or asymmetric. This article uses computer simulations to show that researchers in the QCA tradition face a vexing double bind. On the one hand, QCA algorithms often require large data sets in order to recover an accurate causal model, even if that model is relatively simple. On the other hand, as data sets increase in size, it becomes harder to guarantee data integrity, and QCA algorithms can be highly sensitive to measurement error, data entry mistakes, or misclassification.


2015 ◽  
Vol 60 (8) ◽  
pp. 1-11
Author(s):  
Mirosław Szreder

The phenomenon of ”big data”, understood as the collection and processing of large data sets, in order to extract from them new knowledge, develops independently of the will of individuals and societies. The driving force behind this development is, on the one hand, rapid technological progress in the field of IT, and on the other the desire of many organizations to gain access to the knowledge accumulated in more and more electronic databases of Internet users, facebook, or twitter. The fact that the challenge is this phenomenon for man and for the statistics, the methodology can in these conditions prove less adequate, treats article. The author tries to argue that in case of protection of individuals and society, devoid of attribute privacy and anonymity, technological progress raises previously unknown threats. As statisticians analytical work hardly keep up with the possibilities offered by ”big data”, as well as the protection of human rights is merely a belated response to the dynamic world of electronic data.


2021 ◽  
Author(s):  
Vincent Arel-Bundock

Qualitative comparative analysis (QCA) is an influential methodological approach motivated by set theory and boolean logic. QCA proponents have developed algorithms to analyze quantitative data, in a bid to uncover necessary and sufficient conditions where causal relationships are complex, conditional, or asymmetric. This article uses computer simulations to show that researchers in the QCA tradition face a vexing double bind. On the one hand, QCA algorithms often require large data sets in order to recover an accurate causal model, even if that model is relatively simple. On the other hand, as data sets increase in size, it becomes harder to guarantee data integrity, and QCA algorithms can be highly sensitive to measurement error, data entry mistakes, or misclassification.


2003 ◽  
Vol 2 (4) ◽  
pp. 218-231 ◽  
Author(s):  
Eduardo Tejada ◽  
Rosane Minghim ◽  
Luis Gustavo Nonato

Projection (or dimensionality reduction) techniques have been used as a means to handling the growing dimensionality of data sets as well as providing a way to visualize information coded into point relationships. Their role is essential in data interpretation and simultaneous use of different projections and their visualizations improve data understanding and increase the level of confidence in the result. For that purpose, projections should be fast to allow multiple views of the same data set. In this work we present a novel fast technique for projecting multi-dimensional data sets into bidimensional (2D) spaces that preserves neighborhood relationships. Additionally, a new technique for improving 2D projections from multi-dimensional data is presented, that helps reduce the inherent loss of information yielded by dimensionality reduction. The results are stimulating and are presented in the form of comparative visualizations against known and new 2D projection techniques. Based on the projection improvement approach presented here, a new metric for quality of projection is also given, that matches well the visual perception of quality. We discuss the implication of using improved projections in visual exploration of large data sets and the role of interaction in visualization of projected subspaces.


PLoS ONE ◽  
2021 ◽  
Vol 16 (10) ◽  
pp. e0258390
Author(s):  
Natalia Sokolova ◽  
Klaus Schoeffmann ◽  
Mario Taschwer ◽  
Stephanie Sarny ◽  
Doris Putzgruber-Adamitsch ◽  
...  

In the light of an increased use of premium intraocular lenses (IOL), such as EDOF IOLs, multifocal IOLs or toric IOLs even minor intraoperative complications such as decentrations or an IOL tilt, will hamper the visual performance of these IOLs. Thus, the post-operative analysis of cataract surgeries to detect even minor intraoperative deviations that might explain a lack of a post-operative success becomes more and more important. Up-to-now surgical videos are evaluated by just looking at a very limited number of intraoperative data sets, or as done in studies evaluating the pupil changes that occur during surgeries, in a small number intraoperative picture only. A continuous measurement of pupil changes over the whole surgery, that would achieve clinically more relevant data, has not yet been described. Therefore, the automatic retrieval of such events may be a great support for a post-operative analysis. This would be especially true if large data files could be evaluated automatically. In this work, we automatically detect pupil reactions in cataract surgery videos. We employ a Mask R-CNN architecture as a segmentation algorithm to segment the pupil and iris with pixel-based accuracy and then track their sizes across the entire video. We can detect pupil reactions with a harmonic mean (H) of Recall, Precision, and Ground Truth Coverage Rate (GTCR) of 60.9% and average prediction length (PL) of 18.93 seconds. However, we consider the best configuration for practical use the one with the H value of 59.4% and PL of 10.2 seconds, which is much shorter. We further investigate the generalization ability of this method on a slightly different dataset without retraining the model. In this evaluation, we achieve the H value of 49.3% with the PL of 18.15 seconds.


Sign in / Sign up

Export Citation Format

Share Document