Data Journalism in the Age of Big Data: An Exploration into the Uptake of Data Journalism in Leading South African Newspapers

Author(s):  
Dumisani Moyo ◽  
Allen Munoriyarwa
Author(s):  
Victor Olago ◽  
Lina Bartels ◽  
Tafadzwa Dhokotera ◽  
Lina Bartels ◽  
Julia Bohlius ◽  
...  

IntroductionThe South African HIV Cancer Match (SAM) study is a probabilistic record linkage study involving creation of an HIV cohort from laboratory records from the National Health Laboratory Service (NHLS). This cohort was linked to the pathology based South African National Cancer Registry to establish cancer incidences among HIV positive population in South Africa. As the number of HIV records increases, there is need for more efficient ways of de-duplicating this big-data. In this work, we used clustering to perform big-data deduplication. Objectives and ApproachOur objective was to use DBSCAN as clustering algorithm together with bi-gram word analyser to perform big-data deduplication in resource-limited settings. We used HIV related laboratory records from entire South Africa collated in the NHLS Corporate Data Warehouse for period 2004-2014. This involved data pre-processing, deterministic deduplication, ngrams generation, features generation using Term Frequency Inverse Document Frequency vectorizer, clustering using DBSCAN and assigning cluster labels for records that potentially belonged to the same person. We used records with national identification numbers to assess quality of deduplication by calculating precision, recall and f-measure. ResultsWe had 51,563,127 HIV related laboratory records. Deterministic deduplication resulted in 20,387,819 patient record deduplicates. With DBSCAN clustering we further reduced this to 14,849,524 patient record clusters. In this final dataset, 3,355,544 (22.60%) patients had negative HIV test, 11,316,937 (76.21%) had evidence for HIV infection, and for 177,043 (1.19%) the HIV status could not be determined. The precision, recall and f-measure based on 1,865,445 records with national identification numbers were 0.96, 0.94 and 0.95, respectively. Conclusion / ImplicationsOur study demonstrated that DBSCAN clustering is an effective way of deduplicating big datasets in resource-limited settings. This enabled refining of an HIV observational database by accurately linking test records that potentially belonged to the same person. The methodology creates opportunities for easy data profiling to inform public health decision making.


Author(s):  
Surajit Bag

The study considers samples from the South African engineering companies who are strategic suppliers to mining and minerals industry and further explores the uncertainties persisting in the supply chain network. Further investigation was done to understand the role of big data and predictive analysis (BDPA) in managing the supply uncertainties. The paper finally uses partial least square regression analysis to study the relationship among buyer-supplier relationship, big data and predictive analysis and supply chain performance. The analysis supported the second and third hypothesis. Therefore, it is established that firstly, there is a positive relationship between big data, predictive analysis and supply chain performance and secondly, there is a positive relationship between and big data, predictive analysis and buyer-supplier relationship. The study is a unique contribution to the current literature by shedding light on the practical problems persisting in the South African context.


2019 ◽  
Author(s):  
Susan Brokensha ◽  
◽  
Eduan Kotzé ◽  
Burgert A Senekal ◽  
◽  
...  

1970 ◽  
pp. 193-208
Author(s):  
María Teresa Sandoval-Martín ◽  
Leonardo La-Rosa

The use of methods of the social sciences and computational tools to analyze databases in journalism has had several definitions since Philip Meyer called it precision journalism (PJ). In the last decade, this specialty has had an important development under the term data journalism (DJ), in a differentiating technological and sociocultural environment: Big Data. This research aims to differentiate DJ from PJ and computer assisted reporting (CAR) with a perspective taken from the science and technology studies, focusing the news as a boundary object between programmers, designers, journalists and other actors that now are part of the news production process. For this purpose, 14 in-depth interviews have been made from 2015 to 2017 to data journalists from Spain (8), EEUU (1) and Finland (1); PP, PD and transparency academic experts from Spain (1) and Finland (2); and one expert in transparency acts y access to public information in Spain, Europe and Latin American. As a result, it can be affirmed that big data is differentiating element of DJ because it is a sociocultural context where the open data philosophy, free software, collaborative and team work are part of its identity.


Author(s):  
Роман Валерьевич Ерженин ◽  
Зинаида Андреевна Бахвалова ◽  
Евгений Дмитриевич Волков ◽  
Александр Андреевич Абзаев

В статье рассматривается проблема адаптации образовательной программы подготовки ИТ-специалистов к решению сложных инженерных и, одновременно, творческих задач, связанных с развитием дата-журналистики. В качестве источника больших данных используются открытые государственные данные о системе здравоохранения. В статье отражены отдельные положения проектного обучения, сформированного на стандартах проектной и системно-инженерной деятельности, а также на принципах проведения хакатонов, где главным приоритетом является интеллектуальная составляющая работы команды и ее идеи. Приведены некоторые результаты использования проектного обучения при взаимодействии студентов - будущих ИТ-разработчиков с журналистами и исследователями данных. Предложенный подход и полученные на его основе результаты проектного обучения могут использоваться при создании методических рекомендаций для разработки совместных междисциплинарных образовательных программ, объединяющих сферу подготовки будущих ИТ-специалистов и дата-журналистов. The article discusses the problem of adapting the educational program of training IT specialists to solving complex engineering and, at the same time, creative problems associated with the development of data journalism. Open government data on the health care system is used as a source of big data. The article proposes the main provisions of the approach to project training, formed on the standards of project and system engineering activities, as well as on the principles of conducting hackathons, where the main priority is the intellectual component of the team's work and its ideas. Some results of the use of project-based learning in the interaction of students - future IT developers with journalists and data researchers are presented. The proposed approach and the results of project training obtained on its basis can be used to create guidelines for the development of joint interdisciplinary educational programs that combine the field of training future IT specialists and data journalists.


Author(s):  
Namhla Matiwane ◽  
Tiko Iyamu

Within the South African government, there is an increasing amount of data. The problem is that the South African government is struggling to employ the concept of big data analytics (BDA) for the analysis of its big data. This could be attributed to know-how from both technical and nontechnical perspectives. Failure to implement BDA and ensure appropriate use hinders government enterprises and agencies in their drive to deliver quality service. A government enterprise was selected and used as a case in this study primarily because the concept of BDA is new to many South African government departments. Data was collected through in-depth interviews. From the analysis, four factors—knowledge, process, differentiation, and skillset—that can influence implementation of BDA for government enterprises were revealed. Based on the factors, a set of criteria in the form of a model was developed.


Author(s):  
Gordon J. Murray

In this chapter, context for understanding the phenomenon of “big data” and disruptive innovation is introduced relative to current changes affecting the future of the journalism industry. Perspective is provided on market forces and emerging technologies that now shape the demand for data journalism. Current best practices and strategies to analyze, scrape, personalize, visualize and map data are presented. Trends and resources to access data and effectively analyze information are outlined for journalists to use when researching and reporting online. Three contemporary case studies explore the day-to-day operations and decision-making processes of media organizations struggling to remain profitable; adapt to changing consumer demands and try to serve a new demographic that is increasingly global, wireless, mobile and socially networked.


Sign in / Sign up

Export Citation Format

Share Document