Data Journalism in the Age of Big Data: An Exploration into the Uptake of Data Journalism in Leading South African Newspapers

IntroductionThe South African HIV Cancer Match (SAM) study is a probabilistic record linkage study involving creation of an HIV cohort from laboratory records from the National Health Laboratory Service (NHLS). This cohort was linked to the pathology based South African National Cancer Registry to establish cancer incidences among HIV positive population in South Africa. As the number of HIV records increases, there is need for more efficient ways of de-duplicating this big-data. In this work, we used clustering to perform big-data deduplication. Objectives and ApproachOur objective was to use DBSCAN as clustering algorithm together with bi-gram word analyser to perform big-data deduplication in resource-limited settings. We used HIV related laboratory records from entire South Africa collated in the NHLS Corporate Data Warehouse for period 2004-2014. This involved data pre-processing, deterministic deduplication, ngrams generation, features generation using Term Frequency Inverse Document Frequency vectorizer, clustering using DBSCAN and assigning cluster labels for records that potentially belonged to the same person. We used records with national identification numbers to assess quality of deduplication by calculating precision, recall and f-measure. ResultsWe had 51,563,127 HIV related laboratory records. Deterministic deduplication resulted in 20,387,819 patient record deduplicates. With DBSCAN clustering we further reduced this to 14,849,524 patient record clusters. In this final dataset, 3,355,544 (22.60%) patients had negative HIV test, 11,316,937 (76.21%) had evidence for HIV infection, and for 177,043 (1.19%) the HIV status could not be determined. The precision, recall and f-measure based on 1,865,445 records with national identification numbers were 0.96, 0.94 and 0.95, respectively. Conclusion / ImplicationsOur study demonstrated that DBSCAN clustering is an effective way of deduplicating big datasets in resource-limited settings. This enabled refining of an HIV observational database by accurately linking test records that potentially belonged to the same person. The methodology creates opportunities for easy data profiling to inform public health decision making.

Download Full-text

Determining South African distribution power system big data integrity using fuzzy logic: Power measurements data application

2017 Pattern Recognition Association of South Africa and Robotics and Mechatronics (PRASA-RobMech) ◽

10.1109/robomech.2017.8261137 ◽

2017 ◽

Cited By ~ 1

Author(s):

Sibonelo Motepe ◽

Bhekisipho Twala ◽

Riaan Stopforth

Keyword(s):

Fuzzy Logic ◽

Big Data ◽

Power System ◽

South African ◽

Data Integrity ◽

Data Application ◽

Power Measurements

Download Full-text

Big Data Driven Decision Making Guidelines for South African Banking Institutions

10.1109/icabcd51485.2021.9519373 ◽

2021 ◽

Author(s):

Komla Pillay ◽

Alta van der Merwe

Keyword(s):

Decision Making ◽

Big Data ◽

South African ◽

Data Driven ◽

Data Driven Decision Making ◽

Banking Institutions

Download Full-text

Big Data and Predictive Analysis is Key to Superior Supply Chain Performance

International Journal of Information Systems and Supply Chain Management ◽

10.4018/ijisscm.2017040104 ◽

2017 ◽

Vol 10 (2) ◽

pp. 66-84 ◽

Cited By ~ 15

Author(s):

Surajit Bag

Keyword(s):

Big Data ◽

Supply Chain ◽

South African ◽

Positive Relationship ◽

Partial Least Square ◽

Predictive Analysis ◽

Supply Chain Performance ◽

Unique Contribution ◽

The South ◽

Supplier Relationship

The study considers samples from the South African engineering companies who are strategic suppliers to mining and minerals industry and further explores the uncertainties persisting in the supply chain network. Further investigation was done to understand the role of big data and predictive analysis (BDPA) in managing the supply uncertainties. The paper finally uses partial least square regression analysis to study the relationship among buyer-supplier relationship, big data and predictive analysis and supply chain performance. The analysis supported the second and third hypothesis. Therefore, it is established that firstly, there is a positive relationship between big data, predictive analysis and supply chain performance and secondly, there is a positive relationship between and big data, predictive analysis and buyer-supplier relationship. The study is a unique contribution to the current literature by shedding light on the practical problems persisting in the South African context.

Download Full-text

Reinventing the Social Scientist and Humanist in the Era of Big Data - A Perspective from South African Scholars

10.18820/9781928424376 ◽

2019 ◽

Author(s):

Susan Brokensha ◽

◽

Eduan Kotzé ◽

Burgert A Senekal ◽

◽

...

Keyword(s):

Big Data ◽

South African ◽

Social Scientist ◽

The Social

Download Full-text

Big Data as a differentiating sociocultural element of data journalism: the perception of data journalists and experts

Communication & Society ◽

10.15581/003.31.4.193-209 ◽

1970 ◽

pp. 193-208

Author(s):

María Teresa Sandoval-Martín ◽

Leonardo La-Rosa

Keyword(s):

Big Data ◽

Latin American ◽

Science And Technology Studies ◽

Open Data ◽

Public Information ◽

Team Work ◽

Boundary Object ◽

Computer Assisted ◽

Sociocultural Environment ◽

Data Journalism

The use of methods of the social sciences and computational tools to analyze databases in journalism has had several definitions since Philip Meyer called it precision journalism (PJ). In the last decade, this specialty has had an important development under the term data journalism (DJ), in a differentiating technological and sociocultural environment: Big Data. This research aims to differentiate DJ from PJ and computer assisted reporting (CAR) with a perspective taken from the science and technology studies, focusing the news as a boundary object between programmers, designers, journalists and other actors that now are part of the news production process. For this purpose, 14 in-depth interviews have been made from 2015 to 2017 to data journalists from Spain (8), EEUU (1) and Finland (1); PP, PD and transparency academic experts from Spain (1) and Finland (2); and one expert in transparency acts y access to public information in Spain, Europe and Latin American. As a result, it can be affirmed that big data is differentiating element of DJ because it is a sociocultural context where the open data philosophy, free software, collaborative and team work are part of its identity.

Download Full-text

PROJECT LEARNING: CREATING A BIG DATA PROCESSING TOOLKIT FOR USE IN DATE JOURNALISM

Информационные и математические технологии в науке и управлении ◽

10.38028/esi.2021.24.4.011 ◽

2022 ◽

pp. 111-124

Author(s):

Роман Валерьевич Ерженин ◽

Зинаида Андреевна Бахвалова ◽

Евгений Дмитриевич Волков ◽

Александр Андреевич Абзаев

Keyword(s):

Big Data ◽

Educational Program ◽

Educational Programs ◽

Project Based Learning ◽

Open Government ◽

Project Learning ◽

Open Government Data ◽

Data Journalism ◽

Complex Engineering ◽

Government Data

В статье рассматривается проблема адаптации образовательной программы подготовки ИТ-специалистов к решению сложных инженерных и, одновременно, творческих задач, связанных с развитием дата-журналистики. В качестве источника больших данных используются открытые государственные данные о системе здравоохранения. В статье отражены отдельные положения проектного обучения, сформированного на стандартах проектной и системно-инженерной деятельности, а также на принципах проведения хакатонов, где главным приоритетом является интеллектуальная составляющая работы команды и ее идеи. Приведены некоторые результаты использования проектного обучения при взаимодействии студентов - будущих ИТ-разработчиков с журналистами и исследователями данных. Предложенный подход и полученные на его основе результаты проектного обучения могут использоваться при создании методических рекомендаций для разработки совместных междисциплинарных образовательных программ, объединяющих сферу подготовки будущих ИТ-специалистов и дата-журналистов. The article discusses the problem of adapting the educational program of training IT specialists to solving complex engineering and, at the same time, creative problems associated with the development of data journalism. Open government data on the health care system is used as a source of big data. The article proposes the main provisions of the approach to project training, formed on the standards of project and system engineering activities, as well as on the principles of conducting hackathons, where the main priority is the intellectual component of the team's work and its ideas. Some results of the use of project-based learning in the interaction of students - future IT developers with journalists and data researchers are presented. The proposed approach and the results of project training obtained on its basis can be used to create guidelines for the development of joint interdisciplinary educational programs that combine the field of training future IT specialists and data journalists.

Download Full-text

A Probe for the Way of Data Journalism Development at the Newsrooms in Big Data Era

Korean Publishing Science Society ◽

10.21732/skps.2016.74.165 ◽

2016 ◽

pp. 165-207

Author(s):

Dong Woo Chung ◽

Keyword(s):

Big Data ◽

Data Journalism ◽

The Way

Download Full-text

Implementation of Big Data Analytics for Government Enterprise

Empowering Businesses With Collaborative Enterprise Architecture Frameworks - Advances in Business Information Systems and Analytics ◽

10.4018/978-1-5225-8229-8.ch008 ◽

2021 ◽

pp. 182-198

Author(s):

Namhla Matiwane ◽

Tiko Iyamu

Keyword(s):

Big Data ◽

South African ◽

Data Analytics ◽

Big Data Analytics ◽

South African Government ◽

The South ◽

African Government ◽

Knowledge Process ◽

Government Enterprise ◽

Depth Interviews

Within the South African government, there is an increasing amount of data. The problem is that the South African government is struggling to employ the concept of big data analytics (BDA) for the analysis of its big data. This could be attributed to know-how from both technical and nontechnical perspectives. Failure to implement BDA and ensure appropriate use hinders government enterprises and agencies in their drive to deliver quality service. A government enterprise was selected and used as a case in this study primarily because the concept of BDA is new to many South African government departments. Data was collected through in-depth interviews. From the analysis, four factors—knowledge, process, differentiation, and skillset—that can influence implementation of BDA for government enterprises were revealed. Based on the factors, a set of criteria in the form of a model was developed.

Download Full-text

The Disruptive Impact of Emerging Technology

Advances in Media, Entertainment, and the Arts - Contemporary Research Methods and Data Analytics in the News Industry ◽

10.4018/978-1-4666-8580-2.ch013 ◽

2015 ◽

pp. 233-256 ◽

Cited By ~ 1

Author(s):

Gordon J. Murray

Keyword(s):

Decision Making ◽

Big Data ◽

Disruptive Innovation ◽

Emerging Technology ◽

Decision Making Processes ◽

Data Journalism ◽

Day To Day Operations ◽

Map Data ◽

Access Data ◽

Wireless Mobile

In this chapter, context for understanding the phenomenon of “big data” and disruptive innovation is introduced relative to current changes affecting the future of the journalism industry. Perspective is provided on market forces and emerging technologies that now shape the demand for data journalism. Current best practices and strategies to analyze, scrape, personalize, visualize and map data are presented. Trends and resources to access data and effectively analyze information are outlined for journalists to use when researching and reporting online. Three contemporary case studies explore the day-to-day operations and decision-making processes of media organizations struggling to remain profitable; adapt to changing consumer demands and try to serve a new demographic that is increasingly global, wireless, mobile and socially networked.

Download Full-text