Getting the foundations right

Statistical Journal of the IAOS ◽

10.3233/sji-210859 ◽

2021 ◽

pp. 1-11

Author(s):

Helen MacGillivray

Keyword(s):

Big Data ◽

Data Science ◽

Authentic Learning ◽

Educational Resources ◽

Technological Capabilities ◽

Data Types ◽

Official Statistics ◽

Statistical Literacy ◽

2030 Agenda ◽

Access To Data

There has been increasing interest in recent years in training in official statistics with reference to the 2030 Agenda, big data, diversification of data types and sources, and data science. Backgrounds for work in official statistics are becoming more varied than ever. The official statistics community has also become progressively more aware of the importance of statistical literacy in education and trust in official statistics. Hence foundation and introductory are of as much interest to official statistics as more specialised training. At the same time, greater access to data and vast technological capabilities has seen much emphasis and discussion of the statistical and data sciences and education therein, including development of educational resources in contexts such as civic data and statistics. Data science provides opportunities to renew the decades-long push for authentic learning that reflects the practice of ‘greater statistics’ and ‘greater data science’, and to examine progress to date in implementing and sustaining the extensive work and advocacy of many. This article discusses what is needed at the foundation and introductory levels to realize this advocacy, with commentary relevant to official statistics.

Download Full-text

Big Data Techniques for Supporting Official Statistics

Web Services ◽

10.4018/978-1-5225-7501-6.ch040 ◽

2019 ◽

pp. 728-744 ◽

Cited By ~ 1

Author(s):

Antonino Virgillito ◽

Federico Polidoro

Keyword(s):

Big Data ◽

Data Collection ◽

Data Science ◽

Official Statistics ◽

The Core ◽

Web Scraping ◽

Collection Process ◽

Data Collection Process ◽

Data Source ◽

Use Of Internet

Following the advent of Big Data, statistical offices have been largely exploring the use of Internet as data source for modernizing their data collection process. Particularly, prices are collected online in several statistical institutes through a technique known as web scraping. The objective of the chapter is to discuss the challenges of web scraping for setting up a continuous data collection process, exploring and classifying the more widespread techniques and presenting how they are used in practical cases. The main technical notions behind web scraping are presented and explained in order to give also to readers with no background in IT the sufficient elements to fully comprehend scraping techniques, promoting the building of mixed skills that is at the core of the spirit of modern data science. Challenges for official statistics deriving from the use of web scraping are briefly sketched. Finally, research ideas for overcoming the limitations of current techniques are presented and discussed.

Download Full-text

Linking Educational Resources on Data Science

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33019404 ◽

2019 ◽

Vol 33 ◽

pp. 9404-9409

Author(s):

José Luis Ambite ◽

Jonathan Gordon ◽

Lily Fierro ◽

Gully Burns ◽

Joel Mathew

Keyword(s):

Big Data ◽

Linked Data ◽

Data Science ◽

Resource Discovery ◽

Educational Resources ◽

Machine Learning Techniques ◽

Biomedical Data ◽

Centers Of Excellence ◽

Training Resources ◽

The Web

The availability of massive datasets in genetics, neuroimaging, mobile health, and other subfields of biology and medicine promises new insights but also poses significant challenges. To realize the potential of big data in biomedicine, the National Institutes of Health launched the Big Data to Knowledge (BD2K) initiative, funding several centers of excellence in biomedical data analysis and a Training Coordinating Center (TCC) tasked with facilitating online and inperson training of biomedical researchers in data science. A major initiative of the BD2K TCC is to automatically identify, describe, and organize data science training resources available on the Web and provide personalized training paths for users. In this paper, we describe the construction of ERuDIte, the Educational Resource Discovery Index for Data Science, and its release as linked data. ERuDIte contains over 11,000 training resources including courses, video tutorials, conference talks, and other materials. The metadata for these resources is described uniformly using Schema.org. We use machine learning techniques to tag each resource with concepts from the Data Science Education Ontology, which we developed to further describe resource content. Finally, we map references to people and organizations in learning resources to entities in DBpedia, DBLP, and ORCID, embedding our collection in the web of linked data. We hope that ERuDIte will provide a framework to foster open linked educational resources on the Web.

Download Full-text

Training institutes and training in official statistics in Africa: An overview

Statistical Journal of the IAOS ◽

10.3233/sji-210838 ◽

2021 ◽

Vol 37 (3) ◽

pp. 835-851

Author(s):

Hugues Kouassi Kouadio

Keyword(s):

Big Data ◽

Literature Review ◽

Current Situation ◽

Official Statistics ◽

Statistical Literacy ◽

The Face ◽

Technological Developments ◽

Language Areas ◽

University Training ◽

And Training

The training offer of official statistics in statistical training institutes has been constantly evolving as it adapts to the statistical environment and technological developments. Based on a literature review and the mobilisation of curricula and programmes offered by statistical training centres in Africa, this paper presents the current situation of training in official statistics as well as the challenges to be faced. Despite harmonisation efforts, there are still differences between language areas and training types. Engineer and vocational statistical training are better suited to the needs of National Statistical Institutes than university training. It is essential that the training of statisticians is strategically thought out so that they can be reactive and dynamic in the face of changes and upheavals they will be confronted with in the context of data revolution and big data. Their training should reinforce the statistical literacy dimension with a view to reducing the gap between producers and users.

Download Full-text

Big Data Techniques for Supporting Official Statistics

Data Visualization and Statistical Literacy for Open and Big Data - Advances in Data Mining and Database Management ◽

10.4018/978-1-5225-2512-7.ch010 ◽

2017 ◽

pp. 253-273

Author(s):

Antonino Virgillito ◽

Federico Polidoro

Keyword(s):

Big Data ◽

Data Collection ◽

Data Science ◽

Official Statistics ◽

The Core ◽

Web Scraping ◽

Collection Process ◽

Data Collection Process ◽

Data Source ◽

Use Of Internet

Download Full-text

Linking statistical literacy and data stewardship in Public Universities of Niger: Lessons learned from the collaboration with the national statistics institute

Statistical Journal of the IAOS ◽

10.3233/sji-200708 ◽

2020 ◽

Vol 36 ◽

pp. 63-72

Author(s):

Ibrahim Sidi Zakari

Keyword(s):

Big Data ◽

Data Science ◽

Public Universities ◽

Open Data ◽

Lessons Learned ◽

Mathematical Methods ◽

Statistical Literacy ◽

National Statistics ◽

Emerging Issues ◽

Data Stewardship

This paper aims at highlighting the lessons learned from recent initiatives between public universities of Niger and the national statistics institute. Our investigation of the existing national statistical system revealed the need to increase the number of qualified human resources with advanced skills in open data, big data, data visualisation, machine learning, mathematical modeling and data-driven innovations. Moreover, the existing statistical literacy and data crowdsourcing activities need to be validated and upscaled; and we have found a lack of experience in managing big data and in the development of mathematical methods and fast computational algorithms to analyze them. Finally, the aforementioned collaboration can be improved by working closely with private sector, civil society and the data science community to generate new approaches to emerging issues including climate change and sustainable development.

Download Full-text

BIG-DATA LITERACY AS A NEW VOCATION FOR STATISTICAL LITERACY

STATISTICS EDUCATION RESEARCH JOURNAL ◽

10.52041/serj.v19i1.130 ◽

2020 ◽

Vol 19 (1) ◽

pp. 194-205

Author(s):

KAREN FRANÇOIS ◽

CARLOS MONTEIRO ◽

PATRICK ALLO

Keyword(s):

Big Data ◽

Data Science ◽

Statistics Education ◽

Data Sets ◽

Statistical Literacy ◽

Data Literacy ◽

Research Journal ◽

Critical Issues ◽

History Of ◽

Mathematical Justification

In the contemporary society a massive amount of data is generated continuously by various means, and they are called Big-Data sets. Big Data has potential and limits which need to be understood by statisticians and statistics consumers, therefore it is a challenge to develop Big-Data Literacy to support the needs of constructive, concerned, and reflective citizens. However, the development of the concept of statistical literacy mirrors the current gap between purely technical and socio-political characterizations of Big Data. In this paper, we review the recent history of the concept of statistical literacy and highlight the need to integrate the new challenges and critical issues from data science associated with Big Data, including ethics, epistemology, mathematical justification, and math washing. First published February 2020 at Statistics Education Research Journal Archives

Download Full-text

Issues in security and privacy of big data

International Journal of Advanced Research in Computer Science and Software Engineering ◽

10.23956/ijarcsse.v7i12.482 ◽

2018 ◽

Vol 7 (12) ◽

pp. 1

Author(s):

Shaveta Bhatia

Keyword(s):

Cloud Computing ◽

Big Data ◽

Approximate Method ◽

Biomedical Research ◽

Cyber Security ◽

Data Science ◽

Third Party ◽

Security And Privacy ◽

Security Threats ◽

The Third

The epoch of the big data presents many opportunities for the development in the range of data science, biomedical research cyber security, and cloud computing. Nowadays the big data gained popularity. It also invites many provocations and upshot in the security and privacy of the big data. There are various type of threats, attacks such as leakage of data, the third party tries to access, viruses and vulnerability that stand against the security of the big data. This paper will discuss about the security threats and their approximate method in the field of biomedical research, cyber security and cloud computing.

Download Full-text

Big Data Driven Clinical Informatics & Surveillance (BDD_CIS) – A Multimodal Database Focused Clinical, Community, and Multi-Omics Surveillance Plan for COVID-19: A study Protocol (Preprint)

10.2196/preprints.24504 ◽

2020 ◽

Author(s):

Bankole Olatosi ◽

Jiajia Zhang ◽

Sharon Weissman ◽

Zhenlong Li ◽

Jianjun Hu ◽

...

Keyword(s):

Big Data ◽

South Carolina ◽

Data Science ◽

Age Groups ◽

The Elderly ◽

The United States ◽

Data Sources ◽

Patient Registries ◽

Multiple Partner ◽

Multimodal Data

BACKGROUND The Coronavirus Disease 2019 (COVID-19) caused by the severe acute respiratory syndrome coronavirus (SARS-CoV-2) remains a serious global pandemic. Currently, all age groups are at risk for infection but the elderly and persons with underlying health conditions are at higher risk of severe complications. In the United States (US), the pandemic curve is rapidly changing with over 6,786,352 cases and 199,024 deaths reported. South Carolina (SC) as of 9/21/2020 reported 138,624 cases and 3,212 deaths across the state. OBJECTIVE The growing availability of COVID-19 data provides a basis for deploying Big Data science to leverage multitudinal and multimodal data sources for incremental learning. Doing this requires the acquisition and collation of multiple data sources at the individual and county level. METHODS The population for the comprehensive database comes from statewide COVID-19 testing surveillance data (March 2020- till present) for all SC COVID-19 patients (N≈140,000). This project will 1) connect multiple partner data sources for prediction and intelligence gathering, 2) build a REDCap database that links de-identified multitudinal and multimodal data sources useful for machine learning and deep learning algorithms to enable further studies. Additional data will include hospital based COVID-19 patient registries, Health Sciences South Carolina (HSSC) data, data from the office of Revenue and Fiscal Affairs (RFA), and Area Health Resource Files (AHRF). RESULTS The project was funded as of June 2020 by the National Institutes for Health. CONCLUSIONS The development of such a linked and integrated database will allow for the identification of important predictors of short- and long-term clinical outcomes for SC COVID-19 patients using data science.

Download Full-text

Produção internacional sobre ciência orientada a dados: análise dos termos data science e e-science na scopus e na web of science

Pesquisa Brasileira em Ciência da Informação e Biblioteconomia ◽

10.22478/ufpb.1981-0695.2017v12n1.34121 ◽

2017 ◽

Vol 12 (1) ◽

Author(s):

Leilah Santiago Bufrem ◽

Fábio Mascarenhas Silva ◽

Natanael Vitor Sobral ◽

Anna Elizabeth Galvão Coutinho Correia

Keyword(s):

Big Data ◽

Open Access ◽

Grid Computing ◽

Digital Library ◽

Data Science ◽

Web Of Science ◽

Computer Systems ◽

Distributed Computer Systems

Introdução: A atual configuração da dinâmica relativa à produção e àcomunicação científicas revela o protagonismo da Ciência Orientada a Dados,em concepção abrangente, representada principalmente por termos como “e-Science” e “Data Science”. Objetivos: Apresentar a produção científica mundial relativa à Ciência Orientada a Dados a partir dos termos “e-Science” e “Data Science” na Scopus e na Web of Science, entre 2006 e 2016. Metodologia: A pesquisa está estruturada em cinco etapas: a) busca de informações nas bases Scopus e Web of Science; b) obtenção dos registros; bibliométricos; c) complementação das palavras-chave; d) correção e cruzamento dos dados; e) representação analítica dos dados. Resultados: Os termos de maior destaque na produção científica analisada foram Distributed computer systems (2006), Grid computing (2007 a 2013) e Big data (2014 a 2016). Na área de Biblioteconomia e Ciência de Informação, a ênfase é dada aos temas: Digital library e Open access, evidenciando a centralidade do campo nas discussões sobre dispositivos para dar acesso à informação científica em meio digital. Conclusões: Sob um olhar diacrônico, constata-se uma visível mudança de foco das temáticas voltadas às operações de compartilhamento de dados para a perspectiva analítica de busca de padrões em grandes volumes de dados.Palavras-chave: Data Science. E-Science. Ciência orientada a dados. Produção científica.Link:http://www.uel.br/revistas/uel/index.php/informacao/article/view/26543/20114

Download Full-text

Impact of Big Data over Telecom Industry

Pakistan Journal of Engineering Technology & Science ◽

10.22555/pjets.v6i2.1958 ◽

2018 ◽

Vol 6 (2) ◽

Author(s):

Muhammad Waqar Khan ◽

Muhammad Asghar Khan ◽

Muhammad Alam ◽

Wajahat Ali

Keyword(s):

Big Data ◽

Data Science ◽

Cell Phones ◽

Smart Phones ◽

World Population ◽

Huge Amount ◽

Scale Down ◽

Telecom Industry ◽

Telecom Sector ◽

Theoretical Computing

During past few years, data is growing exponentially attracting researchers to work a popular term, the Big Data. Big Data is observed in various fields, such as information technology, telecommunication, theoretical computing, mathematics, data mining and data warehousing. Data science is frequently referred with Big Data as it uses methods to scale down the Big Data. Currently more than 3.2 billion of the world population is connected to internet out of which 46% are connected via smart phones. Over 5.5 billion people are using cell phones. As technology is rapidly shifting from ordinary cell phones towards smart phones, therefore proportion of using internet is also growing. There is a forecast that by 2020 around 7 billion people at the globe will be using internet out of which 52% will be using their smart phones to connect. In year 2050 that figure will be touching 95% of world population. Every device connect to internet generates data. As majority of the devices are using smart phones to generate this data by using applications such as Instagram, WhatsApp, Apple, Google, Google+, Twitter, Flickr etc., therefore this huge amount of data is becoming a big threat for telecom sector. This paper is giving a comparison of amount of Big Data generated by telecom industry. Based on the collected data we use forecasting tools to predict the amount of Big Data will be generated in future and also identify threats that telecom industry will be facing from that huge amount of Big Data.

Download Full-text