Web scraping technique for producing Iranian consumer price index

2021 ◽  
pp. 1-14
Author(s):  
Ayoub Faramarzi ◽  
Reza Hadizadeh ◽  
Saeed Fayyaz ◽  
Sohrab Sajadimanesh ◽  
Abbas Moradi

Data pervasiveness was made possible by the advent of new technologies such as the Internet and the World Wide Web in every human and non-human activity. This created an exponential increase or data explosion in data generation, coined under the term Big data. Alternatively, Big Data sources can contribute to the reduction of the response burden or they can be used only to study some economic or social phenomena before designing a statistical survey which is inherently expensive to pilot. Also, incorporating Big Data sources into official statistics means maintaining a net competitive advantage and relevance of the official statistics products compared to those provided by a plethora of commercial players, with reference to large corporations that are active in the field of information technology. In this paper, the web scraping technique was used to extract the daily prices of the food and drinks products in order to replace them with conventional prices which had been used for price indices. Moreover, these sorts of new datasets enable us to calculate the indices in smaller time scales like weekly or daily basis in comparison to the conventional approach which is possible only on monthly basis. Although web scraping has its own problems, it is more economically friendly, accurate, and time-saving, especially in urban areas. Findings revealed that the web scraping technique can be applied as an effective alternative to conventional methods for CPI. Also, this technique can be used for other price statistics.

2021 ◽  
Vol 37 (1) ◽  
pp. 161-169
Author(s):  
Dominik Rozkrut ◽  
Olga Świerkot-Strużewska ◽  
Gemma Van Halderen

Never has there been a more exciting time to be an official statistician. The data revolution is responding to the demands of the CoVID-19 pandemic and a complex sustainable development agenda to improve how data is produced and used, to close data gaps to prevent discrimination, to build capacity and data literacy, to modernize data collection systems and to liberate data to promote transparency and accountability. But can all data be liberated in the production and communication of official statistics? This paper explores the UN Fundamental Principles of Official Statistics in the context of eight new and big data sources. The paper concludes each data source can be used for the production of official statistics in adherence with the Fundamental Principles and argues these data sources should be used if National Statistical Systems are to adhere to the first Fundamental Principle of compiling and making available official statistics that honor citizen’s entitlement to public information.


Author(s):  
Pethuru Raj

The implications of the digitization process among a bevy of trends are definitely many and memorable. One is the abnormal growth in data generation, gathering, and storage due to a steady increase in the number of data sources, structures, scopes, sizes, and speeds. In this chapter, the author shows some of the impactful developments brewing in the IT space, how the tremendous amount of data getting produced and processed all over the world impacts the IT and business domains, how next-generation IT infrastructures are accordingly getting refactored, remedied, and readied for the impending big data-induced challenges, how likely the move of the big data analytics discipline towards fulfilling the digital universe requirements of extracting and extrapolating actionable insights for the knowledge-parched is, and finally, the establishment and sustenance of the dreamt smarter planet.


Web Services ◽  
2019 ◽  
pp. 728-744 ◽  
Author(s):  
Antonino Virgillito ◽  
Federico Polidoro

Following the advent of Big Data, statistical offices have been largely exploring the use of Internet as data source for modernizing their data collection process. Particularly, prices are collected online in several statistical institutes through a technique known as web scraping. The objective of the chapter is to discuss the challenges of web scraping for setting up a continuous data collection process, exploring and classifying the more widespread techniques and presenting how they are used in practical cases. The main technical notions behind web scraping are presented and explained in order to give also to readers with no background in IT the sufficient elements to fully comprehend scraping techniques, promoting the building of mixed skills that is at the core of the spirit of modern data science. Challenges for official statistics deriving from the use of web scraping are briefly sketched. Finally, research ideas for overcoming the limitations of current techniques are presented and discussed.


2015 ◽  
Vol 31 (2) ◽  
pp. 249-262 ◽  
Author(s):  
Piet J.H. Daas ◽  
Marco J. Puts ◽  
Bart Buelens ◽  
Paul A.M. van den Hurk

Abstract More and more data are being produced by an increasing number of electronic devices physically surrounding us and on the internet. The large amount of data and the high frequency at which they are produced have resulted in the introduction of the term ‘Big Data’. Because these data reflect many different aspects of our daily lives and because of their abundance and availability, Big Data sources are very interesting from an official statistics point of view. This article discusses the exploration of both opportunities and challenges for official statistics associated with the application of Big Data. Experiences gained with analyses of large amounts of Dutch traffic loop detection records and Dutch social media messages are described to illustrate the topics characteristic of the statistical analysis and use of Big Data.


Digital technology is fast changing in the recent years and with this change, the number of data systems, sources, and formats has also increased exponentially. So the process of extracting data from these multiple source systems and transforming it to suit for various analytics processes is gaining importance at an alarming rate. In order to handle Big Data, the process of transformation is quite challenging, as data generation is a continuous process. In this paper, we extract data from various heterogeneous sources from the web and try to transform it into a form which is vastly used in data warehousing so that it caters to the analytical needs of the machine learning community.


Big Data ◽  
2016 ◽  
pp. 757-777
Author(s):  
Pethuru Raj

The implications of the digitization process among a bevy of trends are definitely many and memorable. One is the abnormal growth in data generation, gathering, and storage due to a steady increase in the number of data sources, structures, scopes, sizes, and speeds. In this chapter, the authors show some of the impactful developments brewing in the IT space, how the tremendous amount of data getting produced and processed all over the world impacts the IT and business domains, how next-generation IT infrastructures are accordingly being refactored, remedied, and readied for the impending big data-induced challenges, how likely the move of the big data analytics discipline towards fulfilling the digital universe requirements of extracting and extrapolating actionable insights for the knowledge-parched is, and finally, the establishment and sustenance of the smarter planet.


Author(s):  
Antonino Virgillito ◽  
Federico Polidoro

Following the advent of Big Data, statistical offices have been largely exploring the use of Internet as data source for modernizing their data collection process. Particularly, prices are collected online in several statistical institutes through a technique known as web scraping. The objective of the chapter is to discuss the challenges of web scraping for setting up a continuous data collection process, exploring and classifying the more widespread techniques and presenting how they are used in practical cases. The main technical notions behind web scraping are presented and explained in order to give also to readers with no background in IT the sufficient elements to fully comprehend scraping techniques, promoting the building of mixed skills that is at the core of the spirit of modern data science. Challenges for official statistics deriving from the use of web scraping are briefly sketched. Finally, research ideas for overcoming the limitations of current techniques are presented and discussed.


2015 ◽  
pp. 187-221
Author(s):  
Pethuru Raj

The implications of the digitization process among a bevy of trends are definitely many and memorable. One is the abnormal growth in data generation, gathering, and storage due to a steady increase in the number of data sources, structures, scopes, sizes, and speeds. In this chapter, the author shows some of the impactful developments brewing in the IT space, how the tremendous amount of data getting produced and processed all over the world impacts the IT and business domains, how next-generation IT infrastructures are accordingly getting refactored, remedied, and readied for the impending big data-induced challenges, how likely the move of the big data analytics discipline towards fulfilling the digital universe requirements of extracting and extrapolating actionable insights for the knowledge-parched is, and finally, the establishment and sustenance of the dreamt smarter planet.


2019 ◽  
Vol 59 (3) ◽  
pp. 599-603 ◽  
Author(s):  
Martha M Muñoz ◽  
Samantha A Price

Abstract In recent years, the fields of evolutionary biomechanics and morphology have developed into a deeply quantitative and integrative science, resulting in a much richer understanding of how structural relationships shape macroevolutionary patterns. This issue highlights new research at the conceptual and experimental cutting edge, with a special focus on applying big data approaches to classic questions in form–function evolution. As this issue illustrates, new technologies and analytical tools are facilitating the integration of biomechanics, functional morphology, and phylogenetic comparative methods to catalyze a new, more integrative discipline. Although we are at the cusp of the big data generation of organismal biology, the field is nonetheless still data-limited. This data bottleneck is primarily due to the rate-limiting steps of digitizing specimens, recording and tracking organismal movements, and extracting patterns from massive datasets. Automation and machine-learning approaches hold great promise to help data generation keep pace with ideas. As a final and important note, almost all the research presented in this issue relied on specimens—totaling the tens of thousands—provided by museum collections. Without collection, curation, and conservation of museum specimens, the future of the field is much less bright.


Data & Policy ◽  
2020 ◽  
Vol 2 ◽  
Author(s):  
Fabio Ricciato ◽  
Albrecht Wirthmann ◽  
Martina Hahn

Abstract In this discussion paper, we outline the motivations and the main principles of the Trusted Smart Statistics (TSS) concept that is under development in the European Statistical System. TSS represents the evolution of official statistics in response to the challenges posed by the new datafied society. Taking stock from the availability of new digital data sources, new technologies, and new behaviors, statistical offices are called nowadays to rethink the way they operate in order to reassert their role in modern democratic society. The issue at stake is considerably broader and deeper than merely adapting existing processes to embrace so-called Big Data. In several aspects, such evolution entails a fundamental paradigm shift with respect to the legacy model of official statistics production based on traditional data sources, for example, in the relation between data and computation, between data collection and analysis, between methodological development and statistical production, and of course in the roles of the various stakeholders and their mutual relationships. Such complex evolution must be guided by a comprehensive system-level view based on clearly spelled design principles. In this paper, we aim at providing a general account of the TSS concept reflecting the current state of the discussion within the European Statistical System.


Sign in / Sign up

Export Citation Format

Share Document