scholarly journals Big Data in Market Research: Why More Data Does Not Automatically Mean Better Information

2016 ◽  
Vol 8 (2) ◽  
pp. 56-63 ◽  
Author(s):  
Volker Bosch

Abstract Big data will change market research at its core in the long term because consumption of products and media can be logged electronically more and more, making it measurable on a large scale. Unfortunately, big data datasets are rarely representative, even if they are huge. Smart algorithms are needed to achieve high precision and prediction quality for digital and non-representative approaches. Also, big data can only be processed with complex and therefore error-prone software, which leads to measurement errors that need to be corrected. Another challenge is posed by missing but critical variables. The amount of data can indeed be overwhelming, but it often lacks important information. The missing observations can only be filled in by using statistical data imputation. This requires an additional data source with the additional variables, for example a panel. Linear imputation is a statistical procedure that is anything but trivial. It is an instrument to “transport information,” and the higher the observed data correlates with the data to be imputed, the better it works. It makes structures visible even if the depth of the data is limited.

Epidemiologia ◽  
2021 ◽  
Vol 2 (3) ◽  
pp. 315-324
Author(s):  
Juan M. Banda ◽  
Ramya Tekumalla ◽  
Guanyu Wang ◽  
Jingyuan Yu ◽  
Tuo Liu ◽  
...  

As the COVID-19 pandemic continues to spread worldwide, an unprecedented amount of open data is being generated for medical, genetics, and epidemiological research. The unparalleled rate at which many research groups around the world are releasing data and publications on the ongoing pandemic is allowing other scientists to learn from local experiences and data generated on the front lines of the COVID-19 pandemic. However, there is a need to integrate additional data sources that map and measure the role of social dynamics of such a unique worldwide event in biomedical, biological, and epidemiological analyses. For this purpose, we present a large-scale curated dataset of over 1.12 billion tweets, growing daily, related to COVID-19 chatter generated from 1 January 2020 to 27 June 2021 at the time of writing. This data source provides a freely available additional data source for researchers worldwide to conduct a wide and diverse number of research projects, such as epidemiological analyses, emotional and mental responses to social distancing measures, the identification of sources of misinformation, stratified measurement of sentiment towards the pandemic in near real time, among many others.


Author(s):  
Chunyi Wu ◽  
Gaochao Xu ◽  
Yan Ding ◽  
Jia Zhao

Large-scale tasks processing based on cloud computing has become crucial to big data analysis and disposal in recent years. Most previous work, generally, utilize the conventional methods and architectures for general scale tasks to achieve tons of tasks disposing, which is limited by the issues of computing capability, data transmission, etc. Based on this argument, a fat-tree structure-based approach called LTDR (Large-scale Tasks processing using Deep network model and Reinforcement learning) has been proposed in this work. Aiming at exploring the optimal task allocation scheme, a virtual network mapping algorithm based on deep convolutional neural network and [Formula: see text]-learning is presented herein. After feature extraction, we design and implement a policy network to make node mapping decisions. The link mapping scheme can be attained by the designed distributed value-function based reinforcement learning model. Eventually, tasks are allocated onto proper physical nodes and processed efficiently. Experimental results show that LTDR can significantly improve the utilization of physical resources and long-term revenue while satisfying task requirements in big data.


2020 ◽  
Author(s):  
Namrata Bhattacharya Mis

<p>Agenda 2030 goal 11 commits towards making disaster risk reduction an integral part of sustainable social and economic development. Flooding poses some of the most serious challenges in front of developing nations by hitting hardest to the most vulnerable. Focussing on the urban poor, frequently at highest risk are characterised by inadequate housing, lack of services and infrastructure with high population growth and spatial expansion in dense, lower quality urban structures. Use of big data from within these low-quality urban settlement areas can be a useful step forward in generating information to have a better understanding of their vulnerabilities. Big data for resilience is a recent field of research which offers tremendous potential for increasing disaster resilience especially in the context of social resilience. This research focusses to unleash the unrealised opportunities of big data through the differential social and economic frames that can contribute towards better-targeted information generation in disaster management. The scoping study aims to contribute to the understanding of the potential of big data in developing particularly in low-income countries to empower the vulnerable population against natural hazards such as floods. Recognising the potential of providing real-time and long-term information for emergency management in flood-affected large urban settlements this research concentrates on flood hazard and use of remotely sensed data (NASA, TRMM, LANDSAT) as the big data source for quick disaster response (and recovery) in targeted areas. The research question for the scoping study is: Can big data source provide real-time and long- term information to improve emergency disaster management in urban settlements against floods in developing countries?  Previous research has identified several potentials that big data has on faster response to the affected population but few attempts have been made to integrate the factors to develop an aggregated conceptual output . An international review of multi-discipline research, grey literature, grass-root projects, and emerging online social discourse will appraise the concepts and scope of big data to highlight the four objectives of the research and answer the specific questions around existing and future potentials of big data, operationalising and capacity building by agencies, risk associated and prospects of maximising impact. The research proposes a concept design for undertaking a thematic review of existing secondary data sources which will  be used to provide a holistic picture of how big data can support in resilience through technological change within the specific scope of social and environmental contexts of developing countries. The implications of the study lie in the system integration and understanding of the socio-economics, political, legal and ethical contexts essential for investment decision making for strategic impact and resilience-building in developing nations.</p>


2019 ◽  
Author(s):  
Ruud J. Dirksen ◽  
Greg E. Bodeker ◽  
Peter W. Thorne ◽  
Andrea Merlone ◽  
Tony Reale ◽  
...  

Abstract. This paper describes the GRUAN-wide approach to manage the transition from the Vaisala RS92 to the Vaisala RS41 as the operational radiosonde. The goal of the GCOS Reference Upper-Air Network (GRUAN) is to provide long-term high-quality reference observations of upper air Essential Climate Variables (ECVs) such as temperature and water vapor. With GRUAN data being used for climate monitoring, it is vital that the change of measurement system does not introduce inhomogeneities in to the data record. The majority of the 27 GRUAN sites were launching the RS92 as their operational radiosonde, and following the end of production of the RS92 in the last quarter of 2017, most of these sites have now switched to the RS41. Such a large-scale change in instrumentation is unprecedented in the history of GRUAN and poses a challenge for the network. Several measurement programmes have been initiated to characterize differences in biases, uncertainties and noise between the two radiosonde types. These include laboratory characterization of measurement errors, extensive twin sounding studies with RS92 and RS41 on the same balloon, and comparison with ancillary data. This integrated approach is commensurate with the GRUAN principles of traceability and deliberate redundancy. A two-year period of regular twin soundings is recommended, and for sites that are not able to implement this burden sharing is employed, such that measurements at a certain site are considered representative of other sites with similar climatological characteristics. All data relevant to the RS92-RS41 transition are archived in a database that will be accessible to the scientific community for external scrutiny. Furthermore, the knowledge and experience gained about GRUAN's RS92-RS41 transition will be extensively documented to ensure traceability of the process. This documentation will benefit other networks in managing changes in their operational radiosonde systems. Preliminary analysis of the laboratory experiments indicates that the manufacturer's calibration of the RS41's temperature and humidity sensors is more accurate than for the RS92; with uncertainties of


Author(s):  
Usman Iqbal ◽  
Phung Anh Nguyen ◽  
Shabbir Syed-Abdul ◽  
Wen-Shan Jian ◽  
Yu-Chuan Jack Li

ABSTRACTObjectiveRapid change in health information technology system had dramatically increased health data accumulated. We aimed to develop an online informatics tool in order to evaluate the risk of drugs for cancer by utilizing medical big data. Data SourceWe use the Taiwan’s National Health Insurance Database that has provided a huge data which covered all health information including characteristics and all drug information i.e. prescriptions, etc. of 23 million Taiwanese population. Front-end development: Web-based interface was developed by using PHP package and Javascript. In addition, we included the guidelines of evidence based medicine (EBM) level 3 for observational study such as cohort, case-control, and/or case serial self-control in order to support users interact with system. Back-end development: A package of Apache, MySQL & PHP was used to build the serve-side of the system. We integrated the Elasticsearch API5 to our system in order to search and analyze data immediately. The example of data transform to person-level from Taiwan NHI database is shown in Box 1. After then, we also integrated the analytics package (ie. R package) to perform the statistical analysis to a given study. This online analytical tool has capability to massively explore and visualize big data for long term use drugs and cancers through OMOSC system which will help to do mass online studies for long term use drugs and cancer risk. It would help to direct the health care professionals with lack of datamining skills to lead the study. The constructed online system would generate automatically case and controls by utilizing large databases for long term drug exposures and cancer risk. ResultsThe results are shown in odds ratio (OR) and if selected some confounding factors then could also get adjusted odds ratio (AOR) for risk estimation with 95% Confidence Intervals (CI). We used SAS statistical software on the same dataset to validate the OMOSC system results. It could help to do massive online studies which will saves time and cost effective. ConclusionSince the clinical trials are impossible to conduct due to cultural, cost, ethical, political or social obstacles. Therefore, this kind of research model would play an important role in health care industry by providing an excellent opportunities for solving the technological, informatics, and organizational issues towards other broad domains of drugs evaluation by utilizing large-scale databases.


Author(s):  
Inga Brentel ◽  
Kristi Winters

Abstract This article details the novel structure developed to handle, harmonize and document big data for reuse and long-term preservation. ‘The Longitudinal IntermediaPlus (2014–2016)’ big data dataset is uniquely rich: it covers an array of German online media extendable to cross-media channels and user information. The metadata file for this dataset, and its documentation, were recently deposited as its own MySQL database called charmstana_sample_14-16.sql (https://data.gesis.org/sharing/#!Detail/10.7802/2030) (cs16) and is suitable for generating descriptive statistics. Analogous to the ‘Data View’ in spss, the charmstana_analysis (ca) contains the dataset’s numerical values. Both the cs16 and ca MySQL files are needed to conduct analysis on the full database. The research challenge was to process large-scaled datasets into one longitudinal, big-data data source suitable for academic research, and according to fair principles. The authors review four methodological recommendations that can serve as a framework for solving big-data structuring challenges, using the harmonization software CharmStats.


2020 ◽  
Vol 9 (2) ◽  
pp. 337-355 ◽  
Author(s):  
Ruud J. Dirksen ◽  
Greg E. Bodeker ◽  
Peter W. Thorne ◽  
Andrea Merlone ◽  
Tony Reale ◽  
...  

Abstract. This paper describes the Global Climate Observing System (GCOS) Reference Upper-Air Network (GRUAN) approach to managing the transition from the Vaisala RS92 to the Vaisala RS41 as the operational radiosonde. The goal of GRUAN is to provide long-term high-quality reference observations of upper-air essential climate variables (ECVs) such as temperature and water vapor. With GRUAN data being used for climate monitoring, it is vital that the change of measurement system does not introduce inhomogeneities to the data record. The majority of the 27 GRUAN sites were launching the RS92 as their operational radiosonde, and following the end of production of the RS92 in the last quarter of 2017, most of these sites have now switched to the RS41. Such a large-scale change in instrumentation is unprecedented in the history of GRUAN and poses a challenge for the network. Several measurement programs have been initiated to characterize differences in biases, uncertainties, and noise between the two radiosonde types. These include laboratory characterization of measurement errors, extensive twin sounding studies with RS92 and RS41 on the same balloon, and comparison with ancillary data. This integrated approach is commensurate with the GRUAN principles of traceability and deliberate redundancy. A 2-year period of regular twin soundings is recommended, and for sites that are not able to implement this, burden-sharing is employed such that measurements at a certain site are considered representative of other sites with similar climatological characteristics. All data relevant to the RS92–RS41 transition are archived in a database that will be accessible to the scientific community for external scrutiny. Furthermore, the knowledge and experience gained regarding GRUAN's RS92–RS41 transition will be extensively documented to ensure traceability of the process. This documentation will benefit other networks in managing changes in their operational radiosonde systems. Preliminary analysis of the laboratory experiments indicates that the manufacturer's calibration of the RS41 temperature and humidity sensors is more accurate than for the RS92, with uncertainties of <0.2 K for the temperature and <1.5 % RH (RH: relative humidity) for the humidity sensor. A first analysis of 224 RS92–RS41 twin soundings at Lindenberg Observatory shows nighttime temperature differences <0.1 K between the Vaisala-processed temperature data for the RS41 (TRS41) and the GRUAN data product for the RS92 (TRS92-GDP.2). However, daytime temperature differences in the stratosphere increase steadily with altitude, with TRS92-GDP.2 up to 0.6 K higher than TRS41 at 35 km. RHRS41 values are up to 8 % higher, which is consistent with the analysis of satellite–radiosonde collocations.


Author(s):  
Niels Brügger ◽  
Janne Nielsen ◽  
Ditte Laursen

This article outlines how the 'digital geography' of a nation can be studied, that is the online presence of one nation. The entire Danish Web domain and its development from 2006 to 2015 is used as a case, based on the holdings in the Danish national Web archive. The following research questions guide the investigation: What has the Danish Web domain looked like in the past, and how has it developed in the period 2006-2015? Methodologically, we investigate to what extent one can delimit 'a nation' on the Web, and what characterizes the archived Web as a historical source for academic studies, as well as the general characteristics of our specific data source. Analytically, the article introduces a design for how this type of big data analyses of an entire national Web domain can be performed. Our findings show some of the ways in which a nation's digital landscape can be mapped, ie. on size, content types and hyperlinks. On a broader canvas, this study demonstrates that with hard- and software as well as human competencies from different disciplines it is possible to perform large-scale historical studies of one of the biggest media sources of today, the World Wide Web.


Animals ◽  
2021 ◽  
Vol 11 (7) ◽  
pp. 1872
Author(s):  
Ashley N. Paynter ◽  
Matthew D. Dunbar ◽  
Kate E. Creevy ◽  
Audrey Ruple

Dogs provide an ideal model for study as they have the most phenotypic diversity and known naturally occurring diseases of all non-human land mammals. Thus, data related to dog health present many opportunities to discover insights into health and disease outcomes. Here, we describe several sources of veterinary medical big data that can be used in research. These sources include medical records from primary medical care centers or referral hospitals, medical claims data from animal insurance companies, and datasets constructed specifically for research purposes. No data source provides information that is without limitations, but large-scale, prospective, longitudinally collected data from dog populations are ideal for further research as they offer many advantages over other data sources.


Sign in / Sign up

Export Citation Format

Share Document