scholarly journals Reviews and syntheses: The promise of big soil data, moving current practices towards future potential

2021 ◽  
Author(s):  
Katherine E. O. Todd-Brown ◽  
Rose Z. Abramoff ◽  
Jeffrey Beem-Miller ◽  
Hava K. Blair ◽  
Stevan Earl ◽  
...  

Abstract. In the age of big data, soil data are more available than ever, but -outside of a few large soil survey resources- remain largely unusable for informing soil management and understanding Earth system processes outside of the original study. Data science has promised a fully reusable research pipeline where data from past studies are used to contextualize new findings and reanalyzed for global relevance. Yet synthesis projects encounter challenges at all steps of the data reuse pipeline, including unavailable data, labor-intensive transcription of datasets, incomplete metadata, and a lack of communication between collaborators. Here, using insights from a diversity of soil, data and climate scientists, we summarize current practices in soil data synthesis across all stages of database creation: data discovery, input, harmonization, curation, and publication. We then suggest new soil-focused semantic tools to improve existing data pipelines, such as ontologies, vocabulary lists, and community practices. Our goal is to provide the soil data community with an overview of current practices in soil data and where we need to go to fully leverage big data to solve soil problems in the next century.

2021 ◽  
Author(s):  
Neal Robert Haddaway ◽  
Charles T. Gray ◽  
Matthew Grainger

One of the most important steps in the process of conducting a systematic review or map is data extraction and the production of a database of coding, metadata and study data. There are many ways to structure these data, but to date, no guidelines or standards have been produced for the evidence synthesis community to support their production. Furthermore, there is little adoption of easily machine-readable, readily reusable and adaptable databases: these databases would be easier to translate into different formats by review authors, for example for tabulation, visualisation and analysis, and also by readers of the review/map. As a result, it is common for systematic review and map authors to produce bespoke, complex data structures that, although typically provided digitally, require considerable efforts to understand, verify and reuse. Here, we report on an analysis of systematic reviews and maps published by the Collaboration for Environmental Evidence, and discuss major issues that hamper machine readability and data reuse or verification. We highlight different justifications for the alternative data formats found: condensed databases; long databases; and wide databases. We describe these challenges in the context of data science principles that can support curation and publication of machine-readable, Open Data. We then go on to make recommendations to review and map authors on how to plan and structure their data, and we provide a suite of novel R-based functions to support efficient and reliable translation of databases between formats that are useful for presentation (condensed, human readable tables), filtering and visualisation (wide databases), and analysis (long databases). We hope that our recommendations for adoption of standard practices in database formatting, and the tools necessary to rapidly move between formats will provide a step-change in transparency and replicability of Open Data in evidence synthesis.


Data & Policy ◽  
2020 ◽  
Vol 2 ◽  
Author(s):  
Ina Sander

Abstract Datafied societies need informed public debate about the implications of data science technologies. At present, internet users are often unaware of the potential consequences of disclosing personal data online and few citizens have the knowledge to participate in such debates. This paper argues that critical big data literacy efforts are one way to address this lack of knowledge. It draws on findings from a small qualitative investigation and discusses the effectiveness of online critical big data literacy tools. Through pre and post use testing, the short- and longer-term influence of these tools on people’s privacy attitudes and behavior was investigated. The study’s findings suggested that the tools tested had a predominantly positive initial effect, leading to improved critical big data literacy among most participants, which resulted in more privacy-sensitive attitudes and internet usage. When analyzing the tools’ longer-term influence, results were more mixed, with evidence suggesting for some that literacy effects of the tools were short-lived, while for others they led to more persistent and growing literacy. The findings confirm previous research noting the complexity of privacy attitudes and also find that resignation toward privacy is multi-faceted. Overall, this study reaffirms the importance of critical big data literacy and produces new findings about the value of interactive data literacy tools. These tools have been under-researched to date. This research shows that these tools could provide a relevant means to work toward empowering internet users, promoting a critical internet usage and, ideally, enabling more citizens to engage in public debates about changing data systems.


2019 ◽  
Vol 6 (2) ◽  
pp. 205395171986259 ◽  
Author(s):  
Johannes Starkbaum ◽  
Ulrike Felt

Before the EU General Data Protection Regulation entered into force in May 2018, we witnessed an intense struggle of actors associated with data-dependent fields of science, in particular health-related academia and biobanks striving for legal derogations for data reuse in research. These actors engaged in a similar line of argument and formed issue alliances to pool their collective power. Using descriptive coding followed by an interpretive analysis, this article investigates the argumentative repertoire of these actors and embeds the analysis in ethical debates on data sharing and biobank-related data governance. We observe efforts to perform a paradigmatic shift of the discourse around the General Data Protection Regulation-implementation away from ‘protecting data’ as key concern to ‘protecting health’ of individuals and societies at large. Instead of data protection, the key risks stressed by health researchers became potential obstacles to research. In line, exchange of information with data subjects is not a key concern in the arguments of biobank-related actors and it is assumed that patients want ‘their’ data to be used. We interpret these narratives as a ‘reaction’ to potential restrictions for data reuse and in line with a broader trend towards Big Data science, as the very idea of biobanking is conceptualized around long-term use of readily prepared data. We conclude that a sustainable implementation of biobanks needs not only to comply with the General Data Protection Regulation, but must proactively re-imagine its relation to citizens and data subjects in order to account for the various ways that science gets entangled with society.


2021 ◽  
Vol 10 (1) ◽  
Author(s):  
Neal R. Haddaway ◽  
Charles T. Gray ◽  
Matthew Grainger

AbstractOne of the most important steps in the process of conducting a systematic review or map is data extraction and the production of a database of coding, metadata and study data. There are many ways to structure these data, but to date, no guidelines or standards have been produced for the evidence synthesis community to support their production. Furthermore, there is little adoption of easily machine-readable, readily reusable and adaptable databases: these databases would be easier to translate into different formats by review authors, for example for tabulation, visualisation and analysis, and also by readers of the review/map. As a result, it is common for systematic review and map authors to produce bespoke, complex data structures that, although typically provided digitally, require considerable efforts to understand, verify and reuse. Here, we report on an analysis of systematic reviews and maps published by the Collaboration for Environmental Evidence, and discuss major issues that hamper machine readability and data reuse or verification. We highlight different justifications for the alternative data formats found: condensed databases; long databases; and wide databases. We describe these challenges in the context of data science principles that can support curation and publication of machine-readable, Open Data. We then go on to make recommendations to review and map authors on how to plan and structure their data, and we provide a suite of novel R-based functions to support efficient and reliable translation of databases between formats that are useful for presentation (condensed, human readable tables), filtering and visualisation (wide databases), and analysis (long databases). We hope that our recommendations for adoption of standard practices in database formatting, and the tools necessary to rapidly move between formats will provide a step-change in transparency and replicability of Open Data in evidence synthesis.


Author(s):  
Shaveta Bhatia

 The epoch of the big data presents many opportunities for the development in the range of data science, biomedical research cyber security, and cloud computing. Nowadays the big data gained popularity.  It also invites many provocations and upshot in the security and privacy of the big data. There are various type of threats, attacks such as leakage of data, the third party tries to access, viruses and vulnerability that stand against the security of the big data. This paper will discuss about the security threats and their approximate method in the field of biomedical research, cyber security and cloud computing.


2020 ◽  
Author(s):  
Bankole Olatosi ◽  
Jiajia Zhang ◽  
Sharon Weissman ◽  
Zhenlong Li ◽  
Jianjun Hu ◽  
...  

BACKGROUND The Coronavirus Disease 2019 (COVID-19) caused by the severe acute respiratory syndrome coronavirus (SARS-CoV-2) remains a serious global pandemic. Currently, all age groups are at risk for infection but the elderly and persons with underlying health conditions are at higher risk of severe complications. In the United States (US), the pandemic curve is rapidly changing with over 6,786,352 cases and 199,024 deaths reported. South Carolina (SC) as of 9/21/2020 reported 138,624 cases and 3,212 deaths across the state. OBJECTIVE The growing availability of COVID-19 data provides a basis for deploying Big Data science to leverage multitudinal and multimodal data sources for incremental learning. Doing this requires the acquisition and collation of multiple data sources at the individual and county level. METHODS The population for the comprehensive database comes from statewide COVID-19 testing surveillance data (March 2020- till present) for all SC COVID-19 patients (N≈140,000). This project will 1) connect multiple partner data sources for prediction and intelligence gathering, 2) build a REDCap database that links de-identified multitudinal and multimodal data sources useful for machine learning and deep learning algorithms to enable further studies. Additional data will include hospital based COVID-19 patient registries, Health Sciences South Carolina (HSSC) data, data from the office of Revenue and Fiscal Affairs (RFA), and Area Health Resource Files (AHRF). RESULTS The project was funded as of June 2020 by the National Institutes for Health. CONCLUSIONS The development of such a linked and integrated database will allow for the identification of important predictors of short- and long-term clinical outcomes for SC COVID-19 patients using data science.


Author(s):  
Leilah Santiago Bufrem ◽  
Fábio Mascarenhas Silva ◽  
Natanael Vitor Sobral ◽  
Anna Elizabeth Galvão Coutinho Correia

Introdução: A atual configuração da dinâmica relativa à produção e àcomunicação científicas revela o protagonismo da Ciência Orientada a Dados,em concepção abrangente, representada principalmente por termos como “e-Science” e “Data Science”. Objetivos: Apresentar a produção científica mundial relativa à Ciência Orientada a Dados a partir dos termos “e-Science” e “Data Science” na Scopus e na Web of Science, entre 2006 e 2016. Metodologia: A pesquisa está estruturada em cinco etapas: a) busca de informações nas bases Scopus e Web of Science; b) obtenção dos registros; bibliométricos; c) complementação das palavras-chave; d) correção e cruzamento dos dados; e) representação analítica dos dados. Resultados: Os termos de maior destaque na produção científica analisada foram Distributed computer systems (2006), Grid computing (2007 a 2013) e Big data (2014 a 2016). Na área de Biblioteconomia e Ciência de Informação, a ênfase é dada aos temas: Digital library e Open access, evidenciando a centralidade do campo nas discussões sobre dispositivos para dar acesso à informação científica em meio digital. Conclusões: Sob um olhar diacrônico, constata-se uma visível mudança de foco das temáticas voltadas às operações de compartilhamento de dados para a perspectiva analítica de busca de padrões em grandes volumes de dados.Palavras-chave: Data Science. E-Science. Ciência orientada a dados. Produção científica.Link:http://www.uel.br/revistas/uel/index.php/informacao/article/view/26543/20114


Author(s):  
Muhammad Waqar Khan ◽  
Muhammad Asghar Khan ◽  
Muhammad Alam ◽  
Wajahat Ali

<p>During past few years, data is growing exponentially attracting researchers to work a popular term, the Big Data. Big Data is observed in various fields, such as information technology, telecommunication, theoretical computing, mathematics, data mining and data warehousing. Data science is frequently referred with Big Data as it uses methods to scale down the Big Data. Currently<br />more than 3.2 billion of the world population is connected to internet out of which 46% are connected via smart phones. Over 5.5 billion people are using cell phones. As technology is rapidly shifting from ordinary cell phones towards smart phones, therefore proportion of using internet is also growing. There<br />is a forecast that by 2020 around 7 billion people at the globe will be using internet out of which 52% will be using their smart phones to connect. In year 2050 that figure will be touching 95% of world population. Every device connect to internet generates data. As majority of the devices are using smart phones to<br />generate this data by using applications such as Instagram, WhatsApp, Apple, Google, Google+, Twitter, Flickr etc., therefore this huge amount of data is becoming a big threat for telecom sector. This paper is giving a comparison of amount of Big Data generated by telecom industry. Based on the collected data<br />we use forecasting tools to predict the amount of Big Data will be generated in future and also identify threats that telecom industry will be facing from that huge amount of Big Data.</p>


2020 ◽  
Vol 30 (Supplement_5) ◽  
Author(s):  
J Doetsch ◽  
I Lopes ◽  
R Redinha ◽  
H Barros

Abstract The usage and exchange of “big data” is at the forefront of the data science agenda where Record Linkage plays a prominent role in biomedical research. In an era of ubiquitous data exchange and big data, Record Linkage is almost inevitable, but raises ethical and legal problems, namely personal data and privacy protection. Record Linkage refers to the general merging of data information to consolidate facts about an individual or an event that are not available in a separate record. This article provides an overview of ethical challenges and research opportunities in linking routine data on health and education with cohort data from very preterm (VPT) infants in Portugal. Portuguese, European and International law has been reviewed on data processing, protection and privacy. A three-stage analysis was carried out: i) interplay of threefold law-levelling for Record Linkage at different levels; ii) impact of data protection and privacy rights for data processing, iii) data linkage process' challenges and opportunities for research. A framework to discuss the process and its implications for data protection and privacy was created. The GDPR functions as utmost substantial legal basis for the protection of personal data in Record Linkage, and explicit written consent is considered the appropriate basis for the processing sensitive data. In Portugal, retrospective access to routine data is permitted if anonymised; for health data if it meets data processing requirements declared with an explicit consent; for education data if the data processing rules are complied. Routine health and education data can be linked to cohort data if rights of the data subject and requirements and duties of processors and controllers are respected. A strong ethical context through the application of the GDPR in all phases of research need to be established to achieve Record Linkage between cohort and routine collected records for health and education data of VPT infants in Portugal. Key messages GDPR is the most important legal framework for the protection of personal data, however, its uniform approach granting freedom to its Member states hampers Record Linkage processes among EU countries. The question remains whether the gap between data protection and privacy is adequately balanced at three legal levels to guarantee freedom for research and the improvement of health of data subjects.


Sign in / Sign up

Export Citation Format

Share Document