scholarly journals Reproducible big data science: A case study in continuous FAIRness

2018 ◽  
Author(s):  
Ravi Madduri ◽  
Kyle Chard ◽  
Mike D’ Arcy ◽  
Segun C. Jung ◽  
Alexis Rodriguez ◽  
...  

AbstractBig biomedical data create exciting opportunities for discovery, but make it difficult to capture analyses and outputs in forms that are findable, accessible, interoperable, and reusable (FAIR). In response, we describe tools that make it easy to capture, and assign identifiers to, data and code throughout the data lifecycle. We illustrate the use of these tools via a case study involving a multi-step analysis that creates an atlas of putative transcription factor binding sites from terabytes of ENCODE DNase I hypersensitive sites sequencing data. We show how the tools automate routine but complex tasks, capture analysis algorithms in understandable and reusable forms, and harness fast networks and powerful cloud computers to process data rapidly, all without sacrificing usability or reproducibility—thus ensuring that big data are not hard-to-(re)use data. We compare and contrast our approach with other approaches to big data analysis and reproducibility.

Web Services ◽  
2019 ◽  
pp. 1301-1329
Author(s):  
Suren Behari ◽  
Aileen Cater-Steel ◽  
Jeffrey Soar

The chapter discusses how Financial Services organizations can take advantage of Big Data analysis for disruptive innovation through examination of a case study in the financial services industry. Popular tools for Big Data Analysis are discussed and the challenges of big data are explored as well as how these challenges can be met. The work of Hayes-Roth in Valued Information at the Right Time (VIRT) and how it applies to the case study is examined. Boyd's model of Observe, Orient, Decide, and Act (OODA) is explained in relation to disruptive innovation in financial services. Future trends in big data analysis in the financial services domain are explored.


Author(s):  
José Luis Ambite ◽  
Jonathan Gordon ◽  
Lily Fierro ◽  
Gully Burns ◽  
Joel Mathew

The availability of massive datasets in genetics, neuroimaging, mobile health, and other subfields of biology and medicine promises new insights but also poses significant challenges. To realize the potential of big data in biomedicine, the National Institutes of Health launched the Big Data to Knowledge (BD2K) initiative, funding several centers of excellence in biomedical data analysis and a Training Coordinating Center (TCC) tasked with facilitating online and inperson training of biomedical researchers in data science. A major initiative of the BD2K TCC is to automatically identify, describe, and organize data science training resources available on the Web and provide personalized training paths for users. In this paper, we describe the construction of ERuDIte, the Educational Resource Discovery Index for Data Science, and its release as linked data. ERuDIte contains over 11,000 training resources including courses, video tutorials, conference talks, and other materials. The metadata for these resources is described uniformly using Schema.org. We use machine learning techniques to tag each resource with concepts from the Data Science Education Ontology, which we developed to further describe resource content. Finally, we map references to people and organizations in learning resources to entities in DBpedia, DBLP, and ORCID, embedding our collection in the web of linked data. We hope that ERuDIte will provide a framework to foster open linked educational resources on the Web.


2019 ◽  
Vol 11 (3) ◽  
pp. 327-339 ◽  
Author(s):  
Graeme T. Laurie

Abstract Discussion of uses of biomedical data often proceeds on the assumption that the data are generated and shared solely or largely within the health sector. However, this assumption must be challenged because increasingly large amounts of health and well-being data are being gathered and deployed in cross-sectoral contexts such as social media and through the internet of (medical) things and wearable devices. Cross-sectoral sharing of data thus refers to the generation, use and linkage of biomedical data beyond the health sector. This paper considers the challenges that arise from this phenomenon. If we are to benefit fully, it is important to consider which ethical values are at stake and to reflect on ways to resolve emerging ethical issues across ecosystems where values, laws and cultures might be quite distinct. In considering such issues, this paper applies the deliberative balancing approach of the Ethics Framework for Big Data in Health and Research (Xafis et al. 2019) to the domain of cross-sectoral big data. Please refer to that article for more information on how this framework is to be used, including a full explanation of the key values involved and the balancing approach used in the case study at the end.


Author(s):  
Suren Behari ◽  
Aileen Cater-Steel ◽  
Jeffrey Soar

The chapter discusses how Financial Services organizations can take advantage of Big Data analysis for disruptive innovation through examination of a case study in the financial services industry. Popular tools for Big Data Analysis are discussed and the challenges of big data are explored as well as how these challenges can be met. The work of Hayes-Roth in Valued Information at the Right Time (VIRT) and how it applies to the case study is examined. Boyd's model of Observe, Orient, Decide, and Act (OODA) is explained in relation to disruptive innovation in financial services. Future trends in big data analysis in the financial services domain are explored.


2020 ◽  
Vol 142 (12) ◽  
Author(s):  
Claudia Eckert ◽  
Ola Isaksson ◽  
Calandra Eckert ◽  
Mark Coeckelbergh ◽  
Malin Hane Hagström

Abstract In the era of digitalization, manufacturing companies expect their growing access to data to lead to improvements and innovations. Manufacturing engineers will have to collaborate with data scientists to analyze the ever-increasing volume of data. This process of adopting data science techniques into an engineering organization is a sociotechnical process fraught with challenges. This article uses a participant observation case study to investigate and discuss the sociotechnical nature of the adoption data science technology into an engineering organization. In the case study, a young data scientist/statistician interacted with experienced production engineers in a global automotive organization to mutual satisfaction. However, the case study highlights the mis-aligned expectations between engineers and data scientists and knowledge in what is necessary to successfully benefit from manufacturing process data. The results reveal that the engineers had an initially romantic and idealistic view on how data scientists can bring value out of dispersed and complex information residing in the multisite manufacturing organization’s datasets in a “magic” way. Conversely, the data scientist had not enough engineering and contextual understanding to ask the right questions. The case reveals important shortcomings in the sociotechnical processes that undergo changes as digitalization is brought into mature engineering organizations and points to a lack of knowledge on multiple levels of the data analysis process and the ethical implications this could have.


2021 ◽  
Vol 2 ◽  
pp. 88-94
Author(s):  
Eleonora Stancheva-Todorova ◽  
Mirella Dimitrova

By bridging the accounting and data science domains, this paper introduces an interdisciplinary Big Data case study for accounting students that implements a specific methodology framework. It is supported by clear learning objectives and detailed instructor’s implementation guidance that complement a fascinating scenario, representing a real-world situation in the data-led world of business. The participants’ assignment is to propose a strategy for improving financial position and performance of a particular company by attracting new customers selected among companies, listed on the London Stock Exchange. The data sources of the proposed case study are publicly available and comprise of historical financial and non-financial data, disclosed in companies’ annual reports. By performing their assigned roles under the case study scenario, future graduates will build upon their technological competences as well as raise their awareness on the new roles and job tasks of the future accountant. They will also gain understanding on the new advisory function of the accounting specialists and their responsibilities as management consultants in the data-let business world. From a research perspective, this interdisciplinary work demonstrates how expertise in text mining and financial reporting might be combined for revealing new investment opportunities and enhancing management decisions.


2020 ◽  
Vol 9 (4) ◽  
pp. 1411-1419
Author(s):  
Nashwan Dheyaa Zaki ◽  
Nada Yousif Hashim ◽  
Yasmin Makki Mohialden ◽  
Mostafa Abdulghafoor Mohammed ◽  
Tole Sutikno ◽  
...  

The scale of data streaming in social networks, such as Twitter, is increasing exponentially. Twitter is one of the most important and suitable big data sources for machine learning research in terms of analysis, prediction, extract knowledge, and opinions. People use Twitter platform daily to express their opinion which is a fundamental fact that influence their behaviors. In recent years, the flow of Iraqi dialect has been increased, especially on the Twitter platform. Sentiment analysis for different dialects and opinion mining has become a hot topic in data science researches. In this paper, we will attempt to develop a real-time analytic model for sentiment analysis and opinion mining to Iraqi tweets using spark streaming, also create a dataset for researcher in this field. The Twitter handle Bassam AlRawi is the case study here. The new method is more suitable in the current day machine learning applications and fast online prediction. 


PLoS ONE ◽  
2019 ◽  
Vol 14 (4) ◽  
pp. e0213013 ◽  
Author(s):  
Ravi Madduri ◽  
Kyle Chard ◽  
Mike D’Arcy ◽  
Segun C. Jung ◽  
Alexis Rodriguez ◽  
...  
Keyword(s):  
Big Data ◽  

Author(s):  
Ryan McGranaghan ◽  
Enrico Camporeale ◽  
Manolis Georgoulis ◽  
Anastasios Anastasiadis

The onset and rapid advance of the Digital Age have brought challenges and opportunities for scientific research characterized by a continuously evolving data landscape reflected in the four V’s of big data: volume, variety, veracity, and velocity. The big data landscape supersedes traditional means of storage, processing, management, and exploration, and requires adaptation and innovation across the full data lifecycle (i.e., collection, storage and processing, analytics, and representation). The Topical Issue ``Space Weather research in the Digital Age and across the full data lifecycle'' collects research from across the full data lifecycle (collection, management, analysis, and communication; collectively `Data Science') and offers a tractable compendium that illustrates the latest computational and data science trends, tools, and advances for Space Weather research. We introduce the paradigm shift in Space Weather and the articles in the Topical Issue. We create a network view of the research that highlights the contribution to the change of paradigm and reveals the trends that will guide it hereafter.


F1000Research ◽  
2019 ◽  
Vol 8 ◽  
pp. 251
Author(s):  
John Van Horn ◽  
Sumiko Abe ◽  
José Luis Ambite ◽  
Teresa K. Attwood ◽  
Niall Beard ◽  
...  

The increasing richness and diversity of biomedical data types creates major organizational and analytical impediments to rapid translational impact in the context of training and education. As biomedical data-sets increase in size, variety and complexity, they challenge conventional methods for sharing, managing and analyzing those data. In May 2017, we convened a two-day meeting between the BD2K Training Coordinating Center (TCC), ELIXIR Training/TeSS, GOBLET, H3ABioNet, EMBL-ABR, bioCADDIE and the CSIRO, in Huntington Beach, California, to compare and contrast our respective activities, and how these might be leveraged for wider impact on an international scale. Discussions focused on the role of i) training for biomedical data science; ii) the need to promote core competencies, and the ii) development of career paths. These led to specific conversations about i) the values of standardizing and sharing data science training resources; ii) challenges in encouraging adoption of training material standards; iii) strategies and best practices for the personalization and customization of learning experiences; iv) processes of identifying stakeholders and determining how they should be accommodated; and v) discussions of joint partnerships to lead the world on data science training in ways that benefit all stakeholders. Generally, international cooperation was viewed as essential for accommodating the widest possible participation in the modern bioscience enterprise, providing skills in a truly “FAIR” manner, addressing the importance of data science understanding worldwide. Several recommendations for the exchange of educational frameworks are made, along with potential sources for support, and plans for further cooperative efforts are presented.


Sign in / Sign up

Export Citation Format

Share Document