scholarly journals Challenges and Governance Solutions for Data Science Services based on Open Data and APIs

Author(s):  
Juha-Pekka Joutsenlahti ◽  
Timo Lehtonen ◽  
Mikko Raatikainen ◽  
Elina Kettunen ◽  
Tommi Mikkonen
Keyword(s):  
2020 ◽  
Vol 36 ◽  
pp. 49-62
Author(s):  
Nureni Olawale Adeboye ◽  
Peter Osuolale Popoola ◽  
Oluwatobi Nurudeen Ogunnusi

Data science is a concept to unify statistics, data analysis, machine learning and their related methods in order to analyze actual phenomena with data to provide better understanding. This article focused its investigation on acquisition of data science skills in building partnership for efficient school curriculum delivery in Africa, especially in the area of teaching statistics courses at the beginners’ level in tertiary institutions. Illustrations were made using Big data of selected 18 African countries sourced from United Nations Educational, Scientific and Cultural Organization (UNESCO) with special focus on some macro-economic variables that drives economic policy. Data description techniques were adopted in the analysis of the sourced open data with the aid of R analytics software for data science, as improvement on the traditional methods of data description for learning and thus open a new charter of education curriculum delivery in African schools. Though, the collaboration is not without its own challenges, its prospects in creating self-driven learning culture among students of tertiary institutions has greatly enhanced the quality of teaching, advancing students skills in machine learning, improved understanding of the role of data in global perspective and being able to critique claims based on data.


2020 ◽  
Vol 5 (19) ◽  
pp. 104-122
Author(s):  
Azzan Amin ◽  
Haslina Arshad ◽  
Ummul Hanan Mohamad

Data visualization is viewed as a significant element in data analysis and communication. As the data engagement becomes more and more complex, visual presentation of data does help users understand the data. So far, two-dimensional (2D) data visuals are often used for the data visualization process, but the lack of depth dimension leads to inefficient and limited understanding of the data. Therefore, the effectiveness of augmented reality (AR) in data visualization was studied through the development of an AR Data Visualization application using E-commerce data. Machine learning models are also involved in the development of this AR application for the provision of data using predictive analysis functions. To provide quality E-commerce data and an optimal machine learning model, the data science process is carried out using the python programming language. The E-commerce data selected for this study is open data taken through the Kaggle Website. This database has 9994 data numbers and 21 attributes. This AR data visualization application will make it easier for users to understand the E-commerce data in-depth through the use of AR technology and be able to visualize the forecasts for sales profit based on the algorithm model "Auto-Regressive Integrated Moving Average" (ARIMA).


F1000Research ◽  
2017 ◽  
Vol 6 ◽  
pp. 12 ◽  
Author(s):  
Stéphanie Boué ◽  
Thomas Exner ◽  
Samik Ghosh ◽  
Vincenzo Belcastro ◽  
Joh Dokler ◽  
...  

The US FDA defines modified risk tobacco products (MRTPs) as products that aim to reduce harm or the risk of tobacco-related disease associated with commercially marketed tobacco products.  Establishing a product’s potential as an MRTP requires scientific substantiation including toxicity studies and measures of disease risk relative to those of cigarette smoking.  Best practices encourage verification of the data from such studies through sharing and open standards. Building on the experience gained from the OpenTox project, a proof-of-concept database and website (INTERVALS) has been developed to share results from both in vivo inhalation studies and in vitro studies conducted by Philip Morris International R&D to assess candidate MRTPs. As datasets are often generated by diverse methods and standards, they need to be traceable, curated, and the methods used well described so that knowledge can be gained using data science principles and tools. The data-management framework described here accounts for the latest standards of data sharing and research reproducibility. Curated data and methods descriptions have been prepared in ISA-Tab format and stored in a database accessible via a search portal on the INTERVALS website. The portal allows users to browse the data by study or mechanism (e.g., inflammation, oxidative stress) and obtain information relevant to study design, methods, and the most important results. Given the successful development of the initial infrastructure, the goal is to grow this initiative and establish a public repository for 21st-century preclinical systems toxicology MRTP assessment data and results that supports open data principles.


2020 ◽  
Vol 6 ◽  
Author(s):  
Christoph Steinbeck ◽  
Oliver Koepler ◽  
Felix Bach ◽  
Sonja Herres-Pawlis ◽  
Nicole Jung ◽  
...  

The vision of NFDI4Chem is the digitalisation of all key steps in chemical research to support scientists in their efforts to collect, store, process, analyse, disclose and re-use research data. Measures to promote Open Science and Research Data Management (RDM) in agreement with the FAIR data principles are fundamental aims of NFDI4Chem to serve the chemistry community with a holistic concept for access to research data. To this end, the overarching objective is the development and maintenance of a national research data infrastructure for the research domain of chemistry in Germany, and to enable innovative and easy to use services and novel scientific approaches based on re-use of research data. NFDI4Chem intends to represent all disciplines of chemistry in academia. We aim to collaborate closely with thematically related consortia. In the initial phase, NFDI4Chem focuses on data related to molecules and reactions including data for their experimental and theoretical characterisation. This overarching goal is achieved by working towards a number of key objectives: Key Objective 1: Establish a virtual environment of federated repositories for storing, disclosing, searching and re-using research data across distributed data sources. Connect existing data repositories and, based on a requirements analysis, establish domain-specific research data repositories for the national research community, and link them to international repositories. Key Objective 2: Initiate international community processes to establish minimum information (MI) standards for data and machine-readable metadata as well as open data standards in key areas of chemistry. Identify and recommend open data standards in key areas of chemistry, in order to support the FAIR principles for research data. Finally, develop standards, if there is a lack. Key Objective 3: Foster cultural and digital change towards Smart Laboratory Environments by promoting the use of digital tools in all stages of research and promote subsequent Research Data Management (RDM) at all levels of academia, beginning in undergraduate studies curricula. Key Objective 4: Engage with the chemistry community in Germany through a wide range of measures to create awareness for and foster the adoption of FAIR data management. Initiate processes to integrate RDM and data science into curricula. Offer a wide range of training opportunities for researchers. Key Objective 5: Explore synergies with other consortia and promote cross-cutting development within the NFDI. Key Objective 6: Provide a legally reliable framework of policies and guidelines for FAIR and open RDM.


Electronics ◽  
2019 ◽  
Vol 8 (10) ◽  
pp. 1190 ◽  
Author(s):  
Dario Pevec ◽  
Jurica Babic ◽  
Vedran Podobnik

Current trends are showing that the popularity of electric vehicles (EVs) has significantly increased over the last few years, causing changes not only in the transportation industry but generally in business and society. This paper covers one possible angle to the (r)evolution instigated by EVs, i.e., it provides the data science perspective review of the interdisciplinary area at the intersection of green transportation, energy informatics, and economics. Namely, the review summarizes data-driven research in EVs by identifying two main research streams: (i) socio–economic, and (ii) socio–technical. The socio–economic stream includes research in: (i) acceptance of green transportation in countries and among different populations, (ii) current trends in the EV market, and (iii) forecasting future sales for the green transportation. The socio–technical stream includes research in: (i) electric vehicle battery price and capacity and (ii) charging station management. This kind of study is especially important now when the question is no longer whether the transition from internal-combustion engine vehicles to clean-fuel vehicles is going to happen but how fast it will happen and what are going to be implications for society, governmental policies, and industry. Based on the presented literature review, the paper also outlines the most significant open questions and challenges that are yet to be solved: (i) scarcity of trustworthy (open) data, and (ii) designing a generalized methodology for charging station deployment.


The 2017 SIS Conference aims to highlight the crucial role of the Statistics in Data Science. In this new domain of ‘meaning’ extracted from the data, the increasing amount of produced and available data in databases, nowadays, has brought new challenges. That involves different fields of statistics, machine learning, information and computer science, optimization, pattern recognition. These afford together a considerable contribute in the analysis of ‘Big data’, open data, relational and complex data, structured and no-structured. The interest is to collect the contributes which provide from the different domains of Statistics, in the high dimensional data quality validation, sampling extraction, dimensional reduction, pattern selection, data modelling, testing hypotheses and confirming conclusions drawn from the data.


2019 ◽  
Author(s):  
Andrea Blasco ◽  
Michael G. Endres ◽  
Rinat A. Sergeev ◽  
Anup Jonchhe ◽  
Max Macaluso ◽  
...  

SummaryOpen data science and algorithm development competitions offer a unique avenue for rapid discovery of better computational strategies. We highlight three examples in computational biology and bioinformatics research where the use of competitions has yielded significant performance gains over established algorithms. These include algorithms for antibody clustering, imputing gene expression data, and querying the Connectivity Map (CMap). Performance gains are evaluated quantitatively using realistic, albeit sanitized, data sets. The solutions produced through these competitions are then examined with respect to their utility and the prospects for implementation in the field. We present the decision process and competition design considerations that lead to these successful outcomes as a model for researchers who want to use competitions and non-domain crowds as collaborators to further their research.


Sign in / Sign up

Export Citation Format

Share Document