Methodological framework for data processing based on the Data Science paradigm

2020 ◽

Vol 30 (Supplement_5) ◽

Author(s):

J Doetsch ◽

I Lopes ◽

R Redinha ◽

H Barros

Keyword(s):

Big Data ◽

Data Processing ◽

Data Protection ◽

Record Linkage ◽

Data Science ◽

Personal Data ◽

Routine Data ◽

Cohort Data ◽

Education Data ◽

Explicit Consent

Abstract The usage and exchange of “big data” is at the forefront of the data science agenda where Record Linkage plays a prominent role in biomedical research. In an era of ubiquitous data exchange and big data, Record Linkage is almost inevitable, but raises ethical and legal problems, namely personal data and privacy protection. Record Linkage refers to the general merging of data information to consolidate facts about an individual or an event that are not available in a separate record. This article provides an overview of ethical challenges and research opportunities in linking routine data on health and education with cohort data from very preterm (VPT) infants in Portugal. Portuguese, European and International law has been reviewed on data processing, protection and privacy. A three-stage analysis was carried out: i) interplay of threefold law-levelling for Record Linkage at different levels; ii) impact of data protection and privacy rights for data processing, iii) data linkage process' challenges and opportunities for research. A framework to discuss the process and its implications for data protection and privacy was created. The GDPR functions as utmost substantial legal basis for the protection of personal data in Record Linkage, and explicit written consent is considered the appropriate basis for the processing sensitive data. In Portugal, retrospective access to routine data is permitted if anonymised; for health data if it meets data processing requirements declared with an explicit consent; for education data if the data processing rules are complied. Routine health and education data can be linked to cohort data if rights of the data subject and requirements and duties of processors and controllers are respected. A strong ethical context through the application of the GDPR in all phases of research need to be established to achieve Record Linkage between cohort and routine collected records for health and education data of VPT infants in Portugal. Key messages GDPR is the most important legal framework for the protection of personal data, however, its uniform approach granting freedom to its Member states hampers Record Linkage processes among EU countries. The question remains whether the gap between data protection and privacy is adequately balanced at three legal levels to guarantee freedom for research and the improvement of health of data subjects.

Download Full-text

Real-time data processing in supply chain management: revealing the uncertainty dilemma

International Journal of Physical Distribution & Logistics Management ◽

10.1108/ijpdlm-12-2017-0398 ◽

2019 ◽

Vol 49 (10) ◽

pp. 1003-1019 ◽

Cited By ~ 2

Author(s):

Sabrina Lechler ◽

Angelo Canzaniello ◽

Bernhard Roßmann ◽

Heiko A. von der Gracht ◽

Evi Hartmann

Keyword(s):

Supply Chain ◽

Supply Chain Management ◽

Data Processing ◽

Real Time ◽

Data Science ◽

Research Question ◽

Time Data ◽

Chain Management ◽

Content Type ◽

Real Time Data Processing

Purpose Particularly in volatile, uncertain, complex and ambiguous (VUCA) business conditions, staff in supply chain management (SCM) look to real-time (RT) data processing to reduce uncertainties. However, based on the premise that data processing can be perfectly mastered, such expectations do not reflect reality. The purpose of this paper is to investigate whether RT data processing reduces SCM uncertainties under real-world conditions. Design/methodology/approach Aiming to facilitate communication on the research question, a Delphi expert survey was conducted to identify challenges of RT data processing in SCM operations and to assess whether it does influence the reduction of SCM uncertainty. In total, 14 prospective statements concerning RT data processing in SCM operations were developed and evaluated by 68 SCM and data-science experts. Findings RT data processing was found to have an ambivalent influence on the reduction of SCM complexity and associated uncertainty. Analysis of the data collected from the study participants revealed a new type of uncertainty related to SCM data itself. Originality/value This paper discusses the challenges of gathering relevant, timely and accurate data sets in VUCA environments and creates awareness of the relationship between data-related uncertainty and SCM uncertainty. Thus, it provides valuable insights for practitioners and the basis for further research on this subject.

Download Full-text

Big data processing using Open Source Software- A Questionnaire on the data science

Scholedge International Journal of Multidisciplinary & Allied Studies ISSN 2394-336X ◽

10.19085/journal.sijmas030101 ◽

2016 ◽

Vol 3 (1) ◽

pp. 1

Author(s):

Andrew McCullum

Keyword(s):

Big Data ◽

Data Processing ◽

World Trade Organization ◽

Central Asia ◽

Open Source ◽

Open Source Software ◽

World Trade ◽

Data Science ◽

Customs Union ◽

The World

In 2015, Central Asia made some vital enhancements in nature for cross-fringe e-business: Kazakhstan's promotion to the World Trade Organization (WTO) will help business straightforwardness, while the Kyrgyz Republic's enrollment in the Eurasian Customs Union grows its buyer base. Why e-business? Two reasons to begin with, e-trade diminishes the expense of separation. Focal Asia is the most elevated exchange cost locale on the planet: unlimited separations from real markets make discovering purchasers testing, shipping merchandise moderate, and fare costs high. Second, e-business can pull in populaces that are customarily under-spoke to in fare markets, for example, ladies, little organizations and rustic business visionaries.

Download Full-text

On the Large-scale Graph Data Processing for User Interface Testing in Big Data Science Projects

2020 IEEE International Conference on Big Data (Big Data) ◽

10.1109/bigdata50022.2020.9378153 ◽

2020 ◽

Author(s):

Yasin Uygun ◽

Ramazan Faruk Oguz ◽

Erdi Olmezogullari ◽

Mehmet S. Aktas

Keyword(s):

Big Data ◽

User Interface ◽

Data Processing ◽

Large Scale ◽

Data Science ◽

Graph Data ◽

Science Projects

Download Full-text

A Review of Data Science and Big Data Computing

Asian Journal of Research in Computer Science ◽

10.9734/ajrcos/2020/v6i330158 ◽

2020 ◽

pp. 1-11

Author(s):

Wajid Ali ◽

Muhammad Usman Shafique ◽

Muhammad Arslan Majeed ◽

Muhammad Faizan ◽

Ahmad Raza

Keyword(s):

Big Data ◽

Data Processing ◽

Data Science ◽

General Concept ◽

Future Data ◽

Big Data Computing ◽

Abundant Data

Data Science emerged as an important discipline and its education is essential for success in almost every aspect of life. Here comes the age of Big data. Big data impacts all aspects of our lives and society is admitting it. Data processing and other techniques are combined to convert abundant data into valuable information for society, organizations, and people. Specific strategies and approaches are needed to provide better to educate future data scientists to overcome the challenges of Big data. In this paper, we discussed the general concept of data science, Big data, and areas of Big data computing.

Download Full-text

Use Python Data Analysis to Gain Insights from Airbnb Hosts

Advances in Mathematical Physics ◽

10.1155/2021/1079850 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Zichun Tian

Keyword(s):

Data Analysis ◽

Data Processing ◽

Data Science ◽

Compelling Evidence ◽

External Data ◽

Depth Analysis ◽

The Future ◽

The Many

This paper uses Python and its external data processing package to conduct an in-depth analysis machine study of Airbnb review data. Increasingly, travelers are now using Airbnb instead of staying in traditional hotels. However, in such a growing and competitive Airbnb market, many hosts may find it difficult to make their listings attractive among the many. With the development of data science, the author can now analyse large amounts of data to obtain compelling evidence that helps Airbnb hosts find certain patterns in some popular properties. By learning and emulating these patterns, many hosts may be able to increase the popularity of their properties. By using Python to analyse all data from all aspects of Airbnb listings, the author proposes to test and find correlations between certain variables and popular listings. To ensure that the results are representative and general, the author used a database containing many multidimensional details and information about Airbnb listings to date. To obtain the desired results, the author uses the Pandas, NLTK, and matplotlib packages to better process and visualize the data. Finally, the author will make some recommendations to Airbnb hosts based on the evidence generated from the data in many ways. In the future, the author will build on this to further optimize the design.

Download Full-text

Data Science: Measuring Uncertainties

Entropy ◽

10.3390/e22121438 ◽

2020 ◽

Vol 22 (12) ◽

pp. 1438

Author(s):

Carlos Alberto de Braganca Pereira ◽

Adriano Polpo ◽

Agatha Sacramento Rodrigues

Keyword(s):

Data Processing ◽

Data Science ◽

Storage Capacity ◽

Processing And Storage ◽

And Storage

With the increase in data processing and storage capacity, a large amount of data is available [...]

Download Full-text

Gradient Boosting Machine and Deep Learning Approach in Big Data Analysis

Journal of Information Technology Research ◽

10.4018/jitr.2022010101 ◽

2022 ◽

Vol 15 (1) ◽

pp. 1-20

Author(s):

Ravinder Kumar ◽

Lokesh Kumar Shrivastav

Keyword(s):

Big Data ◽

Deep Learning ◽

Data Processing ◽

High Frequency ◽

Data Science ◽

Processing System ◽

High Frequency Data ◽

Gradient Boosting ◽

Frequency Data ◽

Gradient Boosting Machine

Designing a system for analytics of high-frequency data (Big data) is a very challenging and crucial task in data science. Big data analytics involves the development of an efficient machine learning algorithm and big data processing techniques or frameworks. Today, the development of the data processing system is in high demand for processing high-frequency data in a very efficient manner. This paper proposes the processing and analytics of stochastic high-frequency stock market data using a modified version of suitable Gradient Boosting Machine (GBM). The experimental results obtained are compared with deep learning and Auto-Regressive Integrated Moving Average (ARIMA) methods. The results obtained using modified GBM achieves the highest accuracy (R2 = 0.98) and minimum error (RMSE = 0.85) as compared to the other two approaches.

Download Full-text

Management of the radiotherapy quality control using automated Big Data processing

Health Care of the Russian Federation ◽

10.46563/0044-197x-2020-64-6-368-372 ◽

2020 ◽

Vol 64 (6) ◽

pp. 368-372

Author(s):

Aleksandr A. Zavyalov ◽

Dmitry A. Andreev

Keyword(s):

Quality Control ◽

Big Data ◽

Data Processing ◽

Radiation Oncology ◽

Information Technologies ◽

Data Science ◽

Internal Quality ◽

Quality And Safety ◽

Big Data Processing ◽

Safety Control

Introduction. In Moscow, the state-of-the-art information technologies for cancer care data processing are widely used in routine practice. Data Science approaches are increasingly applied in the field of radiation oncology. Novel arrays of radiotherapy performance indices can be introduced into real-time cancer care quality and safety monitoring. The purpose of the study. The short review of the critical structural elements of automated Big Data processing and its perspectives in the light of the internal quality and safety control organization in radiation oncology departments. Material and methods. The PubMed (Medline) and E-Library databases were used to search the articles published mainly in the last 2-3 years. In total, about 20 reports were selected. Results. This paper highlights the applicability of the next-generation Data Science approaches to quality and safety assurance in radiation oncological units. The structural pillars for automated Big Data processing are considered. Big Data processing technologies can facilitate improvements in quality management at any radiotherapy stage. Simultaneously, the high requirements for quality and integrity across indices in the databases are crucial. Detailed dose data may also be linked to outcomes and survival indices integrated into larger registries. Discussion. Radiotherapy quality control could be automated to some extent through further introduction of information technologies making comparisons of the real-time quality measures with digital targets in terms of minimum norms / standards. The implementation of automated systems generating early electronic notifications and rapid alerts in case of serious quality violation could drastically improve the internal medical processes in local clinics. Conclusion. The role of Big Data tools in internal quality and safety control will dramatically increase over time.

Download Full-text

Integration patterns of MongoDB GridFS for advanced data science and big data processing

Materials Today Proceedings ◽

10.1016/j.matpr.2021.03.357 ◽

2021 ◽

Author(s):

Janga Vijay Kumar ◽

Syed Abdul Moeed ◽

C. Madan Kumar ◽

G. Ashmitha

Keyword(s):

Big Data ◽

Data Processing ◽

Data Science ◽

Big Data Processing

Download Full-text

Methodological framework for data processing based on the Data Science paradigm

Record linkage of routine data with cohorts’ data of infants under European and Portuguese law

Real-time data processing in supply chain management: revealing the uncertainty dilemma

Big data processing using Open Source Software- A Questionnaire on the data science

On the Large-scale Graph Data Processing for User Interface Testing in Big Data Science Projects

A Review of Data Science and Big Data Computing

Use Python Data Analysis to Gain Insights from Airbnb Hosts

Data Science: Measuring Uncertainties

Gradient Boosting Machine and Deep Learning Approach in Big Data Analysis

Management of the radiotherapy quality control using automated Big Data processing

Integration patterns of MongoDB GridFS for advanced data science and big data processing

Export Citation Format