data science Latest Research Papers

Assessing the effects of fuel energy consumption, foreign direct investment and GDP on CO2 emission: New data science evidence from Europe & Central Asia

Fuel ◽

10.1016/j.fuel.2021.123098 ◽

2022 ◽

Vol 314 ◽

pp. 123098

Author(s):

Muhammad Mohsin ◽

Sobia Naseem ◽

Muddassar Sarfraz ◽

Tamoor Azam

Keyword(s):

Foreign Direct Investment ◽

Energy Consumption ◽

Central Asia ◽

Direct Investment ◽

Co2 Emission ◽

Data Science ◽

Science Evidence

Documentation Matters: Human-Centered AI System to Assist Data Science Code Documentation in Computational Notebooks

ACM Transactions on Computer-Human Interaction ◽

10.1145/3489465 ◽

2022 ◽

Vol 29 (2) ◽

pp. 1-33

Author(s):

April Yi Wang ◽

Dakuo Wang ◽

Jaimie Drozdal ◽

Michael Muller ◽

Soya Park ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Data Science ◽

Source Code ◽

Generation System ◽

Document Code ◽

Human Data ◽

Within Subjects ◽

The Creation ◽

Api Documentation

Computational notebooks allow data scientists to express their ideas through a combination of code and documentation. However, data scientists often pay attention only to the code, and neglect creating or updating their documentation during quick iterations. Inspired by human documentation practices learned from 80 highly-voted Kaggle notebooks, we design and implement Themisto, an automated documentation generation system to explore how human-centered AI systems can support human data scientists in the machine learning code documentation scenario. Themisto facilitates the creation of documentation via three approaches: a deep-learning-based approach to generate documentation for source code, a query-based approach to retrieve online API documentation for source code, and a user prompt approach to nudge users to write documentation. We evaluated Themisto in a within-subjects experiment with 24 data science practitioners, and found that automated documentation generation techniques reduced the time for writing documentation, reminded participants to document code they would have ignored, and improved participants’ satisfaction with their computational notebook.

Data science in the business environment: Insight management for an Executive MBA

The International Journal of Management Education ◽

10.1016/j.ijme.2021.100588 ◽

2022 ◽

Vol 20 (1) ◽

pp. 100588

Author(s):

Jing Lu

Keyword(s):

Data Science ◽

Business Environment ◽

Executive Mba

Adventures in Financial Data Science

10.1142/12678 ◽

2022 ◽

Author(s):

Graham L Giller

Keyword(s):

Data Science ◽

Financial Data

GeCoAgent: A Conversational Agent for Empowering Genomic Data Extraction and Analysis

ACM Transactions on Computing for Healthcare ◽

10.1145/3464383 ◽

2022 ◽

Vol 3 (1) ◽

pp. 1-29

Author(s):

Pietro Crovari ◽

Sara Pidò ◽

Pietro Pinoli ◽

Anna Bernasconi ◽

Arif Canakoglu ◽

...

Keyword(s):

Data Analysis ◽

Data Science ◽

Low Cost ◽

Data Extraction ◽

End Users ◽

Genomic Information ◽

Computational Tools ◽

Human Genomics ◽

Novel Approach ◽

General Data

With the availability of reliable and low-cost DNA sequencing, human genomics is relevant to a growing number of end-users, including biologists and clinicians. Typical interactions require applying comparative data analysis to huge repositories of genomic information for building new knowledge, taking advantage of the latest findings in applied genomics for healthcare. Powerful technology for data extraction and analysis is available, but broad use of the technology is hampered by the complexity of accessing such methods and tools. This work presents GeCoAgent, a big-data service for clinicians and biologists. GeCoAgent uses a dialogic interface, animated by a chatbot, for supporting the end-users’ interaction with computational tools accompanied by multi-modal support. While the dialogue progresses, the user is accompanied in extracting the relevant data from repositories and then performing data analysis, which often requires the use of statistical methods or machine learning. Results are returned using simple representations (spreadsheets and graphics), while at the end of a session the dialogue is summarized in textual format. The innovation presented in this article is concerned with not only the delivery of a new tool but also our novel approach to conversational technologies, potentially extensible to other healthcare domains or to general data science.

Differentially Private Medical Texts Generation Using Generative Neural Networks

ACM Transactions on Computing for Healthcare ◽

10.1145/3469035 ◽

2022 ◽

Vol 3 (1) ◽

pp. 1-27

Author(s):

Md Momin Al Aziz ◽

Tanbir Ahmed ◽

Tasnia Faequa ◽

Xiaoqian Jiang ◽

Yiyu Yao ◽

...

Keyword(s):

Data Science ◽

Medical Information ◽

Healthcare Providers ◽

Well Being ◽

Original Text ◽

Sensitive Information ◽

Classification Problems ◽

Medical Texts ◽

Health Records ◽

Content Type

Technological advancements in data science have offered us affordable storage and efficient algorithms to query a large volume of data. Our health records are a significant part of this data, which is pivotal for healthcare providers and can be utilized in our well-being. The clinical note in electronic health records is one such category that collects a patient’s complete medical information during different timesteps of patient care available in the form of free-texts. Thus, these unstructured textual notes contain events from a patient’s admission to discharge, which can prove to be significant for future medical decisions. However, since these texts also contain sensitive information about the patient and the attending medical professionals, such notes cannot be shared publicly. This privacy issue has thwarted timely discoveries on this plethora of untapped information. Therefore, in this work, we intend to generate synthetic medical texts from a private or sanitized (de-identified) clinical text corpus and analyze their utility rigorously in different metrics and levels. Experimental results promote the applicability of our generated data as it achieves more than 80\% accuracy in different pragmatic classification problems and matches (or outperforms) the original text data.

Impact on Stock Market across Covid-19 Outbreak

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2022.39874 ◽

2022 ◽

Vol 10 (1) ◽

pp. 572-577

Author(s):

Charmi Gotecha

Keyword(s):

Seasonal Changes ◽

Global Economy ◽

Data Science ◽

Stock Exchange ◽

Paper Analysis ◽

Fiscal Year ◽

Stock Exchanges ◽

The World ◽

The Impact ◽

Global Stock

Abstract: This paper analysis the impact of pandemic over the global stock exchange. The stock listing values are determined by variety of factors including the seasonal changes, catastrophic calamities, pandemic, fiscal year change and many more. This paper significantly provides analysis on the variation of listing price over the world-wide outbreak of novel corona virus. The key reason to imply upon this outbreak was to provide notion on underlying regulation of stock exchanges. Daily closing prices of the stock indices from January 2017 to January 2022 has been utilized for the analysis. The predominant feature of the research is to analyse the fact that does global economy downfall impacts the financial stock exchange. Keywords: Stock Exchange, Matplotlib, Streamlit, Data Science, Web scrapping.

Information Resilience: the nexus of responsible and agile approaches to information use

The VLDB Journal ◽

10.1007/s00778-021-00720-2 ◽

2022 ◽

Author(s):

Shazia Sadiq ◽

Amir Aryani ◽

Gianluca Demartini ◽

Wen Hua ◽

Marta Indulska ◽

...

Keyword(s):

Case Studies ◽

Data Privacy ◽

Data Science ◽

Information Use ◽

Regulatory Compliance ◽

Future Research ◽

Public And Private ◽

Social Good ◽

Public And Private Sector ◽

Effective Use

AbstractThe appetite for effective use of information assets has been steadily rising in both public and private sector organisations. However, whether the information is used for social good or commercial gain, there is a growing recognition of the complex socio-technical challenges associated with balancing the diverse demands of regulatory compliance and data privacy, social expectations and ethical use, business process agility and value creation, and scarcity of data science talent. In this vision paper, we present a series of case studies that highlight these interconnected challenges, across a range of application areas. We use the insights from the case studies to introduce Information Resilience, as a scaffold within which the competing requirements of responsible and agile approaches to information use can be positioned. The aim of this paper is to develop and present a manifesto for Information Resilience that can serve as a reference for future research and development in relevant areas of responsible data management.

qEEG Analysis in the Diagnosis of Alzheimers Disease; a Comparison of Functional Connectivity and Spectral Analysis

10.1101/2022.01.10.475756 ◽

2022 ◽

Author(s):

Maria Semeli Frangopoulou ◽

Maryam Alimardani

Keyword(s):

Spectral Analysis ◽

Functional Connectivity ◽

Data Science ◽

Phase Synchronization ◽

Connectivity Analysis ◽

Alzheimers Disease ◽

Brain Disorder ◽

Alternative Analysis ◽

Functional Connectivity Analysis ◽

Analysis Methods

Alzheimers disease (AD) is a brain disorder that is mainly characterized by a progressive degeneration of neurons in the brain, causing a decline in cognitive abilities and difficulties in engaging in day-to-day activities. This study compares an FFT-based spectral analysis against a functional connectivity analysis based on phase synchronization, for finding known differences between AD patients and Healthy Control (HC) subjects. Both of these quantitative analysis methods were applied on a dataset comprising bipolar EEG montages values from 20 diagnosed AD patients and 20 age-matched HC subjects. Additionally, an attempt was made to localize the identified AD-induced brain activity effects in AD patients. The obtained results showed the advantage of the functional connectivity analysis method compared to a simple spectral analysis. Specifically, while spectral analysis could not find any significant differences between the AD and HC groups, the functional connectivity analysis showed statistically higher synchronization levels in the AD group in the lower frequency bands (delta and theta), suggesting that the AD patients brains are in a phase-locked state. Further comparison of functional connectivity between the homotopic regions confirmed that the traits of AD were localized in the centro-parietal and centro-temporal areas in the theta frequency band (4-8 Hz). The contribution of this study is that it applies a neural metric for Alzheimers detection from a data science perspective rather than from a neuroscience one. The study shows that the combination of bipolar derivations with phase synchronization yields similar results to comparable studies employing alternative analysis methods.

Big Data Analytics for Long-Term Meteorological Observations at Hanford Site

Atmosphere ◽

10.3390/atmos13010136 ◽

2022 ◽

Vol 13 (1) ◽

pp. 136

Author(s):

Huifen Zhou ◽

Huiying Ren ◽

Patrick Royer ◽

Hongfei Hou ◽

Xiao-Ying Yu

Keyword(s):

Big Data ◽

Data Analytics ◽

Data Science ◽

Meteorological Data ◽

Extreme Weather ◽

Department Of Energy ◽

Extreme Weather Events ◽

Weather Events ◽

Classical Statistics

A growing number of physical objects with embedded sensors with typically high volume and frequently updated data sets has accentuated the need to develop methodologies to extract useful information from big data for supporting decision making. This study applies a suite of data analytics and core principles of data science to characterize near real-time meteorological data with a focus on extreme weather events. To highlight the applicability of this work and make it more accessible from a risk management perspective, a foundation for a software platform with an intuitive Graphical User Interface (GUI) was developed to access and analyze data from a decommissioned nuclear production complex operated by the U.S. Department of Energy (DOE, Richland, USA). Exploratory data analysis (EDA), involving classical non-parametric statistics, and machine learning (ML) techniques, were used to develop statistical summaries and learn characteristic features of key weather patterns and signatures. The new approach and GUI provide key insights into using big data and ML to assist site operation related to safety management strategies for extreme weather events. Specifically, this work offers a practical guide to analyzing long-term meteorological data and highlights the integration of ML and classical statistics to applied risk and decision science.

data science
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Assessing the effects of fuel energy consumption, foreign direct investment and GDP on CO2 emission: New data science evidence from Europe & Central Asia

Documentation Matters: Human-Centered AI System to Assist Data Science Code Documentation in Computational Notebooks

Data science in the business environment: Insight management for an Executive MBA

Adventures in Financial Data Science

GeCoAgent: A Conversational Agent for Empowering Genomic Data Extraction and Analysis

Differentially Private Medical Texts Generation Using Generative Neural Networks

Impact on Stock Market across Covid-19 Outbreak

Information Resilience: the nexus of responsible and agile approaches to information use

qEEG Analysis in the Diagnosis of Alzheimers Disease; a Comparison of Functional Connectivity and Spectral Analysis

Big Data Analytics for Long-Term Meteorological Observations at Hanford Site

Export Citation Format

data scienceRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Assessing the effects of fuel energy consumption, foreign direct investment and GDP on CO2 emission: New data science evidence from Europe & Central Asia

Documentation Matters: Human-Centered AI System to Assist Data Science Code Documentation in Computational Notebooks

Data science in the business environment: Insight management for an Executive MBA

Adventures in Financial Data Science

GeCoAgent: A Conversational Agent for Empowering Genomic Data Extraction and Analysis

Differentially Private Medical Texts Generation Using Generative Neural Networks

Impact on Stock Market across Covid-19 Outbreak

Information Resilience: the nexus of responsible and agile approaches to information use

qEEG Analysis in the Diagnosis of Alzheimers Disease; a Comparison of Functional Connectivity and Spectral Analysis

Big Data Analytics for Long-Term Meteorological Observations at Hanford Site

data science
Recently Published Documents