scholarly journals Data Science Meets High-Tech Manufacturing – The BTW 2021 Data Science Challenge

Author(s):  
Lucas Woltmann ◽  
Peter Volk ◽  
Michael Dinzinger ◽  
Lukas Gräf ◽  
Sebastian Strasser ◽  
...  

AbstractFor its third installment, the Data Science Challenge of the 19th symposium “Database Systems for Business, Technology and Web” (BTW) of the Gesellschaft für Informatik (GI) tackled the problem of predictive energy management in large production facilities. For the first time, this year’s challenge was organized as a cooperation between Technische Universität Dresden, GlobalFoundries, and ScaDS.AI Dresden/Leipzig. The Challenge’s participants were given real-world production and energy data from the semiconductor manufacturer GlobalFoundries and had to solve the problem of predicting the energy consumption for production equipment. The usage of real-world data gave the participants a hands-on experience of challenges in Big Data integration and analysis. After a leaderboard-based preselection round, the accepted participants presented their approach to an expert jury and audience in a hybrid format. In this article, we give an overview of the main points of the Data Science Challenge, like organization and problem description. Additionally, the winning team presents its solution.

2021 ◽  
Author(s):  
Prasanta Pal ◽  
Shataneek Banerjee ◽  
Amardip Ghosh ◽  
David R. Vago ◽  
Judson Brewer

<div> <div> <div> <p>Knowingly or unknowingly, digital-data is an integral part of our day-to-day lives. Realistically, there is probably not a single day when we do not encounter some form of digital-data. Typically, data originates from diverse sources in various formats out of which time-series is a special kind of data that captures the information about the time-evolution of a system under observation. How- ever, capturing the temporal-information in the context of data-analysis is a highly non-trivial challenge. Discrete Fourier-Transform is one of the most widely used methods that capture the very essence of time-series data. While this nearly 200-year-old mathematical transform, survived the test of time, however, the nature of real-world data sources violates some of the intrinsic properties presumed to be present to be able to be processed by DFT. Adhoc noise and outliers fundamentally alter the true signature of the frequency domain behavior of the signal of interest and as a result, the frequency-domain representation gets corrupted as well. We demonstrate that the application of traditional digital filters as is, may not often reveal an accurate description of the pristine time-series characteristics of the system under study. In this work, we analyze the issues of DFT with real-world data as well as propose a method to address it by taking advantage of insights from modern data-science techniques and particularly our previous work SOCKS. Our results reveal that a dramatic, never-before-seen improvement is possible by re-imagining DFT in the context of real-world data with appropriate curation protocols. We argue that our proposed transformation DFT21 would revolutionize the digital world in terms of accuracy, reliability, and information retrievability from raw-data. </p> </div> </div> </div>


2021 ◽  
Author(s):  
Prasanta Pal ◽  
Shataneek Banerjee ◽  
Amardip Ghosh ◽  
David R. Vago ◽  
Judson Brewer

<div> <div> <div> <p>Knowingly or unknowingly, digital-data is an integral part of our day-to-day lives. Realistically, there is probably not a single day when we do not encounter some form of digital-data. Typically, data originates from diverse sources in various formats out of which time-series is a special kind of data that captures the information about the time-evolution of a system under observation. How- ever, capturing the temporal-information in the context of data-analysis is a highly non-trivial challenge. Discrete Fourier-Transform is one of the most widely used methods that capture the very essence of time-series data. While this nearly 200-year-old mathematical transform, survived the test of time, however, the nature of real-world data sources violates some of the intrinsic properties presumed to be present to be able to be processed by DFT. Adhoc noise and outliers fundamentally alter the true signature of the frequency domain behavior of the signal of interest and as a result, the frequency-domain representation gets corrupted as well. We demonstrate that the application of traditional digital filters as is, may not often reveal an accurate description of the pristine time-series characteristics of the system under study. In this work, we analyze the issues of DFT with real-world data as well as propose a method to address it by taking advantage of insights from modern data-science techniques and particularly our previous work SOCKS. Our results reveal that a dramatic, never-before-seen improvement is possible by re-imagining DFT in the context of real-world data with appropriate curation protocols. We argue that our proposed transformation DFT21 would revolutionize the digital world in terms of accuracy, reliability, and information retrievability from raw-data. </p> </div> </div> </div>


The importance of data science and machine learning is evident in all the domains where any kind of data is generated. The multi aspect analysis and visualizations help the society to come up with useful solutions and formulate policies. This paper takes the live data of current pandemic of Corona Virus and presents multi-faceted views of the data as to help the authorities and Governments to take appropriate decisions to takle this unprecedented problem. Python and its libraries along with Google Colab platform is used to get the results. The best possible techniques and combinations of modules/libraries are used to present the information related to COVID-19..


2018 ◽  
Vol 11 (5) ◽  
pp. 450-460 ◽  
Author(s):  
Brandon Swift ◽  
Lokesh Jain ◽  
Craig White ◽  
Vasu Chandrasekaran ◽  
Aman Bhandari ◽  
...  

2017 ◽  
Vol 11 (2) ◽  
pp. 13-26 ◽  
Author(s):  
Liz Lyon ◽  
Eleanor Mattern

This study reports on the findings from Part 2 of a small-scale analysis of requirements for real-world data science positions and examines three further data science roles: data analyst, data engineer and data journalist. The study examines recent job descriptions and maps their requirements to the current curriculum within the graduate MLIS and Information Science and Technology Masters Programs in the School of Information Sciences (iSchool) at the University of Pittsburgh. From this mapping exercise, model ‘course pathways’ and module ‘stepping stones’ have been identified, as well as course topic gaps and opportunities for collaboration with other Schools. Competency in four specific tools or technologies was required by all three roles (Microsoft Excel, R, Python and SQL), as well as collaborative skills (with both teams of colleagues and with clients). The ability to connect the educational curriculum with real-world positions is viewed as further validation of the translational approach being developed as a foundational principle of the current MLIS curriculum review process 


2021 ◽  
Author(s):  
Rhonda Facile ◽  
Erin Elizabeth Muhlbradt ◽  
Mengchun Gong ◽  
Qing-Na Li ◽  
Vaishali B. Popat ◽  
...  

BACKGROUND Real World Data (RWD) and Real World Evidence (RWE) have an increasingly important role in clinical research and health care decision making in many countries. In order to leverage RWD and generate reliable RWE, a framework must be in place to ensure that the data is well-defined and structured in a way that is semantically interoperable and consistent across stakeholders. The adoption of data standards is one of the cornerstones supporting high-quality evidence for clinical medicine and therapeutics development. CDISC data standards are mature, globally recognized and heavily utilized by the pharmaceutical industry for regulatory submission in the US and Japan and are recommended in Europe and China. Against this backdrop, the CDISC RWD Connect Initiative was initiated to better understand the barriers to implementing CDISC standards for RWD and to identify the tools and guidance needed to more easily implement CDISC standards for this purpose. We believe that bridging the gap between RWD and clinical trial generated data will benefit all stakeholders. OBJECTIVE The aim of this project was to understand the barriers to implementing CDISC standards for Real World Data (RWD) and to identify what tools and guidance may be needed to more easily implement CDISC standards for this purpose. METHODS We conducted a qualitative Delphi survey involving an Expert Advisory Board (EAB) with multiple key stakeholders, with three rounds of input and review. RESULTS In total, 66 experts participated in round 1, 56 participated in round 2 and 49 participated in round 3 of the Delphi Survey. Their input was collected and analyzed culminating in group statements. It was widely agreed that the standardization of RWD is highly necessary, and the primary focus should be on its ability to improve data-sharing and the quality of RWE. The priorities for RWD standardization include electronic health records, such as data shared using HL7 FHIR, and data stemming from observational studies. With different standardization efforts already underway in these areas, a gap analysis should be performed to identify areas where synergies and efficiencies are possible and then collaborate with stakeholders to create, or extend existing, mappings between CDISC and other standards, controlled terminologies and models to represent data originating across different sources. CONCLUSIONS There are many ongoing data standardization efforts that span the spectrum of human health data related activities including, but not limited to, those related to healthcare, public health, product or disease registries and clinical research, each with different definitions, levels of granularity and purpose. Amongst these standardization efforts, CDISC has been successful in standardizing clinical trial-based data for regulation worldwide. However, the complexity of the CDISC standards, and the fact that they were developed for different purposes, combined with the lack of awareness and incentives to using a new standard, insufficient training and implementation support are significant barriers for setting up the use of CDISC standards for RWD. The collection and dissemination of use cases showing in detail how to effectively implement CDISC standards for RWD, developing tools and support systems specifically for the RWD community, and collaboration with other standards development organizations and initiatives are potential steps towards connecting RWD to research. The integrity of RWE is dependent on the quality of the RWD and the data standards utilized in its collection, integration, processing, exchange and reporting. Using CDISC as part of the database schema will help to link clinical trial data and RWD and promote innovation in health data science. The authors believe that CDISC standards, if adapted carefully and presented appropriately to the RWD community, can provide “FAIR” structure and semantics for common clinical concepts and domains and help to bridge the gap between RWD and clinical trial generated data. CLINICALTRIAL Not Applicable


2020 ◽  
Vol 107 (4) ◽  
pp. 719-721 ◽  
Author(s):  
Larsson Omberg ◽  
Elias Chaibub Neto ◽  
Lara M. Mangravite

Research ecosystems within university environments are continuously evolving and requiring more resources and domain specialists to assist with the data lifecycle. Typically, academic researchers and professionals are overcommitted, making it challenging to be up-to-date on recent developments in best practices of data management, curation, transformation, analysis, and visualization. Recently, research groups, university core centers, and Libraries are revitalizing these services to fill in the gaps to aid researchers in finding new tools and approaches to make their work more impactful, sustainable, and replicable. In this paper, we report on a student consultation program built within the University Libraries, that takes an innovative, student-centered approach to meeting the research data needs in a university environment while also providing students with experiential learning opportunities. This student program, DataBridge, trains students to work in multi-disciplinary teams and as student consultants to assist faculty, staff, and students with their real-world, data-intensive research challenges. Centering DataBridge in the Libraries allows students the unique opportunity to work across all disciplines, on problems and in domains that some students may not interact with during their college careers. To encourage students from multiple disciplines to participate, we developed a scaffolded curriculum that allows students from any discipline and skill level to quickly develop the essential data science skill sets and begin contributing their own unique perspectives and specializations to the research consultations. These students, mentored by Informatics faculty in the Libraries, provide research support that can ultimately impact the entire research process. Through our pilot phase, we have found that DataBridge enhances the utilization and openness of data created through research, extends the reach and impact of the work beyond the researcher’s specialized community, and creates a network of student “data champions” across the University who see the value in working with the Library. Here, we describe the evolution of the DataBridge program and outline its unique role in both training the data stewards of the future with regard to FAIR data practices, and in contributing significant value to research projects at Virginia Tech. Ultimately, this work highlights the need for innovative, strategic programs that encourage and enable real-world experience of data curation, data analysis, and data publication for current researchers, all while training the next generation of researchers in these best practices.


Author(s):  
Taehee Kim ◽  
Cheolwoo Ro ◽  
Kiho Suh

Anomaly detection is widely in demand in the field where automated detection of anomalous conditions in many observation tasks. While conventional data science approaches have shown interesting results, deep learning approaches to anomaly detection problems reveal new perspectives of possibilities especially where massive amount of data need to be handled. We develop anomaly detection applications on city train vibration data using deep learning approaches. We carried out preliminary research on anomaly detection in general and applied our real world data to existing solutions. In this paper, we provide a survey on anomaly detection and analyse our results of experiments using deep learning approaches.


2016 ◽  
Vol 22 ◽  
pp. 219
Author(s):  
Roberto Salvatori ◽  
Olga Gambetti ◽  
Whitney Woodmansee ◽  
David Cox ◽  
Beloo Mirakhur ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document