scholarly journals DFT21: Discrete Fourier Transform in the 21st century

Author(s):  
Prasanta Pal ◽  
Shataneek Banerjee ◽  
Amardip Ghosh ◽  
David R. Vago ◽  
Judson Brewer

<div> <div> <div> <p>Knowingly or unknowingly, digital-data is an integral part of our day-to-day lives. Realistically, there is probably not a single day when we do not encounter some form of digital-data. Typically, data originates from diverse sources in various formats out of which time-series is a special kind of data that captures the information about the time-evolution of a system under observation. How- ever, capturing the temporal-information in the context of data-analysis is a highly non-trivial challenge. Discrete Fourier-Transform is one of the most widely used methods that capture the very essence of time-series data. While this nearly 200-year-old mathematical transform, survived the test of time, however, the nature of real-world data sources violates some of the intrinsic properties presumed to be present to be able to be processed by DFT. Adhoc noise and outliers fundamentally alter the true signature of the frequency domain behavior of the signal of interest and as a result, the frequency-domain representation gets corrupted as well. We demonstrate that the application of traditional digital filters as is, may not often reveal an accurate description of the pristine time-series characteristics of the system under study. In this work, we analyze the issues of DFT with real-world data as well as propose a method to address it by taking advantage of insights from modern data-science techniques and particularly our previous work SOCKS. Our results reveal that a dramatic, never-before-seen improvement is possible by re-imagining DFT in the context of real-world data with appropriate curation protocols. We argue that our proposed transformation DFT21 would revolutionize the digital world in terms of accuracy, reliability, and information retrievability from raw-data. </p> </div> </div> </div>

2021 ◽  
Author(s):  
Prasanta Pal ◽  
Shataneek Banerjee ◽  
Amardip Ghosh ◽  
David R. Vago ◽  
Judson Brewer

<div> <div> <div> <p>Knowingly or unknowingly, digital-data is an integral part of our day-to-day lives. Realistically, there is probably not a single day when we do not encounter some form of digital-data. Typically, data originates from diverse sources in various formats out of which time-series is a special kind of data that captures the information about the time-evolution of a system under observation. How- ever, capturing the temporal-information in the context of data-analysis is a highly non-trivial challenge. Discrete Fourier-Transform is one of the most widely used methods that capture the very essence of time-series data. While this nearly 200-year-old mathematical transform, survived the test of time, however, the nature of real-world data sources violates some of the intrinsic properties presumed to be present to be able to be processed by DFT. Adhoc noise and outliers fundamentally alter the true signature of the frequency domain behavior of the signal of interest and as a result, the frequency-domain representation gets corrupted as well. We demonstrate that the application of traditional digital filters as is, may not often reveal an accurate description of the pristine time-series characteristics of the system under study. In this work, we analyze the issues of DFT with real-world data as well as propose a method to address it by taking advantage of insights from modern data-science techniques and particularly our previous work SOCKS. Our results reveal that a dramatic, never-before-seen improvement is possible by re-imagining DFT in the context of real-world data with appropriate curation protocols. We argue that our proposed transformation DFT21 would revolutionize the digital world in terms of accuracy, reliability, and information retrievability from raw-data. </p> </div> </div> </div>


2015 ◽  
Vol 26 ◽  
pp. vii99 ◽  
Author(s):  
Yu Uneno ◽  
Kei Taneishi ◽  
Masashi Kanai ◽  
Akiko Tamon ◽  
Kazuya Okamoto ◽  
...  

The importance of data science and machine learning is evident in all the domains where any kind of data is generated. The multi aspect analysis and visualizations help the society to come up with useful solutions and formulate policies. This paper takes the live data of current pandemic of Corona Virus and presents multi-faceted views of the data as to help the authorities and Governments to take appropriate decisions to takle this unprecedented problem. Python and its libraries along with Google Colab platform is used to get the results. The best possible techniques and combinations of modules/libraries are used to present the information related to COVID-19..


Author(s):  
Amir Hossein Adineh ◽  
Zahra Narimani ◽  
Suresh Chandra Satapathy

Over last decades, time series data analysis has been in practice of specific importance. Different domains such as financial data analysis, analyzing biological data and speech recognition inherently deal with time dependent signals. Monitoring the past behavior of signals is a key for precise predicting the behavior of a system in near future. In scenarios such as financial data prediction, the predominant signal has a periodic behavior (starting from beginning of the month, week, etc.) and a general trend and seasonal behavior can also be assumed. Autoregressive Integrated Moving Average (ARIMA) model and its seasonal extension, SARIMA, have been widely used in forecasting time-series data, and are also capable of dealing with the seasonal behavior/trend in the data. Although the behavior of data may be autoregressive and trends and seasonality can be detected and handled by SARIMA, the data is not always exactly compatible with SARIMA (or more generally ARIMA) assumptions. In addition, the existence of missing data is not pre-assumed in SARIMA, while in real-world, there can be always missing data for different reasons such as holidays for which no data may be recorded. For different week days, different working hours may be a cause of observing irregular patterns compared to what is expected by SARIMA assumptions. In this paper, we investigate the effectiveness of applying SARIMA on such real-world data, and demonstrate preprocessing methods that can be applied in order to make the data more suitable to be modeled by SARIMA model. The data in the existing research is derived from transactions of a mutual fund investment company, which contains missing values (single point and intervals) and also irregularities as a result of the number of working hours per week days being different from each other which makes the data inconsistent leading to poor result without preprocessing. In addition, the number of data points was not adequate at the time of analysis in order to fit a SARIM model. Preprocessing steps such as filling missing values and tricks to make data consistent has been proposed to deal with existing problems. Results show that prediction performance of SARIMA on this set of real-world data is significantly improved by applying several preprocessing steps introduced in order to deal with mentioned circumstances. The proposed preprocessing steps can be used in other real-world time-series data analysis.


2018 ◽  
Vol 11 (5) ◽  
pp. 450-460 ◽  
Author(s):  
Brandon Swift ◽  
Lokesh Jain ◽  
Craig White ◽  
Vasu Chandrasekaran ◽  
Aman Bhandari ◽  
...  

2017 ◽  
Vol 11 (2) ◽  
pp. 13-26 ◽  
Author(s):  
Liz Lyon ◽  
Eleanor Mattern

This study reports on the findings from Part 2 of a small-scale analysis of requirements for real-world data science positions and examines three further data science roles: data analyst, data engineer and data journalist. The study examines recent job descriptions and maps their requirements to the current curriculum within the graduate MLIS and Information Science and Technology Masters Programs in the School of Information Sciences (iSchool) at the University of Pittsburgh. From this mapping exercise, model ‘course pathways’ and module ‘stepping stones’ have been identified, as well as course topic gaps and opportunities for collaboration with other Schools. Competency in four specific tools or technologies was required by all three roles (Microsoft Excel, R, Python and SQL), as well as collaborative skills (with both teams of colleagues and with clients). The ability to connect the educational curriculum with real-world positions is viewed as further validation of the translational approach being developed as a foundational principle of the current MLIS curriculum review process 


2021 ◽  
Author(s):  
Rhonda Facile ◽  
Erin Elizabeth Muhlbradt ◽  
Mengchun Gong ◽  
Qing-Na Li ◽  
Vaishali B. Popat ◽  
...  

BACKGROUND Real World Data (RWD) and Real World Evidence (RWE) have an increasingly important role in clinical research and health care decision making in many countries. In order to leverage RWD and generate reliable RWE, a framework must be in place to ensure that the data is well-defined and structured in a way that is semantically interoperable and consistent across stakeholders. The adoption of data standards is one of the cornerstones supporting high-quality evidence for clinical medicine and therapeutics development. CDISC data standards are mature, globally recognized and heavily utilized by the pharmaceutical industry for regulatory submission in the US and Japan and are recommended in Europe and China. Against this backdrop, the CDISC RWD Connect Initiative was initiated to better understand the barriers to implementing CDISC standards for RWD and to identify the tools and guidance needed to more easily implement CDISC standards for this purpose. We believe that bridging the gap between RWD and clinical trial generated data will benefit all stakeholders. OBJECTIVE The aim of this project was to understand the barriers to implementing CDISC standards for Real World Data (RWD) and to identify what tools and guidance may be needed to more easily implement CDISC standards for this purpose. METHODS We conducted a qualitative Delphi survey involving an Expert Advisory Board (EAB) with multiple key stakeholders, with three rounds of input and review. RESULTS In total, 66 experts participated in round 1, 56 participated in round 2 and 49 participated in round 3 of the Delphi Survey. Their input was collected and analyzed culminating in group statements. It was widely agreed that the standardization of RWD is highly necessary, and the primary focus should be on its ability to improve data-sharing and the quality of RWE. The priorities for RWD standardization include electronic health records, such as data shared using HL7 FHIR, and data stemming from observational studies. With different standardization efforts already underway in these areas, a gap analysis should be performed to identify areas where synergies and efficiencies are possible and then collaborate with stakeholders to create, or extend existing, mappings between CDISC and other standards, controlled terminologies and models to represent data originating across different sources. CONCLUSIONS There are many ongoing data standardization efforts that span the spectrum of human health data related activities including, but not limited to, those related to healthcare, public health, product or disease registries and clinical research, each with different definitions, levels of granularity and purpose. Amongst these standardization efforts, CDISC has been successful in standardizing clinical trial-based data for regulation worldwide. However, the complexity of the CDISC standards, and the fact that they were developed for different purposes, combined with the lack of awareness and incentives to using a new standard, insufficient training and implementation support are significant barriers for setting up the use of CDISC standards for RWD. The collection and dissemination of use cases showing in detail how to effectively implement CDISC standards for RWD, developing tools and support systems specifically for the RWD community, and collaboration with other standards development organizations and initiatives are potential steps towards connecting RWD to research. The integrity of RWE is dependent on the quality of the RWD and the data standards utilized in its collection, integration, processing, exchange and reporting. Using CDISC as part of the database schema will help to link clinical trial data and RWD and promote innovation in health data science. The authors believe that CDISC standards, if adapted carefully and presented appropriately to the RWD community, can provide “FAIR” structure and semantics for common clinical concepts and domains and help to bridge the gap between RWD and clinical trial generated data. CLINICALTRIAL Not Applicable


Author(s):  
Leonid Schwenke ◽  
Martin Atzmueller

While Transformers have shown their advantages consideringtheir learning performance, their lack of explainabilityand interpretability is still a major problem.This specifically relates to the processing of time series,as a specific form of complex data. In this paper,we propose an approach for visualizing abstracted informationin order to enable computational sensemakingand local interpretability on the respective Transformermodel. Our results demonstrate the efficacy ofthe proposed abstraction method and visualization, utilizingboth synthetic and real world data for evaluation.


2021 ◽  
Vol 2021 ◽  
pp. 1-13
Author(s):  
Tinofirei Museba ◽  
Fulufhelo Nelwamondo ◽  
Khmaies Ouahada ◽  
Ayokunle Akinola

For most real-world data streams, the concept about which data is obtained may shift from time to time, a phenomenon known as concept drift. For most real-world applications such as nonstationary time-series data, concept drift often occurs in a cyclic fashion, and previously seen concepts will reappear, which supports a unique kind of concept drift known as recurring concepts. A cyclically drifting concept exhibits a tendency to return to previously visited states. Existing machine learning algorithms handle recurring concepts by retraining a learning model if concept is detected, leading to the loss of information if the concept was well learned by the learning model, and the concept will recur again in the next learning phase. A common remedy for most machine learning algorithms is to retain and reuse previously learned models, but the process is time-consuming and computationally prohibitive in nonstationary environments to appropriately select any optimal ensemble classifier capable of accurately adapting to recurring concepts. To learn streaming data, fast and accurate machine learning algorithms are needed for time-dependent applications. Most of the existing algorithms designed to handle concept drift do not take into account the presence of recurring concept drift. To accurately and efficiently handle recurring concepts with minimum computational overheads, we propose a novel and evolving ensemble method called Recurrent Adaptive Classifier Ensemble (RACE). The algorithm preserves an archive of previously learned models that are diverse and always trains both new and existing classifiers. The empirical experiments conducted on synthetic and real-world data stream benchmarks show that RACE significantly adapts to recurring concepts more accurately than some state-of-the-art ensemble classifiers based on classifier reuse.


Sign in / Sign up

Export Citation Format

Share Document