Efficient Algorithms for Mining Long Patterns in Scientific Data Sets

Author(s):  
Ramesh C. Agarwal ◽  
Charu C. Aggarwal
Author(s):  
Soi Avgeridou ◽  
Ilija Djordjevic ◽  
Anton Sabashnikov ◽  
Kaveh Eghbalzadeh ◽  
Laura Suhr ◽  
...  

AbstractExtracorporeal membrane oxygenation (ECMO) plays an important role as a life-saving tool for patients with therapy-refractory cardio-respiratory failure. Especially, for rare and infrequent indications, scientific data is scarce. The conducted paper focuses primarily on our institutional experience with a 19-year-old patient suffering an acute chest syndrome, a pathognomonic pulmonary condition presented by patients with sickle cell disease. After implementation of awake ECMO therapy, the patient was successfully weaned off support and discharged home 22 days after initiation of the extracorporeal circulation. In addition to limited data and current literature, further and larger data sets are necessary to determine the outcome after ECMO therapy for this rare indication.


Author(s):  
Andy Hector

Statistics is a fundamental component of the scientific toolbox, but learning the basics of this area of mathematics is one of the most challenging parts of a research training. This book gives an up-to-date introduction to the classical techniques and modern extensions of linear-model analysis—one of the most useful approaches in the analysis of scientific data in the life and environmental sciences. The book emphasizes an estimation-based approach that takes account of recent criticisms of overuse of probability values and introduces the alternative approach using information criteria. The book is based on the use of the open-source R programming language for statistics and graphics, which is rapidly becoming the lingua franca in many areas of science. This second edition adds new chapters, including one discussing some of the complexities of linear-model analysis and another introducing reproducible research documents using the R Markdown package. Statistics is introduced through worked analyses performed in R using interesting data sets from ecology, evolutionary biology, and environmental science. The data sets and R scripts are available as supporting material.


Big Data ◽  
2016 ◽  
pp. 261-287
Author(s):  
Keqin Wu ◽  
Song Zhang

While uncertainty in scientific data attracts an increasing research interest in the visualization community, two critical issues remain insufficiently studied: (1) visualizing the impact of the uncertainty of a data set on its features and (2) interactively exploring 3D or large 2D data sets with uncertainties. In this chapter, a suite of feature-based techniques is developed to address these issues. First, an interactive visualization tool for exploring scalar data with data-level, contour-level, and topology-level uncertainties is developed. Second, a framework of visualizing feature-level uncertainty is proposed to study the uncertain feature deviations in both scalar and vector data sets. With quantified representation and interactive capability, the proposed feature-based visualizations provide new insights into the uncertainties of both data and their features which otherwise would remain unknown with the visualization of only data uncertainties.


2011 ◽  
Vol 37 (1) ◽  
pp. 57-64 ◽  
Author(s):  
Lisa M. Ballagh ◽  
Bruce H. Raup ◽  
Ruth E. Duerr ◽  
Siri Jodha S. Khalsa ◽  
Christopher Helm ◽  
...  
Keyword(s):  

2000 ◽  
Vol 25 (4) ◽  
pp. 417-436 ◽  
Author(s):  
Todd C. Headrick ◽  
Shlomo S. Sawilowsky

The power methods are simple and efficient algorithms used to generate either univariate or multivariate nonnormal distributions with specified values of (marginal) mean, standard deviation, skew, and kurtosis. The power methods are bounded as are other transformation techniques. Given an exogenous value of skew, there is an associated lower bound of kurtosis. Previous approximations of the boundary for the power methods are either incorrect or inadequate. Data sets from education and psychology can be found to lie within, near, or outside tile boundary of the power methods. In view of this, we derived necessary and sufficient conditions using the Lagrange multiplier method to determine the boundary of the power methods. The conditions for locating and classifying modes for distributions on the boundary were also derived. Self-contained interactive Fortran programs using a Weighted Simplex Procedure were employed to generate tabled values of minimum kurtosis for a given value of skew and power constants for various (non)normal distributions.


2014 ◽  
Vol 70 (10) ◽  
pp. 2502-2509 ◽  
Author(s):  
Loes M. J. Kroon-Batenburg ◽  
John R. Helliwell

Recently, the IUCr (International Union of Crystallography) initiated the formation of a Diffraction Data Deposition Working Group with the aim of developing standards for the representation of raw diffraction data associated with the publication of structural papers. Archiving of raw data serves several goals: to improve the record of science, to verify the reproducibility and to allow detailed checks of scientific data, safeguarding against fraud and to allow reanalysis with future improved techniques. A means of studying this issue is to submit exemplar publications with associated raw data and metadata. In a recent study of the binding of cisplatin and carboplatin to histidine in lysozyme crystals under several conditions, the possible effects of the equipment and X-ray diffraction data-processing software on the occupancies andBfactors of the bound Pt compounds were compared. Initially, 35.3 GB of data were transferred from Manchester to Utrecht to be processed withEVAL. A detailed description and discussion of the availability of metadata was published in a paper that was linked to a local raw data archive at Utrecht University and also mirrored at the TARDIS raw diffraction data archive in Australia. By making these raw diffraction data sets available with the article, it is possible for the diffraction community to make their own evaluation. This led to one of the authors ofXDS(K. Diederichs) to re-integrate the data from crystals that supposedly solely contained bound carboplatin, resulting in the analysis of partially occupied chlorine anomalous electron densities near the Pt-binding sites and the use of several criteria to more carefully assess the diffraction resolution limit. General arguments for archiving raw data, the possibilities of doing so and the requirement of resources are discussed. The problems associated with a partially unknown experimental setup, which preferably should be available as metadata, is discussed. Current thoughts on data compression are summarized, which could be a solution especially for pixel-device data sets with fine slicing that may otherwise present an unmanageable amount of data.


Eos ◽  
2005 ◽  
Vol 86 (50) ◽  
pp. 522 ◽  
Author(s):  
Dawn Wright ◽  
Stephanie Watson ◽  
John Graybeal ◽  
Luis Bermudez
Keyword(s):  

Acta Numerica ◽  
2001 ◽  
Vol 10 ◽  
pp. 313-355 ◽  
Author(s):  
Markus Hegland

Methods for knowledge discovery in data bases (KDD) have been studied for more than a decade. New methods are required owing to the size and complexity of data collections in administration, business and science. They include procedures for data query and extraction, for data cleaning, data analysis, and methods of knowledge representation. The part of KDD dealing with the analysis of the data has been termed data mining. Common data mining tasks include the induction of association rules, the discovery of functional relationships (classification and regression) and the exploration of groups of similar data objects in clustering. This review provides a discussion of and pointers to efficient algorithms for the common data mining tasks in a mathematical framework. Because of the size and complexity of the data sets, efficient algorithms and often crude approximations play an important role.


2014 ◽  
Vol 26 (10) ◽  
pp. 2410-2424 ◽  
Author(s):  
Anand Kumar ◽  
Vladimir Grupcev ◽  
Yongke Yuan ◽  
Jin Huang ◽  
Yi-Cheng Tu ◽  
...  

2020 ◽  
Vol 5 (1) ◽  
pp. 46-56
Author(s):  
Frank Edughom Ekpar

It is a well-known fact that numerous issues in many fields of human endeavor including, but not limited to, science and engineering, medicine, law enforcement and security, economics and finance, governance, psychology, philosophy, religion, and many other fields require the management of arbitrary dimensional data. However, systems permitting direct and efficient management of arbitrary dimensional data currently do not exist. In fact, contemporary systems such as graphical user interfaces for the management of data typically lack even the very concept of arbitrary dimensionality – failing to provide any practical way or means of managing arbitrary dimensional data. Here, we establish the foundational principles for a system permitting practical, direct and efficient management of arbitrary dimensional data. Furthermore, we demonstrate the effectiveness of our system by applying it to an experiment involving eight-dimensional (8D) medical and scientific data sets. Our system has immediate, far-reaching implications for numerous fields of human endeavor – enabling hitherto impossible solutions and applications and leading to deeper insights and improved understanding of numerous issues.


Sign in / Sign up

Export Citation Format

Share Document