Deriving In-Depth Knowledge from IT-Performance Data Simulations

2012 ◽  
Vol 3 (2) ◽  
pp. 13-29 ◽  
Author(s):  
Konstantin Petruch ◽  
Gerrit Tamm ◽  
Vladimir Stantchev

Knowledge of behavioral patterns of a system can contribute to an optimized management and governance of the same system, or of similar systems. While human experience often manifests itself as intuition, intuition can be notoriously misleading, particularly in the case of quantitative data and subtle relations between different data sets. This article augments managerial intuition with knowledge derived from a specific byproduct of automated transaction processing performance and log data of the processing software. More specifically, the authors consider data generated by incident management and ticketing systems within IT support departments. The authors’ approach utilizes a rigorous analysis methodology based on System Dynamics. This allows for identifying real causalities and hidden dependencies between different datasets. The authors can then use them to derive and assemble knowledge bases for improved management and governance in this context. This approach is able to provide more in depth insights as compared to typical data visualization and dashboard techniques. In the experimental results section, the authors demonstrate the feasibility of the approach. It is applied on real life datasets and log files from an international telecommunication provider and considered different improvements in management and governance that result from it.

Author(s):  
Changwon Son ◽  
Farzan Sasangohar ◽  
S. Camille Peres ◽  
Jukrin Moon

Investigating real-life disasters and crises has been challenging due to accompanying difficulties and risks posed by these complex phenomena. Previous research in the emergency management domain has largely relied on qualitative approaches to describe the event after it occurred. To facilitate investigations for more generalizable findings, this paper documents ongoing efforts to design an emergency management simulation testbed called Team Emergency Operations Simulation (TEOS) in which an incident management team (IMT) is situated. First, we describe the design process based on our previous work. Next, we present the overall description of TEOS including representative roles, tasks, and team environments. We also propose measures of team performance of the IMT and propose future research that can be realized through TEOS.


2016 ◽  
Vol 2016 ◽  
pp. 1-18 ◽  
Author(s):  
Mustafa Yuksel ◽  
Suat Gonul ◽  
Gokce Banu Laleci Erturkmen ◽  
Ali Anil Sinaci ◽  
Paolo Invernizzi ◽  
...  

Depending mostly on voluntarily sent spontaneous reports, pharmacovigilance studies are hampered by low quantity and quality of patient data. Our objective is to improve postmarket safety studies by enabling safety analysts to seamlessly access a wide range of EHR sources for collecting deidentified medical data sets of selected patient populations and tracing the reported incidents back to original EHRs. We have developed an ontological framework where EHR sources and target clinical research systems can continue using their own local data models, interfaces, and terminology systems, while structural interoperability and Semantic Interoperability are handled through rule-based reasoning on formal representations of different models and terminology systems maintained in the SALUS Semantic Resource Set. SALUS Common Information Model at the core of this set acts as the common mediator. We demonstrate the capabilities of our framework through one of the SALUS safety analysis tools, namely, the Case Series Characterization Tool, which have been deployed on top of regional EHR Data Warehouse of the Lombardy Region containing about 1 billion records from 16 million patients and validated by several pharmacovigilance researchers with real-life cases. The results confirm significant improvements in signal detection and evaluation compared to traditional methods with the missing background information.


Author(s):  
Barinaadaa John Nwikpe ◽  
Isaac Didi Essi

A new two-parameter continuous distribution called the Two-Parameter Nwikpe (TPAN) distribution is derived in this paper. The new distribution is a mixture of gamma and exponential distributions. A few statistical properties of the new probability distribution have been derived. The shape of its density for different values of the parameters has also been established.  The first four crude moments, the second and third moments about the mean of the new distribution were derived using the method of moment generating function. Other statistical properties derived include; the distribution of order statistics, coefficient of variation and coefficient of skewness. The parameters of the new distribution were estimated using maximum likelihood method. The flexibility of the Two-Parameter Nwikpe (TPAN) distribution was shown by fitting the distribution to three real life data sets. The goodness of fit shows that the new distribution outperforms the one parameter exponential, Shanker and Amarendra distributions for the data sets used for this study.


Author(s):  
Heiko Paulheim ◽  
Christian Bizer

Linked Data on the Web is either created from structured data sources (such as relational databases), from semi-structured sources (such as Wikipedia), or from unstructured sources (such as text). In the latter two cases, the generated Linked Data will likely be noisy and incomplete. In this paper, we present two algorithms that exploit statistical distributions of properties and types for enhancing the quality of incomplete and noisy Linked Data sets: SDType adds missing type statements, and SDValidate identifies faulty statements. Neither of the algorithms uses external knowledge, i.e., they operate only on the data itself. We evaluate the algorithms on the DBpedia and NELL knowledge bases, showing that they are both accurate as well as scalable. Both algorithms have been used for building the DBpedia 3.9 release: With SDType, 3.4 million missing type statements have been added, while using SDValidate, 13,000 erroneous RDF statements have been removed from the knowledge base.


Author(s):  
Barinaadaa John Nwikpe

A new sole parameter probability distribution named the Tornumonkpe distribution has been derived in this paper. The new model is a blend of gamma (2,  and gamma(3  distributions. The shape of its density for different values of the parameter has been shown.  The mathematical expression for the moment generating function, the first three raw moments, the second and third moments about the mean, the distribution of order statistics, coefficient of variation and coefficient of skewness has been given. The parameter of the new distribution was estimated using the method of maximum likelihood. The goodness of fit of the Tornumonkpe distribution was established by fitting the distribution to three real life data sets. Using -2lnL, Bayesian Information Criterion (BIC), and Akaike Information Criterion(AIC) as criterial for selecting the best fitting model, it was revealed that the new distribution outperforms the one parameter exponential, Shanker and Amarendra distributions for the data sets used.


1998 ◽  
Vol 6 (A) ◽  
pp. A13-A19 ◽  
Author(s):  
T.G. Axon ◽  
R. Brown ◽  
S.V. Hammond ◽  
S.J. Maris ◽  
F. Ting

The early use of near infrared (NIR) spectroscopy in the pharmaceutical industry was for raw material identification, later moving on to some conventional “calibrations” for various ingredients in a variety of sample types. The approach throughout this development process has always been “conventional” with one measurement by NIR directly replacing some other slower method, be it Mid-IR identification, or determinations by Karl Fischer, high performance liquid chromatography (HPLC)etc. A significant change in approach was demonstrated by Plugge and Van der Vlies1 in 1993, where a qualitative system was used to provide “quantitative like” answers for potency of a drug substance. Following on from that key paper, there has been a realisation that the qualitative analysis ability of NIR, has the potential to be a powerful tool for process investigation, control and validation. The final step has been to develop “model free” approaches, that consider individual data sets as unique systems, and present the opportunity for NIR to escape the shackles of “calibration” in one form or another. The use of qualitative, or model free, approaches to NIR spectroscopy provides an effective tool for satisfying many of the demands of modern pharmaceutical production. “Straight through production,” “right first time,” “short cycle time” and “total quality management” philosophies can be realised. Eventually the prospect of parametric release may be materialised with a strong contribution from NIR spectroscopy. This paper will illustrate the above points with some real life examles.


Author(s):  
Joel T. Hicks ◽  
Kravitz Michael

There Have Been Many Papers Written And Published On The Subject Of Pedestrian Throw Distance With Automobiles. Many Of These Papers Can Be Obtained From The Society Of Automotive Engineers (Sae). Many Of The Papers Make Assumptions For The Takeoff Angle Of The Pedestrian. Some Of Those Authors Have Performed Tests Using Dummies Or Objects Dropped From Moving Vehicles In Order To Draw Correlations Between Their Formulae Or Equations And Real Life. Generally, Those Authors State How The Formulae Should Be Used, And Specify That If The Pedestrian Mounts The Vehicle, The Formula Is Not Valid. Despite The Disclaimers Of How And When To Use The Published Formulae, The Writer Has Noticed That Accident Reconstructionists Tend To Misuse The Formulae And Often Arrive At Speeds For An Impacting Vehicle Higher Than Can Be Justified By More Rigorous Analysis. This Occurs Because The Theories Behind The Formulae, Its Assumptions, And Its Application Are Not Known Or Are Not Understood Well. The Writer Has Found, In His Experience, That The Reconstruction Practioners Often Use The Distance From The First Impact With The Pedestrian To The Rest Position Of The Pedestrian As The Distance Factor In The Formulae Without Giving Thought As To Whether The Pedestrian Mounted The Vehicle Or Not.


2008 ◽  
pp. 1231-1249
Author(s):  
Jaehoon Kim ◽  
Seong Park

Much of the research regarding streaming data has focused only on real time querying and analysis of recent data stream allowable in memory. However, as data stream mining, or tracking of past data streams, is often required, it becomes necessary to store large volumes of streaming data in stable storage. Moreover, as stable storage has restricted capacity, past data stream must be summarized. The summarization must be performed periodically because streaming data flows continuously, quickly, and endlessly. Therefore, in this paper, we propose an efficient periodic summarization method with a flexible storage allocation. It improves the overall estimation error by flexibly adjusting the size of the summarized data of each local time section. Additionally, as the processing overhead of compression and the disk I/O cost of decompression can be an important factor for quick summarization, we also consider setting the proper size of data stream to be summarized at a time. Some experimental results with artificial data sets as well as real life data show that our flexible approach is more efficient than the existing fixed approach.


Metabolites ◽  
2020 ◽  
Vol 10 (4) ◽  
pp. 171
Author(s):  
Sanjeevan Jahagirdar ◽  
Edoardo Saccenti

Metabolite differential connectivity analysis has been successful in investigating potential molecular mechanisms underlying different conditions in biological systems. Correlation and Mutual Information (MI) are two of the most common measures to quantify association and for building metabolite—metabolite association networks and to calculate differential connectivity. In this study, we investigated the performance of correlation and MI to identify significantly differentially connected metabolites. These association measures were compared on (i) 23 publicly available metabolomic data sets and 7 data sets from other fields, (ii) simulated data with known correlation structures, and (iii) data generated using a dynamic metabolic model to simulate real-life observed metabolite concentration profiles. In all cases, we found more differentially connected metabolites when using correlation indices as a measure for association than MI. We also observed that different MI estimation algorithms resulted in difference in performance when applied to data generated using a dynamic model. We concluded that there is no significant benefit in using MI as a replacement for standard Pearson’s or Spearman’s correlation when the application is to quantify and detect differentially connected metabolites.


Author(s):  
André M. Carrington ◽  
Paul W. Fieguth ◽  
Hammad Qazi ◽  
Andreas Holzinger ◽  
Helen H. Chen ◽  
...  

Abstract Background In classification and diagnostic testing, the receiver-operator characteristic (ROC) plot and the area under the ROC curve (AUC) describe how an adjustable threshold causes changes in two types of error: false positives and false negatives. Only part of the ROC curve and AUC are informative however when they are used with imbalanced data. Hence, alternatives to the AUC have been proposed, such as the partial AUC and the area under the precision-recall curve. However, these alternatives cannot be as fully interpreted as the AUC, in part because they ignore some information about actual negatives. Methods We derive and propose a new concordant partial AUC and a new partial c statistic for ROC data—as foundational measures and methods to help understand and explain parts of the ROC plot and AUC. Our partial measures are continuous and discrete versions of the same measure, are derived from the AUC and c statistic respectively, are validated as equal to each other, and validated as equal in summation to whole measures where expected. Our partial measures are tested for validity on a classic ROC example from Fawcett, a variation thereof, and two real-life benchmark data sets in breast cancer: the Wisconsin and Ljubljana data sets. Interpretation of an example is then provided. Results Results show the expected equalities between our new partial measures and the existing whole measures. The example interpretation illustrates the need for our newly derived partial measures. Conclusions The concordant partial area under the ROC curve was proposed and unlike previous partial measure alternatives, it maintains the characteristics of the AUC. The first partial c statistic for ROC plots was also proposed as an unbiased interpretation for part of an ROC curve. The expected equalities among and between our newly derived partial measures and their existing full measure counterparts are confirmed. These measures may be used with any data set but this paper focuses on imbalanced data with low prevalence. Future work Future work with our proposed measures may: demonstrate their value for imbalanced data with high prevalence, compare them to other measures not based on areas; and combine them with other ROC measures and techniques.


Sign in / Sign up

Export Citation Format

Share Document