Bayesian networks: Theory, applications and sensitivity issues

2017 ◽  
Vol 01 (01) ◽  
pp. 1630014 ◽  
Author(s):  
Ron S. Kenett

This chapter is about an important tool in the data science workbench, Bayesian networks (BNs). Data science is about generating information from a given data set using applications of statistical methods. The quality of the information derived from data analysis is dependent on various dimensions, including the communication of results, the ability to translate results into actionable tasks and the capability to integrate various data sources [R. S. Kenett and G. Shmueli, On information quality, J. R. Stat. Soc. A 177(1), 3 (2014).] This paper demonstrates, with three examples, how the application of BNs provides a high level of information quality. It expands the treatment of BNs as a statistical tool and provides a wider scope of statistical analysis that matches current trends in data science. For more examples on deriving high information quality with BNs see [R. S. Kenett and G. Shmueli, Information Quality: The Potential of Data and Analytics to Generate Knowledge (John Wiley and Sons, 2016), www.wiley.com/go/information_quality.] The three examples used in the chapter are complementary in scope. The first example is based on expert opinion assessments of risks in the operation of health care monitoring systems in a hospital environment. The second example is from the monitoring of an open source community and is a data rich application that combines expert opinion, social network analysis and continuous operational variables. The third example is totally data driven and is based on an extensive customer satisfaction survey of airline customers. The first section is an introduction to BNs, Sec. 2 provides a theoretical background on BN. Examples are provided in Sec. 3. Section 4 discusses sensitivity analysis of BNs, Sec. 5 lists a range of software applications implementing BNs. Section 6 concludes the chapter.

Author(s):  
S. Palm ◽  
R. Sommer ◽  
A. Tessmann ◽  
U. Stilla

<p><strong>Abstract.</strong> In this paper we propose a strategy to focus ultra-high resolution single channel carborne SAR and airborne circular SAR (CSAR) data to image facades and vertical infrastructure. We illustrate the related theoretical background and the design of an optimal focusing geometry for carborne SAR applications while using backprojection focusing techniques. Of particular interest is thereby the determination of the minimum distance and orientation of the facade to the radar sensor. Potential image distortions due to a wrong choice of these parameters are illustrated. Effects on the final resolution of the data due to the rotation of the focusing geometry compared to typical airborne SAR are discussed. We validated the strategy by driving on conventional roads illuminating facades with an experimental mobile radar mapping (MRM) sensor operating at 300 GHz. We further present an adapted version of the proposed strategy to focus vertical infrastructure in CSAR data sets. By extracting the center coordinate and the principal orientation of an object from GiS data, the focusing plane is designed arbitrarily in the 3D space. For the CSAR data set, a radar sensor particularly designed for circular flight trajectories operating at 94 GHz was evaluated. An electrical pylon was chosen as potential target. In both applications, the final images show a high level of detail. The combination of proposed strategy and radar sensor with very high bandwidth is capable of subcentimeter imaging of facades. The height, shape and dimensions of objects can be extracted directly from the image geometry at very high accuracy.</p>


Author(s):  
S. Sofie Lövdal ◽  
Ruud J.R. Den Hartigh ◽  
George Azzopardi

Purpose: Staying injury free is a major factor for success in sports. Although injuries are difficult to forecast, novel technologies and data-science applications could provide important insights. Our purpose was to use machine learning for the prediction of injuries in runners, based on detailed training logs. Methods: Prediction of injuries was evaluated on a new data set of 74 high-level middle- and long-distance runners, over a period of 7 years. Two analytic approaches were applied. First, the training load from the previous 7 days was expressed as a time series, with each day’s training being described by 10 features. These features were a combination of objective data from a global positioning system watch (eg, duration, distance), together with subjective data about the exertion and success of the training. Second, a training week was summarized by 22 aggregate features, and a time window of 3 weeks before the injury was considered. Results: A predictive system based on bagged XGBoost machine-learning models resulted in receiver operating characteristic curves with average areas under the curves of 0.724 and 0.678 for the day and week approaches, respectively. The results of the day approach especially reflect a reasonably high probability that our system makes correct injury predictions. Conclusions: Our machine-learning-based approach predicts a sizable portion of the injuries, in particular when the model is based on training-load data in the days preceding an injury. Overall, these results demonstrate the possible merits of using machine learning to predict injuries and tailor training programs for athletes.


2020 ◽  

BACKGROUND: This paper deals with territorial distribution of the alcohol and drug addictions mortality at a level of the districts of the Slovak Republic. AIM: The aim of the paper is to explore the relations within the administrative territorial division of the Slovak Republic, that is, between the individual districts and hence, to reveal possibly hidden relation in alcohol and drug mortality. METHODS: The analysis is divided and executed into the two fragments – one belongs to the female sex, the other one belongs to the male sex. The standardised mortality rate is computed according to a sequence of the mathematical relations. The Euclidean distance is employed to compute the similarity within each pair of a whole data set. The cluster analysis examines is performed. The clusters are created by means of the mutual distances of the districts. The data is collected from the database of the Statistical Office of the Slovak Republic for all the districts of the Slovak Republic. The covered time span begins in the year 1996 and ends in the year 2015. RESULTS: The most substantial point is that the Slovak Republic possesses the regional disparities in a field of mortality expressed by the standardised mortality rate computed particularly for the diagnoses assigned to the alcohol and drug addictions at a considerably high level. However, the female sex and the male sex have the different outcome. The Bratislava III District keeps absolutely the most extreme position. It forms an own cluster for the both sexes too. The Topoľčany District bears a similar extreme position from a point of view of the male sex. All the Bratislava districts keep their mutual notable dissimilarity. Contrariwise, evaluation of a development of the regional disparities among the districts looks like notably heterogeneously. CONCLUSIONS: There are considerable regional discrepancies throughout the districts of the Slovak Republic. Hence, it is necessary to create a common platform how to proceed with the solution of this issue.


Author(s):  
Ritu Khandelwal ◽  
Hemlata Goyal ◽  
Rajveer Singh Shekhawat

Introduction: Machine learning is an intelligent technology that works as a bridge between businesses and data science. With the involvement of data science, the business goal focuses on findings to get valuable insights on available data. The large part of Indian Cinema is Bollywood which is a multi-million dollar industry. This paper attempts to predict whether the upcoming Bollywood Movie would be Blockbuster, Superhit, Hit, Average or Flop. For this Machine Learning techniques (classification and prediction) will be applied. To make classifier or prediction model first step is the learning stage in which we need to give the training data set to train the model by applying some technique or algorithm and after that different rules are generated which helps to make a model and predict future trends in different types of organizations. Methods: All the techniques related to classification and Prediction such as Support Vector Machine(SVM), Random Forest, Decision Tree, Naïve Bayes, Logistic Regression, Adaboost, and KNN will be applied and try to find out efficient and effective results. All these functionalities can be applied with GUI Based workflows available with various categories such as data, Visualize, Model, and Evaluate. Result: To make classifier or prediction model first step is learning stage in which we need to give the training data set to train the model by applying some technique or algorithm and after that different rules are generated which helps to make a model and predict future trends in different types of organizations Conclusion: This paper focuses on Comparative Analysis that would be performed based on different parameters such as Accuracy, Confusion Matrix to identify the best possible model for predicting the movie Success. By using Advertisement Propaganda, they can plan for the best time to release the movie according to the predicted success rate to gain higher benefits. Discussion: Data Mining is the process of discovering different patterns from large data sets and from that various relationships are also discovered to solve various problems that come in business and helps to predict the forthcoming trends. This Prediction can help Production Houses for Advertisement Propaganda and also they can plan their costs and by assuring these factors they can make the movie more profitable.


2020 ◽  
Vol 22 (Supplement_3) ◽  
pp. iii464-iii464
Author(s):  
Dharmendra Ganesan ◽  
Nor Faizal Ahmad Bahuri ◽  
Revathi Rajagopal ◽  
Jasmine Loh PY ◽  
Kein Seong Mun ◽  
...  

Abstract The University of Malaya Medical Centre, Kuala Lumpur had acquired a intraoperative MRI (iMRI) brain suite via a public private initiative in September 2015. The MRI brain suite has a SIEMENS 1.5T system with NORAS coil system and NORAS head clamps in a two room solution. We would like to retrospectively review the cranial paediatric neuro-oncology cases that had surgery in this facility from September 2015 till December 2019. We would like to discuss our experience with regard to the clear benefits and the challenges in using such technology to aid in the surgery. The challenges include the physical setting up the paediatric case preoperatively, the preparation and performing the intraoperative scan, the interpretation of intraoperative images and making a decision and the utilisation of the new MRI data set to assist in the navigation to locate the residue safely. Also discuss the utility of the intraoperative images in the decision of subsequent adjuvant management. The use of iMRI also has other technical challenges such as ensuring the perimeter around the patient is free of ferromagnetic material, the process of transfer of the patient to the scanner and as a consequence increased duration of the surgery. CONCLUSION: Many elements in the use of iMRI has a learning curve and it improves with exposure and experience. In some areas only a high level of vigilance and SOP (Standard operating procedure) is required to minimize mishaps. Currently, the iMRI gives the best means of determining extent of resection before concluding the surgery.


2021 ◽  
pp. postgradmedj-2020-139361
Author(s):  
María Matesanz-Fernández ◽  
Teresa Seoane-Pillado ◽  
Iria Iñiguez-Vázquez ◽  
Roi Suárez-Gil ◽  
Sonia Pértega-Díaz ◽  
...  

ObjectiveWe aim to identify patterns of disease clusters among inpatients of a general hospital and to describe the characteristics and evolution of each group.MethodsWe used two data sets from the CMBD (Conjunto mínimo básico de datos - Minimum Basic Hospital Data Set (MBDS)) of the Lucus Augusti Hospital (Spain), hospitalisations and patients, realising a retrospective cohort study among the 74 220 patients discharged from the Medic Area between 01 January 2000 and 31 December 2015. We created multimorbidity clusters using multiple correspondence analysis.ResultsWe identified five clusters for both gender and age. Cluster 1: alcoholic liver disease, alcoholic dependency syndrome, lung and digestive tract malignant neoplasms (age under 50 years). Cluster 2: large intestine, prostate, breast and other malignant neoplasms, lymphoma and myeloma (age over 70, mostly males). Cluster 3: malnutrition, Parkinson disease and other mobility disorders, dementia and other mental health conditions (age over 80 years and mostly women). Cluster 4: atrial fibrillation/flutter, cardiac failure, chronic kidney failure and heart valve disease (age between 70–80 and mostly women). Cluster 5: hypertension/hypertensive heart disease, type 2 diabetes mellitus, ischaemic cardiomyopathy, dyslipidaemia, obesity and sleep apnea, including mostly men (age range 60–80). We assessed significant differences among the clusters when gender, age, number of chronic pathologies, number of rehospitalisations and mortality during the hospitalisation were assessed (p<0001 in all cases).ConclusionsWe identify for the first time in a hospital environment five clusters of disease combinations among the inpatients. These clusters contain several high-incidence diseases related to both age and gender that express their own evolution and clinical characteristics over time.


2020 ◽  
Vol 8 ◽  
Author(s):  
Devasis Bassu ◽  
Peter W. Jones ◽  
Linda Ness ◽  
David Shallcross

Abstract In this paper, we present a theoretical foundation for a representation of a data set as a measure in a very large hierarchically parametrized family of positive measures, whose parameters can be computed explicitly (rather than estimated by optimization), and illustrate its applicability to a wide range of data types. The preprocessing step then consists of representing data sets as simple measures. The theoretical foundation consists of a dyadic product formula representation lemma, and a visualization theorem. We also define an additive multiscale noise model that can be used to sample from dyadic measures and a more general multiplicative multiscale noise model that can be used to perturb continuous functions, Borel measures, and dyadic measures. The first two results are based on theorems in [15, 3, 1]. The representation uses the very simple concept of a dyadic tree and hence is widely applicable, easily understood, and easily computed. Since the data sample is represented as a measure, subsequent analysis can exploit statistical and measure theoretic concepts and theories. Because the representation uses the very simple concept of a dyadic tree defined on the universe of a data set, and the parameters are simply and explicitly computable and easily interpretable and visualizable, we hope that this approach will be broadly useful to mathematicians, statisticians, and computer scientists who are intrigued by or involved in data science, including its mathematical foundations.


2021 ◽  
Vol 11 (22) ◽  
pp. 10596
Author(s):  
Chung-Hong Lee ◽  
Hsin-Chang Yang ◽  
Yenming J. Chen ◽  
Yung-Lin Chuang

Recently, an emerging application field through Twitter messages and algorithmic computation to detect real-time world events has become a new paradigm in the field of data science applications. During a high-impact event, people may want to know the latest information about the development of the event because they want to better understand the situation and possible trends of the event for making decisions. However, often in emergencies, the government or enterprises are usually unable to notify people in time for early warning and avoiding risks. A sensible solution is to integrate real-time event monitoring and intelligence gathering functions into their decision support system. Such a system can provide real-time event summaries, which are updated whenever important new events are detected. Therefore, in this work, we combine a developed Twitter-based real-time event detection algorithm with pre-trained language models for summarizing emergent events. We used an online text-stream clustering algorithm and self-adaptive method developed to gather the Twitter data for detection of emerging events. Subsequently we used the Xsum data set with a pre-trained language model, namely T5 model, to train the summarization model. The Rouge metrics were used to compare the summary performance of various models. Subsequently, we started to use the trained model to summarize the incoming Twitter data set for experimentation. In particular, in this work, we provide a real-world case study, namely the COVID-19 pandemic event, to verify the applicability of the proposed method. Finally, we conducted a survey on the example resulting summaries with human judges for quality assessment of generated summaries. From the case study and experimental results, we have demonstrated that our summarization method provides users with a feasible method to quickly understand the updates in the specific event intelligence based on the real-time summary of the event story.


2020 ◽  
Vol 17 (1) ◽  
pp. 28-37
Author(s):  
Deimantė Krisiukėnienė ◽  
Vaida Pilinkienė

AbstractResearch purpose. The research purpose is to assess and compare the competitiveness of the EU creative industries’ export.Design/Methodology/Approach. The article is organised as follows: Section 1 presents a short theoretical conception of creative industries; Section 2 presents the theoretical background of trade competitiveness indices; Section 3 introduces the research data set, method and variables; Section 4 discusses the results of the revealed comparative advantage index analysis; and the final section presents the conclusions of the research. It should be noted that the research does not cover all possible factors underlying the differences in the external sector performance and thus may need to be complemented with country-specific analysis as warranted. Methods of the research include theoretical review and analysis, evaluation of comparative advantage indices and clustering.Findings. The analysis revealed that the EU countries may gain competitiveness because of the globalisation effects and the development of creative industries. The increase in the revealed comparative advantage (RCA) index during the period 2004–2017 shows rising EU international trade specialisation in creative industries. According to dynamic RCA index results, France, Poland, Slovakia, Slovenia and Spain has competitive advantage in creative industries sectors and could be specified as ‘rising stars’ according to dynamic of their export.Originality/Value/Practical implications. A creative industries analysis is becoming increasingly relevant in scientific research. Fast globalisation growth affects the processes in which closed economies together with their specific sectors are no longer competitive in the market because productivity of countries as well as particular economic sectors depends on international trade liberalisation, technology and innovation. Scientific literature, nevertheless, contains a gap in the area of international trade competitiveness research in creative industries sector.


1999 ◽  
Vol 37 (9) ◽  
pp. 2781-2788 ◽  
Author(s):  
Tomasz A. Łe˛ski ◽  
Marek Gniadkowski ◽  
Anna Skoczyńska ◽  
Elz˙bieta Stefaniuk ◽  
Krzysztof Trzciński ◽  
...  

An outbreak of mupirocin-resistant (MuR) staphylococci was investigated in two wards of a large hospital in Warsaw, Poland. Fifty-three MuR isolates of Staphylococcus aureus, S. epidermidis, S. haemolyticus, S. xylosus, and S. capitis were identified over a 17-month survey which was carried out after introduction of the drug for the treatment of skin infections. The isolates were collected from patients with infections, environmental samples, and carriers; they constituted 19.5% of all staphylococcal isolates identified in the two wards during that time. Almost all the MuR isolates were also resistant to methicillin (methicillin-resistant S. aureus and methicillin-resistant coagulase-negative staphylococci). Seven of the outbreak isolates expressed a low-level-resistance phenotype (MuL), whereas the remaining majority of isolates were found to be highly resistant to mupirocin (MuH). The mupA gene, responsible for the MuH phenotype, has been assigned to three different polymorphic loci among the strains in the collection analyzed. The predominant polymorph, polymorph I (characterized by a mupA-containingEcoRI DNA fragment of about 16 kb), was located on a specific plasmid which was widely distributed among the entire staphylococcal population. All MuR S. aureus isolates were found to represent a single epidemic strain, which was clonally disseminated in both wards. The S. epidermidis population was much more diverse; however, at least four clusters of closely related isolates were identified, which suggested that some strains of this species were also clonally spread in the hospital environment. Six isolates of S. epidermidis were demonstrated to express the MuL and MuH resistance mechanisms simultaneously, and this is the first identification of such dual MuR phenotype-bearing strains. The outbreak was attributed to a high level and inappropriate use of mupirocin, and as a result the dermatological formulation of the drug has been removed from the hospital formulary.


Sign in / Sign up

Export Citation Format

Share Document