Analysis, Discussion, and Evaluations for the Case Studies

The purpose of this chapter is to discuss and analyse the results produced in Chapter 5. To evaluate the proposed models, this chapter compares the models with others existing in the literature. Additionally, the chapter discusses the evaluation measures used to validate the experimental results of Chapter 5. For example, from experiments, GA/DT demonstrated the highest average accuracy (92%) for classifying colon cancer, compared with other algorithms. PSO/DT presented 89%, PSO/SVM presented 89%, and IG/DT presented 89%, demonstrating very good classification accuracy. PSO/NB presented 57% and GA/NB presented 58%: less classification accuracy. Table ‎6.1 lists all accuracies resulting from experiments of case study one, as applied to the full data set. There are 45 algorithmic incorporation methods that have accuracy above 80% when applied to the full dataset. One algorithm presents an accuracy of 92%. Nine others scored below 60%.

Risks ◽  
2021 ◽  
Vol 9 (11) ◽  
pp. 204
Author(s):  
Chamay Kruger ◽  
Willem Daniel Schutte ◽  
Tanja Verster

This paper proposes a methodology that utilises model performance as a metric to assess the representativeness of external or pooled data when it is used by banks in regulatory model development and calibration. There is currently no formal methodology to assess representativeness. The paper provides a review of existing regulatory literature on the requirements of assessing representativeness and emphasises that both qualitative and quantitative aspects need to be considered. We present a novel methodology and apply it to two case studies. We compared our methodology with the Multivariate Prediction Accuracy Index. The first case study investigates whether a pooled data source from Global Credit Data (GCD) is representative when considering the enrichment of internal data with pooled data in the development of a regulatory loss given default (LGD) model. The second case study differs from the first by illustrating which other countries in the pooled data set could be representative when enriching internal data during the development of a LGD model. Using these case studies as examples, our proposed methodology provides users with a generalised framework to identify subsets of the external data that are representative of their Country’s or bank’s data, making the results general and universally applicable.


2020 ◽  
Author(s):  
Kenneth Pickering ◽  
Dale Allen ◽  
Eric Bucsela ◽  
Jos van Geffen ◽  
Henk Eskes ◽  
...  

<p>Nitric oxide (NO) is produced in lightning channels and quickly comes into equilibrium with nitrogen dioxide (NO<sub>2</sub>) in the atmosphere.  The production of NO<sub>x</sub> (NO + NO<sub>2</sub>) leads to subsequent increases in the concentrations of ozone (O<sub>3</sub>) and the hydroxyl radical (OH) and decreases in the concentration of methane (CH<sub>4</sub>), thus impacting the climate system.  Global production of NO<sub>x</sub> from lightning is uncertain by a factor of four.  NO<sub>x</sub> production by lightning will be examined using NO<sub>2</sub> columns from the TROPOspheric Monitoring Instrument (TROPOMI) on board the Copernicus Sentinel-5 Precursor Satellite with an overpass time of approximately 1330 LT and flash rates from the Geostationary Lightning Mapper (GLM) on board the NOAA GOES-16 (75.2° W) and GOES-17 (137.2° W) satellites.  Where there is overlap in coverage of the two GLM instruments, the greater of the two flash counts is used.  Two approaches have been undertaken for this analysis:  a series of case studies of storm systems over the United States, and a gridded analysis over the entire contiguous United States, Central America, northern South America, and surrounding oceans.  A modified Copernicus Sentinel 5P TROPOMI NO<sub>2</sub> data set is used here for the case-study analysis to improve data coverage over deep convective clouds.  In both approaches, only TROPOMI pixels with cloud fraction > 0.95 and cloud pressure < 500 hPa are used.  The stratospheric column is removed from the total slant column, and the result is divided by air mass factors appropriate for deep convective clouds containing lightning NO<sub>x</sub> (LNO<sub>x</sub>).  Case studies have been selected from deep convective systems over and near the United States during the warm seasons of 2018 and 2019.  For each of these systems, NO<sub>x</sub> production per flash is determined by multiplying a TROPOMI-based estimate of the mean tropospheric column of LNO<sub>x</sub> over each system by the storm area and then dividing by a GLM-based estimate of the flashes that contribute to the column.  In the large temporal and spatial scale analysis, the TROPOMI data are aggregated on a 0.5 x 0.5 degree grid and converted to moles LNO<sub>x</sub>*.  GLM flash counts during the one-hour period before TROPOMI overpass are similarly binned. A tropospheric background of LNO<sub>x</sub>* is estimated from grid cells without lightning and subtracted from LNO<sub>x</sub>* in cells with lightning to yield an estimate of freshly produced lightning NO<sub>x</sub>, designated LNO<sub>x</sub>.  Results of the two approaches are compared and discussed with respect to previous LNO<sub>x</sub> per flash estimates.</p><p> </p>


Author(s):  
Fionn Murtagh

The author reviews the theory and practice of determining what parts of a data set are ultrametric. He describes the potential relevance of ultrametric topology as a framework for unconscious thought processes. This view of ultrametric topology as a framework that complements metric-based, conscious, Aristotelian logical reasoning comes from the work of the Chilean psychoanalyst, Ignacio Matte Blanco. Taking text data, the author develops an algorithm for finding local ultrametricity in such data. He applies that in two case studies. The first relates to a large set of dream reports, and therefore can possibly recall traces of unconscious thought processes. The second case study uses Twitter social media, and has the aim of picking up underlying associations. The author's case studies are selective in regard to names of people and objects, and are focused in order to highlight the principle of his approach, which is one of particular pattern finding in textual data.


2021 ◽  
Vol 12 (1) ◽  
pp. 1-11
Author(s):  
Kishore Sugali ◽  
Chris Sprunger ◽  
Venkata N Inukollu

Artificial Intelligence and Machine Learning have been around for a long time. In recent years, there has been a surge in popularity for applications integrating AI and ML technology. As with traditional development, software testing is a critical component of a successful AI/ML application. The development methodology used in AI/ML contrasts significantly from traditional development. In light of these distinctions, various software testing challenges arise. The emphasis of this paper is on the challenge of effectively splitting the data into training and testing data sets. By applying a k-Means clustering strategy to the data set followed by a decision tree, we can significantly increase the likelihood of the training data set to represent the domain of the full dataset and thus avoid training a model that is likely to fail because it has only learned a subset of the full data domain.


Author(s):  
Min Zhang ◽  
Elizaveta Levina ◽  
Dragan Djurdjanovic ◽  
Jun Ni

The classification of workpiece surface patterns is an essential element in trying to understand how functional performance is influenced by the surface geometry. Filter banks have been investigated in literature for capturing the multiscale characterization of the engineering surfaces. Conventionally, parametric representations of the filter outputs were used for classification. In this paper, the histogram estimators of the filter bank outputs from engineering surfaces in combination with the nearest neighbor method for classification are investigated to improve the classification accuracy, which are accomplished by utilizing distribution dissimilarity measures to compare histograms. Furthermore, for large and complex surfaces, the histogram estimators of local surface flatness parameters are also proposed for the purpose of simple computation. Two case studies have been conducted to demonstrate the proposed methods. Influence of the histogram bins for histograms and the dissimilarity measures on classification performance is studied in detail. Results from the first case study show that the proposed method is less effective in classifying small surfaces with clear surface patterns, because the filtering is influenced by the quality of the surface data collected from the measurement sensor. In comparison, results from the second case study show that the proposed method performs better in classifying large surfaces with mild surface pattern differences. The classification accuracy using the conventional method drops from 100% to around 50% in the second case study. In general, one can achieve misclassification errors below 5% in both case studies with the histogram representations of surface parameters and the appropriate selection of the number of bins for histogram construction.


2017 ◽  
Vol 63 (No. 8) ◽  
pp. 347-355 ◽  
Author(s):  
Klepac Václav ◽  
Hampel David

The objective of this paper is the prediction of financial distress (default of payment or insolvency) of 250 agriculture business companies in the EU from which 62 companies defaulted in 2014 with respect to lag of the used attributes. From many types of classification models, there was chosen the Logistic regression, the Support vector machines method with the RBF ANOVA kernel, the Decision Trees and the Adaptive Boosting based on the decision trees to acquire the best results. From the results, it is obvious that with the increasing distance to the bankruptcy, there decreases the average accuracy of the financial distress prediction and there is a greater difference between the active and distressed companies in terms of liquidity, rentability and debt ratios. The Decision trees and Adaptive Boosting offer a better accuracy for the distress prediction than the SVM and logit methods, what is comparable to the previous studies. From the total of 15 accounting variables, there were constructed classification trees by the Decision Trees with the inner feature selection method for the better visualization, what reduces the full data set only to 1 or 2 attributes: ROA and Long-term Debt to Total Assets Ratio in 2011, ROA and Current Ratio in 2012, ROA in 2013 for the discrimination of the distressed companies.


Author(s):  
Zhecheng Zhu ◽  
Bee Hoon Heng ◽  
Kiok Liang Teow

This paper focuses on interactive data visualization techniques and their applications in healthcare systems. Interactive data visualization is a collection of techniques translating data from its numeric format to graphic presentation dynamically for easy understanding and visual impact. Compared to conventional static data visualization techniques, interactive data visualization techniques allow users to self-explore the entire data set by instant slice and dice, quick switching among multiple data sources. Adjustable granularity of interactive data visualization allows for both detailed micro information and aggregated macro information displayed in a single chart. Animated transition adds extra visual impact that describes how system transits from one state to another. When applied to healthcare system, interactive visualization techniques are useful in areas such as information integration, flow or trajectory presentation and location related visualization, etc. In this paper, three case studies are shared to illustrate how interactive data visualization techniques are applied to various aspects of healthcare systems. The first case study shows a pathway visualization representing longitudinal disease progression of a patient cohort. The second case study shows a dashboard profiling different patient cohorts from multiple perspectives. The third case study shows an interactive map illustrating patient geographical distribution at adjustable granularity. All three case studies illustrate that interactive data visualization techniques help quick information access, fast knowledge sharing and better decision making in healthcare system.


2013 ◽  
Vol 427-429 ◽  
pp. 1631-1636
Author(s):  
Xiao Rui Wei ◽  
Yang Ping Li

Classification on geospatial data is different from classical classification in that spatial context must be taken into account. In particular, the validation criterion functions should incorporate both classification accuracy and spatial accuracy. However, direct combination of the two accuracies is cumbersome, due to their different subjects and scales. To circumvent this difficulty, we develop two new criterion functions that indirectly incorporate spatial accuracy into classification accuracy-based functions. Next, we formally introduce a set of ideal properties that an appropriate criterion function should satisfy, giving a more meaningful interpretation to the relative significance coefficient in the weighted scheme. Finally, we compare the proposed new criterion functions with existing ones on a large data set for 1980 US presidential election.


2019 ◽  
Vol 3 (6) ◽  
pp. 369-381 ◽  
Author(s):  
Shujun Zhang ◽  
Mike Clark ◽  
Xuelei Liu ◽  
Donghui Chen ◽  
Paula Thomas ◽  
...  

In this paper, an innovative bionic system will be presented. This system can be used to generate bio-inspired electromagnetic fields (BIEFM) by mimicking natural Earth magnetic fields, the frequencies, strengths and waveforms of body organs and cellular pulsations. The preliminary tests have been carried out to investigate the influences of BIEFM on the ATP levels of people to prove the interactions between BIEMF and the cellular level's bio-processing activities.  This system has been applied for the health enhancement of humans and pets etc. A number of case studies will be present to demonstrate the efficiency and effectiveness of the system. The case study experimental results have shown that this innovative bio-inspired system works efficiently and effectively in enhancing human and animal health. It has been proven that this bio-inspired system can be effectively applied to many areas such as (1) human health enhancement and illness treatment, (2) pet health enhancement, and (3) reduction or elimination of 'jet lag'.


2021 ◽  
Vol 48 (2) ◽  
pp. 221-228
Author(s):  
Jaegook Seung ◽  
Jaegon Kim ◽  
Yeonmi Yang ◽  
Hyungbin Lim ◽  
Van Nhat Thang Le ◽  
...  

The aim of this study was to evaluate the use of easily accessible machine learning application to identify mesiodens, and to compare the ability to identify mesiodens between trained model and human.<br/>A total of 1604 panoramic images (805 images with mesiodens, 799 images without mesiodens) of patients aged 5 – 7 years were used for this study. The model used for machine learning was Google’s teachable machine. Data set 1 was used to train model and to verify the model. Data set 2 was used to compare the ability between the learning model and human group.<br/>As a result of data set 1, the average accuracy of the model was 0.82. After testing data set 2, the accuracy of the model was 0.78. From the resident group and the student group, the accuracy was 0.82, 0.69.<br/>This study developed a model for identifying mesiodens using panoramic radiographs of children in primary and early mixed dentition. The classification accuracy of the model was lower than that of the resident group. However, the classification accuracy (0.78) was higher than that of dental students (0.69), so it could be used to assist the diagnosis of mesiodens for non-expert students or general dentists.


Sign in / Sign up

Export Citation Format

Share Document