Visual exploration of classification models for various data types in risk assessment

2012 ◽  
Vol 11 (3) ◽  
pp. 237-251 ◽  
Author(s):  
Malgorzata Migut ◽  
Marcel Worring

In risk assessment applications well-informed decisions need to be made based on large amounts of multi-dimensional data. In many domains, not only the risk of a wrong decision, but also of the trade-off between the costs of possible decisions are of utmost importance. In this paper we describe a framework to support the decision-making process, which tightly integrates interactive visual exploration with machine learning. The proposed approach uses a series of interactive 2D visualizations of numerical and ordinal data combined with visualization of classification models. These series of visual elements are linked to the classifier’s performance, which is visualized using an interactive performance curve. This interaction allows the decision-maker to steer the classification model and instantly identify the critical, cost-changing data elements in the various linked visualizations. The critical data elements are represented as images in order to trigger associations related to the knowledge of the expert. In this way the data visualization and classification results are not only linked together, but are also linked back to the classification model. Such a visual analytics framework allows the user to interactively explore the costs of his decisions for different settings of the model and, accordingly, use the most suitable classification model. More informed and reliable decisions result. A case study in the forensic psychiatry domain reveals the usefulness of the suggested approach.

BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Ratanond Koonchanok ◽  
Swapna Vidhur Daulatabad ◽  
Quoseena Mir ◽  
Khairi Reda ◽  
Sarath Chandra Janga

Abstract Background Direct-sequencing technologies, such as Oxford Nanopore’s, are delivering long RNA reads with great efficacy and convenience. These technologies afford an ability to detect post-transcriptional modifications at a single-molecule resolution, promising new insights into the functional roles of RNA. However, realizing this potential requires new tools to analyze and explore this type of data. Result Here, we present Sequoia, a visual analytics tool that allows users to interactively explore nanopore sequences. Sequoia combines a Python-based backend with a multi-view visualization interface, enabling users to import raw nanopore sequencing data in a Fast5 format, cluster sequences based on electric-current similarities, and drill-down onto signals to identify properties of interest. We demonstrate the application of Sequoia by generating and analyzing ~ 500k reads from direct RNA sequencing data of human HeLa cell line. We focus on comparing signal features from m6A and m5C RNA modifications as the first step towards building automated classifiers. We show how, through iterative visual exploration and tuning of dimensionality reduction parameters, we can separate modified RNA sequences from their unmodified counterparts. We also document new, qualitative signal signatures that characterize these modifications from otherwise normal RNA bases, which we were able to discover from the visualization. Conclusions Sequoia’s interactive features complement existing computational approaches in nanopore-based RNA workflows. The insights gleaned through visual analysis should help users in developing rationales, hypotheses, and insights into the dynamic nature of RNA. Sequoia is available at https://github.com/dnonatar/Sequoia.


2019 ◽  
Vol 19 (1) ◽  
pp. 3-23
Author(s):  
Aurea Soriano-Vargas ◽  
Bernd Hamann ◽  
Maria Cristina F de Oliveira

We present an integrated interactive framework for the visual analysis of time-varying multivariate data sets. As part of our research, we performed in-depth studies concerning the applicability of visualization techniques to obtain valuable insights. We consolidated the considered analysis and visualization methods in one framework, called TV-MV Analytics. TV-MV Analytics effectively combines visualization and data mining algorithms providing the following capabilities: (1) visual exploration of multivariate data at different temporal scales, and (2) a hierarchical small multiples visualization combined with interactive clustering and multidimensional projection to detect temporal relationships in the data. We demonstrate the value of our framework for specific scenarios, by studying three use cases that were validated and discussed with domain experts.


2020 ◽  
Vol 2020 ◽  
pp. 1-6
Author(s):  
Jian-ye Yuan ◽  
Xin-yuan Nan ◽  
Cheng-rong Li ◽  
Le-le Sun

Considering that the garbage classification is urgent, a 23-layer convolutional neural network (CNN) model is designed in this paper, with the emphasis on the real-time garbage classification, to solve the low accuracy of garbage classification and recycling and difficulty in manual recycling. Firstly, the depthwise separable convolution was used to reduce the Params of the model. Then, the attention mechanism was used to improve the accuracy of the garbage classification model. Finally, the model fine-tuning method was used to further improve the performance of the garbage classification model. Besides, we compared the model with classic image classification models including AlexNet, VGG16, and ResNet18 and lightweight classification models including MobileNetV2 and SuffleNetV2 and found that the model GAF_dense has a higher accuracy rate, fewer Params, and FLOPs. To further check the performance of the model, we tested the CIFAR-10 data set and found the accuracy rates of the model (GAF_dense) are 0.018 and 0.03 higher than ResNet18 and SufflenetV2, respectively. In the ImageNet data set, the accuracy rates of the model (GAF_dense) are 0.225 and 0.146 higher than Resnet18 and SufflenetV2, respectively. Therefore, the garbage classification model proposed in this paper is suitable for garbage classification and other classification tasks to protect the ecological environment, which can be applied to classification tasks such as environmental science, children’s education, and environmental protection.


2005 ◽  
Vol 15 (03) ◽  
pp. 337-352 ◽  
Author(s):  
THOMAS NITSCHE

Data distributions are an abstract notion for describing parallel programs by means of overlapping data structures. A generic data distribution layer serves as a basis for implementing specific data distributions over arbitrary algebraic data types and arrays as well as generic skeletons. The necessary communication operations for exchanging overlapping data elements are derived automatically from the specification of the overlappings. This paper describes how the communication operations used internally by the generic skeletons are derived, especially for the asynchronous and synchronous communication scheduling. As a case study, we discuss the iterative solution of PDEs and compare a hand-coded MPI version with a skeletal one based on overlapping data distributions.


2020 ◽  
Author(s):  
Zhanyou Xu ◽  
Andreomar Kurek ◽  
Steven B. Cannon ◽  
Williams D. Beavis

AbstractSelection of markers linked to alleles at quantitative trait loci (QTL) for tolerance to Iron Deficiency Chlorosis (IDC) has not been successful. Genomic selection has been advocated for continuous numeric traits such as yield and plant height. For ordinal data types such as IDC, genomic prediction models have not been systematically compared. The objectives of research reported in this manuscript were to evaluate the most commonly used genomic prediction method, ridge regression and it’s equivalent logistic ridge regression method, with algorithmic modeling methods including random forest, gradient boosting, support vector machine, K-nearest neighbors, Naïve Bayes, and artificial neural network using the usual comparator metric of prediction accuracy. In addition we compared the methods using metrics of greater importance for decisions about selecting and culling lines for use in variety development and genetic improvement projects. These metrics include specificity, sensitivity, precision, decision accuracy, and area under the receiver operating characteristic curve. We found that Support Vector Machine provided the best specificity for culling IDC susceptible lines, while Random Forest GP models provided the best combined set of decision metrics for retaining IDC tolerant and culling IDC susceptible lines.


Symmetry ◽  
2022 ◽  
Vol 14 (1) ◽  
pp. 102
Author(s):  
Nikolai Vladimirovich Korneev ◽  
Julia Vasilievna Korneeva ◽  
Stasis Petrasovich Yurkevichyus ◽  
Gennady Ivanovich Bakhturin

We identified a set of methods for solving risk assessment problems by forecasting an incident of complex object security based on incident monitoring. The solving problem approach includes the following steps: building and training a classification model using the C4.5 algorithm, a decision tree creation, risk assessment system development, and incident prediction. The last system is a predicative self-configuring neural system that includes a SCNN (self-configuring neural network), an RNN (recurrent neural network), and a predicative model that allows for determining the risk and forecasting the probability of an incident for an object. We proposed and developed: a mathematical model of a neural system; a SCNN architecture, where, for the first time, the fundamental problem of teaching a perceptron SCNN was solved without a teacher by adapting thresholds of activation functions of RNN neurons and a special learning algorithm; and a predicative model that includes a fuzzy output system with a membership function of current incidents of the considered object, which belongs to three fuzzy sets, namely “low risk”, “medium risk”, and “high risk”. For the first time, we gave the definition of the base class of an object’s prediction and SCNN, and the fundamental problem of teaching a perceptron SCNN was solved without a teacher. We propose an approach to neural system implementation for multiple incidents of complex object security. The results of experimental studies of the forecasting error at the level of 2.41% were obtained.


Author(s):  
Devarajan Ramanujan ◽  
William Z. Bernstein

VESPER (Visual Exploration of Similarity and PERformance) is a visual analytics system for exploring similarity metrics and performance metrics derived from computer-aided design (CAD) repositories. It consists of (1) a data processing module that allows analysts to input custom similarity metrics and performance metrics, (2) a visualization module that facilitates navigation of the design spaces through coordinated, interactive visualizations, and (3) a report generation module that allows analysts to export lifecycle data of selected repository items as well as the input metrics for further external validation. In this paper, we discuss the need, design rationale, and implementation details for VESPER. We then apply VESPER to (1) sustainability-focused exploration of parts, and (2) exploration of tool wear and surface roughness in machined parts.


2021 ◽  
Vol 12 (7) ◽  
pp. 358-372
Author(s):  
E. V. Orlova ◽  

The article considers the problem of reducing the banks credit risks associated with the insolvency of borrowers — individuals using financial, socio-economic factors and additional data about borrowers digital footprint. A critical analysis of existing approaches, methods and models in this area has been carried out and a number of significant shortcomings identified that limit their application. There is no comprehensive approach to identifying a borrowers creditworthiness based on information, including data from social networks and search engines. The new methodological approach for assessing the borrowers risk profile based on the phased processing of quantitative and qualitative data and modeling using methods of statistical analysis and machine learning is proposed. Machine learning methods are supposed to solve clustering and classification problems. They allow to automatically determine the data structure and make decisions through flexible and local training on the data. The method of hierarchical clustering and the k-means method are used to identify similar social, anthropometric and financial indicators, as well as indicators characterizing the digital footprint of borrowers, and to determine the borrowers risk profile over group. The obtained homogeneous groups of borrowers with a unique risk profile are further used for detailed data analysis in the predictive classification model. The classification model is based on the stochastic gradient boosting method to predict the risk profile of a potencial borrower. The suggested approach for individuals creditworthiness assessing will reduce the banks credit risks, increase its stability and profitability. The implementation results are of practical importance. Comparative analysis of the effectiveness of the existing and the proposed methodology for assessing credit risk showed that the new methodology provides predictive ana­lytics of heterogeneous information about a potential borrower and the accuracy of analytics is higher. The proposed techniques are the core for the decision support system for justification of individuals credit conditions, minimizing the aggregate credit risks.


2020 ◽  
pp. 019459982094064
Author(s):  
Matthew Shew ◽  
Helena Wichova ◽  
Andres Bur ◽  
Devin C. Koestler ◽  
Madeleine St Peter ◽  
...  

Objective Diagnosis and treatment of Ménière’s disease remains a significant challenge because of our inability to understand what is occurring on a molecular level. MicroRNA (miRNA) perilymph profiling is a safe methodology and may serve as a “liquid biopsy” equivalent. We used machine learning (ML) to evaluate miRNA expression profiles of various inner ear pathologies to predict diagnosis of Ménière’s disease. Study Design Prospective cohort study. Setting Tertiary academic hospital. Subjects and Methods Perilymph was collected during labyrinthectomy (Ménière’s disease, n = 5), stapedotomy (otosclerosis, n = 5), and cochlear implantation (sensorineural hearing loss [SNHL], n = 9). miRNA was isolated and analyzed with the Affymetrix miRNA 4.0 array. Various ML classification models were evaluated with an 80/20 train/test split and cross-validation. Permutation feature importance was performed to understand miRNAs that were critical to the classification models. Results In terms of miRNA profiles for conductive hearing loss versus Ménière’s, 4 models were able to differentiate and identify the 2 disease classes with 100% accuracy. The top-performing models used the same miRNAs in their decision classification model but with different weighted values. All candidate models for SNHL versus Ménière’s performed significantly worse, with the best models achieving 66% accuracy. Ménière’s models showed unique features distinct from SNHL. Conclusions We can use ML to build Ménière’s-specific prediction models using miRNA profile alone. However, ML models were less accurate in predicting SNHL from Ménière’s, likely from overlap of miRNA biomarkers. The power of this technique is that it identifies biomarkers without knowledge of the pathophysiology, potentially leading to identification of novel biomarkers and diagnostic tests.


Sign in / Sign up

Export Citation Format

Share Document