An Interpretable Computer-Aided Diagnosis Method for Periodontitis From Panoramic Radiographs

Frontiers in Physiology ◽

10.3389/fphys.2021.655556 ◽

2021 ◽

Vol 12 ◽

Author(s):

Haoyang Li ◽

Juexiao Zhou ◽

Yi Zhou ◽

Qiang Chen ◽

Yangyang She ◽

...

Keyword(s):

Computational Models ◽

Alveolar Bone ◽

Calibration Method ◽

Disease Diagnosis ◽

Chronic Inflammatory Disease ◽

Data Sets ◽

Data Set ◽

Panoramic Radiographs ◽

The Individual ◽

Severity Degree

Periodontitis is a prevalent and irreversible chronic inflammatory disease both in developed and developing countries, and affects about 20–50% of the global population. The tool for automatically diagnosing periodontitis is highly demanded to screen at-risk people for periodontitis and its early detection could prevent the onset of tooth loss, especially in local communities and health care settings with limited dental professionals. In the medical field, doctors need to understand and trust the decisions made by computational models and developing interpretable models is crucial for disease diagnosis. Based on these considerations, we propose an interpretable method called Deetal-Perio to predict the severity degree of periodontitis in dental panoramic radiographs. In our method, alveolar bone loss (ABL), the clinical hallmark for periodontitis diagnosis, could be interpreted as the key feature. To calculate ABL, we also propose a method for teeth numbering and segmentation. First, Deetal-Perio segments and indexes the individual tooth via Mask R-CNN combined with a novel calibration method. Next, Deetal-Perio segments the contour of the alveolar bone and calculates a ratio for individual tooth to represent ABL. Finally, Deetal-Perio predicts the severity degree of periodontitis given the ratios of all the teeth. The Macro F1-score and accuracy of the periodontitis prediction task in our method reach 0.894 and 0.896, respectively, on Suzhou data set, and 0.820 and 0.824, respectively on Zhongshan data set. The entire architecture could not only outperform state-of-the-art methods and show robustness on two data sets in both periodontitis prediction, and teeth numbering and segmentation tasks, but also be interpretable for doctors to understand the reason why Deetal-Perio works so well.

Download Full-text

The GTC exoplanet transit spectroscopy survey

Astronomy and Astrophysics ◽

10.1051/0004-6361/201834063 ◽

2019 ◽

Vol 622 ◽

pp. A172 ◽

Cited By ~ 7

Author(s):

F. Murgas ◽

G. Chen ◽

E. Pallé ◽

L. Nortmann ◽

G. Nowak

Keyword(s):

Transmission Spectrum ◽

Rayleigh Scattering ◽

Extrasolar Planets ◽

Light Curves ◽

Data Sets ◽

Transmission Spectra ◽

Data Set ◽

Systematic Effects ◽

The Individual ◽

Spectrum Slope

Context. Rayleigh scattering in a hydrogen-dominated exoplanet atmosphere can be detected using ground- or space-based telescopes. However, stellar activity in the form of spots can mimic Rayleigh scattering in the observed transmission spectrum. Quantifying this phenomena is key to our correct interpretation of exoplanet atmospheric properties. Aims. We use the ten-meter Gran Telescopio Canarias (GTC) telescope to carry out a ground-based transmission spectra survey of extrasolar planets to characterize their atmospheres. In this paper we investigate the exoplanet HAT-P-11b, a Neptune-sized planet orbiting an active K-type star. Methods. We obtained long-slit optical spectroscopy of two transits of HAT-P-11b with the Optical System for Imaging and low-Intermediate-Resolution Integrated Spectroscopy (OSIRIS) on August 30, 2016 and September 25, 2017. We integrated the spectrum of HAT-P-11 and one reference star in several spectroscopic channels across the λ ~ 400–785 nm region, creating numerous light curves of the transits. We fit analytic transit curves to the data taking into account the systematic effects and red noise present in the time series in an effort to measure the change of the planet-to-star radius ratio (Rp∕Rs) across wavelength. Results. By fitting both transits together, we find a slope in the transmission spectrum showing an increase of the planetary radius towards blue wavelengths. Closer inspection of the transmission spectrum of the individual data sets reveals that the first transit presents this slope while the transmission spectrum of the second data set is flat. Additionally, we detect hints of Na absorption on the first night, but not on the second. We conclude that the transmission spectrum slope and Na absorption excess found in the first transit observation are caused by unocculted stellar spots. Modeling the contribution of unocculted spots to reproduce the results of the first night we find a spot filling factor of δ = 0.62−0.17+0.20 and a spot-to-photosphere temperature difference of ΔT = 429−299+184 K.

Download Full-text

A classification approach based on variable precision rough sets and cluster validity index function

Engineering Computations ◽

10.1108/ec-11-2012-0297 ◽

2014 ◽

Vol 31 (8) ◽

pp. 1778-1789

Author(s):

Hongkang Lin

Keyword(s):

Optimal Number ◽

Data Sets ◽

Cluster Validity ◽

Cluster Validity Index ◽

Index Method ◽

Data Set ◽

Content Type ◽

The Individual ◽

Variable Precision Rough Sets ◽

Optimal Number Of Clusters

Purpose – The clustering/classification method proposed in this study, designated as the PFV-index method, provides the means to solve the following problems for a data set characterized by imprecision and uncertainty: first, discretizing the continuous values of all the individual attributes within a data set; second, evaluating the optimality of the discretization results; third, determining the optimal number of clusters per attribute; and fourth, improving the classification accuracy (CA) of data sets characterized by uncertainty. The paper aims to discuss these issues. Design/methodology/approach – The proposed method for the solution of the clustering/classifying problem, designated as PFV-index method, combines a particle swarm optimization algorithm, fuzzy C-means method, variable precision rough sets theory, and a new cluster validity index function. Findings – This method could cluster the values of the individual attributes within the data set and achieves both the optimal number of clusters and the optimal CA. Originality/value – The validity of the proposed approach is investigated by comparing the classification results obtained for UCI data sets with those obtained by supervised classification BPNN, decision-tree methods.

Download Full-text

Model Distribution Effects on Likelihood Ratios in Fire Debris Analysis

Separations ◽

10.3390/separations5030044 ◽

2018 ◽

Vol 5 (3) ◽

pp. 44 ◽

Cited By ~ 3

Author(s):

Alyssa Allen ◽

Mary Williams ◽

Nicholas Thurn ◽

Michael Sigman

Keyword(s):

Computational Models ◽

Area Under The Curve ◽

Ground Truth ◽

Data Sets ◽

Likelihood Ratios ◽

Data Set ◽

Discriminant Model ◽

Fire Debris ◽

Characteristic Area ◽

Ignitable Liquid

Computational models for determining the strength of fire debris evidence based on likelihood ratios (LR) were developed and validated against data sets derived from different distributions of ASTM E1618-14 designated ignitable liquid class and substrate pyrolysis contributions using in-silico generated data. The models all perform well in cross validation against the distributions used to generate the model. However, a model generated based on data that does not contain representatives from all of the ASTM E1618-14 classes does not perform well in validation with data sets that contain representatives from the missing classes. A quadratic discriminant model based on a balanced data set (ignitable liquid versus substrate pyrolysis), with a uniform distribution of the ASTM E1618-14 classes, performed well (receiver operating characteristic area under the curve of 0.836) when tested against laboratory-developed casework-relevant samples of known ground truth.

Download Full-text

An Improved Classification Analysis on Utility Aware K-Anonymized Dataset

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2019.7748 ◽

2019 ◽

Vol 16 (2) ◽

pp. 445-452

Author(s):

Kishore S. Verma ◽

A. Rajesh ◽

Adeline J. S. Johnsana

Keyword(s):

Data Mining ◽

Analytical Approach ◽

Value Added ◽

Data Sets ◽

Data Set ◽

Privacy Preserving Data Mining ◽

Privacy Leakage ◽

Anonymized Data ◽

Null Values ◽

The Individual

K anonymization is one of the worldwide used approaches to protect the individual records from the privacy leakage attack of Privacy Preserving Data Mining (PPDM) arena. Typically anonymized dataset will impact the effectiveness of data mining results. Anyhow, currently researchers of PPDM progress in driving their efforts in finding out the optimum trade-off between privacy and utility. This work tends in bringing out the optimum classifier from a set of best classifiers of data mining approaches that are capable enough in generating value-added classifying results on utility aware k-anonymized data set. We performed the analytical approach on the data set that are anonymized in sense of accompanying the anonymity utility factors like null values count and transformation pattern loss. The experimentation is done with three widely used classifiers HNB, PART and J48 and these classifiers are analysed with Accuracy, F-measure, and ROC-AUC which are literately proved to be the perfect measures of classification. Our experimental analysis reveals the best classifiers on the utility aware anonymized data sets of Cell oriented Anonymization (CoA), Attribute oriented Anonymization (AoA) and Record oriented Anonymization (RoA).

Download Full-text

A Dirichlet-Multinomial Bayes Classifier for Disease Diagnosis with Microbial Compositions

mSphere ◽

10.1128/mspheredirect.00536-17 ◽

2017 ◽

Vol 2 (6) ◽

Cited By ~ 3

Author(s):

Xiang Gao ◽

Huaiying Lin ◽

Qunfeng Dong

Keyword(s):

Random Forest ◽

Classification Accuracy ◽

Multinomial Distribution ◽

Disease Diagnosis ◽

Disease Prevalence ◽

Data Sets ◽

Bayes Classifier ◽

Data Set ◽

Random Forest Method ◽

Microbiome Data

ABSTRACT Dysbiosis of microbial communities is associated with various human diseases, raising the possibility of using microbial compositions as biomarkers for disease diagnosis. We have developed a Bayes classifier by modeling microbial compositions with Dirichlet-multinomial distributions, which are widely used to model multicategorical count data with extra variation. The parameters of the Dirichlet-multinomial distributions are estimated from training microbiome data sets based on maximum likelihood. The posterior probability of a microbiome sample belonging to a disease or healthy category is calculated based on Bayes’ theorem, using the likelihood values computed from the estimated Dirichlet-multinomial distribution, as well as a prior probability estimated from the training microbiome data set or previously published information on disease prevalence. When tested on real-world microbiome data sets, our method, called DMBC (for Dirichlet-multinomial Bayes classifier), shows better classification accuracy than the only existing Bayesian microbiome classifier based on a Dirichlet-multinomial mixture model and the popular random forest method. The advantage of DMBC is its built-in automatic feature selection, capable of identifying a subset of microbial taxa with the best classification accuracy between different classes of samples based on cross-validation. This unique ability enables DMBC to maintain and even improve its accuracy at modeling species-level taxa. The R package for DMBC is freely available at https://github.com/qunfengdong/DMBC. IMPORTANCE By incorporating prior information on disease prevalence, Bayes classifiers have the potential to estimate disease probability better than other common machine-learning methods. Thus, it is important to develop Bayes classifiers specifically tailored for microbiome data. Our method shows higher classification accuracy than the only existing Bayesian classifier and the popular random forest method, and thus provides an alternative option for using microbial compositions for disease diagnosis.

Download Full-text

The importance of the accuracy of the experimental data for the prediction of solubility

Journal of the Serbian Chemical Society ◽

10.2298/jsc090809022e ◽

2010 ◽

Vol 75 (4) ◽

pp. 483-495 ◽

Cited By ~ 2

Author(s):

Slavica Eric ◽

Marko Kalinic ◽

Aleksandar Popovic ◽

Halid Makic ◽

Elvisa Civic ◽

...

Keyword(s):

Experimental Data ◽

Computational Models ◽

Linear Regression Analysis ◽

Heuristic Method ◽

Parameter Analysis ◽

Solubility Data ◽

Data Sets ◽

Data Set ◽

Experimental Solubility Data ◽

Experimental Solubility

Aqueous solubility is an important factor influencing several aspects of the pharmacokinetic profile of a drug. Numerous publications present different methodologies for the development of reliable computational models for the prediction of solubility from structure. The quality of such models can be significantly affected by the accuracy of the employed experimental solubility data. In this work, the importance of the accuracy of the experimental solubility data used for model training was investigated. Three data sets were used as training sets - Data Set 1 containing solubility data collected from various literature sources using a few criteria (n = 319), Data Set 2 created by substituting 28 values from Data set 1 with uniformly determined experimental data from one laboratory (n = 319) and Data Set 3 created by including 56 additional components, for which the solubility was also determined under uniform conditions in the same laboratory, in the Data Set 2 (n = 375). The selection of the most significant descriptors was performed by the heuristic method, using one-parameter and multi-parameter analysis. The correlations between the most significant descriptors and solubility were established using multi-linear regression analysis (MLR) for all three investigated data sets. Notable differences were observed between the equations corresponding to different data sets, suggesting that models updated with new experimental data need to be additionally optimized. It was successfully shown that the inclusion of uniform experimental data consistently leads to an improvement in the correlation coefficients. These findings contribute to an emerging consensus that improving the reliability of solubility prediction requires the inclusion of many diverse compounds for which solubility was measured under standardized conditions in the data set.

Download Full-text

Personalized Dynamic Pricing with Machine Learning: High-Dimensional Features and Heterogeneous Elasticity

Management Science ◽

10.1287/mnsc.2020.3680 ◽

2021 ◽

Author(s):

Gah-Yi Ban ◽

N. Bora Keskin

Keyword(s):

Dynamic Pricing ◽

Simulated Data ◽

The United States ◽

Model Parameters ◽

Data Sets ◽

Demand Model ◽

Pricing Policy ◽

Data Set ◽

Customized Pricing ◽

The Individual

We consider a seller who can dynamically adjust the price of a product at the individual customer level, by utilizing information about customers’ characteristics encoded as a d-dimensional feature vector. We assume a personalized demand model, parameters of which depend on s out of the d features. The seller initially does not know the relationship between the customer features and the product demand but learns this through sales observations over a selling horizon of T periods. We prove that the seller’s expected regret, that is, the revenue loss against a clairvoyant who knows the underlying demand relationship, is at least of order [Formula: see text] under any admissible policy. We then design a near-optimal pricing policy for a semiclairvoyant seller (who knows which s of the d features are in the demand model) who achieves an expected regret of order [Formula: see text]. We extend this policy to a more realistic setting, where the seller does not know the true demand predictors, and show that this policy has an expected regret of order [Formula: see text], which is also near-optimal. Finally, we test our theory on simulated data and on a data set from an online auto loan company in the United States. On both data sets, our experimentation-based pricing policy is superior to intuitive and/or widely-practiced customized pricing methods, such as myopic pricing and segment-then-optimize policies. Furthermore, our policy improves upon the loan company’s historical pricing decisions by 47% in expected revenue over a six-month period. This paper was accepted by Noah Gans, stochastic models and simulation.

Download Full-text

Integration of diverse physical-property models: Subsurface zonation and petrophysical parameter estimation based on fuzzy c -means cluster analyses

Geophysics ◽

10.1190/1.2192927 ◽

2006 ◽

Vol 71 (3) ◽

pp. H33-H44 ◽

Cited By ~ 99

Author(s):

Hendrik Paasche ◽

Jens Tronicke ◽

Klaus Holliger ◽

Alan G. Green ◽

Hansruedi Maurer

Keyword(s):

Gamma Ray ◽

Synthetic Data ◽

Physical Property ◽

Geophysical Data ◽

Data Sets ◽

Data Set ◽

Petrophysical Parameters ◽

Fcm Clustering ◽

The Individual ◽

Combine Information

Inversions of an individual geophysical data set can be highly nonunique, and it is generally difficult to determine petrophysical parameters from geophysical data. We show that both issues can be addressed by adopting a statistical multiparameter approach that requires the acquisition, processing, and separate inversion of two or more types of geophysical data. To combine information contained in the physical-property models that result from inverting the individual data sets and to estimate the spatial distribution of petrophysical parameters in regions where they are known at only a few locations, we demonstrate the potential of the fuzzy [Formula: see text]-means (FCM) clustering technique. After testing this new approach on synthetic data, we apply it to limited crosshole georadar, crosshole seismic, gamma-log, and slug-test data acquired within a shallow alluvial aquifer. The derived multiparameter model effectively outlines the major sedimentary units observed in numerous boreholes and provides plausible estimates for the spatial distributions of gamma-ray emitters and hydraulic conductivity.

Download Full-text

Multicrystal approach to crystal structure solution and refinement

Zeitschrift für Kristallographie - Crystalline Materials ◽

10.1524/zkri.219.12.813.55870 ◽

2004 ◽

Vol 219 (12) ◽

Cited By ~ 12

Author(s):

Gavin B. M. Vaughan ◽

Soeren Schmidt ◽

Henning F. Poulsen

Keyword(s):

Single Crystal ◽

Polycrystalline Sample ◽

Test Case ◽

Data Sets ◽

Data Set ◽

Structure Solution ◽

The Individual ◽

Integrated Intensities ◽

Subsequent Integration

AbstractWe present a method in which the contributions from the individual crystallites in a polycrystalline sample are separated and treated as essentially single crystal data sets. The process involves the simultaneous determination of the orientation matrices of the individual crystallites in the sample, the subsequent integration of the individual peaks, and filtering and summing of the subsequent integrated intensities, in order to arrive at a single-crystal like data set which may be treated normally. In order to demonstrate the method, we consider as a test case a small molecule structure, cupric acetate monohyrade. We show that it is possible to obtain a single-crystal quality structure solution and refinement, in which accurate anisotropic thermal parameters and hydrogen atom positions are obtained.

Download Full-text

3D Iterative Spatiotemporal Filtering for Classification of Multitemporal Satellite Data Sets

Photogrammetric Engineering & Remote Sensing ◽

10.14358/pers.86.1.23 ◽

2020 ◽

Vol 86 (1) ◽

pp. 23-31

Author(s):

Hessah Albanwan ◽

Rongjun Qin ◽

Xiaohu Lu ◽

Mao Li ◽

Desheng Liu ◽

...

Keyword(s):

Satellite Data ◽

Surface Model ◽

Data Sets ◽

Temporal Data ◽

Change Analysis ◽

Data Set ◽

Distribution Maps ◽

Multi Temporal ◽

Class Probability ◽

The Individual

The current practice in land cover/land use change analysis relies heavily on the individually classified maps of the multi-temporal data set. Due to varying acquisition conditions (e.g., illumination, sensors, seasonal differences), the classification maps yielded are often inconsistent through time for robust statistical analysis. 3D geometric features have been shown to be stable for assessing differences across the temporal data set. Therefore, in this article we investigate the use of a multi-temporal orthophoto and digital surface model derived from satellite data for spatiotemporal classification. Our approach consists of two major steps: generating per-class probability distribution maps using the random-forest classifier with limited training samples, and making spatiotemporal inferences using an iterative 3D spatiotemporal filter operating on per-class probability maps. Our experimental results demonstrate that the proposed methods can consistently improve the individual classification results by 2%–6% and thus can be an important postclassification refinement approach.

Download Full-text