Identifying the Primary Odor Perception Descriptors by Multi-Output Linear Regression Models

Semantic odor perception descriptors, such as “sweet”, are widely used for product quality assessment in food, beverage, and fragrance industries to profile the odor perceptions. The current literature focuses on developing as many as possible odor perception descriptors. A large number of odor descriptors poses challenges for odor sensory assessment. In this paper, we propose the task of narrowing down the number of odor perception descriptors. To this end, we contrive a novel selection mechanism based on machine learning to identify the primary odor perceptual descriptors (POPDs). The perceptual ratings of non-primary odor perception descriptors (NPOPDs) could be predicted precisely from those of the POPDs. Therefore, the NPOPDs are redundant and could be disregarded from the odor vocabulary. The experimental results indicate that dozens of odor perceptual descriptors are redundant. It is also observed that the sparsity of the data has a negative correlation coefficient with the model performance, while the Pearson correlation between odor perceptions plays an active role. Reducing the odor vocabulary size could simplify the odor sensory assessment and is auxiliary to understand human odor perceptual space.

Download Full-text

ICD10Net: An Artificial Intelligence Algorithm with Medical Background Conducts ICD-10-CM Coding Task with Outstanding Performance (Preprint)

10.2196/preprints.13677 ◽

2019 ◽

Author(s):

Chin Lin ◽

Yu-Sheng Lou ◽

Chia-Cheng Lee ◽

Chia-Jung Hsu ◽

Ding-Chung Wu ◽

...

Keyword(s):

Artificial Intelligence ◽

General Hospital ◽

Pearson Correlation ◽

Model Performance ◽

International Classification Of Diseases ◽

Free Text ◽

Daily Work ◽

Medical Background ◽

Icd 10 ◽

F Measure

BACKGROUND An artificial intelligence-based algorithm has shown a powerful ability for coding the International Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM) in discharge notes. However, its performance still requires improvement compared with human experts. The major disadvantage of the previous algorithm is its lack of understanding medical terminologies. OBJECTIVE We propose some methods based on human-learning process and conduct a series of experiments to validate their improvements. METHODS We compared two data sources for training the word-embedding model: English Wikipedia and PubMed journal abstracts. Moreover, the fixed, changeable, and double-channel embedding tables were used to test their performance. Some additional tricks were also applied to improve accuracy. We used these methods to identify the three-chapter-level ICD-10-CM diagnosis codes in a set of discharge notes. Subsequently, 94,483-labeled discharge notes from June 1, 2015 to June 30, 2017 were used from the Tri-Service General Hospital in Taipei, Taiwan. To evaluate performance, 24,762 discharge notes from July 1, 2017 to December 31, 2017, from the same hospital were used. Moreover, 74,324 additional discharge notes collected from other seven hospitals were also tested. The F-measure is the major global measure of effectiveness. RESULTS In understanding medical terminologies, the PubMed-embedding model (Pearson correlation = 0.60/0.57) shows a better performance compared with the Wikipedia-embedding model (Pearson correlation = 0.35/0.31). In the accuracy of ICD-10-CM coding, the changeable model both used the PubMed- and Wikipedia-embedding model has the highest testing mean F-measure (0.7311 and 0.6639 in Tri-Service General Hospital and other seven hospitals, respectively). Moreover, a proposed method called a hybrid sampling method, an augmentation trick to avoid algorithms identifying negative terms, was found to additionally improve the model performance. CONCLUSIONS The proposed model architecture and training method is named as ICD10Net, which is the first expert level model practically applied to daily work. This model can also be applied in unstructured information extraction from free-text medical writing. We have developed a web app to demonstrate our work (https://linchin.ndmctsgh.edu.tw/app/ICD10/).

Download Full-text

Machine learning augmented predictive and generative model for rupture life in ferritic and austenitic steels

npj Materials Degradation ◽

10.1038/s41529-021-00166-5 ◽

2021 ◽

Vol 5 (1) ◽

Author(s):

Osman Mamun ◽

Madison Wenzlick ◽

Arun Sathanur ◽

Jeffrey Hawk ◽

Ram Devanathan

Keyword(s):

Pearson Correlation ◽

Rupture Life ◽

Model Performance ◽

Austenitic Stainless Steels ◽

Generative Model ◽

Austenitic Steels ◽

Gradient Boosting ◽

Variational Autoencoder ◽

Feature Importance ◽

Boosting Algorithm

AbstractThe Larson–Miller parameter (LMP) offers an efficient and fast scheme to estimate the creep rupture life of alloy materials for high-temperature applications; however, poor generalizability and dependence on the constant C often result in sub-optimal performance. In this work, we show that the direct rupture life parameterization without intermediate LMP parameterization, using a gradient boosting algorithm, can be used to train ML models for very accurate prediction of rupture life in a variety of alloys (Pearson correlation coefficient >0.9 for 9–12% Cr and >0.8 for austenitic stainless steels). In addition, the Shapley value was used to quantify feature importance, making the model interpretable by identifying the effect of various features on the model performance. Finally, a variational autoencoder-based generative model was built by conditioning on the experimental dataset to sample hypothetical synthetic candidate alloys from the learnt joint distribution not existing in both 9–12% Cr ferritic–martensitic alloys and austenitic stainless steel datasets.

Download Full-text

The Effect of Governance On Entrepreneurship: From All Income Economies Perspective

10.21203/rs.3.rs-853223/v1 ◽

2021 ◽

Author(s):

Mekonnen Bogale Abegaz ◽

Kenenisa Lemi Debela ◽

Reta Megersa Hundie

Keyword(s):

Time Series Data ◽

Pearson Correlation ◽

Control Variable ◽

Political Stability ◽

Series Data ◽

Outcome Variable ◽

Net Income ◽

Linear Regression Models ◽

Governance Indicators ◽

Development Levels

Abstract The purpose of this study is to analyze the effect of governance indicators on Entrepreneurship. Explanatory research design with Pearson correlation and multiple linear regression models were applied. Five-year World Bank data (2014–2018) of 126 countries from all economic development levels were used. Worldwide governance indicators considered are voice and accountability, political stability, government effectiveness, regulatory quality, rule of law, and corruption control. Gross net income was taken as a control variable. To measure entrepreneurship, the number of formally registered limited liability businesses as a percentage of the working-age population, was used. To make highly skewed time series data of dependent variable (entrepreneurship) closer to normal, logarithmic transformation was made and heteroscedasticity of residuals was checked. The finding of Pearson correlation shows that there are strong and significant correlations(r > 0.466, p < 0.01) between predictors and the outcome variable and among predictor variables. Regression analysis was computed after two highly collinear variables were dropped from the model using the VIF test. The study found that the remaining four independent variables and the control variable predict 71.5% of the variance in the outcome variable. Except for voice and accountability, all predictors have their own statistically significant influence on entrepreneurship. Thus, working on each predictor up to the standard application can bring incremental changes in new business formation and entry. The researchers believe that this study is of significant interest to policymakers, program developers, entrepreneurs, analysis, and supporters since it provides useful insight on how governance indicators influence entrepreneurship.

Download Full-text

The relationships between height and arm span, mid-upper arm and waist circumferences and sum of four skinfolds in Ellisras rural children aged 8–18 years

Public Health Nutrition ◽

10.1017/s136898001500258x ◽

2015 ◽

Vol 19 (7) ◽

pp. 1195-1199 ◽

Cited By ~ 7

Author(s):

Kotsedi Daniel Monyeki ◽

Michael Matome Sekhotha

Keyword(s):

Waist Circumference ◽

Pearson Correlation ◽

Age Groups ◽

Limpopo Province ◽

Rural Children ◽

Linear Regression Models ◽

Upper Arm ◽

Arm Span ◽

Arm Circumference ◽

Mid Upper Arm Circumference

AbstractObjectiveHeight is required for the assessment of growth and nutritional status, as well as for predictions and standardization of physiological parameters. To determine whether arm span, mid-upper arm and waist circumferences and sum of four skinfolds can be used to predict height, the relationships between these anthropometric variables were assessed among Ellisras rural children aged 8–18 years.DesignThe following parameters were measured according to the International Society for the Advancement of Kinathropometry: height, arm span, mid-upper arm circumference, waist circumference and four skinfolds (suprailiac, subscapular, triceps and biceps). Associations between the variables were assessed using Pearson correlation coefficients and linear regression models.SettingEllisras Longitudinal Study (ELS), Limpopo Province, South Africa.SubjectsBoys (n911) and girls (n858) aged 8–18 years.ResultsMean height was higher than arm span, with differences ranging from 4 cm to 11·5 cm between boys and girls. The correlation between height and arm span was high (ranging from 0·74 to 0·91) withP<0·001. The correlation between height and mid-upper arm circumference, waist circumference and sum of four skinfolds was low (ranging from 0·15 to 0·47) withP<0·00 among girls in the 15–18 years age group.ConclusionsArm span was found to be a good predictor of height. The sum of four skinfolds was significantly associated with height in the older age groups for girls, while waist circumference showed a negative significant association in the same groups.

Download Full-text

Regression-Guided Clustering: A Semisupervised Method for Circulation-to-Environment Synoptic Classification

Journal of Applied Meteorology and Climatology ◽

10.1175/jamc-d-11-0155.1 ◽

2012 ◽

Vol 51 (2) ◽

pp. 185-190 ◽

Cited By ~ 16

Author(s):

Alex J. Cannon

Keyword(s):

Regression Model ◽

Atmospheric Circulation ◽

Clustering Algorithm ◽

Model Performance ◽

Weighting Factor ◽

Synoptic Climatology ◽

Linear Regression Models ◽

Synoptic Scale ◽

Synoptic Classification ◽

Guided Clustering

AbstractRegression-guided clustering is introduced as a means of constructing circulation-to-environment synoptic climatological classifications. Rather than applying an unsupervised clustering algorithm to synoptic-scale atmospheric circulation data, one instead augments the atmospheric circulation dataset with predictions from a supervised regression model linking circulation to environment. The combined dataset is then entered into the clustering algorithm. The level of influence of the environmental dataset can be controlled by a simple weighting factor. The method is generic in that the choice of regression model and clustering algorithm is left to the user. Examples are given using standard multivariate linear regression models and the k-means clustering algorithm, both established methods in synoptic climatology. Results for southern British Columbia, Canada, indicate that model performance can be made to range between that of a fully unsupervised algorithm and a fully supervised algorithm.

Download Full-text

Potential Utility of Routine Programmatic Data in Monitoring National and State-Level HIV Epidemic in Nigeria: Data Triangulation Analysis

10.1101/734293 ◽

2019 ◽

Author(s):

Debem Henry ◽

Aminu Yakubu ◽

Mukhtar Ahmed ◽

Gwamna Jerry ◽

Dalhatu Ibrahim

Keyword(s):

Pearson Correlation ◽

National Level ◽

State Level ◽

Population Based ◽

Hiv Prevalence ◽

Reproductive Age ◽

World Health ◽

Linear Regression Models ◽

Epidemic Control ◽

Hiv Epidemic

AbstractNigeria relies on data from periodic resource-intensive surveys such as antenatal HIV seroprevalence sentinel surveys (ANC-HSS) and population-based National AIDS and Reproductive Health Surveys (NARHS) for its HIV control efforts. Nigeria has not explored the use of readily available routine programmatic data (RPD) to easily inform and monitor epidemic control efforts at local settings in near real time. This study aimed to determine the utility of RPDs (Prevention of Mother-To-Child Transmission [PMTCT] and HIV Testing and Counseling [HTC]) as a proxy for monitoring HIV epidemic in Nigeria. Using World Health Organization 12 step triangulation procedures, we compared state-level seropositivity data from PMTCT and HTC programs to HIV prevalence data from NARHS and ANC-HSS reports in relevant pairs from 2010 to 2014 in Nigeria. The study population was pregnant women and general population. We abstracted relevant data from PEPFAR Nigeria data source and published national survey reports. We compared visual (scatterplots and maps) patterns and trends, and performed Pearson correlation and univariate linear regression models of the estimates for best matched/contiguous years for which data were available. Correlation between PMTCT2014 and ANC-HSS2014 was positive and significant (R=0.7,p<0.001). ANC-HSS2014 and HTC2014 were slightly correlated (R=0.4,p<0.05). Significant correlation was observed between ANC-HSS2010 and PMTCT2013 (R=0.8,p<0.001) and between ANC-HSS2010 and HTC2013 (R=0.6, p<0.001). All RPD sources and ANC-HSS indicated a decreasing trend in national HIV prevalence in Nigeria. PMTCT2014 data showed strong capability of predicting HIV prevalence in ANC-HSS2014 in regression model (B=2.09,p<0.0001). Use of routine PMTCT data in monitoring HIV prevalence among women of reproductive age could be more valid and reliable in local settings than the use of HTC data. Use of RPD to monitor national and sub-national-level HIV epidemic in between national surveys in Nigeria could maximize program resources, and promote a more responsive and efficient actions toward epidemic control.

Download Full-text

Sniff Olfactometry: Temporal effects on odorant mixture perception in humans

10.26434/chemrxiv.10293395 ◽

2019 ◽

Author(s):

Kaifeng Ding ◽

Xiaoyuan Wang ◽

Dmitry Rinberg ◽

Terry Acree

Keyword(s):

Olfactory Bulb ◽

Linear Function ◽

Binary Mixtures ◽

Temporal Resolution ◽

Olfactory Epithelium ◽

Temporal Structure ◽

Odor Perception ◽

Temporal Effects ◽

Recognition Probability ◽

Human Odor

There is evidence in mice and honeybees that signals initiated by odorants at the olfactory epithelium arrive downstream in the olfactory bulb between 10 and 200ms later and that these latencies are ligand dependent. It has recently been proposed that these latencies could be used by mice to identify or classify. Here we demonstrate that humans are sensitive to the timing of individual of odorant presentation. Using a two-alternate forced choice (2AFC) paradigm—subjects chose which odorant they recognized first after they experienced two 70ms puffs separated in time by some interval in the range of -450ms to +450ms. All subject recognition probabilities yielded the same linear function of latency (p<0.05) even though they differed in their recognition thresholds for the components and their recognition probability to detect them in binary mixtures. These results indicate that temporal structure of odor delivery affects human odor perception and sniff olfactometry (SO) has the temporal resolution necessary to measure these effects. <div><br></div>

Download Full-text

Implications for human odor sensing revealed from the statistics of the odorant-receptor interactions

10.1101/283010 ◽

2018 ◽

Author(s):

Ji Hyun Bak ◽

Seogjoo Jang ◽

Changbong Hyeon

Keyword(s):

High Throughput Screening ◽

Odorant Receptor ◽

Olfactory Receptors ◽

Assay Method ◽

Theoretic Approach ◽

Screening Assay ◽

Odor Perception ◽

Activation Kinetics ◽

High Throughput Screening Assay ◽

Human Odor

Binding of odorants to olfactory receptors (ORs) elicits downstream chemical and neural signals, which are further processed to odor perception in the brain. Recently, Mainland et al. [Sci. data, (2015) 2:sdata20152] have measured ≳ 500 pairs of odorant-OR interaction by a high-throughput screening assay method, opening a new avenue to understanding the principles of human odor coding. Here, using a recently developed minimal model for OR activation kinetics [J. Phys. Chem. B (2017) 121, 1304–1311], we characterize the statistics of OR activation by odorants in terms of three empirical parameters: the half-maximum effective concentration EC50, the efficacy, and the basal activity. While the data size of odorants is still limited, the statistics offer meaningful information on the breadth and optimality of the tuning of human ORs to odorants, and allow us to relate the three parameters with the microscopic rate constants and binding affinities that define the OR activation kinetics. Despite the stochastic nature of the response expected at individual OR-odorant level, we assess that the confluence of signals in a neuron released from the multitude of ORs is effectively free of noise and deterministic with respect to changes in odorant concentration. Thus, setting a threshold to the fraction of activated OR copy number for neural spiking binarizes the electrophysiological signal of olfactory sensory neuron, thereby making an information theoretic approach a viable tool in studying the principles of odor perception.

Download Full-text

Correlation between three assessment pain tools in subacromial pain syndrome

Clinical Rehabilitation ◽

10.1177/0269215520947596 ◽

2020 ◽

pp. 026921552094759

Author(s):

Javier Aceituno-Gómez ◽

Juan Avendaño-Coy ◽

Juan José Criado-Álvarez ◽

Gerardo Ávila-Martín ◽

Ana Cecilia Marín-Guerrero ◽

...

Keyword(s):

Shoulder Pain ◽

Visual Analog Scale ◽

Pearson Correlation ◽

Statistical Significance ◽

Pain Syndrome ◽

Assessment Tools ◽

Linear Regression Models ◽

Analog Scale ◽

Subacromial Pain Syndrome ◽

Constant Murley Score

Objective: To compare the correlation of Visual Analog Scale with pain subsections of Shoulder Pain and Disability Index and Constant-Murley Score in subacromial pain syndrome patients. Design: Single cross-sectional analysis. Setting: Hospital Rehabilitation Department. Methods: The assessment tools were applied at baseline. Correlations between Visual Analog Scale, Shoulder Pain and Disability Index and Constant-Murley Score pain subsections were assessed by Pearson correlation coefficient. Linear regression models were calculated between scales. Statistical significance was set at two-sided p < 0.05. Results: Forty-three patients were included. Pearson’s correlation between assessments was for Visual Analog Scale-Shoulder Pain Disability Index-pain ( r = 0.61, p < 0.001) and for Visual Analog Scale-Constant Murley Score-pain were ( r = −0.74, p < 0.001). Visual Analog Scale-Shoulder Pain and Disability Index-pain determination coefficient was r2 = 0.37 and r2 = 0.54 for Visual Analog Scale-Constant-Murley Score-pain. Conclusions: Visual Analog Scale showed better correlation with Constant Murley Score-pain than with Shoulder Pain and Disability Index-pain in subacromial pain syndrome patients.

Download Full-text

Carboplatin dosing by predictive formulae and factors related to its myelosuppressive toxicity

Journal of Clinical Oncology ◽

10.1200/jco.2009.27.15_suppl.e19101 ◽

2009 ◽

Vol 27 (15_suppl) ◽

pp. e19101-e19101

Author(s):

S. J. Ayirookuzhi ◽

J. McLarty ◽

R. Mansour ◽

G. M. Mills

Keyword(s):

Pearson Correlation ◽

Area Under The Curve ◽

Stage Iii ◽

Linear Regression Models ◽

Cell Counts ◽

Significant Difference ◽

Spss Software ◽

Highly Correlated ◽

The Relationship ◽

Carboplatin Dose

e19101 Background: Carboplatin is a commonly used drug in stage III/IV non-small cell lung cancer (NSCLC). Its dose in typically calculated with the Calvert equation that uses the glomerular filtration rate (GFR) from various predictive formulae such as MDRD, Cockroft-Gault, Jeliffe and Wright as well as the targeted area under the curve (AUC) for carboplatin. Myelosuppression is a common toxicity of carboplatin and this study aimed to assess the relationship between dosing and toxicity as well as if there would be any significant difference in dosing based on calculation of GFR from the above formulae in our patient population. Methods: Data from patients with Stage III and IV NSCLC seen between 1/1/99 to 12/31/2007 were analyzed. Patients who received concurrent radiation, who died before the first cycle was completed, as well as patients with missing lab values were excluded. Only the first cycle of the carboplatin based regime was used for analysis with nadir platelets, hemoglobin and wbc's used as endpoints. SPSS software was used for statistical analysis including Pearson correlation, ANCOVA, independent t-tests as well as multivariate linear regression models. Results: Of the 216 patients initially abstracted for analysis only 132 patients were analyzable. Demographically 71 were Caucasian and rest were African-American while 92 were male. The carboplatin dose calculated from all four formulae were highly correlated (p < 0.0001). The drop in the three cell counts were correlated to each other (p< 0.05), particularly the drop in platelet count with other two cell counts. The correlation between the drop in wbc and drop in hemoglobin approached significance (p=0.075). The nadir wbc was significantly associated with BSA (p=0.004), wbc level at baseline (p< 0.000) and approached significance for carboplatin dose (p=0.059) by ANCOVA analysis, while significance was not reached for nadir platelets or nadir hemoglobin. Conclusions: The different predictive equations resulted in similar doses of carboplatin that were statistically significant. BSA correlated significantly with myelosuppression while carboplatin dose only approached significance. No significant financial relationships to disclose.

Download Full-text