missing variables
Recently Published Documents


TOTAL DOCUMENTS

54
(FIVE YEARS 16)

H-INDEX

10
(FIVE YEARS 1)

2022 ◽  
Vol 2022 ◽  
pp. 1-12
Author(s):  
Xuezhong Fu

In order to improve the effect of financial data classification and extract effective information from financial data, this paper improves the data mining algorithm, uses linear combination of principal components to represent missing variables, and performs dimensionality reduction processing on multidimensional data. In order to achieve the standardization of sample data, this paper standardizes the data and combines statistical methods to build an intelligent financial data processing model. In addition, starting from the actual situation, this paper proposes the artificial intelligence classification and statistical methods of financial data in smart cities and designs data simulation experiments to conduct experimental analysis on the methods proposed in this paper. From the experimental results, the artificial intelligence classification and statistical method of financial data in smart cities proposed in this paper can play an important role in the statistical analysis of financial data.


2021 ◽  
Vol 13 (24) ◽  
pp. 13533
Author(s):  
Ziyun Zhang ◽  
Sen Guo

With the internationalization of RMB and the openness of China’s capital account, the amount of foreign institutions investing in China has increased significantly. Based on China’s daily data from January 2007 to September 2021, this study investigated the factors that affect the RMB carry-trade return for sustainability. By comparing the results of the carry return before and after the foreign-exchange reform on 11 August 2015, this study found that the RMB carry return has become more traceable after the exchange-rate reform. Meanwhile, the model fitting degree of explaining the RMB carry return was higher, and there were fewer missing variables. Therefore, this study found that after the RMB-exchange-rate mechanism became more market oriented, the RMB carry return became more reasonable, and the carry trade can play a better role in foreign-exchange pricing. Meanwhile, after using the RMB non-deliverable forwards (NDF) to construct a carry-trade position to perform the robustness test, such results were consistent. With different results before and after the exchange-rate reform, this study can provide references for policy makers and investors for sustainable development.


2021 ◽  
Vol 39 (28_suppl) ◽  
pp. 143-143
Author(s):  
Marita Yaghi ◽  
Nadeem Bilani ◽  
Iktej Jabbal ◽  
Leah Elson ◽  
Maroun Bou Zerdan ◽  
...  

143 Background: The National Cancer Database (NCDB) is a large registry that collates real-world medical record data from millions of patients in the United States. A previous published study using the NCDB found that gaps in the medical record were associated with worse overall survival outcomes. We investigated cases of breast cancer in this registry to understand which factors were predictive of records with missing data. Methods: We screened for missing data in 54 clinical parameters documented by the NCDB pertaining to the diagnosis, workup, management and survival of patients with breast cancer diagnosed between 2004 and 2017. We performed univariate statistics to describe gaps in the dataset, followed by multivariate logistic regression modeling to identify factors associated lack of completeness of the medical record – defined as the presence of > 3 missing variables. Results: A total of n = 2,981,732 patients were included in this analysis. The median number of missing variables per record was 3 (5.6% of clinical parameters surveyed). 52.1% of records had ≤ 3 variables missing, while 47.9% had > 3 variables missing. Predictors of a record with missing data in > 3 variables were: age, race, insurance status and facility type . Regarding race, we found that records of Asian patients were less likely to have missing data as compared to records of White patients (OR 0.75, 95% CI: 0.74-0.76, p < 0.001). Conversely, there was no difference in completeness of the medical record between Black and White patients (OR 0.99, 95% CI: 0.99-1.01, p = 0.890). Patients with private insurance (OR 0.77, 95% CI 0.76-0.79, p < 0.001), or Medicaid (OR 0.65, 95% CI 0.64-0.67, p < 0.001) or Medicare (OR 0.66, 95% CI 0.64-0.67, p < 0.001) were also less likely to have missing data compared to uninsured patients, with patients on private insurance being the least likely to have incomplete records. Finally, patient records from academic programs (OR 0.91, 95% CI 0.90-0.92, p < 0.001) were less likely to contain > 3 missing variables compared to records from patients treated at community cancer programs. Conclusions: Despite high fidelity of NCDB data, social determinants of health including insurance status and treating facility type, were associated with differences in the completeness of the medical record. Improvements in documentation and data quality are necessary to optimize use of real-world data in cancer registries. Further research is needed to determine how these differences could be independently associated with inferior outcomes.


2021 ◽  
Vol 118 (11) ◽  
pp. e2026405118
Author(s):  
Raudel Avila ◽  
Chenhang Li ◽  
Yeguang Xue ◽  
John A. Rogers ◽  
Yonggang Huang

Drug delivery systems featuring electrochemical actuation represent an emerging class of biomedical technology with programmable volume/flowrate capabilities for localized delivery. Recent work establishes applications in neuroscience experiments involving small animals in the context of pharmacological response. However, for programmable delivery, the available flowrate control and delivery time models fail to consider key variables of the drug delivery system––microfluidic resistance and membrane stiffness. Here we establish an analytical model that accounts for the missing variables and provides a scalable understanding of each variable influence in the physics of delivery process (i.e., maximum flowrate, delivery time). This analytical model accounts for the key parameters––initial environmental pressure, initial volume, microfluidic resistance, flexible membrane, current, and temperature––to control the delivery and bypasses numerical simulations allowing faster system optimization for different in vivo experiments. We show that the delivery process is controlled by three nondimensional parameters, and the volume/flowrate results from the proposed analytical model agree with the numerical results and experiments. These results have relevance to the many emerging applications of programmable delivery in clinical studies within the neuroscience and broader biomedical communities.


2021 ◽  
Vol 16 (2) ◽  
pp. 54
Author(s):  
Lucong Wang ◽  
Jingchao Dai

As a tool of communication and a carrier of information, language learning needs to pay the price and cost, language ability is also a form of human capital, which has an impact on income and employment. In order to found the economics effects of language ability, we studied relationship between language ability and income using China Family Panel Studies (CFPS) data from 2010 to 2016. We found that improvement of both Mandarin proficiency and English proficiency can significantly promote the total income and wage income. Considering the missing variables in the model and the endogeneity problem of reverse causality, instrumental variables were used to deal with the endogeneity problem, and the income effect of language ability still existed. From the perspective of the influence mechanism, social capital is an important channel through which language ability affects the total income of employees.


2021 ◽  
Vol 308 ◽  
pp. 01026
Author(s):  
Yang Hu ◽  
Jiaxin Shen ◽  
Ruolin Chen

This paper investigates the impact of corporate social responsibility on the idiosyncratic risk of enterprises. We find that corporate social responsibility is negatively associated with the idiosyncratic risk of enterprises. This association is robust to a series of robustness checks, including the use of alternative indicators, exclusion of the effect of multicollinearity, and the addition of missing variables to address endogeneity concerns. Further analyses show that the impact of corporate social responsibility on idiosyncratic risk is more significant in state-owned enterprises, firms with poor corporate governance or low growth. Our findings support the notion that corporate social responsibility appears to improve corporate performance.


2020 ◽  
Author(s):  
Morio YAMAUCHI ◽  
Kazuhisa NAKANO ◽  
Yoshiya TANAKA ◽  
Keiichi HORIO

In this article, we implemented a regression model and conducted experiments for predicting disease activity using data from 1929 rheumatoid arthritis patients to assist in the selection of biologics for rheumatoid arthritis. On modelling, the missing variables in the data were completed by three different methods, mean value, self-organizing map and random value. Experimental results showed that the prediction error of the regression model was large regardless of the missing completion method, making it difficult to predict the prognosis of rheumatoid arthritis patients.


2020 ◽  
Vol 16 (11) ◽  
pp. e1007450
Author(s):  
Pei-Yau Lung ◽  
Dongrui Zhong ◽  
Xiaodong Pang ◽  
Yan Li ◽  
Jinfeng Zhang

Reusability is part of the FAIR data principle, which aims to make data Findable, Accessible, Interoperable, and Reusable. One of the current efforts to increase the reusability of public genomics data has been to focus on the inclusion of quality metadata associated with the data. When necessary metadata are missing, most researchers will consider the data useless. In this study, we developed a framework to predict the missing metadata of gene expression datasets to maximize their reusability. We found that when using predicted data to conduct other analyses, it is not optimal to use all the predicted data. Instead, one should only use the subset of data, which can be predicted accurately. We proposed a new metric called Proportion of Cases Accurately Predicted (PCAP), which is optimized in our specifically-designed machine learning pipeline. The new approach performed better than pipelines using commonly used metrics such as F1-score in terms of maximizing the reusability of data with missing values. We also found that different variables might need to be predicted using different machine learning methods and/or different data processing protocols. Using differential gene expression analysis as an example, we showed that when missing variables are accurately predicted, the corresponding gene expression data can be reliably used in downstream analyses.


2020 ◽  
Vol 41 (Supplement_2) ◽  
Author(s):  
P Codina ◽  
M De Antonio ◽  
E Santiago-Vacas ◽  
M Domingo ◽  
E Zamora ◽  
...  

Abstract Background Heart failure (HF) contemporary management has significantly improved over the past two decades leading to better survival. How application of the contemporary HF management guidelines affects the risk of death estimated by available web-based risk scores is not elucidated. Objective To assess changes in mortality risk prediction after a after a 12-month management period in a multidisciplinary HF Clinic. Methods Out of 1,689 consecutive patients with HF admitted at our ambulatory HF Clinic from May 2006 to November 2018, those who completed one year follow-up were considered for the study. Patients without NTproBNP measurement or with more than 3 missing variables for risk estimation were excluded. Three contemporary web-based HF risk scores were evaluated: MAGGIC-HF, Seattle HF Model (SHFM) and the Barcelona Bio-HF Calculator containing NTproBNP (BCN Bio-HF). Risk of all-cause death at one year and at 3 years were calculated at baseline and re-evaluated after 12-month management in a multidsisciplinary HF Clinic. Wilcoxon paired data test was used to compare changes in mortality risk estimation over time and test equality of matched pairs for comparing estimated change among tools. 442 patients used to derive the Barcelona Bio-HF Calculator were excluded for discrimination purposes. Results 1,157 patients were included (age 65.7±12.7 years, 70.4% men). A significant reduction in mortality risk estimation was observed with the three HF risk scores evaluated at 12-months (Table). The BCN Bio-HF model showed significantly different changes in risk estimation, fact that indeed was partnered with numerically better discrimination. AUC at 1 and 3 years, respectively, were: BCN Bio-HF (0.773 and 0.775), MAGGIC HF (0.686 and 0.748) and SHFM (0.773 and 0.739). Conclusions The three web-based risk scores evaluated showed a significant reduction in mortality risk estimation after 12 month management in a multidisciplinary HF Clinic. The BCN Bio-HF score showed higher reduction in estimated risk, together with better discrimination, likely because it incorporates contemporary treatment and use of biomarkers. Funding Acknowledgement Type of funding source: None


2020 ◽  
Author(s):  
David N. Fisman ◽  
Amy L. Greer ◽  
Ashleigh R. Tuite

AbstractBackgroundSARS-CoV-2 is currently causing a high mortality global pandemic. However, the clinical spectrum of disease caused by this virus is broad, ranging from asymptomatic infection to cytokine storm with organ failure and death. Risk stratification of individuals with COVID-19 would be desirable for management, prioritization for trial enrollment, and risk stratification. We sought to develop a prediction rule for mortality due to COVID-19 in individuals with diagnosed infection in Ontario, Canada.MethodsData from Ontario’s provincial iPHIS system were extracted for the period from January 23 to May 15, 2020. Both logistic regression-based prediction rules, and a rule derived using a Cox proportional hazards model, were developed in half the study and validated in remaining patients. Sensitivity analyses were performed with varying approaches to missing data.Results21,922 COVID-19 cases were reported. Individuals assigned to the derivation and validation sets were broadly similar. Age and comorbidities (notably diabetes, renal disease and immune compromise) were strong predictors of mortality. Four point-based prediction rules were derived (base case, smoking excluded as a predictor, long-term care excluded as a predictor, and Cox model based). All rules displayed excellent discrimination (AUC for all rules > 0.92) and calibration (both by graphical inspection and P > 0.50 by Hosmer-Lemeshow test) in the derivation set. All rules performed well in the validation set and were robust to random replacement of missing variables, and to the assumption that missing variables indicated absence of the comorbidity or characteristic in question.ConclusionsWe were able to use a public health case-management data system to derive and internally validate four accurate, well-calibrated and robust clinical prediction rules for COVID-19 mortality in Ontario, Canada. While these rules need external validation, they may be a useful tool for clinical management, risk stratification, and clinical trials.


Sign in / Sign up

Export Citation Format

Share Document