Improving Punctuation Restoration for Speech Transcripts via External Data

Summary: the article proposes new software platform for automating the processes of preprocessing and marking up datasets with the aim of further solving analytical problems such as image classification and processing textual and parametric information using neural network technologies. The software platform uses modern technologies and combines a large number of methods in the form of a modular platform, which can be supplemented as the tasks of analytical data processing become more complicated. The need to develop such a software platform is dictated primarily by the fact that, given the current level of data volume growth, the actual transition to deep data analytics remains unattainable without such software platforms, since confidentiality, access to information and the use of external data processing resources are required.

Download Full-text

The quantification of operational risk using internal data, relevant external data and expert opinion

The Journal of Operational Risk ◽

10.21314/jop.2007.030 ◽

2007 ◽

Vol 2 (3) ◽

pp. 3-27 ◽

Cited By ~ 44

Author(s):

Dominik Lambrigger ◽

Pavel Shevchenko ◽

Mario Wüthrich

Keyword(s):

Expert Opinion ◽

Operational Risk ◽

External Data

Download Full-text

A review of methods for combining internal and external data

The Journal of Operational Risk ◽

10.21314/jop.2014.135 ◽

2014 ◽

Vol 9 (4) ◽

pp. 83-103 ◽

Cited By ~ 3

Author(s):

Giuseppe Galloppo ◽

Daniele Previati

Keyword(s):

External Data

Download Full-text

Internal Data, External Data and Consortium Data - How to Mix Them for Measuring Operational Risk

SSRN Electronic Journal ◽

10.2139/ssrn.1032529 ◽

2002 ◽

Cited By ~ 8

Author(s):

Nicolas Baud ◽

Antoine Frachot ◽

Thierry Roncalli

Keyword(s):

Operational Risk ◽

External Data

Download Full-text

The Performance of Post-Fall Detection Using the Cross-Dataset: Feature Vectors, Classifiers and Processing Conditions

Sensors ◽

10.3390/s21144638 ◽

2021 ◽

Vol 21 (14) ◽

pp. 4638

Author(s):

Bummo Koo ◽

Jongman Kim ◽

Yejin Nam ◽

Youngho Kim

Keyword(s):

Time Series ◽

Fall Detection ◽

Discrete Data ◽

Classification Performance ◽

Training Data ◽

Processing Conditions ◽

Feature Vectors ◽

External Data ◽

Public Dataset ◽

The Cross

In this study, algorithms to detect post-falls were evaluated using the cross-dataset according to feature vectors (time-series and discrete data), classifiers (ANN and SVM), and four different processing conditions (normalization, equalization, increase in the number of training data, and additional training with external data). Three-axis acceleration and angular velocity data were obtained from 30 healthy male subjects by attaching an IMU to the middle of the left and right anterior superior iliac spines (ASIS). Internal and external tests were performed using our lab dataset and SisFall public dataset, respectively. The results showed that ANN and SVM were suitable for the time-series and discrete data, respectively. The classification performance generally decreased, and thus, specific feature vectors from the raw data were necessary when untrained motions were tested using a public dataset. Normalization made SVM and ANN more and less effective, respectively. Equalization increased the sensitivity, even though it did not improve the overall performance. The increase in the number of training data also improved the classification performance. Machine learning was vulnerable to untrained motions, and data of various movements were needed for the training.

Download Full-text

Nomograms predicting the overall survival and cancer-specific survival of patients with stage IIIC1 cervical cancer

BMC Cancer ◽

10.1186/s12885-021-08209-5 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Yifan Feng ◽

Ye Wang ◽

Yangqin Xie ◽

Shuwei Wu ◽

Yuyang Li ◽

...

Keyword(s):

Cervical Cancer ◽

Overall Survival ◽

Random Sampling ◽

Tumour Size ◽

Training Group ◽

Data Sets ◽

Prognostic Variables ◽

Cancer Specific Survival ◽

External Data ◽

Seer Data

Abstract Background To explore the factors that affect the prognosis of overall survival (OS) and cancer-specific survival (CSS) of patients with stage IIIC1 cervical cancer and establish nomogram models to predict this prognosis. Methods Data from patients in the Surveil-lance, Epidemiology, and End Results (SEER) programme meeting the inclusion criteria were classified into a training group, and validation data were obtained from the First Affiliated Hospital of Anhui Medical University from 2010 to 2019. The incidence, Kaplan-Meier curves, OS and CSS of patients with stage IIIC1 cervical cancer in the training group were evaluated. Nomograms were established according to the results of univariate and multivariate Cox regression models. Harrell’s C-index, calibration plots, receiver operating characteristic (ROC) curves and decision-curve analysis (DCA) were calculated to validate the prediction models. Results The incidence of pelvic lymph node metastasis, a high-risk factor for the prognosis of cervical cancer, decreased slightly over time. Eight independent prognostic variables were identified for OS, including age, race, marriage status, histology, extension range, tumour size, radiotherapy and surgery, but only seven were identified for CSS, with marriage status excluded. Nomograms of OS and CSS were established based on the results. The C-indexes for the nomograms of OS and CSS were 0.687 and 0.692, respectively, using random sampling of SEER data sets and 0.701 and 0.735, respectively, using random sampling of external data sets. The AUCs for the nomogram of OS were 0.708 and 0.705 for the SEER data sets and 0.750 and 0.750 for the external data sets, respectively. In addition, AUCs of 0.707 and 0.709 were obtained for the nomogram of CSS when validated using SEER data sets, and 0.788 and 0.785 when validated using external data sets. Calibration plots for the nomograms were almost identical to the actual observations. The DCA also indicated the value of the two models. Conclusions Eight independent prognostic variables were identified for OS. The same factors predicted CSS, with the exception of the marriage status. Both OS and CSS nomograms had good predictive and clinical application value after validation. Notably, tumour size had the largest contribution to the OS and CSS nomograms.

Download Full-text

External Data Access And Indexing In AsterixDB

Proceedings of the 24th ACM International on Conference on Information and Knowledge Management - CIKM '15 ◽

10.1145/2806416.2806428 ◽

2015 ◽

Cited By ~ 14

Author(s):

Abdullah A. Alamoudi ◽

Raman Grover ◽

Michael J. Carey ◽

Vinayak Borkar

Keyword(s):

Data Access ◽

External Data

Download Full-text

Mitigating Study Power Loss Caused by Clinical Trial Disruptions Due to the COVID-19 Pandemic: Leveraging External Data via Propensity Score-Integrated Approaches

Statistics in Biopharmaceutical Research ◽

10.1080/19466315.2020.1860813 ◽

2020 ◽

pp. 1-9

Author(s):

Heng Li ◽

Wei-Chen Chen ◽

Nelson Lu ◽

Changhong Song ◽

Chenguang Wang ◽

...

Keyword(s):

Clinical Trial ◽

Propensity Score ◽

Power Loss ◽

External Data ◽

Study Power ◽

Integrated Approaches

Download Full-text

Metastatic heterogeneity of the consensus molecular subtypes of colorectal cancer

npj Genomic Medicine ◽

10.1038/s41525-021-00223-7 ◽

2021 ◽

Vol 6 (1) ◽

Author(s):

Peter W. Eide ◽

Seyed H. Moosavi ◽

Ina A. Eilertsen ◽

Tuva H. Brunsell ◽

Jonas Langerud ◽

...

Keyword(s):

Gene Expression ◽

Colorectal Cancer ◽

Principal Components ◽

Prognostic Value ◽

Tumor Heterogeneity ◽

Molecular Subtypes ◽

R Package ◽

Data Set ◽

Primary Tumors ◽

External Data

AbstractGene expression-based subtypes of colorectal cancer have clinical relevance, but the representativeness of primary tumors and the consensus molecular subtypes (CMS) for metastatic cancers is not well known. We investigated the metastatic heterogeneity of CMS. The best approach to subtype translation was delineated by comparisons of transcriptomic profiles from 317 primary tumors and 295 liver metastases, including multi-metastatic samples from 45 patients and 14 primary-metastasis sets. Associations were validated in an external data set (n = 618). Projection of metastases onto principal components of primary tumors showed that metastases were depleted of CMS1-immune/CMS3-metabolic signals, enriched for CMS4-mesenchymal/stromal signals, and heavily influenced by the microenvironment. The tailored CMS classifier (available in an updated version of the R package CMScaller) therefore implemented an approach to regress out the liver tissue background. The majority of classified metastases were either CMS2 or CMS4. Nonetheless, subtype switching and inter-metastatic CMS heterogeneity were frequent and increased with sampling intensity. Poor-prognostic value of CMS1/3 metastases was consistent in the context of intra-patient tumor heterogeneity.

Download Full-text