scholarly journals Improving Punctuation Restoration for Speech Transcripts via External Data

Author(s):  
Xue-Yong Fu ◽  
Cheng Chen ◽  
Md Tahmid Rahman Laskar ◽  
Shashi Bhushan ◽  
Simon Corston-Oliver
Author(s):  
E. D. Avedyan ◽  
I. V. Voronkov

Summary: the article proposes new software platform for automating the processes of preprocessing and marking up datasets with the aim of further solving analytical problems such as image classification and processing textual and parametric information using neural network technologies. The software platform uses modern technologies and combines a large number of methods in the form of a modular platform, which can be supplemented as the tasks of analytical data processing become more complicated. The need to develop such a software platform is dictated primarily by the fact that, given the current level of data volume growth, the actual transition to deep data analytics remains unattainable without such software platforms, since confidentiality, access to information and the use of external data processing resources are required.


2007 ◽  
Vol 2 (3) ◽  
pp. 3-27 ◽  
Author(s):  
Dominik Lambrigger ◽  
Pavel Shevchenko ◽  
Mario Wüthrich

2014 ◽  
Vol 9 (4) ◽  
pp. 83-103 ◽  
Author(s):  
Giuseppe Galloppo ◽  
Daniele Previati
Keyword(s):  

Sensors ◽  
2021 ◽  
Vol 21 (14) ◽  
pp. 4638
Author(s):  
Bummo Koo ◽  
Jongman Kim ◽  
Yejin Nam ◽  
Youngho Kim

In this study, algorithms to detect post-falls were evaluated using the cross-dataset according to feature vectors (time-series and discrete data), classifiers (ANN and SVM), and four different processing conditions (normalization, equalization, increase in the number of training data, and additional training with external data). Three-axis acceleration and angular velocity data were obtained from 30 healthy male subjects by attaching an IMU to the middle of the left and right anterior superior iliac spines (ASIS). Internal and external tests were performed using our lab dataset and SisFall public dataset, respectively. The results showed that ANN and SVM were suitable for the time-series and discrete data, respectively. The classification performance generally decreased, and thus, specific feature vectors from the raw data were necessary when untrained motions were tested using a public dataset. Normalization made SVM and ANN more and less effective, respectively. Equalization increased the sensitivity, even though it did not improve the overall performance. The increase in the number of training data also improved the classification performance. Machine learning was vulnerable to untrained motions, and data of various movements were needed for the training.


BMC Cancer ◽  
2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Yifan Feng ◽  
Ye Wang ◽  
Yangqin Xie ◽  
Shuwei Wu ◽  
Yuyang Li ◽  
...  

Abstract Background To explore the factors that affect the prognosis of overall survival (OS) and cancer-specific survival (CSS) of patients with stage IIIC1 cervical cancer and establish nomogram models to predict this prognosis. Methods Data from patients in the Surveil-lance, Epidemiology, and End Results (SEER) programme meeting the inclusion criteria were classified into a training group, and validation data were obtained from the First Affiliated Hospital of Anhui Medical University from 2010 to 2019. The incidence, Kaplan-Meier curves, OS and CSS of patients with stage IIIC1 cervical cancer in the training group were evaluated. Nomograms were established according to the results of univariate and multivariate Cox regression models. Harrell’s C-index, calibration plots, receiver operating characteristic (ROC) curves and decision-curve analysis (DCA) were calculated to validate the prediction models. Results The incidence of pelvic lymph node metastasis, a high-risk factor for the prognosis of cervical cancer, decreased slightly over time. Eight independent prognostic variables were identified for OS, including age, race, marriage status, histology, extension range, tumour size, radiotherapy and surgery, but only seven were identified for CSS, with marriage status excluded. Nomograms of OS and CSS were established based on the results. The C-indexes for the nomograms of OS and CSS were 0.687 and 0.692, respectively, using random sampling of SEER data sets and 0.701 and 0.735, respectively, using random sampling of external data sets. The AUCs for the nomogram of OS were 0.708 and 0.705 for the SEER data sets and 0.750 and 0.750 for the external data sets, respectively. In addition, AUCs of 0.707 and 0.709 were obtained for the nomogram of CSS when validated using SEER data sets, and 0.788 and 0.785 when validated using external data sets. Calibration plots for the nomograms were almost identical to the actual observations. The DCA also indicated the value of the two models. Conclusions Eight independent prognostic variables were identified for OS. The same factors predicted CSS, with the exception of the marriage status. Both OS and CSS nomograms had good predictive and clinical application value after validation. Notably, tumour size had the largest contribution to the OS and CSS nomograms.


2021 ◽  
Vol 6 (1) ◽  
Author(s):  
Peter W. Eide ◽  
Seyed H. Moosavi ◽  
Ina A. Eilertsen ◽  
Tuva H. Brunsell ◽  
Jonas Langerud ◽  
...  

AbstractGene expression-based subtypes of colorectal cancer have clinical relevance, but the representativeness of primary tumors and the consensus molecular subtypes (CMS) for metastatic cancers is not well known. We investigated the metastatic heterogeneity of CMS. The best approach to subtype translation was delineated by comparisons of transcriptomic profiles from 317 primary tumors and 295 liver metastases, including multi-metastatic samples from 45 patients and 14 primary-metastasis sets. Associations were validated in an external data set (n = 618). Projection of metastases onto principal components of primary tumors showed that metastases were depleted of CMS1-immune/CMS3-metabolic signals, enriched for CMS4-mesenchymal/stromal signals, and heavily influenced by the microenvironment. The tailored CMS classifier (available in an updated version of the R package CMScaller) therefore implemented an approach to regress out the liver tissue background. The majority of classified metastases were either CMS2 or CMS4. Nonetheless, subtype switching and inter-metastatic CMS heterogeneity were frequent and increased with sampling intensity. Poor-prognostic value of CMS1/3 metastases was consistent in the context of intra-patient tumor heterogeneity.


Sign in / Sign up

Export Citation Format

Share Document