evaluation metric
Recently Published Documents


TOTAL DOCUMENTS

304
(FIVE YEARS 145)

H-INDEX

17
(FIVE YEARS 5)

2022 ◽  
Author(s):  
Polianna Delfino-Pereira ◽  
Cláudio Moisés Valiense De Andrade ◽  
Virginia Mara Reis Gomes ◽  
Maria Clara Pontello Barbosa Lima ◽  
Maira Viana Rego Souza-Silva ◽  
...  

Abstract The majority prognostic scores proposed for early assessment of coronavirus disease 19 (COVID-19) patients are bounded by methodological flaws. Our group recently developed a new risk score - ABC2SPH - using traditional statistical methods (least absolute shrinkage and selection operator logistic regression - LASSO). In this article, we provide a thorough comparative study between modern machine learning (ML) methods and state-of-the-art statistical methods, represented by ABC2SPH, in the task of predicting in-hospital mortality in COVID-19 patients using data upon hospital admission. We overcome methodological and technological issues found in previous similar studies, while exploring a large sample (5,032 patients). Additionally, we take advantage of a large and diverse set of methods and investigate the effectiveness of applying meta-learning, more specifically Stacking, in order to combine the methods' strengths and overcome their limitations. In our experiments, our Stacking solutions improved over previous state-of-the-art by more than 26% in predicting death, achieving 87.1% of AUROC and MacroF1 of 73.9%. We also investigated issues related to the interpretability and reliability of the predictions produced by the most effective ML methods. Finally, we discuss the adequacy of AUROC as an evaluation metric for highly imbalanced and skewed datasets commonly found in health-related problems.


2022 ◽  
Vol 22 (1) ◽  
Author(s):  
Shuo Zhang ◽  
Jing Wang ◽  
Lulu Pei ◽  
Kai Liu ◽  
Yuan Gao ◽  
...  

Abstract Background TOAST subtype classification is important for diagnosis and research of ischemic stroke. Limited by experience of neurologist and time-consuming manual adjudication, it is a big challenge to finish TOAST classification effectively. We propose a novel active deep learning architecture to classify TOAST. Methods To simulate the diagnosis process of neurologists, we drop the valueless features by XGB algorithm and rank the remaining ones. Utilizing active learning framework, we propose a novel causal CNN, in which it combines with a mixed active selection criterion to optimize the uncertainty of samples adaptively. Meanwhile, KL-focal loss derived from the enhancement of Focal loss by KL regularization is introduced to accelerate the iterative fine-tuning of the model. Results To evaluate the proposed method, we construct a dataset which consists of totally 2310 patients. In a series of sequential experiments, we verify the effectiveness of each contribution by different evaluation metrics. Experimental results show that the proposed method achieves competitive results on each evaluation metric. In this task, the improvement of AUC is the most obvious, reaching 77.4. Conclusions We construct a backbone causal CNN to simulate the neurologist process of that could enhance the internal interpretability. The research on clinical data also indicates the potential application value of this model in stroke medicine. Future work we would consider various data types and more comprehensive patient types to achieve fully automated subtype classification.


Author(s):  
Xinrong Zhang ◽  
Yanghao Li ◽  
Yuxing Han ◽  
Jiangtao Wen

Video editing is a high-required job, for it requires skilled artists or workers equipped with plentiful physical strength and multidisciplinary knowledge, such as cinematography, aesthetics. Thus gradually, more and more researches focus on proposing semi-automatical and even fully automatical solutions to reduce workloads. Since those conventional methods are usually designed to follow some simple guidelines, they lack flexibility and capability to learn complex ones. Fortunately, the advances of computer vision and machine learning make up the shortages of traditional approaches and make AI editing feasible. There is no survey to conclude those emerging researches yet. This paper summaries the development history of automatic video editing, and especially the applications of AI in partial and full workflows. We emphasizes video editing and discuss related works from multiple aspects: modality, type of input videos, methology, optimization, dataset, and evaluation metric. Besides, we also summarize the progresses in image editing domain, i.e., style transferring, retargeting, and colorization, and seek for the possibility to transfer those techniques to video domain. Finally, we give a brief conclusion about this survey and explore some open problems.


Author(s):  
Ahrii Kim ◽  
Jinhyun Kim

SacreBLEU, by incorporating a text normalizing step in the pipeline, has been well-received as an automatic evaluation metric in recent years. With agglutinative languages such as Korean, however, the metric cannot provide a conceivable result without the help of customized pre-tokenization. In this regard, this paper endeavors to examine the influence of diversified pre-tokenization schemes –word, morpheme, character, and subword– on the aforementioned metric by performing a meta-evaluation with manually-constructed into-Korean human evaluation data. Our empirical study demonstrates that the correlation of SacreBLEU (to human judgment) fluctuates consistently by the token type. The reliability of the metric even deteriorates due to some tokenization, and MeCab is not an exception. Guiding through the proper usage of tokenizer for each metric, we stress the significance of a character level and the insignificance of a Jamo level in MT evaluation.


2022 ◽  
pp. 206-218
Author(s):  
Bhawna Dhupia ◽  
M. Usha Rani

Power demand forecasting is one of the fields which is gaining popularity for researchers. Although machine learning models are being used for prediction in various fields, they need to upgrade to increase accuracy and stability. With the rapid development of AI technology, deep learning (DL) is being recommended by many authors in their studies. The core objective of the chapter is to employ the smart meter's data for energy forecasting in the industrial sector. In this chapter, the author will be implementing popular power demand forecasting models from machine learning and compare the results of the best-fitted machine learning (ML) model with a deep learning model, long short-term memory based on RNN (LSTM-RNN). RNN model has vanishing gradient issue, which slows down the training in the early layers of the network. LSTM-RNN is the advanced model which take care of vanishing gradient problem. The performance evaluation metric to compare the superiority of the model will be R2, mean square error (MSE), root means square error (RMSE), and mean absolute error (MAE).


2021 ◽  
Author(s):  
Seyed Shayan Sajjadinia ◽  
Bruno Carpentieri ◽  
Gerhard A. Holzapfel

Numerical simulation is widely used to study physical systems, although it can be computationally too expensive. To counter this limitation, a surrogate may be used, which is a high-performance model that replaces the main numerical model by using, e.g., a machine learning (ML) regressor that is trained on a previously generated subset of possible inputs and outputs of the numerical model. In this context, inspired by the definition of the mean squared error (MSE) metric, we introduce the pointwise MSE (PMSE) metric, which can give a better insight into the performance of such ML models over the test set, by focusing on every point that forms the physical system. To show the merits of the metric, we will create a dataset of a physics problem that will be used to train an ML surrogate, which will then be evaluated by the metrics. In our experiment, the PMSE contour demonstrates how the model learns the physics in different model regions and, in particular, the correlation between the characteristics of the numerical model and the learning progress can be observed. We therefore conclude that this simple and efficient metric can provide complementary and potentially interpretable information regarding the performance and functionality of the surrogate.


2021 ◽  
Author(s):  
Hannes Westermann ◽  
Jaromír Šavelka ◽  
Vern R. Walker ◽  
Kevin D. Ashley ◽  
Karim Benyekhlef

Machine learning research typically starts with a fixed data set created early in the process. The focus of the experiments is finding a model and training procedure that result in the best possible performance in terms of some selected evaluation metric. This paper explores how changes in a data set influence the measured performance of a model. Using three publicly available data sets from the legal domain, we investigate how changes to their size, the train/test splits, and the human labelling accuracy impact the performance of a trained deep learning classifier. Our experiments suggest that analyzing how data set properties affect performance can be an important step in improving the results of trained classifiers, and leads to better understanding of the obtained results.


Author(s):  
Geng Li ◽  
Huiling Liu ◽  
Gaojian Huang ◽  
Xingwang Li ◽  
Bichu Raj ◽  
...  

AbstractThe future sixth generation (6G) is going to face the significant challenges of massive connections and green communication. Recently, reconfigurable intelligent surfaces (RIS) and non-orthogonal multiple access (NOMA) have been proposed as two key technologies to solve the above problems. Motivated by this fact, we consider a downlink RIS-aided NOMA system, where the source aims to communicate with the two NOMA users via RIS. Considering future network supporting real-time service, we investigate the system performance with the view of effective capacity (EC), which is an important evaluation metric of delay sensitive systems. Specifically, we derive the analytical expressions of the EC of the near and far users. To obtain more useful insights, we deduce the analytical approximation expressions of the EC in the low signal-to-noise-ratio approximation by utilizing Taylor expansion. Moreover, we provide the results of orthogonal multiple access (OMA) for the purpose of comparison. It is found that (1) The number of RIS components and the transmission power of the source have important effects on the performance of the considered system; (2) Compared with OMA, NOMA system has higher EC due to the short transmission time.


Sensors ◽  
2021 ◽  
Vol 21 (23) ◽  
pp. 7923
Author(s):  
Dae-Yeol Kim ◽  
Kwangkee Lee ◽  
Chae-Bong Sohn

In general, facial image-based remote photoplethysmography (rPPG) methods use color-based and patch-based region-of-interest (ROI) selection methods to estimate the blood volume pulse (BVP) and beats per minute (BPM). Anatomically, the thickness of the skin is not uniform in all areas of the face, so the same diffuse reflection information cannot be obtained in each area. In recent years, various studies have presented experimental results for their ROIs but did not provide a valid rationale for the proposed regions. In this paper, to see the effect of skin thickness on the accuracy of the rPPG algorithm, we conducted an experiment on 39 anatomically divided facial regions. Experiments were performed with seven algorithms (CHROM, GREEN, ICA, PBV, POS, SSR, and LGI) using the UBFC-rPPG and LGI-PPGI datasets considering 29 selected regions and two adjusted regions out of 39 anatomically classified regions. We proposed a BVP similarity evaluation metric to find a region with high accuracy. We conducted additional experiments on the TOP-5 regions and BOT-5 regions and presented the validity of the proposed ROIs. The TOP-5 regions showed relatively high accuracy compared to the previous algorithm’s ROI, suggesting that the anatomical characteristics of the ROI should be considered when developing a facial image-based rPPG algorithm.


Energies ◽  
2021 ◽  
Vol 14 (22) ◽  
pp. 7794
Author(s):  
Gergo Barta ◽  
Benedek Pasztor ◽  
Venkat Prava

The goal of this paper is to optimally combine day-ahead solar and demand forecasts for the optimal battery schedule of a hybrid solar and battery farm connected to a distribution station. The objective is to achieve the maximum daily peak load reduction and charge battery with maximum solar photovoltaic energy. The innovative part of the paper lies in the treatment for the errors in solar and demand forecasts to then optimize the battery scheduler. To test the effectiveness of the proposed methodology, it was applied in the data science challenge Presumed Open Data 2021. With the historical Numerical Weather Prediction (NWP) data, solar power plant generation and distribution-level demand data provided, the proposed methodology was tested for four different seasons. The evaluation metric used is the peak reduction score (defined in the paper), and our approach has improved this KPI from 82.84 to 89.83. The solution developed achieved a final place of 5th (out of 55 teams) in the challenge.


Sign in / Sign up

Export Citation Format

Share Document