Study on Unbalanced Binary Classification with Unknown Misclassification Costs

Author(s):  
J. Gao ◽  
L. Gong ◽  
J. Y. Wang ◽  
Z. C. Mo
2021 ◽  
Author(s):  
Philipp Sterner ◽  
David Goretzko ◽  
Florian Pargent

Psychology has seen an increase in machine learning (ML) methods. In many applications, observations are classified into one of two groups (binary classification). Off-the-shelf classification algorithms assume that the costs of a misclassification (false-positive or false-negative) are equal. Because this is often not reasonable (e.g., in clinical psychology), cost-sensitive learning (CSL) methods can take different cost ratios into account. We present the mathematical foundations and introduce a taxonomy of the most commonly used CSL methods, before demonstrating their application and usefulness on psychological data, i.e., the drug consumption dataset ($N = 1885$) from the UCI Machine Learning Repository. In our example, all demonstrated CSL methods noticeably reduce mean misclassification costs compared to regular ML algorithms. We discuss the necessity for researchers to perform small benchmarks of CSL methods for their own practical application. Thus, our open materials provide R code, demonstrating how CSL methods can be applied within the mlr3 framework (https://osf.io/cvks7/).


2004 ◽  
Author(s):  
Lyle E. Bourne ◽  
Alice F. Healy ◽  
James A. Kole ◽  
William D. Raymond

Author(s):  
P.L. Nikolaev

This article deals with method of binary classification of images with small text on them Classification is based on the fact that the text can have 2 directions – it can be positioned horizontally and read from left to right or it can be turned 180 degrees so the image must be rotated to read the sign. This type of text can be found on the covers of a variety of books, so in case of recognizing the covers, it is necessary first to determine the direction of the text before we will directly recognize it. The article suggests the development of a deep neural network for determination of the text position in the context of book covers recognizing. The results of training and testing of a convolutional neural network on synthetic data as well as the examples of the network functioning on the real data are presented.


Author(s):  
Валентина Викторовна Дмитриева ◽  
Николай Николаевич Тупицын ◽  
Евгений Валерьевич Поляков ◽  
Софья Сергеевна Денисюк

Применение методов и средств цифровой обработки изображений при распознавании типов клеток крови и костного мозга для повышения качества диагностики острых лейкозов является актуальной научно-технической задачей, отвечающей стратегии развития технологий искусственного интеллекта в медицине. В работе предложен подход к мультиклассификации клеток костного мозга при диагностике острых лейкозов и минимальной остаточной болезни. Для проведения экспериментальных исследований сформирована выборка из 3284 изображений клеток, представленных Лабораторией гемопоэза Национального медицинского исследовательского центра онкологии им. Н.Н. Блохина. Предложенный подход к мультиклассификации клеток костного мозга основан на бинарной модели классификации для каждого из исследуемых классов относительно остальных. В рассматриваемой работе бинарная классификация выполняется методом опорных векторов. Метод мультиклассификации был программно реализован с применением интерпретатора Python 3.6.9. Входными данными программы служат файлы формата *.csv с таблицами морфологических, цветовых, текстурных признаков для каждой из клеток используемой выборки. В выборке представлено девять типов клеток костного мозга. Выходными данными программы мультиклассификации являются значения точности классификации на тестовой выборке, которые отражают совпадение прогнозируемого класса клетки с фактическим (верифицированным) классом клетки. “Эксперимент показал следующие результаты: точность мультиклассификации рассматриваемых типов клеток в среднем составила: 87% на тестовом наборе, 88% на обучающем наборе данных. Проведенное исследование является предварительным. В дальнейшем планируется увеличить число классов клеток, объем выборок различных типов клеток и с уточнением результатов мультиклассификации The use of methods and means of digital image processing in the recognition of types of blood cells and bone marrow to improve the quality of diagnosis of acute leukemia is an urgent scientific and technical task that meets the strategy for the development of artificial intelligence technologies in medicine. The paper proposes an approach to the multiclassification of bone marrow cells in the diagnosis of acute leukemia and minimal residual disease. For experimental studies, a sample of 3284 images of cells was formed, submitted by the Hematopoiesis Laboratory of the National Medical Research Center of Oncology named after V.I. N.N. Blokhin. The proposed approach to the multiclassification of bone marrow cells is based on a binary classification model for each of the studied classes relative to the others. In the work under consideration, binary classification is performed by the support vector machine. The multiclassification method was implemented programmatically using the Python 3.6.9 interpreter. The input data of the program are * .csv files with tables of morphological, color, texture features for each of the cells of the sample used. The sample contains nine types of bone marrow cells. The output data of the multiclassification program are the classification accuracy values on the test sample, which reflect the coincidence of the predicted cell class with the actual (verified) cell class. “The experiment showed the following results: the accuracy of multiclassification of the considered types of cells on average was: 87% on the test set, 88% on the training data set. This study is preliminary. In the future, it is planned to increase the number of classes of cells, the volume of samples of various types of cells and with the refinement of the results of multiclassification


2020 ◽  
Vol 14 ◽  
Author(s):  
Lahari Tipirneni ◽  
Rizwan Patan

Abstract:: Millions of deaths all over the world are caused by breast cancer every year. It has become the most common type of cancer in women. Early detection will help in better prognosis and increases the chance of survival. Automating the classification using Computer-Aided Diagnosis (CAD) systems can make the diagnosis less prone to errors. Multi class classification and Binary classification of breast cancer is a challenging problem. Convolutional neural network architectures extract specific feature descriptors from images, which cannot represent different types of breast cancer. This leads to false positives in classification, which is undesirable in disease diagnosis. The current paper presents an ensemble Convolutional neural network for multi class classification and Binary classification of breast cancer. The feature descriptors from each network are combined to produce the final classification. In this paper, histopathological images are taken from publicly available BreakHis dataset and classified between 8 classes. The proposed ensemble model can perform better when compared to the methods proposed in the literature. The results showed that the proposed model could be a viable approach for breast cancer classification.


Sensors ◽  
2020 ◽  
Vol 21 (1) ◽  
pp. 52
Author(s):  
Tianyi Zhang ◽  
Abdallah El Ali ◽  
Chen Wang ◽  
Alan Hanjalic ◽  
Pablo Cesar

Recognizing user emotions while they watch short-form videos anytime and anywhere is essential for facilitating video content customization and personalization. However, most works either classify a single emotion per video stimuli, or are restricted to static, desktop environments. To address this, we propose a correlation-based emotion recognition algorithm (CorrNet) to recognize the valence and arousal (V-A) of each instance (fine-grained segment of signals) using only wearable, physiological signals (e.g., electrodermal activity, heart rate). CorrNet takes advantage of features both inside each instance (intra-modality features) and between different instances for the same video stimuli (correlation-based features). We first test our approach on an indoor-desktop affect dataset (CASE), and thereafter on an outdoor-mobile affect dataset (MERCA) which we collected using a smart wristband and wearable eyetracker. Results show that for subject-independent binary classification (high-low), CorrNet yields promising recognition accuracies: 76.37% and 74.03% for V-A on CASE, and 70.29% and 68.15% for V-A on MERCA. Our findings show: (1) instance segment lengths between 1–4 s result in highest recognition accuracies (2) accuracies between laboratory-grade and wearable sensors are comparable, even under low sampling rates (≤64 Hz) (3) large amounts of neutral V-A labels, an artifact of continuous affect annotation, result in varied recognition performance.


Cancers ◽  
2021 ◽  
Vol 13 (9) ◽  
pp. 2133
Author(s):  
Francisco O. Cortés-Ibañez ◽  
Sunil Belur Nagaraj ◽  
Ludo Cornelissen ◽  
Gerjan J. Navis ◽  
Bert van der Vegt ◽  
...  

Cancer incidence is rising, and accurate prediction of incident cancers could be relevant to understanding and reducing cancer incidence. The aim of this study was to develop machine learning (ML) models that could predict an incident diagnosis of cancer. Participants without any history of cancer within the Lifelines population-based cohort were followed for a median of 7 years. Data were available for 116,188 cancer-free participants and 4232 incident cancer cases. At baseline, socioeconomic, lifestyle, and clinical variables were assessed. The main outcome was an incident cancer during follow-up (excluding skin cancer), based on linkage with the national pathology registry. The performance of three ML algorithms was evaluated using supervised binary classification to identify incident cancers among participants. Elastic net regularization and Gini index were used for variables selection. An overall area under the receiver operator curve (AUC) <0.75 was obtained, the highest AUC value was for prostate cancer (random forest AUC = 0.82 (95% CI 0.77–0.87), logistic regression AUC = 0.81 (95% CI 0.76–0.86), and support vector machines AUC = 0.83 (95% CI 0.78–0.88), respectively); age was the most important predictor in these models. Linear and non-linear ML algorithms including socioeconomic, lifestyle, and clinical variables produced a moderate predictive performance of incident cancers in the Lifelines cohort.


Sign in / Sign up

Export Citation Format

Share Document