Binary classification of Lupus scientific articles applying deep ensemble model on text data

Author(s):  
Maryam Samami ◽  
Elham Mousazade soure
Mathematics ◽  
2021 ◽  
Vol 9 (19) ◽  
pp. 2502
Author(s):  
Natalia Vanetik ◽  
Marina Litvak

Definitions are extremely important for efficient learning of new materials. In particular, mathematical definitions are necessary for understanding mathematics-related areas. Automated extraction of definitions could be very useful for automated indexing educational materials, building taxonomies of relevant concepts, and more. For definitions that are contained within a single sentence, this problem can be viewed as a binary classification of sentences into definitions and non-definitions. In this paper, we focus on automatic detection of one-sentence definitions in mathematical and general texts. We experiment with different classification models arranged in an ensemble and applied to a sentence representation containing syntactic and semantic information, to classify sentences. Our ensemble model is applied to the data adjusted with oversampling. Our experiments demonstrate the superiority of our approach over state-of-the-art methods in both general and mathematical domains.


Author(s):  
P.L. Nikolaev

This article deals with method of binary classification of images with small text on them Classification is based on the fact that the text can have 2 directions – it can be positioned horizontally and read from left to right or it can be turned 180 degrees so the image must be rotated to read the sign. This type of text can be found on the covers of a variety of books, so in case of recognizing the covers, it is necessary first to determine the direction of the text before we will directly recognize it. The article suggests the development of a deep neural network for determination of the text position in the context of book covers recognizing. The results of training and testing of a convolutional neural network on synthetic data as well as the examples of the network functioning on the real data are presented.


2020 ◽  
Vol 14 ◽  
Author(s):  
Lahari Tipirneni ◽  
Rizwan Patan

Abstract:: Millions of deaths all over the world are caused by breast cancer every year. It has become the most common type of cancer in women. Early detection will help in better prognosis and increases the chance of survival. Automating the classification using Computer-Aided Diagnosis (CAD) systems can make the diagnosis less prone to errors. Multi class classification and Binary classification of breast cancer is a challenging problem. Convolutional neural network architectures extract specific feature descriptors from images, which cannot represent different types of breast cancer. This leads to false positives in classification, which is undesirable in disease diagnosis. The current paper presents an ensemble Convolutional neural network for multi class classification and Binary classification of breast cancer. The feature descriptors from each network are combined to produce the final classification. In this paper, histopathological images are taken from publicly available BreakHis dataset and classified between 8 classes. The proposed ensemble model can perform better when compared to the methods proposed in the literature. The results showed that the proposed model could be a viable approach for breast cancer classification.


2021 ◽  
Vol 13 (9) ◽  
pp. 1623
Author(s):  
João E. Batista ◽  
Ana I. R. Cabral ◽  
Maria J. P. Vasconcelos ◽  
Leonardo Vanneschi ◽  
Sara Silva

Genetic programming (GP) is a powerful machine learning (ML) algorithm that can produce readable white-box models. Although successfully used for solving an array of problems in different scientific areas, GP is still not well known in the field of remote sensing. The M3GP algorithm, a variant of the standard GP algorithm, performs feature construction by evolving hyperfeatures from the original ones. In this work, we use the M3GP algorithm on several sets of satellite images over different countries to create hyperfeatures from satellite bands to improve the classification of land cover types. We add the evolved hyperfeatures to the reference datasets and observe a significant improvement of the performance of three state-of-the-art ML algorithms (decision trees, random forests, and XGBoost) on multiclass classifications and no significant effect on the binary classifications. We show that adding the M3GP hyperfeatures to the reference datasets brings better results than adding the well-known spectral indices NDVI, NDWI, and NBR. We also compare the performance of the M3GP hyperfeatures in the binary classification problems with those created by other feature construction methods such as FFX and EFS.


2021 ◽  
Vol 21 (S2) ◽  
Author(s):  
Kun Zeng ◽  
Yibin Xu ◽  
Ge Lin ◽  
Likeng Liang ◽  
Tianyong Hao

Abstract Background Eligibility criteria are the primary strategy for screening the target participants of a clinical trial. Automated classification of clinical trial eligibility criteria text by using machine learning methods improves recruitment efficiency to reduce the cost of clinical research. However, existing methods suffer from poor classification performance due to the complexity and imbalance of eligibility criteria text data. Methods An ensemble learning-based model with metric learning is proposed for eligibility criteria classification. The model integrates a set of pre-trained models including Bidirectional Encoder Representations from Transformers (BERT), A Robustly Optimized BERT Pretraining Approach (RoBERTa), XLNet, Pre-training Text Encoders as Discriminators Rather Than Generators (ELECTRA), and Enhanced Representation through Knowledge Integration (ERNIE). Focal Loss is used as a loss function to address the data imbalance problem. Metric learning is employed to train the embedding of each base model for feature distinguish. Soft Voting is applied to achieve final classification of the ensemble model. The dataset is from the standard evaluation task 3 of 5th China Health Information Processing Conference containing 38,341 eligibility criteria text in 44 categories. Results Our ensemble method had an accuracy of 0.8497, a precision of 0.8229, and a recall of 0.8216 on the dataset. The macro F1-score was 0.8169, outperforming state-of-the-art baseline methods by 0.84% improvement on average. In addition, the performance improvement had a p-value of 2.152e-07 with a standard t-test, indicating that our model achieved a significant improvement. Conclusions A model for classifying eligibility criteria text of clinical trials based on multi-model ensemble learning and metric learning was proposed. The experiments demonstrated that the classification performance was improved by our ensemble model significantly. In addition, metric learning was able to improve word embedding representation and the focal loss reduced the impact of data imbalance to model performance.


2021 ◽  
Vol 11 (9) ◽  
pp. 3836
Author(s):  
Valeri Gitis ◽  
Alexander Derendyaev ◽  
Konstantin Petrov ◽  
Eugene Yurkov ◽  
Sergey Pirogov ◽  
...  

Prostate cancer is the second most frequent malignancy (after lung cancer). Preoperative staging of PCa is the basis for the selection of adequate treatment tactics. In particular, an urgent problem is the classification of indolent and aggressive forms of PCa in patients with the initial stages of the tumor process. To solve this problem, we propose to use a new binary classification machine-learning method. The proposed method of monotonic functions uses a model in which the disease’s form is determined by the severity of the patient’s condition. It is assumed that the patient’s condition is the easier, the less the deviation of the indicators from the normal values inherent in healthy people. This assumption means that the severity (form) of the disease can be represented by monotonic functions from the values of the deviation of the patient’s indicators beyond the normal range. The method is used to solve the problem of classifying patients with indolent and aggressive forms of prostate cancer according to pretreatment data. The learning algorithm is nonparametric. At the same time, it allows an explanation of the classification results in the form of a logical function. To do this, you should indicate to the algorithm either the threshold value of the probability of successful classification of patients with an indolent form of PCa, or the threshold value of the probability of misclassification of patients with an aggressive form of PCa disease. The examples of logical rules given in the article show that they are quite simple and can be easily interpreted in terms of preoperative indicators of the form of the disease.


2021 ◽  
Vol 6 (1) ◽  
Author(s):  
Vikas Khullar ◽  
Karuna Salgotra ◽  
Harjit Pal Singh ◽  
Davinder Pal Sharma

2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Rajit Nair ◽  
Santosh Vishwakarma ◽  
Mukesh Soni ◽  
Tejas Patel ◽  
Shubham Joshi

Purpose The latest 2019 coronavirus (COVID-2019), which first appeared in December 2019 in Wuhan's city in China, rapidly spread around the world and became a pandemic. It has had a devastating impact on daily lives, the public's health and the global economy. The positive cases must be identified as soon as possible to avoid further dissemination of this disease and swift care of patients affected. The need for supportive diagnostic instruments increased, as no specific automated toolkits are available. The latest results from radiology imaging techniques indicate that these photos provide valuable details on the virus COVID-19. User advanced artificial intelligence (AI) technologies and radiological imagery can help diagnose this condition accurately and help resolve the lack of specialist doctors in isolated areas. In this research, a new paradigm for automatic detection of COVID-19 with bare chest X-ray images is displayed. Images are presented. The proposed model DarkCovidNet is designed to provide correct binary classification diagnostics (COVID vs no detection) and multi-class (COVID vs no results vs pneumonia) classification. The implemented model computed the average precision for the binary and multi-class classification of 98.46% and 91.352%, respectively, and an average accuracy of 98.97% and 87.868%. The DarkNet model was used in this research as a classifier for a real-time object detection method only once. A total of 17 convolutionary layers and different filters on each layer have been implemented. This platform can be used by the radiologists to verify their initial application screening and can also be used for screening patients through the cloud. Design/methodology/approach This study also uses the CNN-based model named Darknet-19 model, and this model will act as a platform for the real-time object detection system. The architecture of this system is designed in such a way that they can be able to detect real-time objects. This study has developed the DarkCovidNet model based on Darknet architecture with few layers and filters. So before discussing the DarkCovidNet model, look at the concept of Darknet architecture with their functionality. Typically, the DarkNet architecture consists of 5 pool layers though the max pool and 19 convolution layers. Assume as a convolution layer, and as a pooling layer. Findings The work discussed in this paper is used to diagnose the various radiology images and to develop a model that can accurately predict or classify the disease. The data set used in this work is the images bases on COVID-19 and non-COVID-19 taken from the various sources. The deep learning model named DarkCovidNet is applied to the data set, and these have shown signification performance in the case of binary classification and multi-class classification. During the multi-class classification, the model has shown an average accuracy 98.97% for the detection of COVID-19, whereas in a multi-class classification model has achieved an average accuracy of 87.868% during the classification of COVID-19, no detection and Pneumonia. Research limitations/implications One of the significant limitations of this work is that a limited number of chest X-ray images were used. It is observed that patients related to COVID-19 are increasing rapidly. In the future, the model on the larger data set which can be generated from the local hospitals will be implemented, and how the model is performing on the same will be checked. Originality/value Deep learning technology has made significant changes in the field of AI by generating good results, especially in pattern recognition. A conventional CNN structure includes a convolution layer that extracts characteristics from the input using the filters it applies, a pooling layer that reduces calculation efficiency and the neural network's completely connected layer. A CNN model is created by integrating one or more of these layers, and its internal parameters are modified to accomplish a specific mission, such as classification or object recognition. A typical CNN structure has a convolution layer that extracts features from the input with the filters it applies, a pooling layer to reduce the size for computational performance and a fully connected layer, which is a neural network. A CNN model is created by combining one or more such layers, and its internal parameters are adjusted to accomplish a particular task, such as classification or object recognition.


2022 ◽  
Vol 10 (1) ◽  
pp. 0-0

Brain tumor is a severe cancer disease caused by uncontrollable and abnormal partitioning of cells. Timely disease detection and treatment plans lead to the increased life expectancy of patients. Automated detection and classification of brain tumor are a more challenging process which is based on the clinician’s knowledge and experience. For this fact, one of the most practical and important techniques is to use deep learning. Recent progress in the fields of deep learning has helped the clinician’s in medical imaging for medical diagnosis of brain tumor. In this paper, we present a comparison of Deep Convolutional Neural Network models for automatically binary classification query MRI images dataset with the goal of taking precision tools to health professionals based on fined recent versions of DenseNet, Xception, NASNet-A, and VGGNet. The experiments were conducted using an MRI open dataset of 3,762 images. Other performance measures used in the study are the area under precision, recall, and specificity.


2020 ◽  
Vol 7 (1) ◽  
pp. 3-13
Author(s):  
Alexander Kozachok ◽  
Sergey Kopylov

 Abstract— This article presents an approach to protection of printed text data by watermark embedding in the printing process. Data protection is based on robust watermark embedding that is invariant to text data format converting into image. The choice of a robust watermark within the confines of the presented classification of digital watermark is justified. The requirements to developed robust watermark have been formed. According to the formed requirements and existing restrictions, an approach to robust watermark embedding into text data based on a steganographic algorithm of line spacing shifting has been developed. The block diagram and the description of the developed algorithm of data embedding into text data are given. An experimental estimation of the embedding capacity and perceptual invisibility of the developed data embedding approach was carried out. An approach to extract embedded information from images containing a robust watermark has been developed. The limits of the retrieval, extraction accuracy and robustness evaluation of embedded data to various transformations have been experimentally established.Tóm tắt— Bài báo trình bày cách tiếp cận để bảo vệ dữ liệu văn bản in bằng cách nhúng vào văn bản một đoạn thủy vân trong quá trình in. Bảo vệ dữ liệu dựa trên việc sử dụng thủy vân bền vững có khả năng chống lại sự chuyển đổi định dạng dữ liệu văn bản sang dữ liệu hình ảnh. Sau quá trình phân tích các hệ thống thủy vân số hiện có, nhận thấy việc lựa chọn một mô hình thủy vân bền vững là hợp lý. Do yêu cầu thực tế và các hạn chế của phương pháp nhúng thủy vân vào dữ liệu văn bản hiện có, bài báo đưa ra phương pháp nhúng mới được phát triển dựa trên một thuật toán ẩn mã sử dụng cách thay đổi khoảng cách giữa các dòng trong văn bản. Bài báo đưa ra một sơ đồ khối và mô tả thuật toán nhúng thông tin vào dữ liệu văn bản. Các thực nghiệm về khả năng nhúng và khả năng che giấu thông tin với tri giác thông thường của dữ liệu nhúng cũng được trình bày. Bài báo cũng nêu cách tiếp cận để trích xuất thông tin được nhúng từ các hình ảnh có chứa thủy vân bền vững. Bên cạnh đó, chúng tôi cũng đưa ra các giới hạn về khả năng ứng dụng của phương pháp dựa trên các thực nghiệm, các đánh giá về độ chính xác của việc trích xuất được dữ liệu và độ mạnh của phương pháp nhúng mới này đối với các phép biến đổi ảnh khác nhau. 


Sign in / Sign up

Export Citation Format

Share Document