scholarly journals Text Classification by Genre Based on Rhythm Features

2021 ◽  
Vol 28 (3) ◽  
pp. 280-291
Author(s):  
Ksenia Vladimirovna Lagutina ◽  
Nadezhda Stanislavovna Lagutina ◽  
Elena Igorevna Boychuk

The article is devoted to the analysis of the rhythm of texts of different genres: fiction novels, advertisements, scientific articles, reviews, tweets, and political articles. The authors identified lexico-grammatical figures in the texts: anaphora, epiphora, diacope, aposiopesis, etc., that are markers of the text rhythm. On their basis, statistical features were calculated that describe quantitatively and structurally these rhythm features.The resulting text model was visualized for statistical analysis using boxplots and heat maps that showed differences in the rhythm of texts of different genres. The boxplots showed that almost all genres differ from each other in terms of the overall density of rhythm features. Heatmaps showed different rhythm patterns across genres. Further, the rhythm features were successfully used to classify texts into six genres. The classification was carried out in two ways: a binary classification for each genre in order to separate a particular genre from the rest genres, and a multi-class classification of the text corpus into six genres at once. Two text corpora in English and Russian were used for the experiments. Each corpus contains 100 fiction novels, scientific articles, advertisements and tweets, 50 reviews and political articles, i.e. a total of 500 texts. The high quality of the classification with neural networks showed that rhythm features are a good marker for most genres, especially fiction. The experiments were carried out using the ProseRhythmDetector software tool for Russian and English languages. Text corpora contains 300 texts for each language.

2020 ◽  
Vol 14 ◽  
Author(s):  
Lahari Tipirneni ◽  
Rizwan Patan

Abstract:: Millions of deaths all over the world are caused by breast cancer every year. It has become the most common type of cancer in women. Early detection will help in better prognosis and increases the chance of survival. Automating the classification using Computer-Aided Diagnosis (CAD) systems can make the diagnosis less prone to errors. Multi class classification and Binary classification of breast cancer is a challenging problem. Convolutional neural network architectures extract specific feature descriptors from images, which cannot represent different types of breast cancer. This leads to false positives in classification, which is undesirable in disease diagnosis. The current paper presents an ensemble Convolutional neural network for multi class classification and Binary classification of breast cancer. The feature descriptors from each network are combined to produce the final classification. In this paper, histopathological images are taken from publicly available BreakHis dataset and classified between 8 classes. The proposed ensemble model can perform better when compared to the methods proposed in the literature. The results showed that the proposed model could be a viable approach for breast cancer classification.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Rajit Nair ◽  
Santosh Vishwakarma ◽  
Mukesh Soni ◽  
Tejas Patel ◽  
Shubham Joshi

Purpose The latest 2019 coronavirus (COVID-2019), which first appeared in December 2019 in Wuhan's city in China, rapidly spread around the world and became a pandemic. It has had a devastating impact on daily lives, the public's health and the global economy. The positive cases must be identified as soon as possible to avoid further dissemination of this disease and swift care of patients affected. The need for supportive diagnostic instruments increased, as no specific automated toolkits are available. The latest results from radiology imaging techniques indicate that these photos provide valuable details on the virus COVID-19. User advanced artificial intelligence (AI) technologies and radiological imagery can help diagnose this condition accurately and help resolve the lack of specialist doctors in isolated areas. In this research, a new paradigm for automatic detection of COVID-19 with bare chest X-ray images is displayed. Images are presented. The proposed model DarkCovidNet is designed to provide correct binary classification diagnostics (COVID vs no detection) and multi-class (COVID vs no results vs pneumonia) classification. The implemented model computed the average precision for the binary and multi-class classification of 98.46% and 91.352%, respectively, and an average accuracy of 98.97% and 87.868%. The DarkNet model was used in this research as a classifier for a real-time object detection method only once. A total of 17 convolutionary layers and different filters on each layer have been implemented. This platform can be used by the radiologists to verify their initial application screening and can also be used for screening patients through the cloud. Design/methodology/approach This study also uses the CNN-based model named Darknet-19 model, and this model will act as a platform for the real-time object detection system. The architecture of this system is designed in such a way that they can be able to detect real-time objects. This study has developed the DarkCovidNet model based on Darknet architecture with few layers and filters. So before discussing the DarkCovidNet model, look at the concept of Darknet architecture with their functionality. Typically, the DarkNet architecture consists of 5 pool layers though the max pool and 19 convolution layers. Assume as a convolution layer, and as a pooling layer. Findings The work discussed in this paper is used to diagnose the various radiology images and to develop a model that can accurately predict or classify the disease. The data set used in this work is the images bases on COVID-19 and non-COVID-19 taken from the various sources. The deep learning model named DarkCovidNet is applied to the data set, and these have shown signification performance in the case of binary classification and multi-class classification. During the multi-class classification, the model has shown an average accuracy 98.97% for the detection of COVID-19, whereas in a multi-class classification model has achieved an average accuracy of 87.868% during the classification of COVID-19, no detection and Pneumonia. Research limitations/implications One of the significant limitations of this work is that a limited number of chest X-ray images were used. It is observed that patients related to COVID-19 are increasing rapidly. In the future, the model on the larger data set which can be generated from the local hospitals will be implemented, and how the model is performing on the same will be checked. Originality/value Deep learning technology has made significant changes in the field of AI by generating good results, especially in pattern recognition. A conventional CNN structure includes a convolution layer that extracts characteristics from the input using the filters it applies, a pooling layer that reduces calculation efficiency and the neural network's completely connected layer. A CNN model is created by integrating one or more of these layers, and its internal parameters are modified to accomplish a specific mission, such as classification or object recognition. A typical CNN structure has a convolution layer that extracts features from the input with the filters it applies, a pooling layer to reduce the size for computational performance and a fully connected layer, which is a neural network. A CNN model is created by combining one or more such layers, and its internal parameters are adjusted to accomplish a particular task, such as classification or object recognition.


Author(s):  
Sushila Sonare ◽  
Megha Kamble

Now-a-days, it is very common that the customers share their thoughts about any product, brand and their experience in social media. The analysts collect these reviews and process it, to extract meaningful information about the product. The beauty of social media is, it’s involved in all the domains. So the analysts got reviews from different social media and platforms for almost all kind of thing. The Sentiment Analysis is applied to predict outcomes for getting useful information, for ex.; like predict the blockbuster for a movie, rating for any new launches and many more. This type of prediction is really helpful for the customer to buy any goods or take any services in this competitive world. This paper is focused on e-commerce website reviews which are normally in text form with some special characters and some symbols (emojis). Each word in this text set got some meaning in terms of context, emotion and prior experience. These characteristics contribute to some of the features of text data for prediction. The objective of this paper is to compile existing research works on text analysis and emotion based analysis. The open issues and challenges of document based sentiment analysis are also discussed. The paper concluded with proposing a new approach of multi class classification. Ternary classification for classes positive, negative and neutral is suggested primarily for product based text and emoji reviews on Twitter social media.


Author(s):  
Jivitesh Sharma ◽  
Charul Giri ◽  
Ole-Christoffer Granmo ◽  
Morten Goodwin

Abstract Recent advances in intrusion detection systems based on machine learning have indeed outperformed other techniques, but struggle with detecting multiple classes of attacks with high accuracy. We propose a method that works in three stages. First, the ExtraTrees classifier is used to select relevant features for each type of attack individually for each (ELM). Then, an ensemble of ELMs is used to detect each type of attack separately. Finally, the results of all ELMs are combined using a softmax layer to refine the results and increase the accuracy further. The intuition behind our system is that multi-class classification is quite difficult compared to binary classification. So, we divide the multi-class problem into multiple binary classifications. We test our method on the UNSW and KDDcup99 datasets. The results clearly show that our proposed method is able to outperform all the other methods, with a high margin. Our system is able to achieve 98.24% and 99.76% accuracy for multi-class classification on the UNSW and KDDcup99 datasets, respectively. Additionally, we use the weighted extreme learning machine to alleviate the problem of imbalance in classification of attacks, which further boosts performance. Lastly, we implement the ensemble of ELMs in parallel using GPUs to perform intrusion detection in real time.


2021 ◽  
Author(s):  
ANKIT GHOSH ◽  
ALOK KOLE

<p>The improvement of Artificial Intelligence (AI) and Machine Learning (ML) can help radiologists in tumor diagnostics without invasive measures. Magnetic resonance imaging (MRI) is a very useful method for diagnosis of tumors in human brain. In this paper, brain MRI images have been analyzed to detect the regions containing tumors and classify these regions into three different tumor categories: meningioma, glioma, and pituitary. This paper presents the implementation and comparison of various enhanced ML algorithms for the detection and classification of brain tumors. A brain tumor is the growth of abnormal cells in the human brain. Brain tumors can be cancerous or non-cancerous. Cancerous or malignant brain tumors can be life threatening. Hence, detection and classification of brain tumors at an early stage is extremely important. In this paper, enhanced ML algorithms have been implemented to predict the presence or the absence of brain tumors using binary classification and to predict whether a patient has brain tumor or not and if he does, detect the type of brain tumor using multi-class classification. The dataset that has been used to perform the binary classification task comprises of two types of brain MRI images with tumor and without tumor. Here nine ML algorithms namely, Support Vector Machine (SVM), Logistic Regression, K-Nearest Neighbor (KNN), Naïve Bayes (NB), Decision Tree (DT) classifier, Random Forest classifier, XGBoost classifier, Stochastic Gradient Descent (SGD) classifier and Gradient Boosting classifier have been used to classify the MRI images. A comparative analysis of the ML algorithms has been performed based on a few performance metrics such as accuracy, recall, and precision, F1-score, AUC-ROC curve and AUC-PR curve. Gradient Boosting classifier has outperformed all the other algorithms with an accuracy of 92.4%, recall of 94.4%, precision of 85%, F1-score of 89.5%, AUC-ROC of 97.2% and an AUC-PR of 91.4%. To address the multi-class classification problem, four ML algorithms namely, SVM, KNN, Random Forest classifier and XGBoost classifier have been employed. In this case, the dataset that has been used consists of four types of brain MRI images with glioma tumor, meningioma tumor, and pituitary tumor and with no tumor. The performances of the ML algorithms have been compared based on accuracy, recall, precision and the F1-score. XGBoost classifier has surpassed all the other algorithms in terms of accuracy, precision, recall and F1-score. XGBoost has produced an accuracy of 90%, precision of 90%, and recall of 90% and F1-score of 90%.</p>


Author(s):  
Eoin Dinneen ◽  
Clare Allen ◽  
Tom Strange ◽  
Daniel Heffernan-Ho ◽  
Jelena Banjeglav ◽  
...  

The accuracy of multi-parametric MRI (mpMRI) in pre-operative staging of prostate cancer (PCa) remains controversial. Objective: To evaluate the ability of mpMRI to accurately predict PCa extra-prostatic extension (EPE) on a side-specific basis using a risk-stratified 5-point Likert scale. This study also aimed to assess the influence of mpMRI scan quality on diagnostic accuracy. Patients and Methods: We included 124 men who underwent robot-assisted RP (RARP) as part of the NeuroSAFE PROOF study at our centre. Three radiologists retrospectively reviewed mpMRI blinded to RP pathology and assigned a Likert score (1-5) for EPE on each side of the prostate. Each scan was also ascribed a Prostate Imaging Quality (PI-QUAL) score for assessing the quality of the mpMRI scan, where 1 represents poorest and 5 represents best diagnostic quality. Outcome measurements and statistical analyses: Diagnostic performance is presented for binary classification of EPE including 95% confidence intervals and area under the receiver operating characteristic curve (AUC). Results: A total of 231 lobes from 121 men (mean age 56.9 years) were evaluated. 39 men (32.2%), or 43 lobes (18.6%) had EPE. Likert score &ge;3 had sensitivity (SE), specificity (SP), NPV, PPV of 90.4%, 52.3%, 96%, 29.9%, respectively, and AUC was 0.82 (95% CI: 0.77-0.86). AUC was 0.63 (95% CI: 0.37-0.9), 0.77 (0.71-0.84) and 0.92 (0.88-0.96) for biparametric scans, PI-QUAL 1-3 and PI-QUAL 4-5 scans, respectively. Conclusions: MRI can be used effectively by genitourinary radiologists to rule out EPE and help inform surgical planning for men undergoing RARP. EPE prediction was more reliable when the MRI scan was a) multi-parametric and b) of a higher image quality according to the PI-QUAL scoring system.


2021 ◽  
Author(s):  
Kira Wegner-Clemens ◽  
George Law Malcolm ◽  
Sarah Shomstein

Semantic information about objects, events, and scenes influences how humans perceive, interact with, and navigate the world. Most evidence in support of semantic influence on cognition has been garnered from research conducted with an isolated modality (e.g., vision, audition). However, the influence of semantic information has not yet been extensively studied in multisensory environments potentially because of the difficulty in quantification of semantic relatedness. Past studies have primary relied on either a simplified binary classification of semantic relatedness based on category or on algorithmic values based on text corpora rather than human perceptual experience and judgement. With the aim to accelerate research into multisensory semantics, we created a constrained audiovisual stimulus set and derived similarity ratings between items within three categories (animals, instruments, household items). A set of 140 participants provided similarity judgments between sounds and images. Participants either heard a sound (e.g., a meow) and judged which of two pictures of objects (e.g., a picture of a dog and a duck) it was more similar to, or saw a picture (e.g., a picture of a duck) and selected which of two sounds it was more similar to (e.g., a bark or a meow). Judgements were then used to calculate similarity values of any given cross-modal pair. The derived and reported similarity judgements reflect a range of semantic similarities across three categories and items, and highlight similarities and differences among similarity judgments between modalities. We make the derived similarity values available in a database format to the research community to be used as a measure of semantic relatedness in cognitive psychology experiments, enabling more robust studies of semantics in audiovisual environments.


Author(s):  
Sushila Sonare ◽  
◽  
Dr. Megha Kamble ◽  

Now-a-days, it is very common that the customers share their thoughts about any product, brand and their experience in social media. The analysts collect these reviews and process it, to extract meaningful information about the product. The beauty of social media is, it’s involved in all the domains. So the analysts got reviews from different social media and platforms for almost all kind of thing. The Sentiment Analysis is applied to predict outcomes for getting useful information, for ex.; like predict the blockbuster for a movie, rating for any new launches and many more. This type of prediction is really helpful for the customer to buy any goods or take any services in this competitive world. This paper is focused on e-commerce website reviews which are normally in text form with some special characters and some symbols (emojis). Each word in this text set got some meaning in terms of context, emotion and prior experience. These characteristics contribute to some of the features of text data for prediction. The objective of this paper is to compile existing research works on text analysis and emotion based analysis. The open issues and challenges of document based sentiment analysis are also discussed. The paper concluded with proposing a new approach of multi class classification. Ternary classification for classes positive, negative and neutral is suggested primarily for product based text and emoji reviews on Twitter social media.


2021 ◽  
Author(s):  
Sebastião Rogério da Silva Neto ◽  
Thomás Tabosa Oliveira ◽  
Igor Vitor Teixeira ◽  
Samuel Benjamin Aguiar de Oliveira ◽  
Vanderson Souza Sampaio ◽  
...  

Abstract Background: NTDs primarily affect the poorest populations, often living in remote, rural areas, urban slums or conflict zones. Arboviruses are a significant NTD category spread by mosquitoes. Dengue, Chikungunya, and Zika are three arboviruses that affect a large proportion of the population in Latin and South America. The clinical diagnosis of these arboviral diseases is a difficult task due to the concurrent circulation of several arboviruses which present similar symptoms, inaccurate serologic tests resulting from cross-reaction and co-infection with other arboviruses. Objective: The goal of this paper is to present evidence on the state of the art of studies investigating the automatic classification of arboviral diseases to support clinical diagnosis based on ML and DL models. Method: We carried out a SLR in which Google Scholar was searched to identify key papers on the topic. From an initial 963 records (956 from string-based search and 7 from single backward snowballing technique), only 15 relevant papers were identified. Results: Results show that current research is focused on the binary classification of Dengue, primarily using Tree based ML algorithms and only one paper was identified using DL. Five papers presented solutions for multi-class problems, covering Dengue (and its levels) and Chikungunya. No papers were identified that investigated models to differentiate between Dengue, Chikungunya, and Zika. Conclusions: The use of an efficient clinical decision support system for arboviral diseases can improve the quality of the entire clinical process, thus increasing the accuracy of the diagnosis and the associated treatment. It should help physicians in their decision-making process and, consequently, improve the use of resources and the patient's quality of life.


2021 ◽  
Vol 11 (12) ◽  
pp. 5533
Author(s):  
Jui-Sheng Chou ◽  
Trang Thi Phuong Pham ◽  
Chia-Chun Ho

Multi-class classification is one of the major challenges in machine learning and an ongoing research issue. Classification algorithms are generally binary, but they must be extended to multi-class problems for real-world application. Multi-class classification is more complex than binary classification. In binary classification, only the decision boundaries of one class are to be known, whereas in multiclass classification, several boundaries are involved. The objective of this investigation is to propose a metaheuristic, optimized, multi-level classification learning system for forecasting in civil and construction engineering. The proposed system integrates the firefly algorithm (FA), metaheuristic intelligence, decomposition approaches, the one-against-one (OAO) method, and the least squares support vector machine (LSSVM). The enhanced FA automatically fine-tunes the hyperparameters of the LSSVM to construct an optimized LSSVM classification model. Ten benchmark functions are used to evaluate the performance of the enhanced optimization algorithm. Two binary-class datasets related to geotechnical engineering, concerning seismic bumps and soil liquefaction, are then used to clarify the application of the proposed system to binary problems. Further, this investigation uses multi-class cases in civil engineering and construction management to verify the effectiveness of the model in the diagnosis of faults in steel plates, quality of water in a reservoir, and determining urban land cover. The results reveal that the system predicts faults in steel plates with an accuracy of 91.085%, the quality of water in a reservoir with an accuracy of 93.650%, and urban land cover with an accuracy of 87.274%. To demonstrate the effectiveness of the proposed system, its predictive accuracy is compared with that of a non-optimized baseline model, single multi-class classification algorithms (sequential minimal optimization (SMO), the Multiclass Classifier, the Naïve Bayes, the library support vector machine (LibSVM) and logistic regression) and prior studies. The analytical results show that the proposed system is promising project analytics software to help decision makers solve multi-level classification problems in engineering applications.


Sign in / Sign up

Export Citation Format

Share Document