Text Classification Using Machine Learning and Deep Learning Models

Abstract: Lawsuits and regulatory investigations in today's legal environment demand corporations to engage in increasingly intense data-focused engagements to find, acquire, and evaluate vast amounts of data. In recent years, technology-assisted review (TAR) has become a more crucial part of the document review process in legal discovery. Attorneys now have been using machine learning techniques like text classification to identify responsive information. In the legal domain, text classification is referred to as predictive coding or technology assisted review (TAR). Predictive coding is used to increase the number of relevant documents identified, while reducing human labelling efforts and manual review of documents. Deep learning models mixed with word embeddings have demonstrated to be more effective in predictive coding in recent years. Deep learning models, on the other hand, have a lot of variables, making it difficult and time-consuming for legal professionals to choose the right settings. In this paper, we will look at a few predictive coding algorithms and discuss which one is the most efficient among them. Keywords: Technology-assisted-review, predictive coding, machine learning, text classification, deep learning, CNN , Unscented Kalman Filter, Logistic Regression, SVM

Download Full-text

Performance Analysis of Machine Learning and Deep Learning Models for Text Classification

2020 IEEE 17th India Council International Conference (INDICON) ◽

10.1109/indicon49873.2020.9342208 ◽

2020 ◽

Author(s):

C M Suneera ◽

Jay Prakash

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Performance Analysis ◽

Text Classification ◽

Learning Models

Download Full-text

Deep Learning--based Text Classification

ACM Computing Surveys ◽

10.1145/3439726 ◽

2021 ◽

Vol 54 (3) ◽

pp. 1-40

Author(s):

Shervin Minaee ◽

Nal Kalchbrenner ◽

Erik Cambria ◽

Narjes Nikzad ◽

Meysam Chenaghlu ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Text Classification ◽

Question Answering ◽

Future Research ◽

Learning Models ◽

Research Directions ◽

Comprehensive Review ◽

Future Research Directions ◽

Classification Tasks

Deep learning--based models have surpassed classical machine learning--based approaches in various text classification tasks, including sentiment analysis, news categorization, question answering, and natural language inference. In this article, we provide a comprehensive review of more than 150 deep learning--based models for text classification developed in recent years, and we discuss their technical contributions, similarities, and strengths. We also provide a summary of more than 40 popular datasets widely used for text classification. Finally, we provide a quantitative analysis of the performance of different deep learning models on popular benchmarks, and we discuss future research directions.

Download Full-text

Data science in economics: comprehensive review of advanced machine learning and deep learning methods

10.31232/osf.io/4pxq2 ◽

2020 ◽

Author(s):

Saeed Nosratabadi ◽

Amir Mosavi ◽

Puhong Duan ◽

Pedram Ghamisi ◽

Ferdinand Filip ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Data Science ◽

State Of The Art ◽

Science Methods ◽

Learning Models ◽

Diverse Range ◽

Hybrid Machine ◽

Economics Research

This paper provides a state-of-the-art investigation of advances in data science in emerging economic applications. The analysis was performed on novel data science methods in four individual classes of deep learning models, hybrid deep learning models, hybrid machine learning, and ensemble models. Application domains include a wide and diverse range of economics research from the stock market, marketing, and e-commerce to corporate banking and cryptocurrency. Prisma method, a systematic literature review methodology, was used to ensure the quality of the survey. The findings reveal that the trends follow the advancement of hybrid models, which, based on the accuracy metric, outperform other learning algorithms. It is further expected that the trends will converge toward the advancements of sophisticated hybrid deep learning models.

Download Full-text

Deep Learning for text in limted data settings

10.36227/techrxiv.12100692 ◽

2020 ◽

Author(s):

Pathikkumar Patel ◽

Bhargav Lad ◽

Jinan Fiaidhi

Keyword(s):

Machine Learning ◽

Time Series ◽

Deep Learning ◽

Sentiment Analysis ◽

Transfer Learning ◽

Text Classification ◽

State Of The Art ◽

Time Series Forecasting ◽

Text Data ◽

Performance Levels

During the last few years, RNN models have been extensively used and they have proven to be better for sequence and text data. RNNs have achieved state-of-the-art performance levels in several applications such as text classification, sequence to sequence modelling and time series forecasting. In this article we will review different Machine Learning and Deep Learning based approaches for text data and look at the results obtained from these methods. This work also explores the use of transfer learning in NLP and how it affects the performance of models on a specific application of sentiment analysis.

Download Full-text

Deep Learning in Disease Diagnosis: Models and Datasets

Current Bioinformatics ◽

10.2174/1574893615999201002124021 ◽

2020 ◽

Vol 15 ◽

Author(s):

Deeksha Saxena ◽

Mohammed Haris Siddiqui ◽

Rajnish Kumar

Keyword(s):

Biological Sciences ◽

Machine Learning ◽

Deep Learning ◽

Disease Diagnosis ◽

Learning Models ◽

Data Types ◽

Related Data ◽

Abstract Level ◽

Experimental Validations ◽

Selection Of

Background: Deep learning (DL) is an Artificial neural network-driven framework with multiple levels of representation for which non-linear modules combined in such a way that the levels of representation can be enhanced from lower to a much abstract level. Though DL is used widely in almost every field, it has largely brought a breakthrough in biological sciences as it is used in disease diagnosis and clinical trials. DL can be clubbed with machine learning, but at times both are used individually as well. DL seems to be a better platform than machine learning as the former does not require an intermediate feature extraction and works well with larger datasets. DL is one of the most discussed fields among the scientists and researchers these days for diagnosing and solving various biological problems. However, deep learning models need some improvisation and experimental validations to be more productive. Objective: To review the available DL models and datasets that are used in disease diagnosis. Methods: Available DL models and their applications in disease diagnosis were reviewed discussed and tabulated. Types of datasets and some of the popular disease related data sources for DL were highlighted. Results: We have analyzed the frequently used DL methods, data types and discussed some of the recent deep learning models used for solving different biological problems. Conclusion: The review presents useful insights about DL methods, data types, selection of DL models for the disease diagnosis.

Download Full-text

Machine Learning-Based Malicious X.509 Certificates’ Detection

Applied Sciences ◽

10.3390/app11052164 ◽

2021 ◽

Vol 11 (5) ◽

pp. 2164

Author(s):

Jiaxin Li ◽

Zhaoxin Zhang ◽

Changyong Guo

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Ensemble Learning ◽

Traffic Analysis ◽

Learning Models ◽

Detection Model ◽

Analysis Tools ◽

Average Accuracy ◽

Machine Learning Models

X.509 certificates play an important role in encrypting the transmission of data on both sides under HTTPS. With the popularization of X.509 certificates, more and more criminals leverage certificates to prevent their communications from being exposed by malicious traffic analysis tools. Phishing sites and malware are good examples. Those X.509 certificates found in phishing sites or malware are called malicious X.509 certificates. This paper applies different machine learning models, including classical machine learning models, ensemble learning models, and deep learning models, to distinguish between malicious certificates and benign certificates with Verification for Extraction (VFE). The VFE is a system we design and implement for obtaining plentiful characteristics of certificates. The result shows that ensemble learning models are the most stable and efficient models with an average accuracy of 95.9%, which outperforms many previous works. In addition, we obtain an SVM-based detection model with an accuracy of 98.2%, which is the highest accuracy. The outcome indicates the VFE is capable of capturing essential and crucial characteristics of malicious X.509 certificates.

Download Full-text

Quantum algorithm for quicker clinical prognostic analysis: an application and experimental study using CT scan images of COVID-19 patients

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-021-01588-6 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Kinshuk Sengupta ◽

Praveen Ranjan Srivastava

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Image Classification ◽

Machine Learning Algorithms ◽

Classification Task ◽

Clinical Image ◽

Prototype Model ◽

Learning Models ◽

Accuracy Measure ◽

Quantum Machine Learning

Abstract Background In medical diagnosis and clinical practice, diagnosing a disease early is crucial for accurate treatment, lessening the stress on the healthcare system. In medical imaging research, image processing techniques tend to be vital in analyzing and resolving diseases with a high degree of accuracy. This paper establishes a new image classification and segmentation method through simulation techniques, conducted over images of COVID-19 patients in India, introducing the use of Quantum Machine Learning (QML) in medical practice. Methods This study establishes a prototype model for classifying COVID-19, comparing it with non-COVID pneumonia signals in Computed tomography (CT) images. The simulation work evaluates the usage of quantum machine learning algorithms, while assessing the efficacy for deep learning models for image classification problems, and thereby establishes performance quality that is required for improved prediction rate when dealing with complex clinical image data exhibiting high biases. Results The study considers a novel algorithmic implementation leveraging quantum neural network (QNN). The proposed model outperformed the conventional deep learning models for specific classification task. The performance was evident because of the efficiency of quantum simulation and faster convergence property solving for an optimization problem for network training particularly for large-scale biased image classification task. The model run-time observed on quantum optimized hardware was 52 min, while on K80 GPU hardware it was 1 h 30 min for similar sample size. The simulation shows that QNN outperforms DNN, CNN, 2D CNN by more than 2.92% in gain in accuracy measure with an average recall of around 97.7%. Conclusion The results suggest that quantum neural networks outperform in COVID-19 traits’ classification task, comparing to deep learning w.r.t model efficacy and training time. However, a further study needs to be conducted to evaluate implementation scenarios by integrating the model within medical devices.

Download Full-text

Reviewing the relationship between machines and radiology: the application of artificial intelligence

Acta Radiologica Open ◽

10.1177/2058460121990296 ◽

2021 ◽

Vol 10 (2) ◽

pp. 205846012199029

Author(s):

Rani Ahmad

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Deep Learning ◽

Health Care Professionals ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Health Science ◽

Computer Algorithms ◽

Learning Models ◽

Specificity And Sensitivity

Background The scope and productivity of artificial intelligence applications in health science and medicine, particularly in medical imaging, are rapidly progressing, with relatively recent developments in big data and deep learning and increasingly powerful computer algorithms. Accordingly, there are a number of opportunities and challenges for the radiological community. Purpose To provide review on the challenges and barriers experienced in diagnostic radiology on the basis of the key clinical applications of machine learning techniques. Material and Methods Studies published in 2010–2019 were selected that report on the efficacy of machine learning models. A single contingency table was selected for each study to report the highest accuracy of radiology professionals and machine learning algorithms, and a meta-analysis of studies was conducted based on contingency tables. Results The specificity for all the deep learning models ranged from 39% to 100%, whereas sensitivity ranged from 85% to 100%. The pooled sensitivity and specificity were 89% and 85% for the deep learning algorithms for detecting abnormalities compared to 75% and 91% for radiology experts, respectively. The pooled specificity and sensitivity for comparison between radiology professionals and deep learning algorithms were 91% and 81% for deep learning models and 85% and 73% for radiology professionals (p < 0.000), respectively. The pooled sensitivity detection was 82% for health-care professionals and 83% for deep learning algorithms (p < 0.005). Conclusion Radiomic information extracted through machine learning programs form images that may not be discernible through visual examination, thus may improve the prognostic and diagnostic value of data sets.

Download Full-text

Text Classification Using Machine Learning and Deep Learning Models

Comparative Study on Telugu text Classification using Machine Learning and Deep Learning models

Analysing Predictive Coding Algorithms for Document Review

Performance Analysis of Machine Learning and Deep Learning Models for Text Classification

Deep Learning--based Text Classification

Data science in economics: comprehensive review of advanced machine learning and deep learning methods

Deep Learning for text in limted data settings

Deep Learning in Disease Diagnosis: Models and Datasets

Machine Learning-Based Malicious X.509 Certificates’ Detection

Quantum algorithm for quicker clinical prognostic analysis: an application and experimental study using CT scan images of COVID-19 patients

Reviewing the relationship between machines and radiology: the application of artificial intelligence

Export Citation Format