Framework for Infectious Disease Analysis: A comprehensive and integrative multi-modeling approach to disease prediction and management

The impact of infectious disease on human populations is a function of many factors including environmental conditions, vector dynamics, transmission mechanics, social and cultural behaviors, and public policy. A comprehensive framework for disease management must fully connect the complete disease lifecycle, including emergence from reservoir populations, zoonotic vector transmission, and impact on human societies. The Framework for Infectious Disease Analysis is a software environment and conceptual architecture for data integration, situational awareness, visualization, prediction, and intervention assessment. Framework for Infectious Disease Analysis automatically collects biosurveillance data using natural language processing, integrates structured and unstructured data from multiple sources, applies advanced machine learning, and uses multi-modeling for analyzing disease dynamics and testing interventions in complex, heterogeneous populations. In the illustrative case studies, natural language processing from social media, news feeds, and websites was used for information extraction, biosurveillance, and situation awareness. Classification machine learning algorithms (support vector machines, random forests, and boosting) were used for disease predictions.

Download Full-text

Evaluation of an international medical E-learning course with natural language processing and machine learning

BMC Medical Education ◽

10.1186/s12909-021-02609-8 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Aditya Borakati

Keyword(s):

Machine Learning ◽

Cohort Study ◽

Natural Language Processing ◽

Medical Students ◽

Text Mining ◽

Natural Language ◽

Language Processing ◽

Small Scale ◽

E Learning ◽

The Impact

Abstract Background In the context of the ongoing pandemic, e-learning has become essential to maintain existing medical educational programmes. Evaluation of such courses has thus far been on a small scale at single institutions. Further, systematic appraisal of the large volume of qualitative feedback generated by massive online e-learning courses manually is time consuming. This study aimed to evaluate the impact of an e-learning course targeting medical students collaborating in an international cohort study, with semi-automated analysis of feedback using text mining and machine learning methods. Method This study was based on a multi-centre cohort study exploring gastrointestinal recovery following elective colorectal surgery. Collaborators were invited to complete a series of e-learning modules on key aspects of the study and complete a feedback questionnaire on the modules. Quantitative data were analysed using simple descriptive statistics. Qualitative data were analysed using text mining with most frequent words, sentiment analysis with the AFINN-111 and syuzhet lexicons and topic modelling using the Latent Dirichlet Allocation (LDA). Results One thousand six hundred and eleventh collaborators from 24 countries completed the e-learning course; 1396 (86.7%) were medical students; 1067 (66.2%) entered feedback. 1031 (96.6%) rated the quality of the course a 4/5 or higher (mean 4.56; SD 0.58). The mean sentiment score using the AFINN was + 1.54/5 (5: most positive; SD 1.19) and + 0.287/1 (1: most positive; SD 0.390) using syuzhet. LDA generated topics consolidated into the themes: (1) ease of use, (2) conciseness and (3) interactivity. Conclusions E-learning can have high user satisfaction for training investigators of clinical studies and medical students. Natural language processing may be beneficial in analysis of large scale educational courses.

Download Full-text

Natural Language Processing and Machine Learning Classifier used for Detecting the Author of the Sentence

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.c4098.098319 ◽

2019 ◽

Vol 8 (3) ◽

pp. 936-939 ◽

Cited By ~ 6

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Support Vector ◽

Learning Classifier ◽

Number Of Classes

Detecting the author of the sentence in a collective document can be done by choosing a suitable set of features and implementing using Natural Language Processing in Machine Learning. Training our machine is the basic idea to identify the author name of a specific sentence. This can be done by using 8 different NLP steps like applying stemming algorithm, finding stop-list words, preprocessing the data, and then applying it to a machine learning classifier-Support vector machine (SVM) which classify the dataset into a number of classes specifying the author of the sentence and defines the name of author for each and every sentence with an accuracy of 82%.This paper helps the readers who are interested in knowing the names of the authors who have written some specific words

Download Full-text

Multi - Class Document Classification: Effective and Systematized Method to Categorize Documents

International Journal of Scientific Research in Science Engineering and Technology ◽

10.32628/ijsrset207117 ◽

2020 ◽

pp. 118-123 ◽

Cited By ~ 1

Author(s):

Kaushika Pal ◽

Biraj V. Patel

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

English Language ◽

Nearest Neighbor ◽

Research Work ◽

Support Vector ◽

Indian Languages ◽

K Nearest Neighbor

A large section of World Wide Web is full of Documents, content; Data, Big data, unformatted data, formatted data, unstructured and unorganized data and we need information infrastructure, which is useful and easily accessible as an when required. This research work is combining approach of Natural Language Processing and Machine Learning for content-based classification of documents. Natural Language Processing is used which will divide the problem of understanding entire document at once into smaller chucks and give us only with useful tokens responsible for Feature Extraction, which is machine learning technique to create Feature Set which helps to train classifier to predict label for new document and place it at appropriate location. Machine Learning subset of Artificial Intelligence is enriched with sophisticated algorithms like Support Vector Machine, K – Nearest Neighbor, Naïve Bayes, which works well with many Indian Languages and Foreign Language content’s for classification. This Model is successful in classifying documents with more than 70% of accuracy for major Indian Languages and more than 80% accuracy for English Language.

Download Full-text

Automatic classification of written descriptions by healthy adults: An overview of the application of natural language processing and machine learning techniques to clinical discourse analysis

Dementia & Neuropsychologia ◽

10.1590/s1980-57642014dn83000006 ◽

2014 ◽

Vol 8 (3) ◽

pp. 227-235 ◽

Cited By ~ 1

Author(s):

Cíntia Matsuda Toledo ◽

Andre Cunha ◽

Carolina Scarton ◽

Sandra Aluísio

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Binary Classification ◽

Machine Learning Techniques ◽

Support Vector ◽

Brain Injured ◽

Written Descriptions

Discourse production is an important aspect in the evaluation of brain-injured individuals. We believe that studies comparing the performance of brain-injured subjects with that of healthy controls must use groups with compatible education. A pioneering application of machine learning methods using Brazilian Portuguese for clinical purposes is described, highlighting education as an important variable in the Brazilian scenario.OBJECTIVE: The aims were to describe how to: (i) develop machine learning classifiers using features generated by natural language processing tools to distinguish descriptions produced by healthy individuals into classes based on their years of education; and (ii) automatically identify the features that best distinguish the groups.METHODS: The approach proposed here extracts linguistic features automatically from the written descriptions with the aid of two Natural Language Processing tools: Coh-Metrix-Port and AIC. It also includes nine task-specific features (three new ones, two extracted manually, besides description time; type of scene described - simple or complex; presentation order - which type of picture was described first; and age). In this study, the descriptions by 144 of the subjects studied in Toledo18 were used, which included 200 healthy Brazilians of both genders.RESULTS AND CONCLUSION:A Support Vector Machine (SVM) with a radial basis function (RBF) kernel is the most recommended approach for the binary classification of our data, classifying three of the four initial classes. CfsSubsetEval (CFS) is a strong candidate to replace manual feature selection methods.

Download Full-text

Automated identification of patients with syncope in the textual health record – a feasibility study using machine learning and natural language processing

European Heart Journal ◽

10.1093/ehjci/ehaa946.0723 ◽

2020 ◽

Vol 41 (Supplement_2) ◽

Author(s):

P Brekke ◽

I Pilan ◽

H Husby ◽

T Gundersen ◽

F.A Dahl ◽

...

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

University Hospital ◽

Support Vector ◽

Funding Source ◽

Learning Approaches ◽

Patient Identification ◽

Linear Classifiers

Abstract Background Syncope is a commonly occurring presenting symptom in emergency departments. While the majority of episodes are benign, syncope is associated with worse prognosis in hypertrophic cardiomyopathy, arrhythmia syndromes, heart failure, aortic stenosis and coronary heart disease. Flagging documented syncope in these patients may be crucial to management decisions. Previous studies show that the International Classification of Diseases (ICD) codes for syncope have a sensitivity of around 0.63, leading to a large number of false negatives if patient identification is based on administrative codes. Thus, in order to provide data-driven, clinical decision support, and to improve identification of patient cohorts for research, better tools are needed. A recent study manually annotated more than 30.000 patient records in order to develop a natural language processing (NLP) tool, which achieved a sensitivity of 92.2%. Since access to medical records and annotation resources is limited, we aimed to investigate whether an unsupervised machine learning and NLP approach with no manual input could achieve similar performance. Methods Our data was admission notes for adult patients admitted between 2005 and 2016 at a large university hospital in Norway. 500 records from patients with, and 500 without a “R55 Syncope” ICD code at discharge were drawn at random. R55 code was considered “ground truth”. Headers containing information about tentative diagnoses were removed from the notes, when present, using regular expressions. The dataset was divided into 70%/15%/15% subsets for training, validation and testing. Baseline identification was calculated by a simple lexical matching using the term “synkope”. We evaluated two linear classifiers, a Support Vector Machine (SVM) and a Linear Regression (LR) model, with a term frequency–inverse document frequency vectorizer, using a bag-of-words approach. In addition, we evaluated a simple convolutional neural network (CNN) consisting of a convolutional layer concatenating filter sizes of 3–5, max pooling and a dropout of 0.5 with randomly initialised word embeddings of 300 dimensions. Results Even a baseline regular expression model achieved a sensitivity of 78% and a specificity of 91% when classifying admission notes as belonging to the syncope class or not. The SVM model and the LR model achieved a sensitivity of 91% and 89%, respectively, and a specificity of 89% and 91%. The CNN model had a sensitivity of 95% and a specificity of 84%. Conclusion With a limited non-English dataset, common NLP and machine learning approaches were able to achieve approximately 90–95% sensitivity for the identification of admission notes related to syncope. Linear classifiers outperformed a CNN model in terms of specificity, as expected in this small dataset. The study demonstrates the feasibility of training document classifiers based on diagnostic codes in order to detect important clinical events. ROC curves for SVM and LR models Funding Acknowledgement Type of funding source: Public grant(s) – National budget only. Main funding source(s): The Research Council of Norway

Download Full-text

Evaluation of an International Medical E-Learning Course with Natural Language Processing and Machine Learning

10.21203/rs.3.rs-116209/v1 ◽

2020 ◽

Author(s):

Aditya Borakati

Keyword(s):

Machine Learning ◽

Cohort Study ◽

Natural Language Processing ◽

Medical Students ◽

Text Mining ◽

Natural Language ◽

Language Processing ◽

Small Scale ◽

E Learning ◽

The Impact

Abstract Background: In the context of the ongoing pandemic, e-learning has become essential to maintain existing medical educational programmes. Evaluation of such courses has thus far been on a small scale at single institutions. Further, systematic appraisal of the large volume of qualitative feedback generated by massive online e-learning courses manually is time consuming. This study aimed to evaluate the impact of an e-learning course targeting medical students collaborating in an international cohort study, with semi-automated analysis of feedback using text mining and machine learning methods.Method: This study was based on a multi-centre cohort study exploring gastrointestinal recovery following elective colorectal surgery. Collaborators were invited to complete a series of e-learning modules on key aspects of the study and complete a feedback questionnaire on the modules. Quantitative data were analysed using simple descriptive statistics. Qualitative data were analysed using text mining with most frequent words, sentiment analysis with the AFINN-111 and syuzhet lexicons and topic modelling using the Latent Dirichlet Allocation (LDA). Results: 1,611 collaborators from 24 countries completed the e-learning course; 1,396 (86.7%) were medical students; 1,067 (66.2%) entered feedback. 1,031 (96.6%) rated the quality of the course a 4/5 or higher (mean 4.56; SD 0.58). The mean sentiment score using the AFINN was +1.54/5 (5: most positive; SD 1.19) and +0.287/1 (1: most positive; SD 0.390) using syuzhet. LDA generated topics consolidated into the themes: (1) ease of use, (2) conciseness and (3) interactivity.Conclusions: E-learning can have high user satisfaction for training investigators of clinical studies and medical students. Natural language processing may be beneficial in analysis of large scale educational courses.

Download Full-text

Classification of Current Procedural Terminology Codes from Electronic Health Record Data Using Machine Learning

Anesthesiology ◽

10.1097/aln.0000000000003150 ◽

2020 ◽

Vol 132 (4) ◽

pp. 738-749 ◽

Cited By ~ 1

Author(s):

Michael L. Burns ◽

Michael R. Mathis ◽

John Vandervest ◽

Xinyu Tan ◽

Bo Lu ◽

...

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Quality Improvement ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Support Vector ◽

Current Procedural Terminology ◽

Test Dataset ◽

Quality Improvement Research

Abstract Background Accurate anesthesiology procedure code data are essential to quality improvement, research, and reimbursement tasks within anesthesiology practices. Advanced data science techniques, including machine learning and natural language processing, offer opportunities to develop classification tools for Current Procedural Terminology codes across anesthesia procedures. Methods Models were created using a Train/Test dataset including 1,164,343 procedures from 16 academic and private hospitals. Five supervised machine learning models were created to classify anesthesiology Current Procedural Terminology codes, with accuracy defined as first choice classification matching the institutional-assigned code existing in the perioperative database. The two best performing models were further refined and tested on a Holdout dataset from a single institution distinct from Train/Test. A tunable confidence parameter was created to identify cases for which models were highly accurate, with the goal of at least 95% accuracy, above the reported 2018 Centers for Medicare and Medicaid Services (Baltimore, Maryland) fee-for-service accuracy. Actual submitted claim data from billing specialists were used as a reference standard. Results Support vector machine and neural network label-embedding attentive models were the best performing models, respectively, demonstrating overall accuracies of 87.9% and 84.2% (single best code), and 96.8% and 94.0% (within top three). Classification accuracy was 96.4% in 47.0% of cases using support vector machine and 94.4% in 62.2% of cases using label-embedding attentive model within the Train/Test dataset. In the Holdout dataset, respective classification accuracies were 93.1% in 58.0% of cases and 95.0% among 62.0%. The most important feature in model training was procedure text. Conclusions Through application of machine learning and natural language processing techniques, highly accurate real-time models were created for anesthesiology Current Procedural Terminology code classification. The increased processing speed and a priori targeted accuracy of this classification approach may provide performance optimization and cost reduction for quality improvement, research, and reimbursement tasks reliant on anesthesiology procedure codes. Editor’s Perspective What We Already Know about This Topic What This Article Tells Us That Is New

Download Full-text

PhageAI - Bacteriophage Life Cycle Recognition with Machine Learning and Natural Language Processing

10.1101/2020.07.11.198606 ◽

2020 ◽

Cited By ~ 2

Author(s):

Piotr Tynecki ◽

Arkadiusz Guziński ◽

Joanna Kazimierczak ◽

Michał Jadczuk ◽

Jarosław Dastych ◽

...

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Life Cycle ◽

Natural Language ◽

Language Processing ◽

3D Visualization ◽

Life Cycles ◽

Nucleotide Sequences ◽

Support Vector ◽

Model Classification

AbstractBackgroundAs antibiotic resistance is becoming a major problem nowadays in a treatment of infections, bacteriophages (also known as phages) seem to be an alternative. However, to be used in a therapy, their life cycle should be strictly lytic. With the growing popularity of Next Generation Sequencing (NGS) technology, it is possible to gain such information from the genome sequence. A number of tools are available which help to define phage life cycle. However, there is still no unanimous way to deal with this problem, especially in the absence of well-defined open reading frames. To overcome this limitation, a new tool is definitely needed.ResultsWe developed a novel tool, called PhageAI, that allows to access more than 10 000 publicly available bacteriophages and differentiate between their major types of life cycles: lytic and lysogenic. The tool included life cycle classifier which achieved 98.90% accuracy on a validation set and 97.18% average accuracy on a test set. We adopted nucleotide sequences embedding based on the Word2Vec with Ship-gram model and linear Support Vector Machine with 10-fold cross-validation for supervised classification. PhageAI is free of charge and it is available at https://phage.ai/. PhageAI is a REST web service and available as Python package.ConclusionsMachine learning and Natural Language Processing allows to extract information from bacteriophages nucleotide sequences for lifecycle prediction tasks. The PhageAI tool classifies phages into either virulent or temperate with a higher accuracy than any existing methods and shares interactive 3D visualization to help interpreting model classification results.

Download Full-text

Perancangan Chatbot Menggunakan Dialogflow Natural Language Processing (Studi Kasus: Sistem Pemesanan pada Coffee Shop)

JURNAL MEDIA INFORMATIKA BUDIDARMA ◽

10.30865/mib.v4i1.1505 ◽

2020 ◽

Vol 4 (1) ◽

pp. 208

Author(s):

Albert Yakobus Chandra ◽

Didik Kurniawan ◽

Rahmat Musa

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Information Technology ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Information Services ◽

Human Language ◽

Coffee Shop ◽

The Impact

Some cases that are often experienced at a particular institution such as Micro Enterprise are often a staff / employee in providing information services and transactions that are carried out manually to customers related to these business activities. This cycle always repeats from one customer to another. The impact if there are conditions where the queue of customer that is quite crowded than the workload of staff/employees will be higher and the risk of error in transactions will be high too. The development of information technology in artificial intelligence on 4.0 industry era is moving forward. One of them is Machine Learning - Natural Language Processing (NLP) which is one of the sciences that focuses on how computers can understand the human language and response to it. Therefor in this research a chatbot system will be builtin providing information and conducting transaction with the customers. This chatbot will be develop using the Dialogflow tools provided by Google. This Chatbot that was build expected to be an alternative that can be implemented in various bussines to provide better service for customers

Download Full-text

The Applications of Support Vector Machine in Natural Language Processing

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.427-429.2572 ◽

2013 ◽

Vol 427-429 ◽

pp. 2572-2575

Author(s):

Xiao Hua Li ◽

Shu Xian Liu

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Basic Knowledge ◽

Support Vector

This article provides a brief introduction to Natural Language Processing and basic knowledge of Machine Learning and Support Vector Machine at first, and then, gives a more detailed introduction about how to use SVM models in several major directions about NLP, and at the end, a brief summary about the application of SVM in Natural Language Processing is given.

Download Full-text