Anders Søgaard: Semi-Supervised Learning and Domain Adaptation in Natural Language Processing

This work aims at defining and evaluating different techniques to automatically build temporal news sequences. The approach proposed is composed by three steps: (i) near duplicate documents detention; (ii) keywords extraction; (iii) news sequences creation. This approach is based on: Natural Language Processing, Information Extraction, Name Entity Recognition and supervised learning algorithms. The proposed methodology got a precision of 93.1% for news chains sequences creation.

Download Full-text

Classifying Fake News Articles Using Natural Language Processing to Identify In-Article Attribution as a Supervised Learning Estimator

2019 IEEE 13th International Conference on Semantic Computing (ICSC) ◽

10.1109/icosc.2019.8665593 ◽

2019 ◽

Cited By ~ 5

Author(s):

Terry Traylor ◽

Jeremy Straub ◽

Gurmeet ◽

Nicholas Snell

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Supervised Learning ◽

Language Processing ◽

Fake News

Download Full-text

Domain Adaptation of General Natural Language Processing Tools for a Patent Claim Visualization System

Lecture Notes in Computer Science - Multidisciplinary Information Retrieval ◽

10.1007/978-3-642-41057-4_8 ◽

2013 ◽

pp. 70-82 ◽

Cited By ~ 1

Author(s):

Linda Andersson ◽

Mihai Lupu ◽

Allan Hanbury

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Domain Adaptation ◽

Patent Claim ◽

Visualization System

Download Full-text

Extractive Text Summarization Using Supervised Learning and Natural Language Processing

2021 International Conference on Intelligent Technologies (CONIT) ◽

10.1109/conit51480.2021.9498322 ◽

2021 ◽

Author(s):

Sarita Mandal ◽

Priya Achary ◽

Shubhada Phalke ◽

KVK Poorvaja ◽

Madhuri Kulkarni

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Supervised Learning ◽

Language Processing ◽

Text Summarization

Download Full-text

Proceedings of the NAACL HLT 2009 Workshop on Semi-Supervised Learning for Natural Language Processing - SemiSupLearn '09

10.3115/1621829 ◽

2009 ◽

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Supervised Learning ◽

Language Processing

Download Full-text

Порівняльний аналіз методів для вирішення задачі сентимент аналізу тексту.

КОМП’ЮТЕРНО-ІНТЕГРОВАНІ ТЕХНОЛОГІЇ: ОСВІТА, НАУКА, ВИРОБНИЦТВО ◽

10.36910/6775-2524-0560-2020-40-21 ◽

2020 ◽

pp. 140-145

Author(s):

С. Мироненко ◽

Є. Онищенко

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Supervised Learning ◽

Language Processing ◽

3D Cnn

В даній статті розглядається підхід навчання з вчителем (supervised learning) для вирішення проблеми, пов’язаної з Natural Language Processing (NLP), а саме сентимент-аналіз текстових даних. В ході роботи було реалізовано 4 різних класифікатори на одній й тій самій виборці даних та порівняно їх ефективність за часом навчання, тестування та точності класифікації. В результаті роботи було визначено, що найкращий метод серед реалізованих – 3D CNN модель, яка використовує BERT токенізатор для попередньої обробки тексту. Саме завдяки використанню BERT для препроцессінгу тексту цей метод показав кращі результати.

Download Full-text

Weakly Supervised Learning for Categorization of Medical Inquiries for Customer Service Effectiveness

Frontiers in Research Metrics and Analytics ◽

10.3389/frma.2021.683400 ◽

2021 ◽

Vol 6 ◽

Author(s):

Shikha Singhal ◽

Bharat Hegde ◽

Prathamesh Karmalkar ◽

Justna Muhith ◽

Harsha Gurulingappa

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Supervised Learning ◽

Language Processing ◽

Medical Information ◽

Service Providers ◽

Training Data ◽

Weakly Supervised Learning ◽

Weakly Supervised

With the growing unstructured data in healthcare and pharmaceutical, there has been a drastic adoption of natural language processing for generating actionable insights from text data sources. One of the key areas of our exploration is the Medical Information function within our organization. We receive a significant amount of medical information inquires in the form of unstructured text. An enterprise-level solution must deal with medical information interactions via multiple communication channels which are always nuanced with a variety of keywords and emotions that are unique to the pharmaceutical industry. There is a strong need for an effective solution to leverage the contextual knowledge of the medical information business along with digital tenants of natural language processing (NLP) and machine learning to build an automated and scalable process that generates real-time insights on conversation categories. The traditional supervised learning methods rely on a huge set of manually labeled training data and this dataset is difficult to attain due to high labeling costs. Thus, the solution is incomplete without its ability to self-learn and improve. This necessitates techniques to automatically build relevant training data using a weakly supervised approach from textual inquiries across consumers, healthcare professionals, sales, and service providers. The solution has two fundamental layers of NLP and machine learning. The first layer leverages heuristics and knowledgebase to identify the potential categories and build an annotated training data. The second layer, based on machine learning and deep learning, utilizes the training data generated using the heuristic approach for identifying categories and sub-categories associated with verbatim. Here, we present a novel approach harnessing the power of weakly supervised learning combined with multi-class classification for improved categorization of medical information inquiries.

Download Full-text