Cue Phrase Classification Using Machine Learning

Journal of Artificial Intelligence Research ◽

10.1613/jair.327 ◽

1996 ◽

Vol 5 ◽

pp. 53-94 ◽

Cited By ~ 20

Author(s):

D. J. Litman

Keyword(s):

Machine Learning ◽

Language Processing ◽

Structural Information ◽

Plan Recognition ◽

Discourse Structure ◽

Anaphora Resolution ◽

Classification Models ◽

Feature Representations ◽

Learning Programs ◽

Manual Methods

Cue phrases may be used in a discourse sense to explicitly signal discourse structure, but also in a sentential sense to convey semantic rather than structural information. Correctly classifying cue phrases as discourse or sentential is critical in natural language processing systems that exploit discourse structure, e.g., for performing tasks such as anaphora resolution and plan recognition. This paper explores the use of machine learning for classifying cue phrases as discourse or sentential. Two machine learning programs (Cgrendel and C4.5) are used to induce classification models from sets of pre-classified cue phrases and their features in text and speech. Machine learning is shown to be an effective technique for not only automating the generation of classification models, but also for improving upon previous results. When compared to manually derived classification models already in the literature, the learned models often perform with higher accuracy and contain new linguistic insights into the data. In addition, the ability to automatically construct classification models makes it easier to comparatively analyze the utility of alternative feature representations of the data. Finally, the ease of retraining makes the learning approach more scalable and flexible than manual methods.

Download Full-text

Mol2vec: Unsupervised Machine Learning Approach with Chemical Intuition

10.26434/chemrxiv.5513581.v1 ◽

2017 ◽

Author(s):

Sabrina Jaeger ◽

Simone Fulle ◽

Samo Turk

Keyword(s):

Machine Learning ◽

Language Processing ◽

Supervised Machine Learning ◽

Learning Approach ◽

Learning Approaches ◽

Unsupervised Machine Learning ◽

Feature Representations ◽

Machine Learning Approach ◽

The Individual ◽

Vector Representations

Inspired by natural language processing techniques we here introduce Mol2vec which is an unsupervised machine learning approach to learn vector representations of molecular substructures. Similarly, to the Word2vec models where vectors of closely related words are in close proximity in the vector space, Mol2vec learns vector representations of molecular substructures that are pointing in similar directions for chemically related substructures. Compounds can finally be encoded as vectors by summing up vectors of the individual substructures and, for instance, feed into supervised machine learning approaches to predict compound properties. The underlying substructure vector embeddings are obtained by training an unsupervised machine learning approach on a so-called corpus of compounds that consists of all available chemical matter. The resulting Mol2vec model is pre-trained once, yields dense vector representations and overcomes drawbacks of common compound feature representations such as sparseness and bit collisions. The prediction capabilities are demonstrated on several compound property and bioactivity data sets and compared with results obtained for Morgan fingerprints as reference compound representation. Mol2vec can be easily combined with ProtVec, which employs the same Word2vec concept on protein sequences, resulting in a proteochemometric approach that is alignment independent and can be thus also easily used for proteins with low sequence similarities.

Download Full-text

The Human Intelligence vs. Artificial Intelligence: Issues and Challenges in Computer Assisted Language Learning

International Journal of English Linguistics ◽

10.5539/ijel.v8n5p259 ◽

2018 ◽

Vol 8 (5) ◽

pp. 259

Author(s):

Mohammed Ali

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Human Brain ◽

Language Learning ◽

Language Processing ◽

Human Intelligence ◽

Computer Assisted ◽

Learning Programs ◽

Face Detection And Recognition ◽

Learning Software

In this study, the researcher has advocated the importance of human intelligence in language learning since software or any Learning Management System (LMS) cannot be programmed to understand the human context as well as all the linguistic structures contextually. This study examined the extent to which language learning is perilous to machine learning and its programs such as Artificial Intelligence (AI), Pattern Recognition, and Image Analysis used in much assistive learning techniques such as voice detection, face detection and recognition, personalized assistants, besides language learning programs. The researchers argue that language learning is closely associated with human intelligence, human neural networks and no computers or software can claim to replace or replicate those functions of human brain. This study thus posed a challenge to natural language processing (NLP) techniques that claimed having taught a computer how to understand the way humans learn, to understand text without any clue or calculation, to realize the ambiguity in human languages in terms of the juxtaposition between the context and the meaning, and also to automate the language learning process between computers and humans. The study cites evidence of deficiencies in such machine learning software and gadgets to prove that in spite of all technological advancements there remain areas of human brain and human intelligence where a computer or its software cannot enter. These deficiencies highlight the limitations of AI and super intelligence systems of machines to prove that human intelligence would always remain superior.

Download Full-text

Corpus-Based Methods for Recognizing the Gender of Anthroponyms

Names ◽

10.5195/names.2021.2238 ◽

2021 ◽

Vol 69 (3) ◽

pp. 16-27

Author(s):

Rogelio Nazar ◽

Irene Renau ◽

Nicolas Acosta ◽

Hernan Robledo ◽

Maha Soliman ◽

...

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Information Extraction ◽

Machine Translation ◽

Language Processing ◽

Large Scale ◽

Anaphora Resolution ◽

Machine Learning Methods ◽

Manual Processing ◽

Large Corpus

This paper presents a series of methods for automatically determining the gender of proper names, based on their co-occurrence with words and grammatical features in a large corpus. Although the results obtained were for Spanish given names, the method presented here can be easily replicated and used for names in other languages. Most methods reported in the literature use pre-existing lists of first names that require costly manual processing and tend to become quickly outdated. Instead, we propose using corpora. Doing so offers the possibility of obtaining real and up-to-date name-gender links. To test the effectiveness of our method, we explored various machine-learning methods as well as another method based on simple frequency of co-occurrence. The latter produced the best results: 93% precision and 88% recall on a database of ca. 10,000 mixed names. Our method can be applied to a variety of natural language processing tasks such as information extraction, machine translation, anaphora resolution or large-scale delivery or email correspondence, among others.

Download Full-text

Text Classification Models for the Automatic Detection of Nonmedical Prescription Medication Use from Social Media

10.1101/2020.04.13.20064089 ◽

2020 ◽

Author(s):

Ali Al-Garadi Mohammed ◽

Yuan-Chi Yang ◽

Haitao Cai ◽

Yucheng Ruan ◽

Karen O’Connor ◽

...

Keyword(s):

Machine Learning ◽

Social Media ◽

Language Processing ◽

Text Classification ◽

Prescription Medication ◽

The United States ◽

Language Models ◽

Classification Model ◽

Classification Models ◽

Active Monitoring

ABSTRACTPrescription medication (PM) misuse/abuse has emerged as a national crisis in the United States, and social media has been suggested as a potential resource for performing active monitoring. However, automating a social media-based monitoring system is challenging—requiring advanced natural language processing (NLP) and machine learning methods. In this paper, we describe the development and evaluation of automatic text classification models for detecting self-reports of PM abuse from Twitter. We experimented with state-of-the-art bi-directional transformer-based language models, which utilize tweet-level representations that enable transfer learning (e.g., BERT, RoBERTa, XLNet, AlBERT, and DistilBERT), proposed fusion-based approaches, and compared the developed models with several traditional machine learning, including deep learning, approaches. Using a public dataset, we evaluated the performances of the classifiers on their abilities to classify the non-majority “abuse/misuse” class. Our proposed fusion-based model performs significantly better than the best traditional model (F1-score [95% CI]: 0.67 [0.64-0.69] vs. 0.45 [0.42-0.48]). We illustrate, via experimentation using differing training set sizes, that the transformer-based models are more stable and require less annotated data compared to the other models. The significant improvements achieved by our best-performing classification model over past approaches makes it suitable for automated continuous monitoring of nonmedical PM use from Twitter.

Download Full-text

Image Synthesis Based On Feature Description

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.37812 ◽

2021 ◽

Vol 9 (8) ◽

pp. 2492-2494

Author(s):

Rohan Bolusani

Keyword(s):

Neural Network ◽

Machine Learning ◽

Language Processing ◽

Image Synthesis ◽

Generative Models ◽

Generative Adversarial Networks ◽

Feature Representations ◽

Adversarial Networks ◽

Text Feature ◽

Image Modelling

Abstract: Generating realistic images from text is innovative and interesting, but modern-day machine learning models are still far from this goal. With research and development in the field of natural language processing, neural network architectures have been developed to learn discriminative text feature representations. Meanwhile, in the field of machine learning, generative adversarial networks (GANs) have begun to generate extremely accurate images of especially in categories, such as faces, album covers, and room interiors. In this work, the main goal is to develop a neural network to bridge these advances in text and image modelling, by essentially translating characters to pixels the project will demonstrate the capability of generative models by taking detailed text descriptions and generate plausible images. Keywords: Deep Learning, Computer Vision, NLP, Generative Adversarial Networks

Download Full-text

A Machine Learning Approach to Anaphora Resolution in Arabic

International Review on Computers and Software (IRECOS) ◽

10.15866/irecos.v9i12.4786 ◽

2014 ◽

Vol 9 (12) ◽

pp. 1956

Author(s):

Abdullatif Abolohom ◽

Nazlia Omar

Keyword(s):

Machine Learning ◽

Learning Approach ◽

Anaphora Resolution ◽

Machine Learning Approach

Download Full-text

Deep Learning Based High-Resolution Remote Sensing Image classification

International Journal of Advanced Research in Computer Science and Software Engineering ◽

10.23956/ijarcsse.v7i10.384 ◽

2017 ◽

Vol 7 (10) ◽

pp. 22

Author(s):

Sumit Kaur

Keyword(s):

Machine Learning ◽

Remote Sensing ◽

Deep Learning ◽

Image Classification ◽

Language Processing ◽

Object Perception ◽

Remote Sensing Image ◽

Research Area ◽

Remote Sensing Image Classification ◽

Unsupervised Algorithms

Abstract- Deep learning is an emerging research area in machine learning and pattern recognition field which has been presented with the goal of drawing Machine Learning nearer to one of its unique objectives, Artificial Intelligence. It tries to mimic the human brain, which is capable of processing and learning from the complex input data and solving different kinds of complicated tasks well. Deep learning (DL) basically based on a set of supervised and unsupervised algorithms that attempt to model higher level abstractions in data and make it self-learning for hierarchical representation for classification. In the recent years, it has attracted much attention due to its state-of-the-art performance in diverse areas like object perception, speech recognition, computer vision, collaborative filtering and natural language processing. This paper will present a survey on different deep learning techniques for remote sensing image classification.

Download Full-text

Proceedings of the ACL Workshop on Feature Engineering for Machine Learning in Natural Language Processing - FeatureEng '05

10.3115/1610230 ◽

2005 ◽

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Feature Engineering

Download Full-text

A Machine Learning Application for Raising WASH Awareness in the Times of COVID-19 Pandemic (Preprint)

10.2196/preprints.25320 ◽

2020 ◽

Cited By ~ 1

Author(s):

Rohan Pandey ◽

Vaibhav Gautam ◽

Ridam Pal ◽

Harsh Bandhey ◽

Lovedeep Singh Dhingra ◽

...

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Machine Translation ◽

Language Processing ◽

User Feedback ◽

Who Guidelines ◽

The Times ◽

The Right ◽

Local Languages

BACKGROUND The COVID-19 pandemic has uncovered the potential of digital misinformation in shaping the health of nations. The deluge of unverified information that spreads faster than the epidemic itself is an unprecedented phenomenon that has put millions of lives in danger. Mitigating this ‘Infodemic’ requires strong health messaging systems that are engaging, vernacular, scalable, effective and continuously learn the new patterns of misinformation. OBJECTIVE We created WashKaro, a multi-pronged intervention for mitigating misinformation through conversational AI, machine translation and natural language processing. WashKaro provides the right information matched against WHO guidelines through AI, and delivers it in the right format in local languages. METHODS We theorize (i) an NLP based AI engine that could continuously incorporate user feedback to improve relevance of information, (ii) bite sized audio in the local language to improve penetrance in a country with skewed gender literacy ratios, and (iii) conversational but interactive AI engagement with users towards an increased health awareness in the community. RESULTS A total of 5026 people who downloaded the app during the study window, among those 1545 were active users. Our study shows that 3.4 times more females engaged with the App in Hindi as compared to males, the relevance of AI-filtered news content doubled within 45 days of continuous machine learning, and the prudence of integrated AI chatbot “Satya” increased thus proving the usefulness of an mHealth platform to mitigate health misinformation. CONCLUSIONS We conclude that a multi-pronged machine learning application delivering vernacular bite-sized audios and conversational AI is an effective approach to mitigate health misinformation. CLINICALTRIAL Not Applicable

Download Full-text

Race and Gender

The Oxford Handbook of Ethics of AI ◽

10.1093/oxfordhb/9780190067397.013.16 ◽

2020 ◽

pp. 251-269 ◽

Cited By ~ 2

Author(s):

Timnit Gebru

Keyword(s):

Machine Learning ◽

Language Processing ◽

The United States ◽

Error Rates ◽

Political Factors ◽

Recidivism Rates ◽

Race And Gender ◽

Decision Tools ◽

And Gender ◽

Technical Solutions

This chapter discusses the role of race and gender in artificial intelligence (AI). The rapid permeation of AI into society has not been accompanied by a thorough investigation of the sociopolitical issues that cause certain groups of people to be harmed rather than advantaged by it. For instance, recent studies have shown that commercial automated facial analysis systems have much higher error rates for dark-skinned women, while having minimal errors on light-skinned men. Moreover, a 2016 ProPublica investigation uncovered that machine learning–based tools that assess crime recidivism rates in the United States are biased against African Americans. Other studies show that natural language–processing tools trained on news articles exhibit societal biases. While many technical solutions have been proposed to alleviate bias in machine learning systems, a holistic and multifaceted approach must be taken. This includes standardization bodies determining what types of systems can be used in which scenarios, making sure that automated decision tools are created by people from diverse backgrounds, and understanding the historical and political factors that disadvantage certain groups who are subjected to these tools.

Download Full-text