Classification and generation of grammatical errors.

The grammatical structure of natural language shapes and defines nearly every mode of communication, especially in the digital and written form; the misuse of grammar is a common and natural nuisance, and a strategy for automatically detecting mistakes in grammatical syntax presents a challenge worth solving. This thesis research seeks to address the challenge, and in doing so, defines and implements a unique approach that combines machine-learning and statistical natural language processing techniques. Several important methods are established by this research: (1) the automated and systematic generation of grammatical errors and parallel error corpora; (2) the definition and extraction of over 150 features of a sentence; and (3) the application of various machine-learning classification algorithms on extracted feature data, in order to classify and predict the grammaticality of a sentence.

Sentiment Analysis on Twitter Data of World Cup Soccer Tournament Using Machine Learning

IoT ◽

10.3390/iot1020014 ◽

2020 ◽

Vol 1 (2) ◽

pp. 218-239 ◽

Cited By ~ 2

Author(s):

Ravikumar Patel ◽

Kalpdrum Passi

Keyword(s):

Machine Learning ◽

Random Forest ◽

Natural Language ◽

Language Processing ◽

Machine Learning Algorithms ◽

World Cup ◽

Part Of Speech ◽

Twitter Data ◽

In the derived approach, an analysis is performed on Twitter data for World Cup soccer 2014 held in Brazil to detect the sentiment of the people throughout the world using machine learning techniques. By filtering and analyzing the data using natural language processing techniques, sentiment polarity was calculated based on the emotion words detected in the user tweets. The dataset is normalized to be used by machine learning algorithms and prepared using natural language processing techniques like word tokenization, stemming and lemmatization, part-of-speech (POS) tagger, name entity recognition (NER), and parser to extract emotions for the textual data from each tweet. This approach is implemented using Python programming language and Natural Language Toolkit (NLTK). A derived algorithm extracts emotional words using WordNet with its POS (part-of-speech) for the word in a sentence that has a meaning in the current context, and is assigned sentiment polarity using the SentiWordNet dictionary or using a lexicon-based method. The resultant polarity assigned is further analyzed using naïve Bayes, support vector machine (SVM), K-nearest neighbor (KNN), and random forest machine learning algorithms and visualized on the Weka platform. Naïve Bayes gives the best accuracy of 88.17% whereas random forest gives the best area under the receiver operating characteristics curve (AUC) of 0.97.

Appearance of New Terms in Accounting Language: A Preliminary Examination of Accounting Pronouncements and Financial Statements

Journal of Emerging Technologies in Accounting ◽

10.2308/jeta.2008.5.1.17 ◽

2008 ◽

Vol 5 (1) ◽

pp. 17-36 ◽

Cited By ~ 4

Author(s):

Margaret R. Garnsey ◽

Ingrid E. Fisher

Keyword(s):

Information Retrieval ◽

Natural Language ◽

Language Processing ◽

Preliminary Analysis ◽

Financial Statements ◽

Preliminary Examination ◽

Statistical Natural Language Processing ◽

Processing Techniques ◽

Initial Results

ABSTRACT: Accounting language evolves as the transactions and organizations it provides guidance for change. We provide a preliminary analysis of terms used in official accounting pronouncements and annual corporate financial statements. Initial results show statistical natural language-processing techniques provide a means of identifying new terms as they enter the lexicon. These techniques should be valuable in deriving a complete accounting lexicon as well as in constructing and maintaining an accounting thesaurus to support information retrieval.

Information Search Mechanisms for Government Entities using Machine Learning and Natural Language Processing Techniques

International Journal of Computer Applications ◽

10.5120/ijca2020920150 ◽

2020 ◽

Vol 176 (21) ◽

pp. 1-7

Author(s):

Ricardo Ponciano ◽

João Santos ◽

João Isento

Keyword(s):

Machine Learning ◽

Natural Language ◽

Language Processing ◽

Information Search ◽

Computerized Answer Grading

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.35044 ◽

2021 ◽

Vol 9 (VI) ◽

pp. 618-619

Author(s):

Anurag Langan

Keyword(s):

Machine Learning ◽

Natural Language ◽

Language Processing ◽

Computer Technology ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Grade Student ◽

Grading student answers is a tedious and time-consuming task. A study had found that almost on average around 25% of a teacher's time is spent in scoring the answer sheets of students. This time could be utilized in much better ways if computer technology could be used to score answers. This system will aim to grade student answers using the various Natural Language processing techniques and Machine Learning algorithms available today.

Extracting Data at Scale: Machine learning at the Natural History Museum

Biodiversity Information Science and Standards ◽

10.3897/biss.5.74031 ◽

2021 ◽

Vol 5 ◽

Author(s):

Ben Scott ◽

Laurence Livermore

Keyword(s):

Machine Learning ◽

Natural History ◽

Natural Language ◽

Functional Traits ◽

Language Processing ◽

Scientific Discovery ◽

Natural History Museum ◽

Grammatical Structure ◽

History Museum

The Natural History Museum holds over 80 million specimens and 300 million pages of scientific text. This information is a vital research tool to help solve the most important challenge humans face over the coming years – mapping a sustainable future for ourselves and the ecosystems on which we depend. Digitising these collections and providing the data in a structured, computable form is a mammoth challenge. As of 2020, less than 15% of available specimen information currently residing on specimen labels or physical registers is digitised and publicly available (Walton et al. 2020). Machine learning applications can deliver a step-change in our activities’ scope, scale, and speed (Borsch et al. 2020). As part of SYNTHESYS+, the Natural History Museum is leading on the development of a cloud-based workflow platform for natural science specimens, the Specimen Data Refinery (SDR) (Smith et al. 2019). The SDR will provide a series of Machine Learning (ML) models, ranging from semantic segmentation to identify regions of interest on labels, to natural language processing to extract locality and taxonomic text entities from the labels, and image analysis to identify specimen traits and collection quality metrics. Each ML task is atomic, with users of the SDR selecting which model would best extract data from their digitised specimen images, allowing the workflows to be used in different institutions worldwide. It also solves one of the key problems in developing ML-based applications: the rapidity at which models become obsolete. New ML models can be introduced into the workflow, with incremental changes to improve processing, without interruption or refactoring of the pipeline. Alongside specimens, digitised images of pages of scientific literature provide another vital source of data. Functional traits mediate the interactions between plant species and their environment and play roles in determining species’ range size and threatened status. Such information is contained within the taxonomic descriptions of species and a natural language processing library has been developed to locate and extract plant functional traits from these texts (Hoehndorf et al. 2016). The ML models allow complex interrelationships between taxa and trait entities to be inferred based on the grammatical structure of sentences, improving the accuracy and extent of data point extraction. These two projects, like many other applications of ML in natural history collections, are focused on the extraction of visible information, for example, a piece of text or a measurable trait. Given the image of the specimen or page, a person would be able to extract the self-same information. However, ML excels in pattern matching and inferring unknown characters from an entire corpus. At the museum, we have started exploring this space, with our voyagerAI project for identifying specimens collected on historical expeditions of scientific discovery (e.g., the voyages of the Beagle and Challenger). This process fills in the gaps in specimen provenance and identifies 'lost' specimens collected by some of the most famous names in biodiversity history. Developing new applications of ML to uncover scientific meaning and tell the narratives of our collections, will be at the forefront of our scientific innovation in the coming years. This presentation will give an overview of these projects, and our future plans for using ML to extract data at scale within the Natural History Museum.

A New Method to Identify Short-Text Authors Using Combinations of Machine Learning and Natural Language Processing Techniques

Procedia Computer Science ◽

10.1016/j.procs.2019.09.197 ◽

2019 ◽

Vol 159 ◽

pp. 428-436

Author(s):

Biveeken Vijayakumar ◽

Muhammad Marwan Muhammad Fuad

Keyword(s):

Machine Learning ◽

Natural Language ◽

Language Processing ◽

New Method ◽

Short Text ◽

Artificial Intelligence and Investing

Encyclopedia of Information Science and Technology, Second Edition ◽

10.4018/978-1-60566-026-4.ch041 ◽

2011 ◽

pp. 237-240 ◽

Cited By ~ 1

Author(s):

Roy Rada

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Risk Management ◽

Natural Language ◽

Language Processing ◽

Asset Valuation ◽

Stock Valuation ◽

Knowledge Based ◽

The techniques of artificial intelligence include knowledgebased, machine learning, and natural language processing techniques. The discipline of investing requires data identification, asset valuation, and risk management. Artificial intelligence techniques apply to many aspects of financial investing, and published work has shown an emphasis on the application of knowledge-based techniques for credit risk assessment and machine learning techniques for stock valuation. However, in the future, knowledge-based, machine learning, and natural language processing techniques will be integrated into systems that simultaneously address data identification, asset valuation, and risk management.

Intelligent compilation of patent summaries using machine learning and natural language processing techniques

Advanced Engineering Informatics ◽

10.1016/j.aei.2019.101027 ◽

2020 ◽

Vol 43 ◽

pp. 101027 ◽

Cited By ~ 2

Author(s):

Amy J.C. Trappey ◽

Charles V. Trappey ◽

Jheng-Long Wu ◽

Jack W.C. Wang

Keyword(s):

Machine Learning ◽

Natural Language ◽

Language Processing ◽

Answer Script Evaluation using Machine Learning

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.35070 ◽

2021 ◽

Vol 9 (VI) ◽

pp. 849-852

Author(s):

Dr. K. Suresh

Keyword(s):

Machine Learning ◽

Natural Language ◽

Computational Methods ◽

Language Processing ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Text Extraction ◽

The current way of checking answer scripts is hectic for the college. They need to manually check the answers and allocate the marks to the students. Our proposed system uses Machine Learning and Natural Language Processing techniques to beat this. Machine learning algorithms use computational methods to find out directly from data without hopping on predetermined rules. NLP algorithms identify specific entities within the text, explore for key elements during a document, run a contextual search for synonyms and detect misspelled words or similar entries, and more. Our algorithm performs similarity checking and also the number of words associated with the question exactly matched between two documents. It also checks whether the grammar is correctly used or not within the student's answer. Our proposed system performs text extraction and evaluation of marks by applying Machine Learning and Natural Language Processing techniques.