scholarly journals Automated Brain Tumor Prediction System using Natural Language Processing (NLP)

Author(s):  
Gourav Sharma

In this paper, we proposed an Automated Brain Tumor Prediction System which predicts Brain Tumor through symptoms in several diseases using Natural Language Processing (NLP). Term Frequency Inverse Document Frequency (TF-IDF) is used for calculating term weighting of terms on different disease’s symptoms. Cosine Similarity Measure and Euclidean Distance are used for calculating angular and linear distance respectively between diseases and symptoms for getting ranking of the Brain Tumor in the ranked diseases. A novel mathematical strategy is used here for predicting chance of Brain Tumor through symptoms in several diseases. According to the proposed novel mathematical strategy, the chance of the Brain Tumor is proportional to the obtained similarity value of the Brain Tumor when symptoms are queried and inversely proportional to the rank of the Brain Tumor in several diseases and the maximum similarity value of the Brain Tumor, where all symptoms of Brain Tumor are present.

2021 ◽  
Author(s):  
Raphael Souza de Oliveira ◽  
Erick Giovani Sperandio Nascimento

The Brazilian legal system postulates the expeditious resolution of judicial proceedings. However, legal courts are working under budgetary constraints and with reduced staff. As a way to face these restrictions, artificial intelligence (AI) has been tackling many complex problems in natural language processing (NLP). This work aims to detect the degree of similarity between judicial documents that can be achieved in the inference group using unsupervised learning, by applying three NLP techniques, namely term frequency-inverse document frequency (TF-IDF), Word2Vec CBoW, and Word2Vec Skip-gram, the last two being specialized with a Brazilian language corpus. We developed a template for grouping lawsuits, which is calculated based on the cosine distance between the elements of the group to its centroid. The Ordinary Appeal was chosen as a reference file since it triggers legal proceedings to follow to the higher court and because of the existence of a relevant contingent of lawsuits awaiting judgment. After the data-processing steps, documents had their content transformed into a vector representation, using the three NLP techniques. We notice that specialized word-embedding models—like Word2Vec—present better performance, making it possible to advance in the current state of the art in the area of NLP applied to the legal sector.


2019 ◽  
Vol 7 (1) ◽  
pp. 1831-1840
Author(s):  
Bern Jonathan ◽  
Jay Idoan Sihotang ◽  
Stanley Martin

Introduction: Natural Language Processing is one part of Artificial Intelligence and Machine Learning to make an understanding of the interactions between computers and human (natural) languages. Sentiment analysis is one part of Natural Language Processing, that often used to analyze words based on the patterns of people in writing to find positive, negative, or neutral sentiments. Sentiment analysis is useful for knowing how users like something or not. Zomato is an application for rating restaurants. The rating has a review of the restaurant which can be used for sentiment analysis. Based on this, writers want to discuss the sentiment of the review to be predicted. Method: The method used for preprocessing the review is to make all words lowercase, tokenization, remove numbers and punctuation, stop words, and lemmatization. Then after that, we create word to vector with the term frequency-inverse document frequency (TF-IDF). The data that we process are 150,000 reviews. After that make positive with reviews that have a rating of 3 and above, negative with reviews that have a rating of 3 and below, and neutral who have a rating of 3. The author uses Split Test, 80% Data Training and 20% Data Testing. The metrics used to determine random forest classifiers are precision, recall, and accuracy. The accuracy of this research is 92%. Result: The precision of positive, negative, and neutral sentiment is 92%, 93%, 96%. The recall of positive, negative, and neutral sentiment are 99%, 89%, 73%. Average precision and recall are 93% and 87%. The 10 words that affect the results are: “bad”, “good”, “average”, “best”, “place”, “love”, “order”, “food”, “try”, and “nice”.


2021 ◽  
pp. 1-20
Author(s):  
Aakanksha Singhal ◽  
Sharma D.K.

Artificial Intelligence and Machine Learning's component to recognise communication between machines and individual (real) languages is a natural language processing. Natural Language Processing is the identification of emotion mostly used to interpret terms to provide strong, slightly bad emotions due to people's reading patterns. Shannon's entropy helps me know whether or not people more like Zomato is a ranking program for restaurants. The assessment involves a restaurant review that can be used for entropy assessment. On this basis, the authors want to respond to the expected view of the analysis. The method used to pre-process the research is to minimise all terms, monitor access, remove quantities, sentence structure, stop words and compile. The latent semantic document frequency (TF-IDF) is then constructed from word to vector. The data we are gathering is 1,50,000 reviews. Great responses are rated 3 and above, poor comments are rated 3 and below, glowing reviews are rated 3 and above. The author uses split Evaluation, 80% full and 20% Data Screening. Accuracy, recall and precision is the criteria used to evaluate random forest classifiers. The reliability of this analysis is 92 percent. 92 %, 93 %, 96 % is the consistency of each selection's thoughts and feelings. 99%, 89%, 73% are a reminder of positive, pessimistic and constructive views. 93 % and 87 % are the average accuracy and recall. "Poor", "great", "fair", "better", "location", "care", "request", "food", "seek" and "pleasant" are the 10 terms that influence the results.


2019 ◽  
Author(s):  
Matthew J. Lavin

This lesson focuses on a foundational natural language processing and information retrieval method called Term Frequency - Inverse Document Frequency (tf-idf). This lesson explores the foundations of tf-idf, and will also introduce you to some of the questions and concepts of computationally oriented text analysis.


2020 ◽  
pp. 3-17
Author(s):  
Peter Nabende

Natural Language Processing for under-resourced languages is now a mainstream research area. However, there are limited studies on Natural Language Processing applications for many indigenous East African languages. As a contribution to covering the current gap of knowledge, this paper focuses on evaluating the application of well-established machine translation methods for one heavily under-resourced indigenous East African language called Lumasaaba. Specifically, we review the most common machine translation methods in the context of Lumasaaba including both rule-based and data-driven methods. Then we apply a state of the art data-driven machine translation method to learn models for automating translation between Lumasaaba and English using a very limited data set of parallel sentences. Automatic evaluation results show that a transformer-based Neural Machine Translation model architecture leads to consistently better BLEU scores than the recurrent neural network-based models. Moreover, the automatically generated translations can be comprehended to a reasonable extent and are usually associated with the source language input.


Diabetes ◽  
2019 ◽  
Vol 68 (Supplement 1) ◽  
pp. 1243-P
Author(s):  
JIANMIN WU ◽  
FRITHA J. MORRISON ◽  
ZHENXIANG ZHAO ◽  
XUANYAO HE ◽  
MARIA SHUBINA ◽  
...  

Author(s):  
Pamela Rogalski ◽  
Eric Mikulin ◽  
Deborah Tihanyi

In 2018, we overheard many CEEA-AGEC members stating that they have "found their people"; this led us to wonder what makes this evolving community unique. Using cultural historical activity theory to view the proceedings of CEEA-ACEG 2004-2018 in comparison with the geographically and intellectually adjacent ASEE, we used both machine-driven (Natural Language Processing, NLP) and human-driven (literature review of the proceedings) methods. Here, we hoped to build on surveys—most recently by Nelson and Brennan (2018)—to understand, beyond what members say about themselves, what makes the CEEA-AGEC community distinct, where it has come from, and where it is going. Engaging in the two methods of data collection quickly diverted our focus from an analysis of the data themselves to the characteristics of the data in terms of cultural historical activity theory. Our preliminary findings point to some unique characteristics of machine- and human-driven results, with the former, as might be expected, focusing on the micro-level (words and language patterns) and the latter on the macro-level (ideas and concepts). NLP generated data within the realms of "community" and "division of labour" while the review of proceedings centred on "subject" and "object"; both found "instruments," although NLP with greater granularity. With this new understanding of the relative strengths of each method, we have a revised framework for addressing our original question.  


Sign in / Sign up

Export Citation Format

Share Document