scholarly journals A Text Mining using Web Scraping for Meaningful Insights

2021 ◽  
Vol 2089 (1) ◽  
pp. 012048
Author(s):  
Kishor Kumar Reddy C ◽  
P R Anisha ◽  
Nhu Gia Nguyen ◽  
G Sreelatha

Abstract This research involves the usage of Machine Learning technology and Natural Language Processing (NLP) along with the Natural Language Tool-Kit (NLTK). This helps develop a logical Text Summarization tool, which uses the Extractive approach to generate an accurate and a fluent summary. The aim of this tool is to efficiently extract a concise and a coherent version, having only the main needed outline points from the long text or the input document avoiding any type of repetitions of the same text or information that has already been mentioned earlier in the text. The text to be summarized can be inherited from the web using the process of web scraping or entering the textual data manually on the platform i.e., the tool. The summarization process can be quite beneficial for the users as these long texts, needs to be shortened to help them to refer to the input quickly and understand points that might be out of their scope to understand.

Students’ life is incomplete without exams because exams are those that help students in evaluating themselves and thus proceeding further in studies. So, the starting step in conducting such examinations is creating a question paper. Generating a question paper is still in its traditional way, where lecturers or professors that are the teaching staff are doing it manually and wasting a terrible amount of time in selecting what type of questions are to be generated. It’s so difficult to create a question paper as it includes a lot of resource utilization and exhaustion. These tasks can be automated. As we are seeing a lot of development in new, exciting technologies and these technologies can help and can make the process of automation easier. So for automation, we use Machine Learning and Natural Language Processing as this whole task involves using and manipulating textual data. In this solution, we provide our model with a textual paragraph from which the questions are to be selectively generated and we develop the multiple choices using a certain distinctive process for the users.


Author(s):  
Janjanam Prabhudas ◽  
C. H. Pradeep Reddy

The enormous increase of information along with the computational abilities of machines created innovative applications in natural language processing by invoking machine learning models. This chapter will project the trends of natural language processing by employing machine learning and its models in the context of text summarization. This chapter is organized to make the researcher understand technical perspectives regarding feature representation and their models to consider before applying on language-oriented tasks. Further, the present chapter revises the details of primary models of deep learning, its applications, and performance in the context of language processing. The primary focus of this chapter is to illustrate the technical research findings and gaps of text summarization based on deep learning along with state-of-the-art deep learning models for TS.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Aditya Borakati

Abstract Background In the context of the ongoing pandemic, e-learning has become essential to maintain existing medical educational programmes. Evaluation of such courses has thus far been on a small scale at single institutions. Further, systematic appraisal of the large volume of qualitative feedback generated by massive online e-learning courses manually is time consuming. This study aimed to evaluate the impact of an e-learning course targeting medical students collaborating in an international cohort study, with semi-automated analysis of feedback using text mining and machine learning methods. Method This study was based on a multi-centre cohort study exploring gastrointestinal recovery following elective colorectal surgery. Collaborators were invited to complete a series of e-learning modules on key aspects of the study and complete a feedback questionnaire on the modules. Quantitative data were analysed using simple descriptive statistics. Qualitative data were analysed using text mining with most frequent words, sentiment analysis with the AFINN-111 and syuzhet lexicons and topic modelling using the Latent Dirichlet Allocation (LDA). Results One thousand six hundred and eleventh collaborators from 24 countries completed the e-learning course; 1396 (86.7%) were medical students; 1067 (66.2%) entered feedback. 1031 (96.6%) rated the quality of the course a 4/5 or higher (mean 4.56; SD 0.58). The mean sentiment score using the AFINN was + 1.54/5 (5: most positive; SD 1.19) and + 0.287/1 (1: most positive; SD 0.390) using syuzhet. LDA generated topics consolidated into the themes: (1) ease of use, (2) conciseness and (3) interactivity. Conclusions E-learning can have high user satisfaction for training investigators of clinical studies and medical students. Natural language processing may be beneficial in analysis of large scale educational courses.


2020 ◽  
Author(s):  
Aditya Borakati

Abstract Background: In the context of the ongoing pandemic, e-learning has become essential to maintain existing medical educational programmes. Evaluation of such courses has thus far been on a small scale at single institutions. Further, systematic appraisal of the large volume of qualitative feedback generated by massive online e-learning courses manually is time consuming. This study aimed to evaluate the impact of an e-learning course targeting medical students collaborating in an international cohort study, with semi-automated analysis of feedback using text mining and machine learning methods.Method: This study was based on a multi-centre cohort study exploring gastrointestinal recovery following elective colorectal surgery. Collaborators were invited to complete a series of e-learning modules on key aspects of the study and complete a feedback questionnaire on the modules. Quantitative data were analysed using simple descriptive statistics. Qualitative data were analysed using text mining with most frequent words, sentiment analysis with the AFINN-111 and syuzhet lexicons and topic modelling using the Latent Dirichlet Allocation (LDA). Results: 1,611 collaborators from 24 countries completed the e-learning course; 1,396 (86.7%) were medical students; 1,067 (66.2%) entered feedback. 1,031 (96.6%) rated the quality of the course a 4/5 or higher (mean 4.56; SD 0.58). The mean sentiment score using the AFINN was +1.54/5 (5: most positive; SD 1.19) and +0.287/1 (1: most positive; SD 0.390) using syuzhet. LDA generated topics consolidated into the themes: (1) ease of use, (2) conciseness and (3) interactivity.Conclusions: E-learning can have high user satisfaction for training investigators of clinical studies and medical students. Natural language processing may be beneficial in analysis of large scale educational courses.


IJOSTHE ◽  
2017 ◽  
Vol 5 (1) ◽  
pp. 10
Author(s):  
Rajul Rai ◽  
Pradeep Mewada

With development of Internet and Natural Language processing, use of regional languages is also grown for communication. Sentiment analysis is natural language processing task that extracts useful information from various data forms such as reviews and categorize them on basis of polarity. One of the sub-domain of opinion mining is sentiment analysis which is basically focused on the extraction of emotions and opinions of the people towards a particular topic from textual data. In this paper, sentiment analysis is performed on IMDB movie review database. We examine the sentiment expression to classify the polarity of the movie review on a scale of negative to positive and perform feature extraction and ranking and use these features to train our multilevel classifier to classify the movie review into its correct label. In this paper classification of movie reviews into positive and negative classes with the help of machine learning. Proposed approach using classification techniques has the best accuracy of about 99%.


Author(s):  
Luca Cagliero ◽  
Paolo Garza ◽  
Moreno La Quatra

The recent advances in multimedia and web-based applications have eased the accessibility to large collections of textual documents. To automate the process of document analysis, the research community has put relevant efforts into extracting short summaries of the document content. However, most of the early proposed summarization methods were tailored to English-written textual corpora or to collections of documents all written in the same language. More recently, the joint efforts of the machine learning and the natural language processing communities have produced more portable and flexible solutions, which can be applied to documents written in different languages. This chapter first overviews the most relevant language-specific summarization algorithms. Then, it presents the most recent advances in multi- and cross-lingual text summarization. The chapter classifies the presented methodology, highlights the main pros and cons, and discusses the perspectives of the extension of the current research towards cross-lingual summarization systems.


2012 ◽  
Vol 24 (2) ◽  
pp. 117-126 ◽  
Author(s):  
Mahmuda Rahman

Key words: Natural language processing; C4.5 classification; DSS, machine learning; KNN clustering; SVMDOI: http://dx.doi.org/10.3329/bjsr.v24i2.10768 Bangladesh J. Sci. Res. 24(2):117-126, 2011 (December) 


2017 ◽  
Vol 7 (3) ◽  
pp. 153-159
Author(s):  
Omer Sevinc ◽  
Iman Askerbeyli ◽  
Serdar Mehmet Guzel

Social media has been widely used in our daily lives, which, in essence, can be considered as a magic box, providing great insights about world trend topics. It is a fact that inferences gained from social media platforms such as Twitter, Faceboook or etc. can be employed in a variety of different fields. Computer science technologies involving data mining, natural language processing (NLP), text mining and machine learning are recently utilized for social media analysis. A comprehensive analysis of social web can discover the trends of the public on any field. For instance, it may help to understand political tendencies, cultural or global believes etc. Twitter is one of the most dominant and popular social media tools, which also provides huge amount of data. Accordingly, this study proposes a new methodology, employing Twitter data, to infer some meaningful information to remarks prominent trend topics successfully. Experimental results verify the feasibility of the proposed approach. Keywords: Social web mining, Tweeter analysis, machine learning, text mining, natural language processing.  


Author(s):  
Rohan Pandey ◽  
Vaibhav Gautam ◽  
Ridam Pal ◽  
Harsh Bandhey ◽  
Lovedeep Singh Dhingra ◽  
...  

BACKGROUND The COVID-19 pandemic has uncovered the potential of digital misinformation in shaping the health of nations. The deluge of unverified information that spreads faster than the epidemic itself is an unprecedented phenomenon that has put millions of lives in danger. Mitigating this ‘Infodemic’ requires strong health messaging systems that are engaging, vernacular, scalable, effective and continuously learn the new patterns of misinformation. OBJECTIVE We created WashKaro, a multi-pronged intervention for mitigating misinformation through conversational AI, machine translation and natural language processing. WashKaro provides the right information matched against WHO guidelines through AI, and delivers it in the right format in local languages. METHODS We theorize (i) an NLP based AI engine that could continuously incorporate user feedback to improve relevance of information, (ii) bite sized audio in the local language to improve penetrance in a country with skewed gender literacy ratios, and (iii) conversational but interactive AI engagement with users towards an increased health awareness in the community. RESULTS A total of 5026 people who downloaded the app during the study window, among those 1545 were active users. Our study shows that 3.4 times more females engaged with the App in Hindi as compared to males, the relevance of AI-filtered news content doubled within 45 days of continuous machine learning, and the prudence of integrated AI chatbot “Satya” increased thus proving the usefulness of an mHealth platform to mitigate health misinformation. CONCLUSIONS We conclude that a multi-pronged machine learning application delivering vernacular bite-sized audios and conversational AI is an effective approach to mitigate health misinformation. CLINICALTRIAL Not Applicable


Sign in / Sign up

Export Citation Format

Share Document