Teaching Natural Language Processing through Big Data Text Summarization with Problem-Based Learning

Liuqing Li; Jack Geissinger; William A. Ingram; Edward A. Fox

doi:10.2478/dim-2020-0003

Teaching Natural Language Processing through Big Data Text Summarization with Problem-Based Learning

Data and Information Management ◽

10.2478/dim-2020-0003 ◽

2020 ◽

Vol 4 (1) ◽

pp. 18-43

Author(s):

Liuqing Li ◽

Jack Geissinger ◽

William A. Ingram ◽

Edward A. Fox

Keyword(s):

Big Data ◽

Natural Language Processing ◽

Graduate Students ◽

Natural Language ◽

Information Management ◽

Language Processing ◽

Problem Based Learning ◽

Text Summarization ◽

Data Sets ◽

Student Teams

AbstractNatural language processing (NLP) covers a large number of topics and tasks related to data and information management, leading to a complex and challenging teaching process. Meanwhile, problem-based learning is a teaching technique specifically designed to motivate students to learn efficiently, work collaboratively, and communicate effectively. With this aim, we developed a problem-based learning course for both undergraduate and graduate students to teach NLP. We provided student teams with big data sets, basic guidelines, cloud computing resources, and other aids to help different teams in summarizing two types of big collections: Web pages related to events, and electronic theses and dissertations (ETDs). Student teams then deployed different libraries, tools, methods, and algorithms to solve the task of big data text summarization. Summarization is an ideal problem to address learning NLP since it involves all levels of linguistics, as well as many of the tools and techniques used by NLP practitioners. The evaluation results showed that all teams generated coherent and readable summaries. Many summaries were of high quality and accurately described their corresponding events or ETD chapters, and the teams produced them along with NLP pipelines in a single semester. Further, both undergraduate and graduate students gave statistically significant positive feedback, relative to other courses in the Department of Computer Science. Accordingly, we encourage educators in the data and information management field to use our approach or similar methods in their teaching and hope that other researchers will also use our data sets and synergistic solutions to approach the new and challenging tasks we addressed.

Download Full-text

Natural Language Processing (NLP) based Text Summarization - A Survey

2021 6th International Conference on Inventive Computation Technologies (ICICT) ◽

10.1109/icict50816.2021.9358703 ◽

2021 ◽

Author(s):

Ishitva Awasthi ◽

Kuntal Gupta ◽

Prabjot Singh Bhogal ◽

Sahejpreet Singh Anand ◽

Piyush Kumar Soni

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Text Summarization

Download Full-text

Ansätze zur quantitativen Inhaltsanalyse

WiSt - Wirtschaftswissenschaftliches Studium ◽

10.15358/0340-1650-2021-2-3-17 ◽

2021 ◽

Vol 50 (2-3) ◽

pp. 17-22

Author(s):

Johannes Brunzel

Keyword(s):

Big Data ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing

Der Beitrag erläutert, inwiefern die Methode der quantitativen Textanalyse ein wesentliches Mittel zur betriebswirtschaftlichen Effizienzsteigerung sein kann. Dabei geht der Artikel über die Nennung von Chancen und Risiken des Einsatzes von künstlicher Intelligenz/Big Data-Analysen hinaus, indem der Beitrag praxisorientiert wichtige Entwicklungen im Bereich der quantitativen Inhaltsanalyse aus der wirtschaftswissenschaftlichen Literatur herleitet. Nachfolgend unterteilt der Artikel die wichtigsten Schritte zur Implementierung in (1) Datenerhebung von quantitativen Textdaten, (2) Durchführung der generischen Textanalyse und (3) Durchführung des Natural Language Processing. Als ein Hauptergebnis hält der Artikel fest, dass Natural Language Processing-Ansätze zwar weiterführende und komplexere Einsichten bieten, jedoch das Potenzial generischer Textanalyse - aufgrund der Flexibilität und verhältnismäßig einfachen Anwendbarkeit im Unternehmenskontext - noch nicht ausgeschöpft ist. Zudem stehen Führungskräfte vor der dichotomen Entscheidung, ob programmierbasierte oder kommerzielle Lösungen für die Durchführung der Textanalyse relevant sind.

Download Full-text

Design of Link Evaluation Method to Improve Reliability based on Linked Open Big Data and Natural Language Processing

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i3.33.18601 ◽

2018 ◽

Vol 7 (3.33) ◽

pp. 168

Author(s):

Yonglak SHON ◽

Jaeyoung PARK ◽

Jangmook KANG ◽

Sangwon LEE

Keyword(s):

Big Data ◽

Natural Language Processing ◽

Language Processing ◽

Confidence Level ◽

Linked Data ◽

Evaluation Method ◽

Data Sets ◽

Knowledge Based ◽

Global Data ◽

Improve Reliability

The LOD data sets consist of RDF Triples based on the Ontology, a specification of existing facts, and by linking them to previously disclosed knowledge based on linked data principles. These structured LOD clouds form a large global data network, which provides a more accurate foundation for users to deliver the desired information. However, it is difficult to identify that, if the presence of the same object is identified differently across several LOD data sets, they are inherently identical. This is because objects with different URIs in the LOD datasets must be different and they must be closely examined for similarities in order to judge them as identical. The aim of this study is that the prosed model, RILE, evaluates similarity by comparing object values of existing specified predicates. After performing experiments with our model, we could check the improvement of the confidence level of the connection by extracting the link value.

Download Full-text

Enhancing Natural Language Inference Using New and Expanded Training Data Sets and New Learning Models

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6371 ◽

2020 ◽

Vol 34 (05) ◽

pp. 8504-8511

Author(s):

Arindam Mitra ◽

Ishan Shrivastava ◽

Chitta Baral

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Question Answering ◽

Training Data ◽

Data Sets ◽

Learning Models ◽

New Learning ◽

Word Attention ◽

Attention Function

Natural Language Inference (NLI) plays an important role in many natural language processing tasks such as question answering. However, existing NLI modules that are trained on existing NLI datasets have several drawbacks. For example, they do not capture the notion of entity and role well and often end up making mistakes such as “Peter signed a deal” can be inferred from “John signed a deal”. As part of this work, we have developed two datasets that help mitigate such issues and make the systems better at understanding the notion of “entities” and “roles”. After training the existing models on the new dataset we observe that the existing models do not perform well on one of the new benchmark. We then propose a modification to the “word-to-word” attention function which has been uniformly reused across several popular NLI architectures. The resulting models perform as well as their unmodified counterparts on the existing benchmarks and perform significantly well on the new benchmarks that emphasize “roles” and “entities”.

Download Full-text

EOR/IOR Screening with Big Data Analytics and Natural Language Processing for Unstructured Data: A Statistical Approach

10.2118/181117-ms ◽

2016 ◽

Author(s):

Sardar Afra ◽

Mohammadali Tarrahi

Keyword(s):

Big Data ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Data Analytics ◽

Statistical Approach ◽

Big Data Analytics ◽

Unstructured Data

Download Full-text

Leveraging Natural Language Processing Applications Using Machine Learning

Handbook of Research on Emerging Trends and Applications of Machine Learning - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-5225-9643-1.ch016 ◽

2020 ◽

pp. 338-360

Author(s):

Janjanam Prabhudas ◽

C. H. Pradeep Reddy

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Text Summarization ◽

Feature Representation ◽

Learning Models ◽

Primary Focus ◽

And Performance

The enormous increase of information along with the computational abilities of machines created innovative applications in natural language processing by invoking machine learning models. This chapter will project the trends of natural language processing by employing machine learning and its models in the context of text summarization. This chapter is organized to make the researcher understand technical perspectives regarding feature representation and their models to consider before applying on language-oriented tasks. Further, the present chapter revises the details of primary models of deep learning, its applications, and performance in the context of language processing. The primary focus of this chapter is to illustrate the technical research findings and gaps of text summarization based on deep learning along with state-of-the-art deep learning models for TS.

Download Full-text

Big Data and Natural Language Processing for Analysing Railway Safety

Innovative Applications of Big Data in the Railway Industry - Advances in Civil and Industrial Engineering ◽

10.4018/978-1-5225-3176-0.ch011 ◽

2018 ◽

pp. 240-267

Author(s):

Kanza Noor Syeda ◽

Syed Noorulhassan Shirazi ◽

Syed Asad Ali Naqvi ◽

Howard J Parkinson ◽

Gary Bamford

Keyword(s):

Big Data ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Machine Intelligence ◽

Data Availability ◽

Accident Data ◽

Data Driven Approach ◽

Advanced Analytics ◽

The Uk

Due to modern powerful computing and the explosion in data availability and advanced analytics, there should be opportunities to use a Big Data approach to proactively identify high risk scenarios on the railway. In this chapter, we comprehend the need for developing machine intelligence to identify heightened risk on the railway. In doing so, we have explained a potential for a new data driven approach in the railway, we then focus the rest of the chapter on Natural Language Processing (NLP) and its potential for analysing accident data. We review and analyse investigation reports of railway accidents in the UK, published by the Rail Accident Investigation Branch (RAIB), aiming to reveal the presence of entities which are informative of causes and failures such as human, technical and external. We give an overview of a framework based on NLP and machine learning to analyse the raw text from RAIB reports which would assist the risk and incident analysis experts to study causal relationship between causes and failures towards the overall safety in the rail industry.

Download Full-text