Design of Link Evaluation Method to Improve Reliability based on Linked Open Big Data and Natural Language Processing

The LOD data sets consist of RDF Triples based on the Ontology, a specification of existing facts, and by linking them to previously disclosed knowledge based on linked data principles. These structured LOD clouds form a large global data network, which provides a more accurate foundation for users to deliver the desired information. However, it is difficult to identify that, if the presence of the same object is identified differently across several LOD data sets, they are inherently identical. This is because objects with different URIs in the LOD datasets must be different and they must be closely examined for similarities in order to judge them as identical. The aim of this study is that the prosed model, RILE, evaluates similarity by comparing object values of existing specified predicates. After performing experiments with our model, we could check the improvement of the confidence level of the connection by extracting the link value.

Download Full-text

Teaching Natural Language Processing through Big Data Text Summarization with Problem-Based Learning

Data and Information Management ◽

10.2478/dim-2020-0003 ◽

2020 ◽

Vol 4 (1) ◽

pp. 18-43

Author(s):

Liuqing Li ◽

Jack Geissinger ◽

William A. Ingram ◽

Edward A. Fox

Keyword(s):

Big Data ◽

Natural Language Processing ◽

Graduate Students ◽

Natural Language ◽

Information Management ◽

Language Processing ◽

Problem Based Learning ◽

Text Summarization ◽

Data Sets ◽

Student Teams

AbstractNatural language processing (NLP) covers a large number of topics and tasks related to data and information management, leading to a complex and challenging teaching process. Meanwhile, problem-based learning is a teaching technique specifically designed to motivate students to learn efficiently, work collaboratively, and communicate effectively. With this aim, we developed a problem-based learning course for both undergraduate and graduate students to teach NLP. We provided student teams with big data sets, basic guidelines, cloud computing resources, and other aids to help different teams in summarizing two types of big collections: Web pages related to events, and electronic theses and dissertations (ETDs). Student teams then deployed different libraries, tools, methods, and algorithms to solve the task of big data text summarization. Summarization is an ideal problem to address learning NLP since it involves all levels of linguistics, as well as many of the tools and techniques used by NLP practitioners. The evaluation results showed that all teams generated coherent and readable summaries. Many summaries were of high quality and accurately described their corresponding events or ETD chapters, and the teams produced them along with NLP pipelines in a single semester. Further, both undergraduate and graduate students gave statistically significant positive feedback, relative to other courses in the Department of Computer Science. Accordingly, we encourage educators in the data and information management field to use our approach or similar methods in their teaching and hope that other researchers will also use our data sets and synergistic solutions to approach the new and challenging tasks we addressed.

Download Full-text

Ansätze zur quantitativen Inhaltsanalyse

WiSt - Wirtschaftswissenschaftliches Studium ◽

10.15358/0340-1650-2021-2-3-17 ◽

2021 ◽

Vol 50 (2-3) ◽

pp. 17-22

Author(s):

Johannes Brunzel

Keyword(s):

Big Data ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing

Der Beitrag erläutert, inwiefern die Methode der quantitativen Textanalyse ein wesentliches Mittel zur betriebswirtschaftlichen Effizienzsteigerung sein kann. Dabei geht der Artikel über die Nennung von Chancen und Risiken des Einsatzes von künstlicher Intelligenz/Big Data-Analysen hinaus, indem der Beitrag praxisorientiert wichtige Entwicklungen im Bereich der quantitativen Inhaltsanalyse aus der wirtschaftswissenschaftlichen Literatur herleitet. Nachfolgend unterteilt der Artikel die wichtigsten Schritte zur Implementierung in (1) Datenerhebung von quantitativen Textdaten, (2) Durchführung der generischen Textanalyse und (3) Durchführung des Natural Language Processing. Als ein Hauptergebnis hält der Artikel fest, dass Natural Language Processing-Ansätze zwar weiterführende und komplexere Einsichten bieten, jedoch das Potenzial generischer Textanalyse - aufgrund der Flexibilität und verhältnismäßig einfachen Anwendbarkeit im Unternehmenskontext - noch nicht ausgeschöpft ist. Zudem stehen Führungskräfte vor der dichotomen Entscheidung, ob programmierbasierte oder kommerzielle Lösungen für die Durchführung der Textanalyse relevant sind.

Download Full-text

Enhancing Natural Language Inference Using New and Expanded Training Data Sets and New Learning Models

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6371 ◽

2020 ◽

Vol 34 (05) ◽

pp. 8504-8511

Author(s):

Arindam Mitra ◽

Ishan Shrivastava ◽

Chitta Baral

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Question Answering ◽

Training Data ◽

Data Sets ◽

Learning Models ◽

New Learning ◽

Word Attention ◽

Attention Function

Natural Language Inference (NLI) plays an important role in many natural language processing tasks such as question answering. However, existing NLI modules that are trained on existing NLI datasets have several drawbacks. For example, they do not capture the notion of entity and role well and often end up making mistakes such as “Peter signed a deal” can be inferred from “John signed a deal”. As part of this work, we have developed two datasets that help mitigate such issues and make the systems better at understanding the notion of “entities” and “roles”. After training the existing models on the new dataset we observe that the existing models do not perform well on one of the new benchmark. We then propose a modification to the “word-to-word” attention function which has been uniformly reused across several popular NLI architectures. The resulting models perform as well as their unmodified counterparts on the existing benchmarks and perform significantly well on the new benchmarks that emphasize “roles” and “entities”.

Download Full-text

LIS4: Lesk Inspired Sense Specific Semantic Similarity using WordNet

Journal of Information & Knowledge Management ◽

10.1142/s0219649221500064 ◽

2021 ◽

pp. 2150006

Author(s):

Saravanakumar Kandasamy ◽

Aswani Kumar Cherukuri

Keyword(s):

Information Retrieval ◽

Natural Language Processing ◽

Natural Language ◽

Semantic Similarity ◽

Language Processing ◽

Gold Standard ◽

Question Answering ◽

Knowledge Based ◽

Benchmark Datasets ◽

Processing Information

Semantic similarity quantification between concepts is one of the inevitable parts in domains like Natural Language Processing, Information Retrieval, Question Answering, etc. to understand the text and their relationships better. Last few decades, many measures have been proposed by incorporating various corpus-based and knowledge-based resources. WordNet and Wikipedia are two of the Knowledge-based resources. The contribution of WordNet in the above said domain is enormous due to its richness in defining a word and all of its relationship with others. In this paper, we proposed an approach to quantify the similarity between concepts that exploits the synsets and the gloss definitions of different concepts using WordNet. Our method considers the gloss definitions, contextual words that are helping in defining a word, synsets of contextual word and the confidence of occurrence of a word in other word’s definition for calculating the similarity. The evaluation based on different gold standard benchmark datasets shows the efficiency of our system in comparison with other existing taxonomical and definitional measures.

Download Full-text

EOR/IOR Screening with Big Data Analytics and Natural Language Processing for Unstructured Data: A Statistical Approach

10.2118/181117-ms ◽

2016 ◽

Author(s):

Sardar Afra ◽

Mohammadali Tarrahi

Keyword(s):

Big Data ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Data Analytics ◽

Statistical Approach ◽

Big Data Analytics ◽

Unstructured Data

Download Full-text

Big Data and Natural Language Processing for Analysing Railway Safety

Innovative Applications of Big Data in the Railway Industry - Advances in Civil and Industrial Engineering ◽

10.4018/978-1-5225-3176-0.ch011 ◽

2018 ◽

pp. 240-267

Author(s):

Kanza Noor Syeda ◽

Syed Noorulhassan Shirazi ◽

Syed Asad Ali Naqvi ◽

Howard J Parkinson ◽

Gary Bamford

Keyword(s):

Big Data ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Machine Intelligence ◽

Data Availability ◽

Accident Data ◽

Data Driven Approach ◽

Advanced Analytics ◽

The Uk

Due to modern powerful computing and the explosion in data availability and advanced analytics, there should be opportunities to use a Big Data approach to proactively identify high risk scenarios on the railway. In this chapter, we comprehend the need for developing machine intelligence to identify heightened risk on the railway. In doing so, we have explained a potential for a new data driven approach in the railway, we then focus the rest of the chapter on Natural Language Processing (NLP) and its potential for analysing accident data. We review and analyse investigation reports of railway accidents in the UK, published by the Rail Accident Investigation Branch (RAIB), aiming to reveal the presence of entities which are informative of causes and failures such as human, technical and external. We give an overview of a framework based on NLP and machine learning to analyse the raw text from RAIB reports which would assist the risk and incident analysis experts to study causal relationship between causes and failures towards the overall safety in the rail industry.

Download Full-text

SEMblog

Ontology-Based Applications for Enterprise Systems and Knowledge Management - Advances in Knowledge Acquisition, Transfer, and Management ◽

10.4018/978-1-4666-1993-7.ch012 ◽

2013 ◽

pp. 210-223

Author(s):

Azleena Mohd Kassim ◽

Yu-N Cheah

Keyword(s):

Information Technology ◽

Knowledge Management ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Human Intervention ◽

Knowledge Based ◽

Search Mechanism ◽

Management Policies ◽

Knowledge Identification

Information Technology (IT) is often employed to put knowledge management policies into operation. However, many of these tools require human intervention when it comes to deciding how the knowledge is to be managed. The Sematic Web may be an answer to this issue, but many Sematic Web tools are not readily available for the regular IT user. Another problem that arises is that typical efforts to apply or reuse knowledge via a search mechanism do not necessarily link to other pages that are relevant. Blogging systems appear to address some of these challenges but the browsing experience can be further enhanced by providing links to other relevant posts. In this chapter, the authors present a semantic blogging tool called SEMblog to identify, organize, and reuse knowledge based on the Sematic Web and ontologies. The SEMblog methodology brings together technologies such as Natural Language Processing (NLP), Sematic Web representations, and the ubiquity of the blogging environment to produce a more intuitive way to manage knowledge, especially in the areas of knowledge identification, organization, and reuse. Based on detailed comparisons with other similar systems, the uniqueness of SEMblog lies in its ability to automatically generate keywords and semantic links.

Download Full-text

A Comparison with Other Approaches

Instruction Modeling ◽

10.1093/oso/9780190910709.003.0008 ◽

2020 ◽

pp. 168-187

Author(s):

George A. Khachatryan

Keyword(s):

Big Data ◽

Cognitive Psychology ◽

Natural Language Processing ◽

Blended Learning ◽

Language Processing ◽

Usability Testing ◽

Learning Science ◽

Instructional Designers ◽

Wide Range ◽

Learning Programs

What are the relative merits of instruction modeling and other approaches to the design of blended learning programs? This chapter discusses several prevailing approaches, including applied learning science, personalization, and the use of big data in education. Many programs are designed around a single claimed feature of good instruction; terming such thinking “featurism,” this chapter argues that it is reductionist and less likely to be successful than more comprehensive approaches (such as instruction modeling). However, instruction modeling is not simply an alternative to other approaches: as the example of cognitive psychology illustrates, instruction modeling can often be fruitfully combined with other methods. Just as good software developers blend different approaches (e.g., using usability testing and the psychology of attention in designing interfaces), good instructional designers should draw on a wide range of techniques. This chapter discusses how instruction modeling can work in concert with big data, natural language processing, and other important approaches.

Download Full-text