A Collaborative Ecosystem for Digital Coptic Studies

Scholarship on underresourced languages bring with them a variety of challenges which make access to the full spectrum of source materials and their evaluation difficult. For Coptic in particular, large scale analyses and any kind of quantitative work become difficult due to the fragmentation of manuscripts, the highly fusional nature of an incorporational morphology, and the complications of dealing with influences from Hellenistic era Greek, among other concerns. Many of these challenges, however, can be addressed using Digital Humanities tools and standards. In this paper, we outline some of the latest developments in Coptic Scriptorium, a DH project dedicated to bringing Coptic resources online in uniform, machine readable, and openly available formats. Collaborative web-based tools create online 'virtual departments' in which scholars dispersed sparsely across the globe can collaborate, and natural language processing tools counterbalance the scarcity of trained editors by enabling machine processing of Coptic text to produce searchable, annotated corpora. Comment: 9 pages; paper presented at the Stanford University CESTA Workshop "Collecting, Preserving and Disseminating Endangered Cultural Heritage for New Understandings Through Multilingual Approaches"

Download Full-text

UGLEO: A WEB BASED INTELLIGENCE CHATBOT FOR STUDENT ADMISSION PORTAL USING MEGAHAL STYLE

Jurnal Ilmiah Informatika Komputer ◽

10.35760/ik.2018.v23i3.2373 ◽

2018 ◽

Vol 23 (3) ◽

pp. 175-191

Author(s):

Anneke Annassia Putri Siswadi ◽

Avinanta Tarigan

Keyword(s):

Markov Chain ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Information Need ◽

Web Based ◽

Markov Chain Method ◽

Information Center

To fulfill the prospective student's information need about student admission, Gunadarma University has already many kinds of services which are time limited, such as website, book, registration place, Media Information Center, and Question Answering’s website (UG-Pedia). It needs a service that can serve them anytime and anywhere. Therefore, this research is developing the UGLeo as a web based QA intelligence chatbot application for Gunadarma University's student admission portal. UGLeo is developed by MegaHal style which implements the Markov Chain method. In this research, there are some modifications in MegaHal style, those modifications are the structure of natural language processing and the structure of database. The accuracy of UGLeo reply is 65%. However, to increase the accuracy there are some improvements to be applied in UGLeo system, both improvement in natural language processing and improvement in MegaHal style.

Download Full-text

Machine-learning as a validated tool to characterize individual differences in free recall of naturalistic events.

10.31234/osf.io/uygzv ◽

2021 ◽

Author(s):

Xinxu Shen ◽

Troy Houser ◽

David Victor Smith ◽

Vishnu P. Murty

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Individual Difference ◽

Language Processing ◽

Large Scale ◽

High Reliability ◽

Difference Analysis ◽

Universal Sentence ◽

Natural Language Processing Tool

The use of naturalistic stimuli, such as narrative movies, is gaining popularity in many fields, characterizing memory, affect, and decision-making. Narrative recall paradigms are often used to capture the complexity and richness of memory for naturalistic events. However, scoring narrative recalls is time-consuming and prone to human biases. Here, we show the validity and reliability of using a natural language processing tool, the Universal Sentence Encoder (USE), to automatically score narrative recall. We compared the reliability in scoring made between two independent raters (i.e., hand-scored) and between our automated algorithm and individual raters (i.e., automated) on trial-unique, video clips of magic tricks. Study 1 showed that our automated segmentation approaches yielded high reliability and reflected measures yielded by hand-scoring, and further that the results using USE outperformed another popular natural language processing tool, GloVe. In study two, we tested whether our automated approach remained valid when testing individual’s varying on clinically-relevant dimensions that influence episodic memory, age and anxiety. We found that our automated approach was equally reliable across both age groups and anxiety groups, which shows the efficacy of our approach to assess narrative recall in large-scale individual difference analysis. In sum, these findings suggested that machine learning approaches implementing USE are a promising tool for scoring large-scale narrative recalls and perform individual difference analysis for research using naturalistic stimuli.

Download Full-text

The Experience of Developing a Large-Scale Natural Language Processing System: Critique

The Kluwer International Series in Engineering and Computer Science - Natural Language Processing: The PLNLP Approach ◽

10.1007/978-1-4615-3170-8_7 ◽

1993 ◽

pp. 77-89 ◽

Cited By ~ 2

Author(s):

Stephen Richardson ◽

Lisa Braden-Harder

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Large Scale ◽

Processing System ◽

Natural Language Processing System

Download Full-text

Designing and Validating an Annotation Model of Dynamic Modality for English and Spanish: Issues and Problems

10.29007/pc58 ◽

2018 ◽

Author(s):

Julia Lavid ◽

Marta Carretero ◽

Juan Rafael Zamorano

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Large Scale ◽

Reliability Study ◽

Annotation Scheme ◽

High Degree ◽

Difficult Cases

In this paper we set forth an annotation model for dynamic modality in English and Spanish, given its relevance not only for contrastive linguistic purposes, but also for its impact on practical annotation tasks in the Natural Language Processing (NLP) community. An annotation scheme is proposed, which captures both the functional-semantic meanings and the language-specific realisations of dynamic meanings in both languages. The scheme is validated through a reliability study performed on a randomly selected set of one hundred and twenty sentences from the MULTINOT corpus, resulting in a high degree of inter-annotator agreement. We discuss our main findings and give attention to the difficult cases as they are currently being used to develop detailed guidelines for the large-scale annotation of dynamic modality in English and Spanish.

Download Full-text

Comparison of Templates with Word2vec in Finding Semantic Relations Between Words

Journal of Intelligent Systems with Applications ◽

10.54856/jiswa.201805007 ◽

2018 ◽

pp. 13-17

Author(s):

Kaan Ant ◽

Ugur Sogukpinar ◽

Mehmet Fatif Amasyali

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Large Scale ◽

Semantic Relations ◽

Template Method ◽

Semantic Relationships ◽

Semantic Spaces

The use of databases those containing semantic relationships between words is becoming increasingly widespread in order to make natural language processing work more effective. Instead of the word-bag approach, the suggested semantic spaces give the distances between words, but they do not express the relation types. In this study, it is shown how semantic spaces can be used to find the type of relationship and it is compared with the template method. According to the results obtained on a very large scale, while is_a and opposite are more successful for semantic spaces for relations, the approach of templates is more successful in the relation types at_location, made_of and non relational.

Download Full-text

INDRA-IPM: interactive pathway modeling using natural language with automated assembly

Bioinformatics ◽

10.1093/bioinformatics/btz289 ◽

2019 ◽

Vol 35 (21) ◽

pp. 4501-4503 ◽

Cited By ~ 9

Author(s):

Petar V Todorov ◽

Benjamin M Gyori ◽

John A Bachman ◽

Peter K Sorger

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Web Service ◽

Language Processing ◽

Source Code ◽

Supplementary Information ◽

Expression Data ◽

Supplementary Data ◽

Automated Assembly ◽

Web Based

Abstract Summary INDRA-IPM (Interactive Pathway Map) is a web-based pathway map modeling tool that combines natural language processing with automated model assembly and visualization. INDRA-IPM contextualizes models with expression data and exports them to standard formats. Availability and implementation INDRA-IPM is available at: http://pathwaymap.indra.bio. Source code is available at http://github.com/sorgerlab/indra_pathway_map. The underlying web service API is available at http://api.indra.bio:8000. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Natural Language Processing in Large-Scale Neural Models for Medical Screenings

Frontiers in Robotics and AI ◽

10.3389/frobt.2019.00062 ◽

2019 ◽

Vol 6 ◽

Cited By ~ 1

Author(s):

Catharina Marie Stille ◽

Trevor Bekolay ◽

Peter Blouw ◽

Bernd J. Kröger

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Large Scale ◽

Neural Models

Download Full-text

YouTube as a Source of Information in Understanding Autonomous Vehicle Consumers: Natural Language Processing Study

Transportation Research Record Journal of the Transportation Research Board ◽

10.1177/0361198119842110 ◽

2019 ◽

Vol 2673 (8) ◽

pp. 242-253 ◽

Cited By ~ 5

Author(s):

Subasish Das ◽

Anandi Dutta ◽

Tomas Lindheimer ◽

Mohammad Jalayer ◽

Zachary Elgart

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Automotive Industry ◽

Autonomous Vehicles ◽

Large Scale ◽

Keyword Search ◽

Autonomous Vehicle ◽

Perception Of Safety ◽

Automation Level

The automotive industry is currently experiencing a revolution with the advent and deployment of autonomous vehicles. Several countries are conducting large-scale testing of autonomous vehicles on private and even public roads. It is important to examine the attitudes and potential concerns of end users towards autonomous cars before mass deployment. To facilitate the transition to autonomous vehicles, the automotive industry produces many videos on its products and technologies. The largest video sharing website, YouTube.com, hosts many videos on autonomous vehicle technology. Content analysis and text mining of the comments related to the videos with large numbers of views can provide insight about potential end-user feedback. This study examines two questions: first, how do people view autonomous vehicles? Second, what polarities exist regarding (a) content and (b) automation level? The researchers found 107 videos on YouTube using a related keyword search and examined comments on the 15 most-viewed videos, which had a total of 60.9 million views and around 25,000 comments. The videos were manually clustered based on their content and automation level. This study used two natural language processing (NLP) tools to perform knowledge discovery from a bag of approximately seven million words. The key issues in the comment threads were mostly associated with efficiency, performance, trust, comfort, and safety. The perception of safety and risk increased in the textual contents when videos presented full automation level. Sentiment analysis shows mixed sentiments towards autonomous vehicle technologies, however, the positive sentiments were higher than the negative.

Download Full-text

An Integrated Approach to 3D Web Visualization of Cultural Heritage Heterogeneous Datasets

Remote Sensing ◽

10.3390/rs11212508 ◽

2019 ◽

Vol 11 (21) ◽

pp. 2508 ◽

Cited By ~ 3

Author(s):

Argyro-Maria Boutsi ◽

Charalabos Ioannidis ◽

Sofia Soile

Keyword(s):

Cultural Heritage ◽

Programming Languages ◽

Large Scale ◽

User Interaction ◽

Holistic Approach ◽

Integrated Approach ◽

3D Models ◽

Data Interoperability ◽

Web Based ◽

3D Scene

The evolution of the high-quality 3D archaeological representations from niche products to integrated online media has not yet been completed. Digital archives of the field often lack multimodal data interoperability, user interaction and intelligibility. A web-based cultural heritage archive that compensates for these issues is presented in this paper. The multi-resolution 3D models constitute the core of the visualization on top of which supportive documentation data and multimedia content are spatial and logical connected. Our holistic approach focuses on the dynamic manipulation of the 3D scene through the development of advanced navigation mechanisms and information retrieval tools. Users parse the multi-modal content in a geo-referenced way through interactive annotation systems over cultural points of interest and automatic narrative tours. Multiple 3D and 2D viewpoints are enabled in real-time to support data inspection. The implementation exploits front-end programming languages, 3D graphic libraries and visualization frameworks to handle efficiently the asynchronous operations and preserve the initial assets’ accuracy. The choice of Greece’s Meteora, UNESCO world site, as a case study accounts for the platform’s applicability to complex geometries and large-scale historical environments.

Download Full-text