Networds: The impact of electronic text-processing utilities on writing

Electronic text is essentially just a sequence of characters, but the majority of text processing tools operate in terms of linguistic units such as words and sentences. Tokenization is a process of segmenting text into words, and sentence splitting is the process of determining sentence boundaries in the text. In this chapter we describe major challenges for text tokenization and sentence splitting in different languages, and outline various computational approaches to tackling them.

Download Full-text

Do all facial emojis communicate emotion? The impact of facial emojis on perceived sender emotion and text processing

Computers in Human Behavior ◽

10.1016/j.chb.2021.107016 ◽

2022 ◽

Vol 126 ◽

pp. 107016

Author(s):

Valeria A. Pfeifer ◽

Emma L. Armstrong ◽

Vicky Tzuyin Lai

Keyword(s):

Text Processing ◽

The Impact

Download Full-text

Approach for social media content-based analysis for vacation resorts

Journal of Communications Software and Systems ◽

10.24138/jcomss.v15i3.712 ◽

2019 ◽

Vol 15 (3) ◽

Author(s):

Snezhana Sulova ◽

Boris Bankov

Keyword(s):

Social Media ◽

Language Processing ◽

Text Processing ◽

Automated Analysis ◽

New Approach ◽

Useful Knowledge ◽

Media Messages ◽

Customer Feedback ◽

Processing Techniques ◽

The Impact

The impact of social networks on our liveskeeps increasing because they provide content,generated and controlled by users, that is constantly evolving. They aid us in spreading news, statements, ideas and comments very quickly. Social platforms are currently one of the richest sources of customer feedback on a variety of topics. A topic that is frequently discussed is the resort and holiday villages and the tourist services offered there. Customer comments are valuable to both travel planners and tour operators. The accumulation of opinions in the web space is a prerequisite for using and applying appropriate tools for their computer processing and for extracting useful knowledge from them. While working with unstructured data, such as social media messages, there isn’t a universal text processing algorithm because each social network and its resources have their own characteristics. In this article, we propose a new approach for an automated analysis of a static set of historical data of user messages about holiday and vacation resorts, published on Twitter. The approach is based on natural language processing techniques and the application of machine learning methods. The experiments are conducted using softwareproduct RapidMiner.

Download Full-text

Improving the Accuracy of Text Classification using Stemming Method, A Case of Non-Formal Indonesian Conversation

10.21203/rs.3.rs-41431/v3 ◽

2021 ◽

Author(s):

Rianto Rianto ◽

Achmad Benny Mutiara ◽

Eri Prasetyo Wibowo ◽

Paulus Insap Santosa

Keyword(s):

Text Classification ◽

Hate Speech ◽

Text Processing ◽

High Accuracy ◽

Small Error ◽

Support Vector ◽

Support Vector Machine Algorithm ◽

Text Data ◽

Accuracy Level ◽

The Impact

Abstract Background: Stemming has long been used in data pre-processing to retrieve information by tracking affixed words back into their root. In an Indonesian setting, existing stemming methods have been observed, and the existing stemming methods are proven to result in high accuracy level. However, there are not many stemming methods for non-formal Indonesian text processing. This study introduces a new stemming method to solve problems in the non-formal Indonesian text data pre-processing. Furthermore, this study aims to improve the accuracy of text classifier models by strengthening stemming method. Using the Support Vector Machine algorithm, a text classifier model is developed, and its accuracy is checked. The experimental evaluation was done by testing 550 datasets in Indonesian using two different stemming methods. Findings: The results show that using the proposed stemming method, the text classifier model has higher accuracy than the existing methods with a score of 0.85 and 0.73, respectively. These results indicate that the proposed stemming methods produces a classifier model with a small error rate, so it will be more accurate to predict a class of objects. Conclusion: The existing Indonesian stemming methods are still oriented towards Indonesian formal sentences, therefore the method has limitations to be used in Indonesian non-formal sentences. This phenomenon underlies the suggestion of developing a corpus by normalizing Indonesian non-formal into formal to be used as a better stemming method. The impact of using the corpus as a stemming method is that it can improve the accuracy of the classifier model. In the future, the proposed corpus and stemming methods can be used for various purposes including text clustering, summarizing, detecting hate speech, and other text processing applications in Indonesian.

Download Full-text

The impact of text presentation on translator performance

Target ◽

10.1075/target.20006.lau ◽

2021 ◽

Author(s):

Samuel Läubli ◽

Patrick Simianer ◽

Joern Wuebker ◽

Geza Kovacs ◽

Rico Sennrich ◽

...

Keyword(s):

Best Practices ◽

Text Processing ◽

The Other ◽

Time Efficiency ◽

Significant Evidence ◽

Computer Aided ◽

Text Presentation ◽

Controlled Evaluation ◽

The Impact ◽

Speed And Accuracy

Abstract Widely used computer-aided translation (CAT) tools divide documents into segments, such as sentences, and arrange them side-by-side in a spreadsheet-like view. We present the first controlled evaluation of these design choices on translator performance, measuring speed and accuracy in three experimental text-processing tasks. We find significant evidence that sentence-by-sentence presentation enables faster text reproduction and within-sentence error identification compared to unsegmented text, and that a top-and-bottom arrangement of source and target sentences enables faster text reproduction compared to a side-by-side arrangement. For revision, on the other hand, we find that presenting unsegmented text results in the highest accuracy and time efficiency. Our findings have direct implications for best practices in designing CAT tools.

Download Full-text

A New Concept of Electronic Text Based on Semantic Coding System for Machine Translation

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3469655 ◽

2022 ◽

Vol 21 (1) ◽

pp. 1-16

Author(s):

Meftah Mohammed Charaf Eddine

Keyword(s):

Machine Translation ◽

Text Processing ◽

Electronic Text ◽

Web Pages ◽

Coding System ◽

Text Editor ◽

Accuracy Rate ◽

Structural Ambiguity ◽

Very High ◽

Structural Aspects

In the field of machine translation of texts, the ambiguity in both lexical (dictionary) and structural aspects is still one of the difficult problems. Researchers in this field use different approaches, the most important of which is machine learning in its various types. The goal of the approach that we propose in this article is to define a new concept of electronic text, which makes the electronic text free from any lexical or structural ambiguity. We used a semantic coding system that relies on attaching the original electronic text (via the text editor interface) with the meanings intended by the author. The author defines the meaning desired for each word that can be a source of ambiguity. The proposed approach in this article can be used with any type of electronic text (text processing applications, web pages, email text, etc.). Thanks to the approach that we propose and through the experiments that we have conducted using it, we can obtain a very high accuracy rate. We can say that the problem of lexical and structural ambiguity can be completely solved. With this new concept of electronic text, the text file contains not only the text but also with it the true sense of the exact meaning intended by the writer in the form of symbols. These semantic symbols are used during machine translation to obtain a translated text completely free of any lexical and structural ambiguity.

Download Full-text

A novel systematic method to evaluate computer-supported collaborative design technologies

Research in Engineering Design ◽

10.1007/s00163-019-00323-7 ◽

2019 ◽

Vol 31 (1) ◽

pp. 53-81

Author(s):

R. Brisco ◽

R. I. Whitfield ◽

H. Grierson

Keyword(s):

Engineering Design ◽

Collaborative Design ◽

Text Processing ◽

Technology Selection ◽

Design Activity ◽

Design Teams ◽

Systematic Method ◽

Using Data ◽

The Impact ◽

Selection Of

Abstract Selection of suitable computer-supported collaborative design (CSCD) technologies is crucial to facilitate successful projects. This paper presents the first systematic method for engineering design teams to evaluate and select the most suitable CSCD technologies comparing technology functionality and project requirements established in peer-reviewed literature. The paper first presents 220 factors that influence successful CSCD. These factors were then systematically mapped and categorised to create CSCD requirement statements. The novel evaluation and selection method incorporates these requirement statements within a matrix and develops a discourse analysis text processing algorithm with data from collaborative projects to automate the population of how technologies impact the success of CSCD in engineering design teams. This method was validated using data collected across 3 years of a student global design project. The impact of this method is the potential to change the way engineering design teams consider the technology they use and how the selection of appropriate tools impacts the success of their CSCD projects. The development of the CSCD evaluation matrix is the first of its kind enabling a systematic and justifiable comparison and technology selection, with the aim of best supporting the engineering designers collaborative design activity.

Download Full-text

The Impact of Translating Resource-Rich Datasets to Low-Resource Languages Through Multi-Lingual Text Processing

IEEE Access ◽

10.1109/access.2021.3110285 ◽

2021 ◽

pp. 1-1

Author(s):

Abdul Ghafoor ◽

Ali Shariq Imran ◽

Sher Muhammad Daudpota ◽

Zenun Kastrati ◽

Abdullah ◽

...

Keyword(s):

Text Processing ◽

Low Resource ◽

The Impact

Download Full-text