electronic text
Recently Published Documents


TOTAL DOCUMENTS

211
(FIVE YEARS 24)

H-INDEX

10
(FIVE YEARS 0)

Author(s):  
Meftah Mohammed Charaf Eddine

In the field of machine translation of texts, the ambiguity in both lexical (dictionary) and structural aspects is still one of the difficult problems. Researchers in this field use different approaches, the most important of which is machine learning in its various types. The goal of the approach that we propose in this article is to define a new concept of electronic text, which makes the electronic text free from any lexical or structural ambiguity. We used a semantic coding system that relies on attaching the original electronic text (via the text editor interface) with the meanings intended by the author. The author defines the meaning desired for each word that can be a source of ambiguity. The proposed approach in this article can be used with any type of electronic text (text processing applications, web pages, email text, etc.). Thanks to the approach that we propose and through the experiments that we have conducted using it, we can obtain a very high accuracy rate. We can say that the problem of lexical and structural ambiguity can be completely solved. With this new concept of electronic text, the text file contains not only the text but also with it the true sense of the exact meaning intended by the writer in the form of symbols. These semantic symbols are used during machine translation to obtain a translated text completely free of any lexical and structural ambiguity.


2021 ◽  
Author(s):  
Gentian Gashi

Handwriting recognition is the process of automatically converting handwritten text into electronic text (letter codes) usable by a computer. The increase in technology reliance during an international pandemic caused by COVID-19 has showcased the importance of ensuring the information stored and digitised is done accurately and efficiently. Interpreting handwriting remains complex for both humans and computers due to the various styles and skewed characters. In this study, we conducted a correlational analysis on the association between filter sizes and the convolutional neural networks (CNN’s) classification accuracy. The testing has been conducted from the publicly available MNIST database of handwritten digits (LeCun and Cortes, 2010). The dataset consists of a training set (N=60,000) and a testing set (N=10,000). Using ANOVA, our results indicate a strong correlation (.000,P≤0.05) between filter size and classification accuracy. However, this significance is only present when increasing the filter size from 1x1 to 2x2. Larger filter sizes were insignificant therefore, a filter size above 2x2 cannot be recommended.


2021 ◽  
Author(s):  
◽  
Richard Robertson

<p>Research problem: Digital libraries have invested significant resources digitising and providing access to an increasing number of books. The various approaches taken to visualise digitised books online, has potential to effect the usability and usefulness of the book to the user. Previous usability studies focus on the digital library as a whole, this study narrows the focus to the digitised book. The intention being to identify usability issues and investigate the effects a visualisation approach may have on users.  Methodology: An anonymous survey was conducted, employing the Interaction Triptych Framework (ITF) to frame the relationships between the user and digitised books. Two examples of digitised books from the New Zealand Electronic Text Collection and the Internet Archive were used. Participants from library, archives and history fields, as well as general users, were invited to participate.  Results: 132 participants began the survey, with 86 participants completing all of the required parts. Results suggest a slightly positive attitude towards the usability and usefulness of the examples, with Open Library rated higher for usability and both examples rated similarly for usefulness. Participant comments suggest many users appreciate features analogous to physical books, with regard to aesthetics, learnability and navigation, while for ease of use and reading, rich text appeared to be preferred over digital image based visualisation.  Implications: Digital Libraries need to continually strive to improve the usability and usefulness of digitised books to satisfy their users, further research is suggested creating prototypes and conducting user testing to gain a deeper understanding of the relationship between users and digitised books online.</p>


2021 ◽  
Author(s):  
◽  
Richard Robertson

<p>Research problem: Digital libraries have invested significant resources digitising and providing access to an increasing number of books. The various approaches taken to visualise digitised books online, has potential to effect the usability and usefulness of the book to the user. Previous usability studies focus on the digital library as a whole, this study narrows the focus to the digitised book. The intention being to identify usability issues and investigate the effects a visualisation approach may have on users.  Methodology: An anonymous survey was conducted, employing the Interaction Triptych Framework (ITF) to frame the relationships between the user and digitised books. Two examples of digitised books from the New Zealand Electronic Text Collection and the Internet Archive were used. Participants from library, archives and history fields, as well as general users, were invited to participate.  Results: 132 participants began the survey, with 86 participants completing all of the required parts. Results suggest a slightly positive attitude towards the usability and usefulness of the examples, with Open Library rated higher for usability and both examples rated similarly for usefulness. Participant comments suggest many users appreciate features analogous to physical books, with regard to aesthetics, learnability and navigation, while for ease of use and reading, rich text appeared to be preferred over digital image based visualisation.  Implications: Digital Libraries need to continually strive to improve the usability and usefulness of digitised books to satisfy their users, further research is suggested creating prototypes and conducting user testing to gain a deeper understanding of the relationship between users and digitised books online.</p>


2021 ◽  
pp. 44-49
Author(s):  
В.И. Теркулов

В статье дается общее описание типов сетевых словарей. Разграничиваются «репринтные» словари, которые представляют собой выложенные в Интернете текстовые электронные копии традиционных словарей, электронные версии традиционных словарей, получивших электронную разметку, гиперссылки (гипертекст) на связанные описания толкований значений и снабженных поисковыми формами, словарные корпусы (сетевые гизаурусы) – сайты одновременного поиска по разным электронным версиям словарей (лингвистических и энциклопедических) и сетевые словари, создаваемые исключительно в электронном формате. Базовыми особенностями последних признаются актуальность, интерактивность, интегрированность, мультимедийность и гипертекстуальность. Автор выдвигает идею создания Большого научного интегрального сетевого словаря русского языка. The article deals with a general description of the types of network dictionaries. A distinction is made between «reprint» dictionaries, which are electronic text copies of traditional dictionaries posted on the Internet, electronic versions of traditional dictionaries that have received electronic marking, hyperlinks (hypertext) to related descriptions of meaning interpretations and equipped with search forms, dictionary corpora (network guisauruses) – sites for simultaneous search in different electronic versions of dictionaries (linguistic and encyclopedic), and network dictionaries created only in electronic format. Relevance, interactivity, integration, multimedia and hypertextuali ty are considered to be the basic features of network dictionaries of electronic format. The author puts forward the idea of creating a Large scientific integrated network dictionary of the Russian language.


2021 ◽  
Vol 8 (4) ◽  
Author(s):  
Ilias Loumos

The present didactic intervention aims to highlight the effective use of Electronic Text Corpora in the teaching approach of the Ancient Greek course. In particular, the teaching of a unit found in the school textbook of the 3rd grade of lower secondary school is examined at its vocabulary and semantic level, using the Digital Resources (http://www.greek-language.gr/digitalResources/) for the Greek language and the Portal for the Greek language (http://www.greek-language.gr/greekLang/index.html). Using Electronic Text Corpora, students take part in the learning process through a critical way by building an interactive and communicative learning environment. The dynamic use of ETC in the teaching process can constitute the bridge between traditional and new literacy in the Information Society and Communication. <p> </p><p><strong> Article visualizations:</strong></p><p><img src="/-counters-/edu_01/0774/a.php" alt="Hit counter" /></p>


2021 ◽  
Vol 33 (4) ◽  
pp. 131-146
Author(s):  
Alexander Vasilievich Kozachok ◽  
Sergey Alexandrovich Kopylov ◽  
Pavel Nikolaevich Gorbachev ◽  
Artur Evgenevich Gaynov ◽  
Boris Vladimirovich Kondrat’ev

The article presents an electronic text documents marking algorithm based on the identification information embedding by changing the values of the intervals between words (interwords distance shifting). The algorithm development is aimed at increasing the documents containing text information security from leakage through the channel due to the transfer of documents printed on paper, as well as the corresponding electronic copies of paper documents. In the marking algorithm developing process, an existing tools analysis of protecting paper documents from leakage was carried out, practical solutions in the field of protecting text documents were considered, their advantages and disadvantages were determined. The interwods distance shifting algorithm acts as an approach to the information embedding in electronic documents. Changing the values of interwords distance is based on embedding the normalized space in the selected areas of text lines and adjusting the remaining values of the spacing between words by the calculated values. To invariance ensure of the embedded marker for printing and subsequent scanning or photographing, formation algorithms of embedding regions and embedding matrix have been developed. In the embedding regions forming process from the text lines of the source document, arrays of spaces are formed, consisting of pairs: four and two spaces or two spaces. By means of the embedded information in the formed areas, the places where the normalized space is inserted is determined. In the embedding a marker process, an embedding matrix is formed, containing the values of the word displacement, and it is embedded in the original document in the process of printing. The developed marking algorithm usage makes it possible to introduce a marker into the electronic document text structure that is invariant to the format transformation of an electronic document into a paper one and vice versa. In addition, the developed marking algorithm features and limitations are presented. Directions for further research identified.


2020 ◽  
pp. 156-182
Author(s):  
Terry Walker ◽  
Peter J. Grund

This chapter explores speech representation structures in Early Modern English that exhibit a mixture of direct speech and indirect speech. Drawing data from an Electronic Text Edition of Depositions 1560–1760 (ETED), we chart the frequency and characteristics of different types of speech representation that overlap between direct and indirect speech (such as the mixture of third-person and first-person reference, and the use of reporting expression + that + direct speech representation). We show that accounting for such uses as “slipping,” free indirect speech, and/or signs of a system under development is less convincing. Instead, we argue that the mixture should be seen as exploitation of speech representation resources for various sociopragmatic and communicative purposes, such as disambiguating voices and shifting responsibilities for the speech report. The chapter thus contributes to the broader goal of enhancing our understanding of the sociopragmatics of speech representation in the history of English.


2020 ◽  
pp. 66-80
Author(s):  
Marcin Woliński ◽  
Witold Kieraś

The subject matter of this paper is Chronofl eks, a computer system (http:// chronofl eks.nlp.ipipan.waw.pl/) modelling Polish infl ection based on a corpus material. The system visualises changes of infl ectional paradigms of individual lexemes over time and enables examination of the variability of the frequency of infl ected form groups distinguished based on various criteria. Feeding Chronofl eks with corpus data required development of IT tools to ensure an infl ectional processing sequence of texts analogous to the ones used for modern language; they comprise a transcriber, a morphological analyser, and a tagger. The work was performed on data from three historical periods (1601–1772, 1830–1918, and modern ones) elaborated in independent projects. Therefore, fi nding a common manner of describing data from the individual periods was a signifi cant element of the work. Keywords: electronic text corpus – natural language processing – infl ection of Polish – history of language


Author(s):  
Elspeth Haston ◽  
Alex Hardisty

Digitisation is the process of converting analogue data about physical specimens to digital representation that includes electronic text, images and other forms. The term has been used diversely within the natural science collections community, and between different digitisation initiatives, the outputs can be quite different. Digitisation of individual specimens provides explicit and precise details about each object curated in a collection. This digitisation is based on diverse aims, the needs of specific projects and the specific practices and workflows in different institutions, so the digitised output has a wide range of uses. Capturing and presenting such data from future digitisation in standard formats is essential so that data can be more easily understood, compared, analysed and communicated via the Internet. By harmonising a framework that clarifies what is meant by different levels of digitisation (MIDS level), as well as the minimum information to be captured at each level, it becomes easier to consistently measure the extent of digitisation achieved over time and to set priorities for the remaining work. Similarly, ensuring that enough data are captured, curated and published is essential so they are useful for the widest possible range of future research, teaching and learning purposes. The Minimum Information about a Digital Specimen (MIDS) specification aims to address these problems. MIDS is a 'minimum specification', which means that the information specified as necessary at a each MIDS level is the minimum expected to be made digitally available following each major stage of digitisation. More is not precluded. From September 2020, MIDS specification work is now the work topic of an approved TDWG Task Group.


Sign in / Sign up

Export Citation Format

Share Document