scholarly journals Automation of Archival Documents Meta Tagging

Daniil Filimonov ◽  
Andrey Svetlov ◽  
Oksana Gorban ◽  
Marina Kosova

The main goal of this project is to create a corpus of documents from the «Mikhailovsky stanichny ataman» archival fund. The methods of corpus linguistics seem to be the most optimal in this case, since they involve the processing of a large number of texts in order to solve a wide variety of linguistic problems. Our group joined the team of philologists to provide the technical and software part of the project. The main task for us is to create a document corpus engine, that is, software that solves the tasks of storing a database of marked-up texts, executing queries to this database, and also providing users with a convenient interface for work that does not require special qualifications in the field of information technology. However, it is necessary to prepare documents for inclusion in the corpus: all texts must undergo special markup. There are many types of markup, and in the previous publications [6; 9] our group has already described the solution to the problem of morphological tagging. This article is about meta tagging. Meta tagging refers to the assignment of certain descriptive attributes to text. In the case of office documents, these are such parameters as the type of document (genre), author (compiler), addressee, date and place of creation. Meta tagging is necessary for the implementation of the corpus search features, so that the researchers can receive text samples with specified external parameters: for example, texts of a certain type, created at a certain period, addressed to a certain addressee, etc. The archives of the «Mikhailovsky stanichny ataman» fund mainly contain documents from the Chanceries of the Don Army from the mid-18th to the first third of the 19th century, that’s why there are not so many varieties of these documents. Moreover, these are mostly official documents, and they were written up according to certain templates, forms, the parameters of which can be relatively easily extracted from documents through preliminary analysis. This work is also carried out by the team of philologists from VolSU under the guidance of Professor O.A. Gorban. The result of their systematization of documents was the description of special speech markers of genre parameters for all document types in the archive. Thus, in our case, there is no need for heavy methods of statistical analysis or machine learning, it is enough to search for certain markers in the document. Moreover, the main marker in all reviewed documents is a direct indication of their type. Other markers are auxiliary elements of meta tagging. The paper is devoted to the description of the created application for determining the type of a document and its meta tagging by searching the text for certain regular expressions derived from the markers.

Anatoly Komendantov ◽  
Alexander Matveev ◽  
Andrey Svetlov

The paper provides the description of the add-on to the stemming tool MyStem by I. Segalovich. We designe the application to add the MyStem a convenient graphical interface that is easy to learn and intuitive for users who do not specialize in information technology. It turned out that MyStem correctly processes outdated vocabulary if it is passed into the program using modern Cyrillic. In addition to the convenient interface, our program has the option to work with the outdated Cyrillic alphabet, when turned on, for instance, the letters zelo and omega are being replaced by «ks» and «o» respectively, and only then the text is transferring for analysis to MyStem, and then the characters are being replaced back in the processed document. So our add-on intercepts the output of the MyStem tool, reformatts and analyzes it in a special way. In addition, the application has functionality for removing homonyms manually if the program was not correct with automatic tagging the morphological characteristics of a word. The main purpose of this application is to prepare the morphological tagging of documents of the archival fund «Mikhailovsky Stanichny Ataman» to create a linguistic corpus. During the work on the application, we solved the problem with the correct processing of texts containing outdated Cyrillic characters. To implement the functional and user-friendly graphical interface, we use the JavaFX platform (OpenJFX).

2017 ◽  
Vol 4 (1) ◽  
pp. 137
Vlado Skračić

Dugi otok is the only large inhabited Adriatic island both with a name composed of two words, with a Croatian name and with a noun island (Croat. otok) in it. Almost all of the linguists and historians agree that the island was first mentioned by Constantine the Porphyrogenitus (10th cent.) as Pizych, which can nowadays be recognised in place names Čuh and Čuh Polje on Dugi otok near Proversa. By the disappearance of that settlement the name was forgotten, but none of the names of newly founded settlements did not became the nesonym, as frequently occurred elsewhere in Croatian nesonymy. In the archival documents and historical maps the island is usually identified by the Romance compound word: geographical term insula/isola + determinant Magna, Maiori, Grossa, Grande, Longa. The island was named Dugi only in the latter half of the 19th century. Neither the nesonym Dugi otok, the ethnic Dugootočanin nor the ktetic dugootočki are used outside the official usage.

2021 ◽  
Vol 01 (05) ◽  
pp. 13-22
T.V. Bogdanova ◽  

The study of the pre-revolutionary institution of governorship, its interaction with the central authorities is extremely important. Objective coverage of historical events predetermines a diverse interest in both national and local characters. Military and civilian governors of Imperial Russia have always been at the center of the political, economic and cultural life of individual provinces. They had a significant impact on provincial life not only due to personal qualities, but also due to the prevailing attitude towards them in public consciousness. In terms of importance, the governor for local officials and ordinary people was in second place after the monarch, and sometimes on the same level with him. However, such a perception by the local society of the figure of the governor did not exclude the fact that people could be enrolled in this position only by coincidence. The decisive role was played by the position taken by the monarch and his immediate entourage, and the real volume of power and the well-being of the region depended on the degree of trust of the central authorities in this or the new governor. Not only talented leaders were appointed to the governor's posts in the Finnish province (Old Finland), but officials who necessarily had organizational and administrative-managerial experience. Based on the preserved archival documents, the article tells about one of them - Ivan Ivanovich Vintere, whose administrative "rise" and "fall" reveal the peculiarities of interaction of various levels in the vertical of power at the beginning of the 19th century.

2019 ◽  
Vol 7 (1) ◽  
pp. 82-85
Geetha Swaminathan

In the 21st Century, the buzzword is often used in all fields is “Innovation". It is no wonder using Innovation in day to the conversation as well as striving for innovation execution at organisations in Information Technology (IT) sectors. When we need to talk about innovation in IT sectors in the fast-moving technology IT organisations, they are in a position in increasing its capability in its innovative product and services. There is a lot of benefits out of business innovations that are being reaped in IT companies; there are apparent disadvantages are also the outcome of them. It is quite common, despite all benefits and drawbacks, they are in apposition to survive in the global market. That becomes a great challenge to all IT organisations. In IT organisations which consist of departments such as Development, Testing, Consulting, Networking, Infrastructure, Process and having common platforms and legacy languages, Apart from that they are in the way of invading new technologies such as Digital, Mobile, IoT, Artificial Intelligence, Machine learning Cloud computing. In all the fields, as mentioned above and area, they need to do innovation to sustain their business. This paper will provide elaborate results on Pros and Cons of Business Innovation in IT Organization.

Евгения Константиновна Макаренко

Введение. Известный в дореволюционной России публицист и духовный писатель Евгений Поселянин (настоящая фамилия Погожев), пройдя путь сомнений в вере и получив духовное возрождение в Оптиной Пустыни, стал участником развернувшейся между интеллигенцией и представителями Русской Православной Церкви дискуссии начала XX в. Церковность эстетического сознания Е. Поселянина определила основную задачу всего его творчества, заключавшуюся в воспроизведении и передаче духовного мира Русского Православия. Цель. Творчество известного духовного писателя и публициста конца XIX – начала XX в. Евгения Николаевича Поселянина, совершенно забытое на несколько десятилетий советской эпохи, требует реабилитации и серьезного научного исследования. Материал и методы. Исследуется сборник жизнеописаний Е. Поселянина «Русские подвижники 19-го века» (1900 г.). Работа написана в русле исторической поэтики. Результаты и обсуждение. В литературной деятельности Поселянина отразились важнейшие духовно-культурные искания его современников и художественно-эстетические тенденции конца XIX – начала XX в. Религиозное возрождение начала XX в. привело к сдвигу границ внутри русской культуры, при котором произошло сближение и взаимовлияние богословия, философии, науки с художественной литературой, что отразилось на трансформации традиционных художественно-эстетических форм. В творчестве Е. Поселянина можно проследить, как церковные темы и православное содержание облекаются в характерные для светской литературы и отходящие от строгих жанровых канонов литературные формы, которые становятся более пластичными жанровыми образованиями, открытыми для выражения и передачи современным человеком опыта духовной жизни. Заключение. Книга Е. Поселянина «Русские подвижники 19-го века» представляет собой документ русской духовной жизни XVIII–XIX столетий. В этом сборнике биографических очерков традиционализм жизнеописания святого размывается жанровыми новациями: включением структурных элементов из других художественных и публицистических церковных жанров (патерики, проповеди, церковная история) и популярной в светской литературе беллетризованной мемуарно-биографической прозы. Introduction. Evgeny Poselyanin, a well-known publicist and spiritual writer in pre-revolutionary Russia, having traveled the path of doubts in faith and received a spiritual revival in Optina Pustyn, became a participant in the discussion between the intelligentsia and representatives of the Russian Orthodox Church at the beginning of the 20th century. The ecclesiastical nature of E. Poselyanin’s aesthetic consciousness determined the main task of all his work, which was to reproduce and transmit the spiritual world of Russian Orthodoxy. Aim and objectives. The work of the famous spiritual writer and publicist of the late 19th – early 20th centuries. Evgeny Nikolaevich Poselyanin, completely forgotten for several decades of the Soviet era, requires «rehabilitation» and serious scientific research. Material and methods. The article examines the collection of biographies of E. Poselyanin «Russian ascetics of the 19th century» (1900 edition). The research is written in the mainstream of historical poetics. Results and discussion. Poselyanin’s literary activity reflected the most important spiritual and cultural searches of his contemporaries and artistic and aesthetic tendencies of the late 19th – early 20th centuries. Religious revival of the early 20th century led to a shift in boundaries within Russian culture, during which there was a convergence and mutual influence of theology, philosophy, science with fiction, which was reflected in the transformation of traditional artistic and aesthetic forms. In the work of E. Poselyanin, one can trace how church themes and Orthodox content are clothed in literary forms characteristic of secular literature and departing from strict genre canons, which are becoming more plastic genre formations open for the expression and transmission of the experience of spiritual life by modern man. Conclusion. The book by E. Poselyanin «Russian ascetics of the 19th century» is a document of Russian spiritual life in the 18th – 19th centuries. In this collection of biographical sketches, the traditionalism of the life of the saint is eroded by genre innovations: the inclusion of structural elements from other artistic and journalistic church genres (paterics, sermons, church history) and fictionalized, memoir and biographical prose popular in secular literature.

2021 ◽  
Vol 18 (3) ◽  
pp. 284-298
Elena M. Shabshaevich

The article presents a focused look at the professional relations of the composer and pianist Anton Grigoryevich Rubinstein (1829—1894) with his main Russian publishers — V.V. Bessel and P.I. Jurgenson. The article is based on musical and historical research concerning the history of the Bessel and Jurgenson publishing houses, works on copyright, A.G. Rubinstein’s epistolary, and archival documents from the Russian National Museum of Music. For the first time in music science, there are revealed some pages of the history of personal and business contacts of the three named persons, primarily the conflicts related to the rights to publish the composer’s works in Russia. The first documented contract for the publications of A.G. Rubinstein was received by P.I. Jurgenson (for op. 82, 1868). However, the contract of A.G. Rubinstein with the trading house “Bessel and Co.”, concluded in 1871 (though Rubinstein’s first work had been published by Bessel two years earlier), was much more extensive and significant. Under this contract, it was supposed to publish more than fifty A.G. Rubinstein’s works of various genres, so in the 1870s, V.V. Bessel became the main Russian publisher of the composer. However, in 1879, A.G. Rubinstein unexpectedly changed his main publisher in Russia. This position was taken by P.I. Jurgenson, whose trading house also published an extensive list of Rubinstein’s compositions, as well as his literary works. This is evidenced by several notarized contracts, stored in the Russian National Museum of Music, between Rubinstein and “P.I. Jurgenson” company. Thus, the two leading Russian publishers of A.G. Rubinstein legally formalized their relations with the composer, which allows us to follow, in a reasoned and substantive way, the process of maturation of the institution of copyright for music publications in Russia in the last third of the 19th century.Using the example of A.G. Rubinstein, in comparison with the position of M.A. Balakirev, the article also raises the issue of granting copyright to a publisher not only in Russia, but also “forever and for all countries”. The comparative analysis of publications of the same composer by different publishing companies is also new to Russian musicology, this helps identify certain accents that publishers put in popularizing A.G. Rubinstein’s works. The publication of the composer’s works by various publishers also highlights new aspects in his creative process, in the history of the creation, receipt of the opus number, and the titles of some of his works.

Vijender Kumar Solanki ◽  
Nguyen Ha Huy Cuong ◽  
Zonghyu (Joan) Lu

The machine learning is the emerging research domain, from which number of emerging trends are available, among them opinion mining is the one technology attraction through which the we could get analysis of the interested domain or we can say about the review from the customer towards any product or we can say any upcoming trending information. These two are the emerging words and we can say it's the buzz word in the information technology. As you will see that its widely use by the corporate sector to uplift the business next level. Before two decade you will not read any words e.g., Opinion mining or Sentiment analysis, but in the last two decade these words have given a new life to information technology domain as well as to the business. The important question which runs in the mind is why use sentiment analysis or opinion mining. The information technology has given number of new programming languages, new innovation and within that the data mining has given this trends to the users. The chapter is covering the three major concept's which comes under the machine learning e.g., Decision tree, Bayesian network and Support vector machine. The chapter is describing the basic inputs, and how it helps in supporting stakeholders by adopting these technologies.

Tasneem Aamir

Digital enterprise transformation focuses on alignment of processes, products, services, business models, and technologies to perceive business value. Digital business integration in an organization utilizes information technology and its tools to drive and manage the life cycle of digital enterprise transformation. It utilizes the practices and approaches of IT governance with modern application tools and APIs. The millennium brought many technological advancements over internet technologies and these technologies operate numerous applications and business services. The span of digital enterprises is expanding and continues to grow with their evolution on a web scale. This chapter is an effort to present understanding about machine learning and automation around businesses intelligence and analytics on a web scale. The chapter provides a brief summary of technologies used in digital enterprise transformation for all the domains of an organization.

Sign in / Sign up

Export Citation Format

Share Document