machine processing
Recently Published Documents


TOTAL DOCUMENTS

96
(FIVE YEARS 22)

H-INDEX

4
(FIVE YEARS 1)

2021 ◽  
Vol 2021 (2) ◽  
pp. 19-23
Author(s):  
Anastasiya Ivanova ◽  
Aleksandr Kuz'menko ◽  
Rodion Filippov ◽  
Lyudmila Filippova ◽  
Anna Sazonova ◽  
...  

The task of producing a chatbot based on a neural network supposes machine processing of the text, which in turn involves using various methods and techniques for analyzing phrases and sentences. The article considers the most popular solutions and models for data analysis in the text format: methods of lemmatization, vectorization, as well as machine learning methods. Particular attention is paid to the text processing techniques, after their analyzing the best method was identified and tested.


2021 ◽  
Vol 23 (1) ◽  
pp. 153-160
Author(s):  
I.O Olaoye ◽  
Y.A. Salako ◽  
B.D. Odugbose ◽  
O.K Owolarafe

The effect of processing conditions such as machine shaft speed, loading and level of ripeness of the Spondias mombin fruit on quality (i.e moisture, ash, fibre, fat and protein contents) of juice extracted were investigated in this study using a newly designed juice extractor for Spondias mombin fruit. The moisture content of the extracted juice was observed to initially decrease as the shaft speed increased from 120 to 130 rpm and then increased with increase in shaft speed from 130 to150 rpm. Increase in loading from 5 to 15 kg per time increased the moisture content of the juice at different shaft speeds. As the shaft speed and rate of loading per time increases, the ash content of the juice also increases. Increase in shaft speed also increased the fibre, fat and protein contents of the juice. The effect of the processing conditions considered indicates that separate and interactive effects of the three factors on the qaulity parameters of the juice were significant (p<0.05). Keywords: Hog plum, Ripeness, Machine, Processing, Juice, Quality


2021 ◽  
Vol 27 (3) ◽  
pp. 184-190
Author(s):  
Oksana Tyshchenko

The subject of the research is machine recognition of handwritten materials of the Archival Card Index (ACI) — lexical and phraseological materials of the dictionary commission of the All-Ukrainian Academy of Sciences, in particular, card index of the “Russian-Ukrainian dictionary” 1924–1933 ed. A. Krymsky and S. Yefremov. The study of the ACI should be considered in the context of cultural and national revival in Ukraine in the 20th — early 21st centuries. The relevance and value of the ACI became a prerequisite for the transfer of its materials to the digital format. In 2018 the Institute of Ukrainian Language of the NAS of Ukraine created a computer system “Archival Card Index”, which accessibles materials primarily in the form of scanned images. The problem that needs urgent resolution is the transfer of handwriting to a typewriter format. The complexity of manual recognition, which requires considerable effort and time, encourages the study and application of Transkribus resource capabilities, which involves the use of the machine teaching. The Aim of the study is to clarify by analyzing, systematizing, classifying and describing the material features of the preparation of ACI cards for machine processing of texts. The scientific novelty of the study is that for the first time, the issue of providing the HTR engine with ACI training data (loading to the platform, segmenting images into lines and text areas, transcribing content each page). The main result is finding out the content of the preparatory stage, the tasks of which are to eliminate the flaws of automatic segmentation: non-text elements, non-substantial text elements, incorrect automatic detection of text region or line. The prospects of lexicographic toloka (crowdsourcing) in the process of card recognition are outlined, for which it is envisaged to use collective access to the collection of transcribed documents in Transkribus. To recognize the cards manually and for the future check and adjustment of automatically recognized ones, you can join the new project “All-Ukrainian Toloka: Archival Card Index” — online platform on the website “ACI”.


Author(s):  
S. Kornienko ◽  
◽  
I. Ismakaeva ◽  

The article discusses the need and problems of organizing sources of data for the study of ideological and political and agitation-propaganda discourses of the “reds” and “whites” during the Civil War based on materials from the Perm province newspapers of 1918–1919. It is noted that the solution to these problems is determined by the tasks of study, using digital technologies and mainly reduced to ensuring the machine readability of data sources, their structuring and organization based on forms that allow machine processing. The main ways to solve these problems are the creation of digital sources of complexes based on source-oriented information systems, arrays in the form of file collections of publications in text formats and data in tabular forms. It is shown that solving the problems of organizing data creates the necessary conditions for the effective use of digital methods of analysis and obtaining the expected results at subsequent, analytical stages of the study.


2020 ◽  
Vol Special Issue on Collecting,... ◽  
Author(s):  
Caroline T. Schroeder ◽  
Amir Zeldes

Scholarship on underresourced languages bring with them a variety of challenges which make access to the full spectrum of source materials and their evaluation difficult. For Coptic in particular, large scale analyses and any kind of quantitative work become difficult due to the fragmentation of manuscripts, the highly fusional nature of an incorporational morphology, and the complications of dealing with influences from Hellenistic era Greek, among other concerns. Many of these challenges, however, can be addressed using Digital Humanities tools and standards. In this paper, we outline some of the latest developments in Coptic Scriptorium, a DH project dedicated to bringing Coptic resources online in uniform, machine readable, and openly available formats. Collaborative web-based tools create online 'virtual departments' in which scholars dispersed sparsely across the globe can collaborate, and natural language processing tools counterbalance the scarcity of trained editors by enabling machine processing of Coptic text to produce searchable, annotated corpora. Comment: 9 pages; paper presented at the Stanford University CESTA Workshop "Collecting, Preserving and Disseminating Endangered Cultural Heritage for New Understandings Through Multilingual Approaches"


Author(s):  
Bodhvi Gaur ◽  
Gurpreet Singh Saluja ◽  
Hamsa Bharathi Sivakumar ◽  
Sanjay Singh

Abstract A job seeker’s resume contains several sections, including educational qualifications. Educational qualifications capture the knowledge and skills relevant to the job. Machine processing of the education sections of resumes has been a difficult task. In this paper, we attempt to identify educational institutions’ names and degrees from a resume’s education section. Usually, a significant amount of annotated data is required for neural network-based named entity recognition techniques. A semi-supervised approach is used to overcome the lack of large annotated data. We trained a deep neural network model on an initial (seed) set of resume education sections. This model is used to predict entities of unlabeled education sections and is rectified using a correction module. The education sections containing the rectified entities are augmented to the seed set. The updated seed set is used for retraining, leading to better accuracy than the previously trained model. This way, it can provide a high overall accuracy without the need of large annotated data. Our model has achieved an accuracy of 92.06% on the named entity recognition task.


2020 ◽  
Author(s):  
Matheus Santana Carvalho ◽  
Benjamin Grando Moreira ◽  
Sueli Fischer Beckert

Metrology is responsible for the studies of aspects that involve themeasurements application, being a common area to engineering inthe search for continuous improvement and quality in the processand products. To approximate the machine processing, factory floor,and development of new applications, continuous technologicaldevelopment is demanded from the industrial sector and modernizationof its process. In this scenario, Metrology 4.0 uses technologiesin the traditional process ensuring the data quality, reliability, andsupports decision in real-time. For the innovation of the traditionalmodels of calibration, this paper introduces a software developmentto realize the comparation of measuring tape with a standard, usingComputer Vision, and compares the results of this process with thetraditional calibration methods.


2020 ◽  
Vol 10 (16) ◽  
pp. 5505
Author(s):  
Liang-Ching Chen ◽  
Kuei-Hu Chang ◽  
Hsiang-Yu Chung

With developments of modern and advanced information and communication technologies (ICTs), Industry 4.0 has launched big data analysis, natural language processing (NLP), and artificial intelligence (AI). Corpus analysis is also a part of big data analysis. For many cases of statistic-based corpus techniques adopted to analyze English for specific purposes (ESP), researchers extracted critical information by retrieving domain-oriented lexical units. However, even if corpus software embraces algorithms such as log-likelihood tests, log ratios, BIC scores, etc., the machine still cannot understand linguistic meanings. In many ESP cases, function words reduce the efficiency of corpus analysis. However, many studies still use manual approaches to eliminate function words. Manual annotation is inefficient and time-wasting, and can easily cause information distortion. To enhance the efficiency of big textual data analysis, this paper proposes a novel statistic-based corpus machine processing approach to refine big textual data. Furthermore, this paper uses COVID-19 news reports as a simulation example of big textual data and applies it to verify the efficacy of the machine optimizing process. The refined resulting data shows that the proposed approach is able to rapidly remove function and meaningless words by machine processing and provide decision-makers with domain-specific corpus data for further purposes.


2020 ◽  
pp. 105-125
Author(s):  
Gulmira N. Kenzhebalina ◽  
Gulvira K. Shaikova ◽  
Maygul T. Shakenova ◽  
Inessa G. Akoyeva

The problem of recognition of the manipulative text in the news on the example of Kazakhstani media is posed. One of the most important tasks of text research in the context of manipulativeness is to find something that, although it has a linguistic representation, is obviously pragmatically “obscured”, hidden from the addressee. It is emphasized that in such cases, the danger of manipulation increases at times, since it is addressed both to large social groups and to society as a whole. The relevance of the work is that the destructive nature of manipulation can lead to large-scale social upheaval. The study is based on the hypothesis of a uniform structure of the manipulative text in the context of big data research for further machine processing. Based on the formed working definition of manipulability, an algorithm for analyzing media texts has been developed, which includes a complex of speech and language indicators. The work was carried out on a corpus of 1,000 media texts, selected from 10,000 fragments of manipulative discourses. Both unambiguous and mixed components are distinguished, that is, linguistic units capable of performing dual functions due to their semantic polysemy. A 4-component universal structure of manipulative texts is compiled, including language parameters extracted from manipulative texts.


Sign in / Sign up

Export Citation Format

Share Document