machine processing Latest Research Papers

The task of producing a chatbot based on a neural network supposes machine processing of the text, which in turn involves using various methods and techniques for analyzing phrases and sentences. The article considers the most popular solutions and models for data analysis in the text format: methods of lemmatization, vectorization, as well as machine learning methods. Particular attention is paid to the text processing techniques, after their analyzing the best method was identified and tested.

Download Full-text

Effect of processing conditions on quality of juice extracted from hog plum fruit

Ife Journal of Science ◽

10.4314/ijs.v23i1.15 ◽

2021 ◽

Vol 23 (1) ◽

pp. 153-160

Author(s):

I.O Olaoye ◽

Y.A. Salako ◽

B.D. Odugbose ◽

O.K Owolarafe

Keyword(s):

Moisture Content ◽

Ash Content ◽

Interactive Effects ◽

Juice Quality ◽

Processing Conditions ◽

Shaft Speed ◽

Spondias Mombin ◽

Hog Plum ◽

Machine Processing

The effect of processing conditions such as machine shaft speed, loading and level of ripeness of the Spondias mombin fruit on quality (i.e moisture, ash, fibre, fat and protein contents) of juice extracted were investigated in this study using a newly designed juice extractor for Spondias mombin fruit. The moisture content of the extracted juice was observed to initially decrease as the shaft speed increased from 120 to 130 rpm and then increased with increase in shaft speed from 130 to150 rpm. Increase in loading from 5 to 15 kg per time increased the moisture content of the juice at different shaft speeds. As the shaft speed and rate of loading per time increases, the ash content of the juice also increases. Increase in shaft speed also increased the fibre, fat and protein contents of the juice. The effect of the processing conditions considered indicates that separate and interactive effects of the three factors on the qaulity parameters of the juice were significant (p<0.05). Keywords: Hog plum, Ripeness, Machine, Processing, Juice, Quality

Download Full-text

Recognition of identical twins based on the most distinctive region of the face: Human criteria and machine processing approaches

Multimedia Tools and Applications ◽

10.1007/s11042-020-10360-3 ◽

2021 ◽

Author(s):

Shokoufeh Mousavi ◽

Mostafa Charmi ◽

Hossein Hassanpoor

Keyword(s):

Identical Twins ◽

The Face ◽

Machine Processing

Download Full-text

Archive Card Index vs. Transkribus: machine recognition of handwritten text

Synopsis Text Context Media ◽

10.28925/2311-259x.2021.3.9 ◽

2021 ◽

Vol 27 (3) ◽

pp. 184-190

Author(s):

Oksana Tyshchenko

Keyword(s):

Automatic Segmentation ◽

Training Data ◽

Digital Format ◽

Handwritten Text ◽

Preparatory Stage ◽

Card Index ◽

Scanned Images ◽

Academy Of Sciences ◽

First Time ◽

Machine Processing

The subject of the research is machine recognition of handwritten materials of the Archival Card Index (ACI) — lexical and phraseological materials of the dictionary commission of the All-Ukrainian Academy of Sciences, in particular, card index of the “Russian-Ukrainian dictionary” 1924–1933 ed. A. Krymsky and S. Yefremov. The study of the ACI should be considered in the context of cultural and national revival in Ukraine in the 20th — early 21st centuries. The relevance and value of the ACI became a prerequisite for the transfer of its materials to the digital format. In 2018 the Institute of Ukrainian Language of the NAS of Ukraine created a computer system “Archival Card Index”, which accessibles materials primarily in the form of scanned images. The problem that needs urgent resolution is the transfer of handwriting to a typewriter format. The complexity of manual recognition, which requires considerable effort and time, encourages the study and application of Transkribus resource capabilities, which involves the use of the machine teaching. The Aim of the study is to clarify by analyzing, systematizing, classifying and describing the material features of the preparation of ACI cards for machine processing of texts. The scientific novelty of the study is that for the first time, the issue of providing the HTR engine with ACI training data (loading to the platform, segmenting images into lines and text areas, transcribing content each page). The main result is finding out the content of the preparatory stage, the tasks of which are to eliminate the flaws of automatic segmentation: non-text elements, non-substantial text elements, incorrect automatic detection of text region or line. The prospects of lexicographic toloka (crowdsourcing) in the process of card recognition are outlined, for which it is envisaged to use collective access to the collection of transcribed documents in Transkribus. To recognize the cards manually and for the future check and adjustment of automatically recognized ones, you can join the new project “All-Ukrainian Toloka: Archival Card Index” — online platform on the website “ACI”.

Download Full-text

Analysis of the ideological and political, agitation-propaganda discourses of the «red» and «white» during the Civil War: problem of studying and organizing data for research (based on materials from Perm provincial newspaper periodicals 1918–1919)

Historical research in the context of data science: Information resources, analytical methods and digital technologies ◽

10.29003/m1798.978-5-317-06529-4/120-126 ◽

2020 ◽

Author(s):

S. Kornienko ◽

◽

I. Ismakaeva ◽

Keyword(s):

Information Systems ◽

Civil War ◽

Necessary Conditions ◽

Digital Technologies ◽

Data Sources ◽

Political Agitation ◽

Digital Methods ◽

Effective Use ◽

Machine Readability ◽

Machine Processing

The article discusses the need and problems of organizing sources of data for the study of ideological and political and agitation-propaganda discourses of the “reds” and “whites” during the Civil War based on materials from the Perm province newspapers of 1918–1919. It is noted that the solution to these problems is determined by the tasks of study, using digital technologies and mainly reduced to ensuring the machine readability of data sources, their structuring and organization based on forms that allow machine processing. The main ways to solve these problems are the creation of digital sources of complexes based on source-oriented information systems, arrays in the form of file collections of publications in text formats and data in tabular forms. It is shown that solving the problems of organizing data creates the necessary conditions for the effective use of digital methods of analysis and obtaining the expected results at subsequent, analytical stages of the study.

Download Full-text

A Collaborative Ecosystem for Digital Coptic Studies

Journal of Data Mining & Digital Humanities ◽

10.46298/jdmdh.5969 ◽

2020 ◽

Vol Special Issue on Collecting,... ◽

Author(s):

Caroline T. Schroeder ◽

Amir Zeldes

Keyword(s):

Natural Language Processing ◽

Cultural Heritage ◽

Language Processing ◽

Large Scale ◽

Full Spectrum ◽

Web Based ◽

Uniform Machine ◽

Machine Readable ◽

Source Materials ◽

Machine Processing

Scholarship on underresourced languages bring with them a variety of challenges which make access to the full spectrum of source materials and their evaluation difficult. For Coptic in particular, large scale analyses and any kind of quantitative work become difficult due to the fragmentation of manuscripts, the highly fusional nature of an incorporational morphology, and the complications of dealing with influences from Hellenistic era Greek, among other concerns. Many of these challenges, however, can be addressed using Digital Humanities tools and standards. In this paper, we outline some of the latest developments in Coptic Scriptorium, a DH project dedicated to bringing Coptic resources online in uniform, machine readable, and openly available formats. Collaborative web-based tools create online 'virtual departments' in which scholars dispersed sparsely across the globe can collaborate, and natural language processing tools counterbalance the scarcity of trained editors by enabling machine processing of Coptic text to produce searchable, annotated corpora. Comment: 9 pages; paper presented at the Stanford University CESTA Workshop "Collecting, Preserving and Disseminating Endangered Cultural Heritage for New Understandings Through Multilingual Approaches"

Download Full-text

Semi-supervised deep learning based named entity recognition model to parse education section of resumes

Neural Computing and Applications ◽

10.1007/s00521-020-05351-2 ◽

2020 ◽

Author(s):

Bodhvi Gaur ◽

Gurpreet Singh Saluja ◽

Hamsa Bharathi Sivakumar ◽

Sanjay Singh

Keyword(s):

Neural Network ◽

Seed Set ◽

Named Entity Recognition ◽

Recognition Task ◽

Entity Recognition ◽

Educational Institutions ◽

Knowledge And Skills ◽

Named Entity ◽

Initial Seed ◽

Machine Processing

Abstract A job seeker’s resume contains several sections, including educational qualifications. Educational qualifications capture the knowledge and skills relevant to the job. Machine processing of the education sections of resumes has been a difficult task. In this paper, we attempt to identify educational institutions’ names and degrees from a resume’s education section. Usually, a significant amount of annotated data is required for neural network-based named entity recognition techniques. A semi-supervised approach is used to overcome the lack of large annotated data. We trained a deep neural network model on an initial (seed) set of resume education sections. This model is used to predict entities of unlabeled education sections and is rectified using a correction module. The education sections containing the rectified entities are augmented to the seed set. The updated seed set is used for retraining, leading to better accuracy than the previously trained model. This way, it can provide a high overall accuracy without the need of large annotated data. Our model has achieved an accuracy of 92.06% on the named entity recognition task.

Download Full-text

Desenvolvimento de um software para calibração de trenas utilizando visão computacional

10.14210/cotb.v11n1.p050-051 ◽

2020 ◽

Author(s):

Matheus Santana Carvalho ◽

Benjamin Grando Moreira ◽

Sueli Fischer Beckert

Keyword(s):

Data Quality ◽

Real Time ◽

Continuous Improvement ◽

Industrial Sector ◽

Calibration Methods ◽

Common Area ◽

Measuring Tape ◽

Traditional Process ◽

New Applications ◽

Machine Processing

Metrology is responsible for the studies of aspects that involve themeasurements application, being a common area to engineering inthe search for continuous improvement and quality in the processand products. To approximate the machine processing, factory floor,and development of new applications, continuous technologicaldevelopment is demanded from the industrial sector and modernizationof its process. In this scenario, Metrology 4.0 uses technologiesin the traditional process ensuring the data quality, reliability, andsupports decision in real-time. For the innovation of the traditionalmodels of calibration, this paper introduces a software developmentto realize the comparation of measuring tape with a standard, usingComputer Vision, and compares the results of this process with thetraditional calibration methods.

Download Full-text

A Novel Statistic-Based Corpus Machine Processing Approach to Refine a Big Textual Data: An ESP Case of COVID-19 News Reports

Applied Sciences ◽

10.3390/app10165505 ◽

2020 ◽

Vol 10 (16) ◽

pp. 5505

Author(s):

Liang-Ching Chen ◽

Kuei-Hu Chang ◽

Hsiang-Yu Chung

Keyword(s):

Big Data ◽

Data Analysis ◽

Big Data Analysis ◽

Corpus Analysis ◽

Function Words ◽

Information Distortion ◽

News Reports ◽

Processing Approach ◽

Textual Data ◽

Machine Processing

With developments of modern and advanced information and communication technologies (ICTs), Industry 4.0 has launched big data analysis, natural language processing (NLP), and artificial intelligence (AI). Corpus analysis is also a part of big data analysis. For many cases of statistic-based corpus techniques adopted to analyze English for specific purposes (ESP), researchers extracted critical information by retrieving domain-oriented lexical units. However, even if corpus software embraces algorithms such as log-likelihood tests, log ratios, BIC scores, etc., the machine still cannot understand linguistic meanings. In many ESP cases, function words reduce the efficiency of corpus analysis. However, many studies still use manual approaches to eliminate function words. Manual annotation is inefficient and time-wasting, and can easily cause information distortion. To enhance the efficiency of big textual data analysis, this paper proposes a novel statistic-based corpus machine processing approach to refine big textual data. Furthermore, this paper uses COVID-19 news reports as a simulation example of big textual data and applies it to verify the efficacy of the machine optimizing process. The refined resulting data shows that the proposed approach is able to rapidly remove function and meaningless words by machine processing and provide decision-makers with domain-specific corpus data for further purposes.

Download Full-text

Recognition of Manipulative Text: Structure and Dominant Language Parameters

Nauchnyy dialog ◽

10.24224/2227-1295-2020-7-105-125 ◽

2020 ◽

pp. 105-125

Author(s):

Gulmira N. Kenzhebalina ◽

Gulvira K. Shaikova ◽

Maygul T. Shakenova ◽

Inessa G. Akoyeva

Keyword(s):

Large Scale ◽

Text Structure ◽

Uniform Structure ◽

Working Definition ◽

Media Texts ◽

Universal Structure ◽

Definition Of ◽

Social Upheaval ◽

Linguistic Units ◽

Machine Processing

The problem of recognition of the manipulative text in the news on the example of Kazakhstani media is posed. One of the most important tasks of text research in the context of manipulativeness is to find something that, although it has a linguistic representation, is obviously pragmatically “obscured”, hidden from the addressee. It is emphasized that in such cases, the danger of manipulation increases at times, since it is addressed both to large social groups and to society as a whole. The relevance of the work is that the destructive nature of manipulation can lead to large-scale social upheaval. The study is based on the hypothesis of a uniform structure of the manipulative text in the context of big data research for further machine processing. Based on the formed working definition of manipulability, an algorithm for analyzing media texts has been developed, which includes a complex of speech and language indicators. The work was carried out on a corpus of 1,000 media texts, selected from 10,000 fragments of manipulative discourses. Both unambiguous and mixed components are distinguished, that is, linguistic units capable of performing dual functions due to their semantic polysemy. A 4-component universal structure of manipulative texts is compiled, including language parameters extracted from manipulative texts.

Download Full-text

machine processing
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

RESEARCHING METHODS FOR PROCESSING TEXT INFORMATION AND REVIEWING THE STAGES OF AN ARTIFICIAL INTELLIGENCE MODEL CREATION AT PRODUCING CHATBOTS

Effect of processing conditions on quality of juice extracted from hog plum fruit

Recognition of identical twins based on the most distinctive region of the face: Human criteria and machine processing approaches

Archive Card Index vs. Transkribus: machine recognition of handwritten text

Analysis of the ideological and political, agitation-propaganda discourses of the «red» and «white» during the Civil War: problem of studying and organizing data for research (based on materials from Perm provincial newspaper periodicals 1918–1919)

A Collaborative Ecosystem for Digital Coptic Studies

Semi-supervised deep learning based named entity recognition model to parse education section of resumes

Desenvolvimento de um software para calibração de trenas utilizando visão computacional

A Novel Statistic-Based Corpus Machine Processing Approach to Refine a Big Textual Data: An ESP Case of COVID-19 News Reports

Recognition of Manipulative Text: Structure and Dominant Language Parameters

Export Citation Format

machine processingRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

RESEARCHING METHODS FOR PROCESSING TEXT INFORMATION AND REVIEWING THE STAGES OF AN ARTIFICIAL INTELLIGENCE MODEL CREATION AT PRODUCING CHATBOTS

Effect of processing conditions on quality of juice extracted from hog plum fruit

Recognition of identical twins based on the most distinctive region of the face: Human criteria and machine processing approaches

Archive Card Index vs. Transkribus: machine recognition of handwritten text

Analysis of the ideological and political, agitation-propaganda discourses of the «red» and «white» during the Civil War: problem of studying and organizing data for research (based on materials from Perm provincial newspaper periodicals 1918–1919)

A Collaborative Ecosystem for Digital Coptic Studies

Semi-supervised deep learning based named entity recognition model to parse education section of resumes

Desenvolvimento de um software para calibração de trenas utilizando visão computacional

A Novel Statistic-Based Corpus Machine Processing Approach to Refine a Big Textual Data: An ESP Case of COVID-19 News Reports

Recognition of Manipulative Text: Structure and Dominant Language Parameters

machine processing
Recently Published Documents