Reduction with Application to Pattern Recognition in Large Databases

Author(s):  
Irina Perfilieva ◽  
Petr Hurtik
Author(s):  
Lior Shamir

Abstract Computing machines allow quantitative analysis of large databases of text, providing knowledge that is difficult to obtain without using automation. This article describes Universal Data Analysis of Text (UDAT) —a text analysis method that extracts a large set of numerical text content descriptors from text files and performs various pattern recognition tasks such as classification, similarity between classes, correlation between text and numerical values, and query by example. Unlike several previously proposed methods, UDAT is not based on frequency of words and links between certain key words and topics. The method is implemented as an open-source software tool that can provide detailed reports about the quantitative analysis of sets of text files, as well as exporting the numerical text content descriptors in the form of comma-separated values files to allow statistical or pattern recognition analysis with external tools. It also allows the identification of specific text descriptors that differentiate between classes or correlate with numerical values and can be applied to problems related to knowledge discovery in domains such as literature and social media. UDAT is implemented as a command-line tool that runs in Windows, and the open source is available and can be compiled in Linux systems. UDAT can be downloaded from http://people.cs.ksu.edu/∼lshamir/downloads/udat.


2021 ◽  
Vol 2073 (1) ◽  
pp. 012009
Author(s):  
O D Montoya ◽  
D D Narváez ◽  
C A Ramárez Vanegas

Abstract The following article presents the analysis through mathematical and physical techniques of large databases, which are very common today, due to the large number of variables (especially in the information and physics industry) and the amount of information that results from a process, therefore an analysis is necessary that allows the Decision in a responsible manner, looking for scientific criteria that support said decisions, in our case a database of the forex system will be taken. Initially, a study and calculation of different measurements between the samples and their characteristics will be carried out to make a good prediction of the data and their behavior using different classification methods inspired by basic sciences. Below is an explanation of the techniques based on the analysis of data components and the correlations that exist between the variables, which is a technique widely used in physical processes to determine the correlations between variables.


Author(s):  
G.Y. Fan ◽  
J.M. Cowley

In recent developments, the ASU HB5 has been modified so that the timing, positioning, and scanning of the finely focused electron probe can be entirely controlled by a host computer. This made the asynchronized handshake possible between the HB5 STEM and the image processing system which consists of host computer (PDP 11/34), DeAnza image processor (IP 5000) which is interfaced with a low-light level TV camera, array processor (AP 400) and various peripheral devices. This greatly facilitates the pattern recognition technique initiated by Monosmith and Cowley. Software called NANHB5 is under development which, instead of employing a set of photo-diodes to detect strong spots on a TV screen, uses various software techniques including on-line fast Fourier transform (FFT) to recognize patterns of greater complexity, taking advantage of the sophistication of our image processing system and the flexibility of computer software.


Author(s):  
L. Fei ◽  
P. Fraundorf

Interface structure is of major interest in microscopy. With high resolution transmission electron microscopes (TEMs) and scanning probe microscopes, it is possible to reveal structure of interfaces in unit cells, in some cases with atomic resolution. A. Ourmazd et al. proposed quantifying such observations by using vector pattern recognition to map chemical composition changes across the interface in TEM images with unit cell resolution. The sensitivity of the mapping process, however, is limited by the repeatability of unit cell images of perfect crystal, and hence by the amount of delocalized noise, e.g. due to ion milling or beam radiation damage. Bayesian removal of noise, based on statistical inference, can be used to reduce the amount of non-periodic noise in images after acquisition. The basic principle of Bayesian phase-model background subtraction, according to our previous study, is that the optimum (rms error minimizing strategy) Fourier phases of the noise can be obtained provided the amplitudes of the noise is given, while the noise amplitude can often be estimated from the image itself.


1989 ◽  
Vol 34 (11) ◽  
pp. 988-989
Author(s):  
Erwin M. Segal
Keyword(s):  

2008 ◽  
Author(s):  
Bradley C. Stolbach ◽  
Frank Putnam ◽  
Melissa Perry ◽  
Karen Putnam ◽  
William Harris ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document