scholarly journals Generating an Ordered Data Set from an OCR Text File

2014 ◽  
Author(s):  
Jon Crump

This tutorial illustrates strategies for taking raw OCR output from a scanned text, parsing it to isolate and correct essential elements of metadata, and generating an ordered data set (a python dictionary) from it.

2000 ◽  
Vol 54 (4) ◽  
pp. 486-495 ◽  
Author(s):  
Rohit Bhargava ◽  
Shi-Qing Wang ◽  
Jack L. Koenig

FT-IR imaging employing a focal plane array (FPA) detector is often plagued by low signal-to-noise ratio (SNR) data. A mathematical transform that re-orders spectral data points into decreasing order of SNR is employed to reduce noise by retransforming the ordered data set using only a few relevant data points. This approach is shown to result in significant gains in terms of image fidelity by examining microscopically phase-separated composites termed polymer dispersed liquid crystals (PDLCs). The actual gains depend on the SNR characteristics of the original data. Noise is reduced by a factor greater than 5 if the noise in the initial data is sufficiently low. For a moderate absorbance level of 0.5 a.u., the achievable SNR by reducing noise is greater than 100 for a collection time of less than 4 min. The criteria for optimal application of a noise-reducing procedure employing the minimum noise fraction (MNF) transform are discussed and various variables in the process quantified. This noise reduction is shown to provide high-quality images for accurate morphological analysis. The coupling of mathematical transformation techniques with spectroscopic Fourier transform infrared (FT-IR) imaging is shown to result in high-fidelity images without increasing collection time or drastically modifying hardware.


2020 ◽  
Vol 24 (5) ◽  
pp. 1029-1042
Author(s):  
Jerry Lonlac ◽  
Engelbert Mephu Nguifo

Mining frequent simultaneous attribute co-variations in numerical databases is also called frequent gradual pattern problem. Few efficient algorithms for automatically extracting such patterns have been reported in the literature. Their main difference resides in the variation semantics used. However in applications with temporal order relations, those algorithms fail to generate correct frequent gradual patterns as they do not take this temporal constraint into account in the mining process. In this paper, we propose an approach for extracting frequent gradual patterns for which the ordering of supporting objects matches the temporal order. This approach considerably reduces the number of gradual patterns within an ordered data set. The experimental results show the benefits of our approach.


Author(s):  
Brad Morantz

Mining a large data set can be time consuming, and without constraints, the process could generate sets or rules that are invalid or redundant. Some methods, for example clustering, are effective, but can be extremely time consuming for large data sets. As the set grows in size, the processing time grows exponentially. In other situations, without guidance via constraints, the data mining process might find morsels that have no relevance to the topic or are trivial and hence worthless. The knowledge extracted must be comprehensible to experts in the field. (Pazzani, 1997) With time-ordered data, finding things that are in reverse chronological order might produce an impossible rule. Certain actions always precede others. Some things happen together while others are mutually exclusive. Sometimes there are maximum or minimum values that can not be violated. Must the observation fit all of the requirements or just most. And how many is “most?” Constraints attenuate the amount of output (Hipp & Guntzer, 2002). By doing a first-stage constrained mining, that is, going through the data and finding records that fulfill certain requirements before the next processing stage, time can be saved and the quality of the results improved. The second stage also might contain constraints to further refine the output. Constraints help to focus the search or mining process and attenuate the computational time. This has been empirically proven to improve cluster purity. (Wagstaff & Cardie, 2000)(Hipp & Guntzer, 2002) The theory behind these results is that the constraints help guide the clustering, showing where to connect, and which ones to avoid. The application of user-provided knowledge, in the form of constraints, reduces the hypothesis space and can reduce the processing time and improve the learning quality.


Geophysics ◽  
1995 ◽  
Vol 60 (6) ◽  
pp. 1875-1886 ◽  
Author(s):  
Sara Rajasekaran ◽  
George A. McMechan

A new wave‐equation–based prestack seismic processing system is proposed. This system has only two essential elements; velocity analysis and depth migration. This approach applies truly surface‐consistent statics corrections, regardless of the amount of elevation, change or of near‐surface velocity variation. It uses tomography for estimating the details of shallow velocities and a finite‐difference solution of the two‐way wave‐equation both for computation of image times and for data extrapolation in migration. A field data set that violates most of the assumptions in conventional common midpoint (CMP) processing, because of severe elevation changes and near‐surface velocity variations, is successfully processed. The final depth section reveals a complicated fold‐thrust geometry that was not visible after CMP processing.


2014 ◽  
Vol 25 (1) ◽  
pp. 1-28
Author(s):  
Chun-Hee Lee ◽  
Chin-Wan Chung

Although there have been many compression schemes for reducing data effectively, most schemes do not consider the reordering of data. In the case of unordered data, if the users change the data order in a given data set, the compression ratio may be improved compared to the original compression before reordering data. However, in the case of ordered data, the users need a mapping table that maps the original position to the changed position in order to recover the original order. Therefore, reordering ordered data may be disadvantageous in terms of space. In this paper, the authors consider two compression schemes, run-length encoding and bucketing scheme as bases for showing the impact of data reordering in compression schemes. Also, the authors propose various optimization techniques related to data reordering. Finally, the authors show that the compression schemes with data reordering are better than the original compression schemes in terms of the compression ratio.


1994 ◽  
Vol 144 ◽  
pp. 139-141 ◽  
Author(s):  
J. Rybák ◽  
V. Rušin ◽  
M. Rybanský

AbstractFe XIV 530.3 nm coronal emission line observations have been used for the estimation of the green solar corona rotation. A homogeneous data set, created from measurements of the world-wide coronagraphic network, has been examined with a help of correlation analysis to reveal the averaged synodic rotation period as a function of latitude and time over the epoch from 1947 to 1991.The values of the synodic rotation period obtained for this epoch for the whole range of latitudes and a latitude band ±30° are 27.52±0.12 days and 26.95±0.21 days, resp. A differential rotation of green solar corona, with local period maxima around ±60° and minimum of the rotation period at the equator, was confirmed. No clear cyclic variation of the rotation has been found for examinated epoch but some monotonic trends for some time intervals are presented.A detailed investigation of the original data and their correlation functions has shown that an existence of sufficiently reliable tracers is not evident for the whole set of examinated data. This should be taken into account in future more precise estimations of the green corona rotation period.


Author(s):  
Jules S. Jaffe ◽  
Robert M. Glaeser

Although difference Fourier techniques are standard in X-ray crystallography it has only been very recently that electron crystallographers have been able to take advantage of this method. We have combined a high resolution data set for frozen glucose embedded Purple Membrane (PM) with a data set collected from PM prepared in the frozen hydrated state in order to visualize any differences in structure due to the different methods of preparation. The increased contrast between protein-ice versus protein-glucose may prove to be an advantage of the frozen hydrated technique for visualizing those parts of bacteriorhodopsin that are embedded in glucose. In addition, surface groups of the protein may be disordered in glucose and ordered in the frozen state. The sensitivity of the difference Fourier technique to small changes in structure provides an ideal method for testing this hypothesis.


Author(s):  
D. E. Becker

An efficient, robust, and widely-applicable technique is presented for computational synthesis of high-resolution, wide-area images of a specimen from a series of overlapping partial views. This technique can also be used to combine the results of various forms of image analysis, such as segmentation, automated cell counting, deblurring, and neuron tracing, to generate representations that are equivalent to processing the large wide-area image, rather than the individual partial views. This can be a first step towards quantitation of the higher-level tissue architecture. The computational approach overcomes mechanical limitations, such as hysterisis and backlash, of microscope stages. It also automates a procedure that is currently done manually. One application is the high-resolution visualization and/or quantitation of large batches of specimens that are much wider than the field of view of the microscope.The automated montage synthesis begins by computing a concise set of landmark points for each partial view. The type of landmarks used can vary greatly depending on the images of interest. In many cases, image analysis performed on each data set can provide useful landmarks. Even when no such “natural” landmarks are available, image processing can often provide useful landmarks.


Sign in / Sign up

Export Citation Format

Share Document