EDIT DISTANCE WITH COMBINATIONS AND SPLITS AND ITS APPLICATIONS IN OCR NAME MATCHING

2009 ◽  
Vol 20 (06) ◽  
pp. 1047-1068 ◽  
Author(s):  
MANOLIS CHRISTODOULAKIS ◽  
GERHARD BREY

Approximate pattern matching has a wide range of applications and, depending on the type of approximation, there exist numerous algorithms for solving it. In this article we focus on texts which originate from OCRed documents, whose errors quite often have a particular form and are far from being random errors. We introduce a new variant of the edit distance metric, where apart from the traditional edit operations, two new operations are supported. The combination operation allows two or more symbols from a string x to be interpreted as a single symbol and then "matched" (or aligned) against a single symbol of a second string y. Its dual is the operation of a split, where a single symbol from x is broken down into a sequence of two or more other symbols, that can then be matched against an equal number of symbols from y. Our algorithm requires O(L) time for preprocessing, and O(mnk) time for computing the edit distance, where L is the total length of all the valid combinations/splits, m and n are the lengths of the two strings under comparison and k is an upper bound on the number of valid splits for any single symbol. The expected running time is O(mn).

Author(s):  
Jeffrey L. Adler

For a wide range of transportation network path search problems, the A* heuristic significantly reduces both search effort and running time when compared to basic label-setting algorithms. The motivation for this research was to determine if additional savings could be attained by further experimenting with refinements to the A* approach. We propose a best neighbor heuristic improvement to the A* algorithm that yields additional benefits by significantly reducing the search effort on sparse networks. The level of reduction in running time improves as the average outdegree of the network decreases and the number of paths sought increases.


Agronomy ◽  
2021 ◽  
Vol 11 (8) ◽  
pp. 1499
Author(s):  
Ján Jobbágy ◽  
Peter Dančanin ◽  
Koloman Krištof ◽  
Juraj Maga ◽  
Vlastimil Slaný

Recently, the development of agricultural technology has been focused on achieving higher reliability and quality of work. The aim of the presented paper was to examine the possibilities of evaluating the quality of work of wide-area irrigation machinery by monitoring the coefficients of uniformity and non-uniformity of irrigation. The object of the research was pivot irrigation machinery equipped with sprinklers with a total length from 230 to 540 m. The commonly applied quality of work parameter for wide-range irrigators is the irrigation uniformity coefficient according to Heermann and Hein CUH. Work quality evaluations were also carried out through other parameters applicable in practice, such as irrigation uniformity coefficients calculated according to Christiansen CU, Wilcox and Swailes Cws, and our introduced parameters, the coefficient ar (derived from the degree of unevenness according to Oehler) and the degree of uniformity γr (derived from the degree of non-uniformity according to Voight). Other applied parameters for determining the quality of work of wide-range irrigation machinery were the coefficients of irrigation uniformity according to Hart and Reynolds CUhr, further according to Criddle CUcr and subsequently according to Beale and Howell CUbr. Next, the parameters of the non-uniformity coefficient according to Oehler a, the coefficient of variation according to Stefanelli Cv, the degree of non-uniformity according to Voigt γ and the degree of non-uniformity according to Hofmeister Ef were evaluated. Field tests were performed during the growing season of cultivated crops (potatoes, corn and sugar beet) in the village of Trakovice (agricultural enterprise SLOV-MART, southwest of the Slovakia) and in the district of Piešťany (Agrobiop, joint stock company). During the research, the inlet operating parameters (speed stage, inlet pressure, irrigation dose), technical parameters (number of sprayers, total length, number of chassis) and weather conditions (wind speed and temperature) were recorded. The obtained results were examined by one-way ANOVA analysis depending on the observed coefficient or input conditions and subsequently verified by Tukey and Duncan tests as needed. Irrigation uniformity values ranged from 67.58% (Cws) to 95.88% (CUbh) depending on the input conditions. Irrigation non-uniformity values ranged from 8.58 (a, Ef) to 32.42% (Cv). The results indicate a statistically significant effect of the site of interest and thus the impact of particular field conditions (p < 0.05). When evaluating the application of different coefficients of irrigation uniformity, the results showed a statistically significant effect only in the first test (p = 0.03, p < 0.05). During further repeated measurements, the quality of work increased due to the performed inspection of all sprayers and the reduction in the influence of the wind.


2021 ◽  
Vol 25 (2) ◽  
pp. 283-303
Author(s):  
Na Liu ◽  
Fei Xie ◽  
Xindong Wu

Approximate multi-pattern matching is an important issue that is widely and frequently utilized, when the pattern contains variable-length wildcards. In this paper, two suffix array-based algorithms have been proposed to solve this problem. Suffix array is an efficient data structure for exact string matching in existing studies, as well as for approximate pattern matching and multi-pattern matching. An algorithm called MMSA-S is for the short exact characters in a pattern by dynamic programming, while another algorithm called MMSA-L deals with the long exact characters by the edit distance method. Experimental results of Pizza & Chili corpus demonstrate that these two newly proposed algorithms, in most cases, are more time-efficient than the state-of-the-art comparison algorithms.


1968 ◽  
Vol 90 (1) ◽  
pp. 45-50
Author(s):  
R. G. Fenton

The upper bound of the average ram pressure, based on an assumed radial flow velocity field, is derived for plane strain extrusion. Ram pressures are calculated for a complete range of reduction ratios and die angles, considering a wide range of frictional conditions. Results are compared with upper-bound ram pressures obtained by considering velocity fields other than the radial flow field, and it is shown that for a considerable range of reduction ratios and die angles, the radial flow field yields better upper bounds for the average ram pressure.


2018 ◽  
Vol 26 (2) ◽  
pp. 237-267 ◽  
Author(s):  
Chao Qian ◽  
Yang Yu ◽  
Ke Tang ◽  
Yaochu Jin ◽  
Xin Yao ◽  
...  

In real-world optimization tasks, the objective (i.e., fitness) function evaluation is often disturbed by noise due to a wide range of uncertainties. Evolutionary algorithms are often employed in noisy optimization, where reducing the negative effect of noise is a crucial issue. Sampling is a popular strategy for dealing with noise: to estimate the fitness of a solution, it evaluates the fitness multiple ([Formula: see text]) times independently and then uses the sample average to approximate the true fitness. Obviously, sampling can make the fitness estimation closer to the true value, but also increases the estimation cost. Previous studies mainly focused on empirical analysis and design of efficient sampling strategies, while the impact of sampling is unclear from a theoretical viewpoint. In this article, we show that sampling can speed up noisy evolutionary optimization exponentially via rigorous running time analysis. For the (1[Formula: see text]1)-EA solving the OneMax and the LeadingOnes problems under prior (e.g., one-bit) or posterior (e.g., additive Gaussian) noise, we prove that, under a high noise level, the running time can be reduced from exponential to polynomial by sampling. The analysis also shows that a gap of one on the value of [Formula: see text] for sampling can lead to an exponential difference on the expected running time, cautioning for a careful selection of [Formula: see text]. We further prove by using two illustrative examples that sampling can be more effective for noise handling than parent populations and threshold selection, two strategies that have shown to be robust to noise. Finally, we also show that sampling can be ineffective when noise does not bring a negative impact.


PEDIATRICS ◽  
1985 ◽  
Vol 76 (5) ◽  
pp. 741-749
Author(s):  
Thomas B. Newman

To investigate the recent 150% increase in the reported incidence of ventricular septal defects (VSDs) in the United States, the epidemiology of ventricular septal defects was examined. The apparent incidence of VSDs is highly dependent on case finding methods, and more complete diagnosis and reporting probably account for the increase in reported incidence. Variations in case ascertainment also account for the small differences in incidence in studies from different places. The several known risk factors for VSD, including a family history of congenital heart disease and exposure to certain drugs, infectious agents, and maternal metabolic disturbances, explain few cases. Incidence rates are similar in different races and seasons and are unrelated to maternal age, birth order, sex, and socioeconomic status. VSDs occur naturally in a wide range of mammals and in birds, which also have four-chambered hearts. Despite identical genes and similar prenatal environments, the concordance rate in identical twins is only about 10%. The consistency of incidence among individuals with widely differing genes and environments and the frequency of discordance in identical twins suggest that VSDs often occur as random errors in development, at a frequency largely determined by the complexity of normal cardiac morphogenesis. This hypothesis has two major implications: many VSDs are not preventable and parents need not feel responsible for VSDs in their children.


Author(s):  
Katrine Okholm Kryger ◽  
Séan Mitchell ◽  
Steph Forrester

The aim of this study was to measure the level of agreement of four portable football velocity and spin rate measurement systems (Jugs speed radar gun, 2-D high-speed video, TrackMan and adidas miCoach football) against a Vicon motion analysis system. One skilled male university football player performed 70 shots covering a wide range of ball velocities (12–30 m s−1) and spin rates (94–743 r/min). A Bland–Altman analysis was used to assess the level of agreement. For ball velocity, the 2-D high-speed video had the smallest systematic error, followed by the radar gun, TrackMan and miCoach football at 0.2, 0.4, 0.5 and 4.8 m s−1, respectively. A similar ranking was also observed for the random errors (95% confidence intervals: ±0.4, ±1.5, ±1.9 and ±6.0 m s−1). The first three systems all tracked ball velocity in >90% of shots, while the miCoach football tracked slightly fewer shots (79%). For spin rate, the miCoach football had a much smaller systematic error (4 vs 38 r/min) and random error (95% confidence intervals: ±24 vs ±355 r/min) compared to TrackMan. The miCoach also successfully tracked spin rate in more shots than the TrackMan (79% vs 44%). These results indicate that 2-D high-speed video would be the preferred option for the field assessment of ball velocity; however, radar gun and TrackMan may also be appropriate. A minimum of 10 frames of 2-D high-speed video, captured close to the ball starting position, was demonstrated to be sufficient in providing a reliable measure of ball velocity. The miCoach ball is the preferred option for field assessment of ball spin rate.


1995 ◽  
Vol 166 ◽  
pp. 371-371
Author(s):  
I.S. Guseva

Anomalous refraction remains to be the most critical problem in the meridian astrometry measuring large angles on the sky. I study slow quasi-periodical variations of refraction caused by the processes in the middle and upper atmosphere, such as gravity waves, etc., which can not be detected and calibrated out by use of any on-ground meteorological measurements. For this study, very old observations at large zenith distances of 80 to 90 degrees made by V. Fuss at Pulkovo Observatory in 1867-1869 [1] were used. The Deeming's method [2] of spectral analysis of data was applied to examine the characteristic variations of refraction in a wide range of periods. Very powerful quasi-periodical processes with periods of 7-8, 11-14, 18-22, 36-44 minutes and with amplitudes of 0.3 to 0.5 arcsec in the zenith were found when short sets of observations (1-5 days) were considered. They increase random errors of astrometric observations with meridian circles, transit instruments, astrolabes, etc. The periods of very slow variations — 152, 122, 93, 82.5, 73, 61 and 50 days, – are close to the well known periods discovered in other astronomical phenomena, for instance, in solar activity and in Earth rotation. I note also, that some of the long-period variations of refraction may cause quasi-systematic errors in astrometric measurements and catalogues.


Sign in / Sign up

Export Citation Format

Share Document