2016 ◽  
Vol 2016 (4) ◽  
pp. 403-417 ◽  
Author(s):  
Steven Hill ◽  
Zhimin Zhou ◽  
Lawrence Saul ◽  
Hovav Shacham

Abstract In many online communities, it is the norm to redact names and other sensitive text from posted screenshots. Sometimes solid bars are used; sometimes a blur or other image transform is used. We consider the effectiveness of two popular image transforms - mosaicing (also known as pixelization) and blurring - for redaction of text. Our main finding is that we can use a simple but powerful class of statistical models - so-called hidden Markov models (HMMs) - to recover both short and indefinitely long instances of redacted text. Our approach borrows on the success of HMMs for automatic speech recognition, where they are used to recover sequences of phonemes from utterances of speech. Here we use HMMs in an analogous way to recover sequences of characters from images of redacted text. We evaluate an implementation of our system against multiple typefaces, font sizes, grid sizes, pixel offsets, and levels of noise. We also decode numerous real-world examples of redacted text. We conclude that mosaicing and blurring, despite their widespread usage, are not viable approaches for text redaction.


Author(s):  
Jose Luis Oropeza-Rodriguez ◽  
Sergio Suárez-Guerra

During the last 30 years, people have tried to communicate in an oral form with the computers, developing for this end an important amount of automatic speech recognition algorithms. Because of this, software such as the Dragon Dictate and the IBM Via Voice are already available to interact easily with the computer in oral form. However, during the last several years ASR has not reported important advances, not only due to the advances obtained until now, but also because the scientific community working in this area does not have founded another tool so powerful as HMM, despite a great number of alternatives that have been proposed since HMM appeared. This chapter presents the main elements required to create a practical ASR using HMM. The basic principles of the continuous density hidden Markov models (CDHMM) are also given.


Sign in / Sign up

Export Citation Format

Share Document