Smart Archive Generation Using Computer Vision, NLP and Big Data

Mapping Intimacies ◽

10.2118/207365-ms ◽

2021 ◽

Author(s):

Mohamed Mahdy Marzouk ◽

Mahmoud Mohamed ElZahed

Keyword(s):

Image Processing ◽

Big Data ◽

Data Storage ◽

Character Recognition ◽

Optical Character Recognition ◽

Data Analytics ◽

Business Processes ◽

Text Recognition ◽

Dense Network ◽

Digital Format

Abstract Gaining insights from the dense network of interrelated documents involved in E&P projects requires experience, knowledge, and awareness about the existence of the required data. This framework aims to facilitate the decision-making process while consuming shorter time periods and lower costs, without sacrificing the accuracy of the data and decreasing the probability of human errors. The high complexity of E&P Projects results in a dense network of interrelated documents which are produced to cover the various aspects and details of the project. Gaining insights from old data requires experience, knowledge, and awareness about the existence of the required data. Accordingly, the knowledge accumulated over the time from various projects can be considered a key asset, since it can be leveraged to perform more informed decisions. This paper presents a framework that aim at capturing organizational knowledge locked in paper-based datasets and store it in a structured digital format that facilitates its retrieval and enables analyses which help uncover valuable insights. This research aims to generate valuable data from existing archives while causing minimal disturbance to existing business processes and workflows. The framework performs four main functions: image processing, text recognition, Data Analytics and Data storage. Initially the text recognition module; which is performs Image Processing to enhance the quality of the scanned files, and optical character recognition using LSTM which extracts the text contained in images. The Data Analytics Module, then cleanses and mines the extracted text using Big Data Analytics tools. Text Matching and searching is performed on the Spark Dataframe using regular expressions to identify different attributes and their different types. Finally, the data is stored in a SQL Database. In order to measure the workflow's accuracy a manual baseline was generated for a sample project. The accuracy is measured using field-level verification, since it was found to be the most fit-for-purpose, as it allows to measure the accuracy of the workflow on the level of each field.

Download Full-text

Advanced Data Analytics for Clinical Research Part I: What are the Tools?

Innovations Technology and Techniques in Cardiothoracic and Vascular Surgery ◽

10.1177/1556984520902783 ◽

2020 ◽

Vol 15 (2) ◽

pp. 114-119 ◽

Cited By ~ 3

Author(s):

Nicolas Zhou ◽

Erin M. Corsini ◽

Shida Jin ◽

Gregory R. Barbosa ◽

Trey Kell ◽

...

Keyword(s):

Big Data ◽

Clinical Research ◽

Language Processing ◽

Character Recognition ◽

Optical Character Recognition ◽

Data Analytics ◽

Relational Databases ◽

Big Data Analytics ◽

Rapid Expansion ◽

Data Infrastructure

The concept of Big Data is changing the way that clinical research can be performed. Cardiothoracic surgeons need to understand the dynamic digital transformation taking place in the healthcare industry. In the last decade, technological advances and Big Data analytics have become powerful tools for businesses. In healthcare, rapid expansion of Big Data infrastructure has occurred in parallel with attempts to reduce cost and improve outcomes. Many hospitals around the country are augmenting traditional relational databases with Big Data infrastructure. Advanced data capture and categorization tools such as natural language processing and optical character recognition are being developed for clinical and research use, while Internet of Things in the form of wearable technology serves as an additional source of data usable for research. As cardiothoracic surgeons seek ways to innovate, novel approaches to data acquisition and analysis enable a more rigorous level of investigatory efforts.

Download Full-text

Pengantar dan Survey Tentang Optical Music Recognition

Jurnal ULTIMATICS ◽

10.31937/ti.v6i1.331 ◽

2014 ◽

Vol 6 (1) ◽

pp. 36-39

Author(s):

Kevin Purwito

Keyword(s):

Character Recognition ◽

Optical Character Recognition ◽

Optical Music Recognition ◽

Optical Character ◽

Digital Format ◽

Music Recognition ◽

Index Terms ◽

The Many ◽

Further Development ◽

Music Symbol

This paper describes about one of the many extension of Optical Character Recognition (OCR), that is Optical Music Recognition (OMR). OMR is used to recognize musical sheets into digital format, such as MIDI or MusicXML. There are many musical symbols that usually used in musical sheets and therefore needs to be recognized by OMR, such as staff; treble, bass, alto and tenor clef; sharp, flat and natural; beams, staccato, staccatissimo, dynamic, tenuto, marcato, stopped note, harmonic and fermata; notes; rests; ties and slurs; and also mordent and turn. OMR usually has four main processes, namely Preprocessing, Music Symbol Recognition, Musical Notation Reconstruction and Final Representation Construction. Each of those four main processes uses different methods and algorithms and each of those processes still needs further development and research. There are already many application that uses OMR to date, but none gives the perfect result. Therefore, besides the development and research for each OMR process, there is also a need to a development and research for combined recognizer, that combines the results from different OMR application to increase the final result’s accuracy. Index Terms—Music, optical character recognition, optical music recognition, musical symbol, image processing, combined recognizer

Download Full-text

A Proposed approach in applying optical character recognition for thermal image processing

2010 34th IEEE/CPMT International Electronic Manufacturing Technology Symposium (IEMT) ◽

10.1109/iemt.2010.5746767 ◽

2010 ◽

Author(s):

Chan Wai Ti ◽

Ir. Sim Kok Swee ◽

Tso Chih Ping

Keyword(s):

Image Processing ◽

Character Recognition ◽

Optical Character Recognition ◽

Thermal Image ◽

Optical Character ◽

Thermal Image Processing

Download Full-text

14th International Conference on Computer Graphics, Visualization, Computer Vision and Image Processing, 5th International Conference on Big Data Analytics, Data Mining and Computational Intelligence and 9th International Conference on Theory and Practice in Modern Computing

10.33965/cgv_big_tpmc2020 ◽

2020 ◽

Keyword(s):

Data Mining ◽

Image Processing ◽

Computer Vision ◽

Big Data ◽

Computer Graphics ◽

Computational Intelligence ◽

Data Analytics ◽

Big Data Analytics ◽

Theory And Practice ◽

International Conference

Download Full-text

Architecture of modern platforms for big data analytics

Advanced Information Technology ◽

10.17721/ait.2021.1.09 ◽

2021 ◽

pp. 67-74

Author(s):

Liudmyla Zubyk ◽

Yaroslav Zubyk

Keyword(s):

Big Data ◽

Data Analytics ◽

Business Processes ◽

Service Providers ◽

Big Data Analytics ◽

Business Value ◽

Third Party ◽

Unstructured Data ◽

Business Decision ◽

World Industry

Big data is one of modern tools that have impacted the world industry a lot of. It also plays an important role in determining the ways in which businesses and organizations formulate their strategies and policies. However, very limited academic researches has been conducted into forecasting based on big data due to the difficulties in capturing, collecting, handling, and modeling of unstructured data, which is normally characterized by it’s confidential. We define big data in the context of ecosystem for future forecasting in business decision-making. It can be difficult for a single organization to possess all of the necessary capabilities to derive strategic business value from their findings. That’s why different organizations will build, and operate their own analytics ecosystems or tap into existing ones. An analytics ecosystem comprising a symbiosis of data, applications, platforms, talent, partnerships, and third-party service providers lets organizations be more agile and adapt to changing demands. Organizations participating in analytics ecosystems can examine, learn from, and influence not only their own business processes, but those of their partners. Architectures of popular platforms for forecasting based on big data are presented in this issue.

Download Full-text

Literature Survey on Student Grade Calculation using Optical Character Recognition based Image Processing Techniques

Journal of VLSI Design and Signal Processing ◽

10.46610/jovdsp.2021.v07i01.005 ◽

2021 ◽

Vol 7 (1) ◽

pp. 34-41

Author(s):

Omkiran S G ◽

Samartha J V ◽

Shashank Nagraj Bhat ◽

Varun Gajanan Hegde ◽

Sumaiya M N

Keyword(s):

Image Processing ◽

Character Recognition ◽

Optical Character Recognition ◽

Literature Survey ◽

Image Processing Techniques ◽

Optical Character ◽

Processing Techniques

Download Full-text

SCENE TEXT RECOGNITION BY USING EE-MSER AND OPTICAL CHARACTER RECOGNITION FOR NATURAL IMAGES

International Journal of Advance Engineering and Research Development ◽

10.21090/ijaerd.021219 ◽

2015 ◽

Vol 2 (12) ◽

Keyword(s):

Character Recognition ◽

Optical Character Recognition ◽

Natural Images ◽

Text Recognition ◽

Optical Character ◽

Scene Text ◽

Scene Text Recognition

Download Full-text

Analytics-as-a-Service (AaaS)

Web Services ◽

10.4018/978-1-5225-7501-6.ch065 ◽

2019 ◽

pp. 1262-1281

Author(s):

Chitresh Verma ◽

Rajiv Pandey

Keyword(s):

Big Data ◽

Case Studies ◽

Data Analytics ◽

Data Science ◽

Business Processes ◽

Service Oriented Architecture ◽

Big Data Analytics ◽

Major Branch ◽

Implementation Level ◽

Industrial Level

Big Data Analytics is a major branch of data science where the huge amount raw data is processed to get insight for relevant business processes. Integration of big data, its analytics along with Service Oriented Architecture (SOA) is need of the hour, such integration shall render reusability and scalability to various business processes. This chapter explains the concept of Big Data and Big Data Analytics at its implementation level. The Chapter further describes Hadoop and its technologies which are one of the popular frameworks for Big Data Analytics and envisage integrating SOA with relevant case studies. The chapter demonstrates the SOA integration with Big Data through, two case studies of two different scenarios are incorporated that integrates real world implementation with theory and enables better understanding of the industrial level processes and practices.

Download Full-text

The Image as Big Data Toolkit

Advances in Data Mining and Database Management - Handbook of Research on Big Data Storage and Visualization Techniques ◽

10.4018/978-1-5225-3142-5.ch018 ◽

2018 ◽

pp. 497-548

Author(s):

Kerry E. Koitzsch

Keyword(s):

Image Processing ◽

Big Data ◽

Data Storage ◽

Future Development ◽

Rapid Evolution ◽

Design And Technology ◽

Open Source Framework ◽

Future Extension ◽

Development Plans ◽

Big Data Storage

This chapter is a brief introduction to the Image As Big Data Toolkit (IABDT), a Java-based open source framework for performing a variety of distributed image processing and analysis tasks. IABDT has been developed over the last two years in response to the rapid evolution of Big Data architectures and technologies, distributed and image processing systems. This chapter presents an architecture for image analytics that uses Big Data storage and compression methods. A sample implementation of our image analytic architecture called the Image as Big Data Toolkit (IABDT) addresses some of the most frequent challenges experienced by the image analytics developer. Baseline applications developed with IABDT, status of the toolkit and directions for future extension with emphasis on image display, presentation, and reporting case studies are discussed to motivate our design and technology stack choices. Sample applications built using IABDT, as well as future development plans for IABDT are discussed.

Download Full-text