Smart Archive Generation Using Computer Vision, NLP and Big Data

2021 ◽  
Author(s):  
Mohamed Mahdy Marzouk ◽  
Mahmoud Mohamed ElZahed

Abstract Gaining insights from the dense network of interrelated documents involved in E&P projects requires experience, knowledge, and awareness about the existence of the required data. This framework aims to facilitate the decision-making process while consuming shorter time periods and lower costs, without sacrificing the accuracy of the data and decreasing the probability of human errors. The high complexity of E&P Projects results in a dense network of interrelated documents which are produced to cover the various aspects and details of the project. Gaining insights from old data requires experience, knowledge, and awareness about the existence of the required data. Accordingly, the knowledge accumulated over the time from various projects can be considered a key asset, since it can be leveraged to perform more informed decisions. This paper presents a framework that aim at capturing organizational knowledge locked in paper-based datasets and store it in a structured digital format that facilitates its retrieval and enables analyses which help uncover valuable insights. This research aims to generate valuable data from existing archives while causing minimal disturbance to existing business processes and workflows. The framework performs four main functions: image processing, text recognition, Data Analytics and Data storage. Initially the text recognition module; which is performs Image Processing to enhance the quality of the scanned files, and optical character recognition using LSTM which extracts the text contained in images. The Data Analytics Module, then cleanses and mines the extracted text using Big Data Analytics tools. Text Matching and searching is performed on the Spark Dataframe using regular expressions to identify different attributes and their different types. Finally, the data is stored in a SQL Database. In order to measure the workflow's accuracy a manual baseline was generated for a sample project. The accuracy is measured using field-level verification, since it was found to be the most fit-for-purpose, as it allows to measure the accuracy of the workflow on the level of each field.

Author(s):  
Nicolas Zhou ◽  
Erin M. Corsini ◽  
Shida Jin ◽  
Gregory R. Barbosa ◽  
Trey Kell ◽  
...  

The concept of Big Data is changing the way that clinical research can be performed. Cardiothoracic surgeons need to understand the dynamic digital transformation taking place in the healthcare industry. In the last decade, technological advances and Big Data analytics have become powerful tools for businesses. In healthcare, rapid expansion of Big Data infrastructure has occurred in parallel with attempts to reduce cost and improve outcomes. Many hospitals around the country are augmenting traditional relational databases with Big Data infrastructure. Advanced data capture and categorization tools such as natural language processing and optical character recognition are being developed for clinical and research use, while Internet of Things in the form of wearable technology serves as an additional source of data usable for research. As cardiothoracic surgeons seek ways to innovate, novel approaches to data acquisition and analysis enable a more rigorous level of investigatory efforts.


2014 ◽  
Vol 6 (1) ◽  
pp. 36-39
Author(s):  
Kevin Purwito

This paper describes about one of the many extension of Optical Character Recognition (OCR), that is Optical Music Recognition (OMR). OMR is used to recognize musical sheets into digital format, such as MIDI or MusicXML. There are many musical symbols that usually used in musical sheets and therefore needs to be recognized by OMR, such as staff; treble, bass, alto and tenor clef; sharp, flat and natural; beams, staccato, staccatissimo, dynamic, tenuto, marcato, stopped note, harmonic and fermata; notes; rests; ties and slurs; and also mordent and turn. OMR usually has four main processes, namely Preprocessing, Music Symbol Recognition, Musical Notation Reconstruction and Final Representation Construction. Each of those four main processes uses different methods and algorithms and each of those processes still needs further development and research. There are already many application that uses OMR to date, but none gives the perfect result. Therefore, besides the development and research for each OMR process, there is also a need to a development and research for combined recognizer, that combines the results from different OMR application to increase the final result’s accuracy. Index Terms—Music, optical character recognition, optical music recognition, musical symbol, image processing, combined recognizer  


2021 ◽  
pp. 67-74
Author(s):  
Liudmyla Zubyk ◽  
Yaroslav Zubyk

Big data is one of modern tools that have impacted the world industry a lot of. It also plays an important role in determining the ways in which businesses and organizations formulate their strategies and policies. However, very limited academic researches has been conducted into forecasting based on big data due to the difficulties in capturing, collecting, handling, and modeling of unstructured data, which is normally characterized by it’s confidential. We define big data in the context of ecosystem for future forecasting in business decision-making. It can be difficult for a single organization to possess all of the necessary capabilities to derive strategic business value from their findings. That’s why different organizations will build, and operate their own analytics ecosystems or tap into existing ones. An analytics ecosystem comprising a symbiosis of data, applications, platforms, talent, partnerships, and third-party service providers lets organizations be more agile and adapt to changing demands. Organizations participating in analytics ecosystems can examine, learn from, and influence not only their own business processes, but those of their partners. Architectures of popular platforms for forecasting based on big data are presented in this issue.


Web Services ◽  
2019 ◽  
pp. 1262-1281
Author(s):  
Chitresh Verma ◽  
Rajiv Pandey

Big Data Analytics is a major branch of data science where the huge amount raw data is processed to get insight for relevant business processes. Integration of big data, its analytics along with Service Oriented Architecture (SOA) is need of the hour, such integration shall render reusability and scalability to various business processes. This chapter explains the concept of Big Data and Big Data Analytics at its implementation level. The Chapter further describes Hadoop and its technologies which are one of the popular frameworks for Big Data Analytics and envisage integrating SOA with relevant case studies. The chapter demonstrates the SOA integration with Big Data through, two case studies of two different scenarios are incorporated that integrates real world implementation with theory and enables better understanding of the industrial level processes and practices.


Author(s):  
Kerry E. Koitzsch

This chapter is a brief introduction to the Image As Big Data Toolkit (IABDT), a Java-based open source framework for performing a variety of distributed image processing and analysis tasks. IABDT has been developed over the last two years in response to the rapid evolution of Big Data architectures and technologies, distributed and image processing systems. This chapter presents an architecture for image analytics that uses Big Data storage and compression methods. A sample implementation of our image analytic architecture called the Image as Big Data Toolkit (IABDT) addresses some of the most frequent challenges experienced by the image analytics developer. Baseline applications developed with IABDT, status of the toolkit and directions for future extension with emphasis on image display, presentation, and reporting case studies are discussed to motivate our design and technology stack choices. Sample applications built using IABDT, as well as future development plans for IABDT are discussed.


Author(s):  
Chitresh Verma ◽  
Rajiv Pandey

Big Data Analytics is a major branch of data science where the huge amount raw data is processed to get insight for relevant business processes. Integration of big data, its analytics along with Service Oriented Architecture (SOA) is need of the hour, such integration shall render reusability and scalability to various business processes. This chapter explains the concept of Big Data and Big Data Analytics at its implementation level. The Chapter further describes Hadoop and its technologies which are one of the popular frameworks for Big Data Analytics and envisage integrating SOA with relevant case studies. The chapter demonstrates the SOA integration with Big Data through, two case studies of two different scenarios are incorporated that integrates real world implementation with theory and enables better understanding of the industrial level processes and practices.


Sign in / Sign up

Export Citation Format

Share Document