OCR for Data Retrieval :An analysis and Machine Learning Application model for NGO social volunteering

Author(s):  
Ruchi Sharma ◽  
Parv Dave ◽  
Jay Chaudhary
2012 ◽  
pp. 1538-1550
Author(s):  
Ting Yu

This paper presents an integrated and distributed intelligent system being capable of automatically estimating and updating large-size economic models. The input-output model of economics uses a matrix representation of a nation’s (or a region’s) economy to predict the effect of changes in one industry on others and by consumers, government, and foreign suppliers on the economy (Miller & Blair, 1985). To construct the model reflecting the underlying industry structure faithfully, multiple sources of data are collected and integrated together. The system in this paper facilitates this estimation process by integrating a series of components with the purposes of data retrieval, data integration, machine learning, and quality checking. More importantly, the complexity of national economy leads to extremely large-size models to represent every detail of an economy, which requires the system to have the capacity for processing large amounts of data. This paper demonstrates that the major bottleneck is the memory allocation, and to include more memory, the machine learning component is built on a distributed platform and constructs the matrix by analyzing historical and spatial data simultaneously. This system is the first distributed matrix estimation package for such a large-size economic matrix.


2020 ◽  
Vol 2020 ◽  
pp. 1-14
Author(s):  
Jing Liu ◽  
Yulong Qiao

Intelligent internet data mining is an important application of AIoT (Artificial Intelligence of Things), and it is necessary to construct large training samples with the data from the internet, including images, videos, and other information. Among them, a hyperspectral database is also necessary for image processing and machine learning. The internet environment provides abundant hyperspectral data resources, but the hyperspectral data have no class labels and no so high value for applications. So, it is important to label the class information for these hyperspectral data through machine learning-based classification. In this paper, we present a quasiconformal mapping kernel machine learning-based intelligent hyperspectral data classification algorithm for internet-based hyperspectral data retrieval. The contributions include three points: the quasiconformal mapping-based multiple kernel learning network framework is proposed for hyperspectral data classification, the Mahalanobis distance kernel function is as the network nodes with the higher discriminative ability than Euclidean distance-based kernel function learning, and the objective function of measuring the class discriminative ability is proposed to seek the optimal parameters of the quasiconformal mapping projection. Experiments show that the proposed scheme is effective for hyperspectral image classification and retrieval.


2021 ◽  
Vol 14 (4) ◽  
pp. 2981-2992
Author(s):  
Antti Lipponen ◽  
Ville Kolehmainen ◽  
Pekka Kolmonen ◽  
Antti Kukkurainen ◽  
Tero Mielonen ◽  
...  

Abstract. Satellite-based aerosol retrievals provide a timely view of atmospheric aerosol properties, having a crucial role in the subsequent estimation of air quality indicators, atmospherically corrected satellite data products, and climate applications. However, current aerosol data products based on satellite data often have relatively large biases compared to accurate ground-based measurements and distinct uncertainty levels associated with them. These biases and uncertainties are often caused by oversimplified assumptions and approximations used in the retrieval algorithms due to unknown surface reflectance or fixed aerosol models. Moreover, the retrieval algorithms do not usually take advantage of all the possible observational data collected by the satellite instruments and may, for example, leave some spectral bands unused. The improvement and the re-processing of the past and current operational satellite data retrieval algorithms would become tedious and computationally expensive. To overcome this burden, we have developed a model-enforced post-process correction approach to correct the existing operational satellite aerosol data products. Our approach combines the existing satellite aerosol retrievals and a post-processing step carried out with a machine-learning-based correction model for the approximation error in the retrieval. The developed approach allows for the utilization of auxiliary data sources, such as meteorological information, or additional observations such as spectral bands unused by the original retrieval algorithm. The post-process correction model can learn to correct for the biases and uncertainties in the original retrieval algorithms. As the correction is carried out as a post-processing step, it allows for computationally efficient re-processing of existing satellite aerosol datasets without fully re-processing the much larger original radiance data. We demonstrate with over-land aerosol optical depth (AOD) and Ångström exponent (AE) data from the Moderate Imaging Spectroradiometer (MODIS) of the Aqua satellite that our approach can significantly improve the accuracy of the satellite aerosol data products and reduce the associated uncertainties. For instance, in our evaluation, the number of AOD samples within the MODIS Dark Target expected error envelope increased from 63 % to 85 % when the post-process correction was applied. In addition to method description and accuracy results, we also give recommendations for validating machine-learning-based satellite data products.


Author(s):  
Ting Yu

This paper presents an integrated and distributed intelligent system being capable of automatically estimating and updating large-size economic models. The input-output model of economics uses a matrix representation of a nation’s (or a region’s) economy to predict the effect of changes in one industry on others and by consumers, government, and foreign suppliers on the economy (Miller & Blair, 1985). To construct the model reflecting the underlying industry structure faithfully, multiple sources of data are collected and integrated together. The system in this paper facilitates this estimation process by integrating a series of components with the purposes of data retrieval, data integration, machine learning, and quality checking. More importantly, the complexity of national economy leads to extremely large-size models to represent every detail of an economy, which requires the system to have the capacity for processing large amounts of data. This paper demonstrates that the major bottleneck is the memory allocation, and to include more memory, the machine learning component is built on a distributed platform and constructs the matrix by analyzing historical and spatial data simultaneously. This system is the first distributed matrix estimation package for such a large-size economic matrix.


2019 ◽  
Vol 4 (1) ◽  
pp. 3
Author(s):  
Chen Tao ◽  
Rongrong Shan ◽  
Hui Li ◽  
Dongsheng Wang ◽  
Wei Liu

In recent years, an increasing number of knowledge bases have been built using linked data, thus datasets have grown substantially. It is neither reasonable to store a large amount of triple data in a single graph, nor appropriate to store RDF in named graphs by class URIs, because many joins can cause performance problems between graphs. This paper presents an agglomerative-adapted approach for large-scale graphs, which is also a bottom-up merging process. The proposed algorithm can partition triples data in three levels: blank nodes, associated nodes, and inference nodes. Regarding blank nodes and classes/nodes involved in reasoning rules, it is better to store with an optimal neighbor node in the same partition instead of splitting into separate partitions. The process of merging associated nodes needs to start with the node in the smallest cost and then repeat it until the final number of partitions is met. Finally, the feasibility and rationality of the merging algorithm are analyzed in detail through bibliographic cases. In summary, the partitioning methods proposed in this paper can be applied in distributed storage, data retrieval, data export, and semantic reasoning of large-scale triples graphs. In the future, we will research the automation setting of the number of partitions with machine learning algorithms.


2021 ◽  
Author(s):  
Vitali Diaz ◽  
Ahmed A. A. Osman ◽  
Gerald A. Corzo Perez ◽  
Henny A. J. Van Lanen ◽  
Shreedhar Maskey ◽  
...  

Abstract. Crop yield is one of the variables used to assess the impact of droughts on agriculture. Crop growth models calculate yield and variables related to plant development and become more suitable for crop yield estimation. However, these models are limited in that specific data are needed for computation. Given this limitation, machine learning (ML) models are often widely utilised instead, but their use with the spatial characteristics of droughts as input data is limited. This research explored the spatial extent of drought (area) as input data for building an approach to predict seasonal crop yield. This ML approach is made up of two components. The first includes polynomial regression (PR) models, and the second considers artificial neural network (ANN) models. In this approach, the purpose was to evaluate both types of ML models (PR and ANN) and integrate them into one operational tool. The logic is as follows: ANN models determine the most accurate predictions, but in practice, issues regarding data retrieval and processing can make the use of equations, i.e. PR, preferable. The proposed approach provides these PR equations to perform such calculations with early and preliminary input. The estimates can be further improved when the ANN models are run with the final input data. The results indicated that the empirical equations (PR) produced good predictions when using drought area as the input. ANN provides better estimates, in general. This research will improve drought monitoring systems for assessing drought effects. Since it is currently possible to calculate drought areas within these systems, the direct application of the prediction of drought effects is possible to integrate by following approaches such as the one presented or similar.


2021 ◽  
Vol 13 (21) ◽  
pp. 4378
Author(s):  
Abdelaziz Htitiou ◽  
Abdelghani Boudhar ◽  
Abdelghani Chehbouni ◽  
Tarik Benabdelouahab

Many challenges prevail in cropland mapping over large areas, including dealing with massive volumes of datasets and computing capabilities. Accordingly, new opportunities have been opened at a breakneck pace with the launch of new satellites, the continuous improvements in data retrieval technology, and the upsurge of cloud computing solutions such as Google Earth Engine (GEE). Therefore, the present work is an attempt to automate the extraction of multi-year (2016–2020) cropland phenological metrics on GEE and use them as inputs with environmental covariates in a trained machine-learning model to generate high-resolution cropland and crop field-probabilities maps in Morocco. The comparison of our phenological retrievals against the MODIS phenology product shows very close agreement, implying that the suggested approach accurately captures crop phenology dynamics, which allows better cropland classification. The entire country is mapped using a large volume of reference samples collected and labelled with a visual interpretation of high-resolution imagery on Collect-Earth-Online, an online platform for systematically collecting geospatial data. The cropland classification product for the nominal year 2019–2020 showed an overall accuracy of 97.86% with a Kappa of 0.95. When compared to Morocco’s utilized agricultural land (SAU) areas, the cropland probabilities maps demonstrated the ability to accurately estimate sub-national SAU areas with an R-value of 0.9. Furthermore, analyzing cropland dynamics reveals a dramatic decrease in the 2019–2020 season by 2% since the 2018–2019 season and by 5% between 2016 and 2020, which is partly driven by climate conditions, but even more so by the novel coronavirus disease 2019 (COVID-19) that impacted the planting and managing of crops due to government measures taken at the national level, like complete lockdown. Such a result proves how much these methods and associated maps are critical for scientific studies and decision-making related to food security and agriculture.


2020 ◽  
Vol 0 (0) ◽  
Author(s):  
Yu Gao ◽  
Kai Zhang

AbstractWe are concerned with the inverse scattering problems associated with incomplete measurement data. It is a challenging topic of increasing importance that arise in many practical applications. Based on a prototypical working model, we propose a machine learning based inverse scattering scheme, which integrates a CNN (convolution neural network) for the data retrieval. The proposed method can effectively cope with the reconstruction under limited-aperture and/or phaseless far-field data. Numerical experiments verify the promising features of our new scheme.


2020 ◽  
Vol 60 (11) ◽  
pp. 46-60
Author(s):  
Vugar Hajimahmud Abdullayev ◽  

Models, methods and algorithms for cyber-social computing and machine learning implies the use of the metric of similarity – difference of unitary coded information for processing big data in order to generate adequate actuator signals for controlling cyber-social critical systems. A set-theoretic method of data search is being developed based on the similarity – difference of the frequency parameters of primitive elements, which makes it possible to determine the similarity of objects, the strategy of transforming one object into another, and also to identify the level of common interests, conflicts. Computational architectures of cyber-social computing and metric search for key data are being created. The definitions of the fundamental concepts in the field of computing are given on the basis of metric relations between interacting processes and phenomena. A software application is proposed for calculating the similarity-differences of objects based on the formation of vectors of frequencies of two sets of primitive data. A high level of correlation of the application results with the well-known system for determining plagiarism is shown. Key words: computing, cybersocial computing, decision making, unitary data codes, similarity – difference, data retrieval, plagiarism


2020 ◽  
Author(s):  
Antti Lipponen ◽  
Ville Kolehmainen ◽  
Pekka Kolmonen ◽  
Antti Kukkurainen ◽  
Tero Mielonen ◽  
...  

Abstract. Satellite-based aerosol retrievals provide a timely global view of atmospheric aerosol properties for air quality, atmospheric characterization, and correction of satellite data products and climate applications. Current aerosol data products based on satellite data, however, often have relatively large biases relative to accurate ground-based measurements and distinct levels of uncertainty associated with them. These biases and uncertainties are often caused by oversimplified assumptions and approximations used in the retrieval algorithms due to unknown surface reflectance or fixed aerosol models. Moreover, the retrieval algorithms do not usually take advantage of all the possible observational data collected by the satellite instruments and may, for example, leave some spectral bands unused. The improvement and the re-processing of the past and current operational satellite data retrieval algorithms would become a tedious and computationally expensive task. To overcome this burden, we have developed a model enforced post-process correction approach that can be used to correct the existing and operational satellite aerosol data products. Our approach combines the existing satellite aerosol retrievals and a post-processing step carried out with a machine learning based correction model for the approximation error in the retrieval. The developed approach allows for the utilization of auxiliary data sources, such as meteorological information, or additional observations such as spectral bands unused by the original retrieval algorithm. The post-process correction model can learn to correct for the biases and uncertainties in the original retrieval algorithms. As the correction is carried out as a post-processing step, it allows for computationally efficient re-processing of existing satellite aerosol datasets with no need to fully reprocess the much larger original radiance data. We demonstrate with over land aerosol optical depth (AOD) and Angstrom exponent (AE) data from the Moderate Imaging Spectroradiometer (MODIS) of Aqua satellite that our approach can significantly improve the accuracy of the satellite aerosol data products and reduce the associated uncertainties. We also give recommendations for the validation of satellite data products that are constructed using machine learning based models.


Sign in / Sign up

Export Citation Format

Share Document