Software platform for automated preparation and analytical processing of heterogeneous datasets

Author(s):  
E. D. Avedyan ◽  
I. V. Voronkov

Summary: the article proposes new software platform for automating the processes of preprocessing and marking up datasets with the aim of further solving analytical problems such as image classification and processing textual and parametric information using neural network technologies. The software platform uses modern technologies and combines a large number of methods in the form of a modular platform, which can be supplemented as the tasks of analytical data processing become more complicated. The need to develop such a software platform is dictated primarily by the fact that, given the current level of data volume growth, the actual transition to deep data analytics remains unattainable without such software platforms, since confidentiality, access to information and the use of external data processing resources are required.

2020 ◽  
Vol 20 (2) ◽  
pp. 129-132
Author(s):  
Vugar Abdullayev ◽  
N.A. Ragimova N.A ◽  
V.H Abdullayev ◽  
T.K Askerov

The objects of the research are tools that support the description and analytical processing of environmental data requests. These tools are used for environmental monitoring. Analytical processing of environmental data is necessary for this monitoring by the persons concerned. Here, a star schema is used to describe the data. Analytical data processing tools are required for analysis and research of environmental data. The results of analytical processing of environmental data are used to speed up decision-making. This article also describes the structure of the analytical data processing tool. Therefore, one of the problem points is how to describe the data. For this purpose, an environmental data relay scheme is defined, and the data description is implemented in multidimensional cubes. Due to the growth of data volume, data processing is carried out using multi-dimensional visualization methods. In addition, a visual user interface has been created for analytically processing queries based on scale data. The result of this research is to find a method for describing environmental data. At the end of the research, a hypercube was obtained, with the help of which it was possible to structure environmental data and carry out analytical processing of them. To this end, environmental data have been described using a multi-dimensional visualization method. And OLAP technologies were used to carry out analytical processing of this data. OLAP technologies allow aggregate data to be used and presented as a hypercube. The results of the research can be used as a basis for an environmental information system that is used for environmental monitoring.


2008 ◽  
Vol 392-394 ◽  
pp. 121-124 ◽  
Author(s):  
Hong Yun Wang ◽  
G.F. Guo ◽  
Y.X. Li ◽  
Xi Lin Zhu

In this paper, a system was introduced, which bases on Flame Cutter NC System and software platform of LabVIEW which the USA NI company developed. Composing of NC machine, partition of modules and assignments, functions confirming, data processing of machining and control, structure of software by the numbers and realization method of two CPUs. The system makes use of multitasking of LabVIEW to make the programmer realize easily the task, which is difficulty to acquire in in tradition programme. It is a kind of comparatively convenient and swift thinking to realize system interface and multitasking by the platform of LabVIEW.


2010 ◽  
Vol 2 (1) ◽  
pp. 99-116
Author(s):  
Katarzyna Rostek

Data Analytical Processing in Data Warehouses The article presents issues connected with processing information from data warehouses (the analytical enterprise databases) and two basic types of analytical data processing in data warehouse. The genesis, main definitions, scope of application and real examples from business implementations will be described for each type of analysis. There will be presented copyrighted method of knowledge discovering in databases, together with practical guidelines for its proper and effective use in the enterprise.


Author(s):  
V. A. Sizov ◽  
A. D. Kirov

The article is devoted to the problem of developing an analytical data processing system for monitoring information security within the information security management system of modern companies conducting their main activities in cyberspace and using cloud infrastructure. Based on the analysis of modern information technologies related to ensuring information security of cloud infrastructure and the most popular products for ensuring information security of cloud infrastructures, as well as existing scientific approaches, a formalized approach to the synthesis of an analytical data processing system for monitoring the information security of an informatization object using cloud infrastructure is proposed. This approach takes into account the usefulness of the used information technologies from the viewpoint of information security. A general model of the structure of information support of an analytical data processing system for monitoring information security, as well as a model of the dependence of the usefulness of information technology on time and the ratio of the skill level of an information security specialist and an attacker are presented. The quality of the information security monitoring system is used as a criterion in the first optimization model. The following limitations are suggested: limitation on the time of making a decision on an incident; limitation on the degree of quality of analysis of information security events by the analytical data processing system and limitation on the compatibility of data analysis functions with data types about information security events. The cited results of the study of the second model show a logically consistent dependence of the usefulness of information technology on time and the ratio of the skill level of an information security specialist to the skill level of an attacker. The particular models of the structure of the information support of ASOD are presented. They make it possible to determine the rational structure information support of ASOD according to particular criteria. The following particular criteria are used: the maximin criterion of the usefulness of the information support of ASOD for monitoring the information security of an informatization object in the cloud infrastructure; the criterion for the maximum relevance of information support distributed over the nodes of the cloud infrastructure for systems with a low degree of centralization of management.


2020 ◽  
Vol 14 (4) ◽  
pp. 534-546
Author(s):  
Tianyu Li ◽  
Matthew Butrovich ◽  
Amadou Ngom ◽  
Wan Shen Lim ◽  
Wes McKinney ◽  
...  

The proliferation of modern data processing tools has given rise to open-source columnar data formats. These formats help organizations avoid repeated conversion of data to a new format for each application. However, these formats are read-only, and organizations must use a heavy-weight transformation process to load data from on-line transactional processing (OLTP) systems. As a result, DBMSs often fail to take advantage of full network bandwidth when transferring data. We aim to reduce or even eliminate this overhead by developing a storage architecture for in-memory database management systems (DBMSs) that is aware of the eventual usage of its data and emits columnar storage blocks in a universal open-source format. We introduce relaxations to common analytical data formats to efficiently update records and rely on a lightweight transformation process to convert blocks to a read-optimized layout when they are cold. We also describe how to access data from third-party analytical tools with minimal serialization overhead. We implemented our storage engine based on the Apache Arrow format and integrated it into the NoisePage DBMS to evaluate our work. Our experiments show that our approach achieves comparable performance with dedicated OLTP DBMSs while enabling orders-of-magnitude faster data exports to external data science and machine learning tools than existing methods.


Author(s):  
David Gelernter

we’ve installed the foundation piles and are ready to start building Mirror worlds. In this chapter we discuss (so to speak) the basement, in the next chapter we get to the attic, and the chapter after that fills in the middle region and glues the whole thing together. The basement we are about to describe is filled with lots of a certain kind of ensemble program. This kind of program, called a Trellis, makes the connection between external data and internal mirror-reality. The Trellis is, accordingly, a key player in the Mirror world cast. It’s also a good example of ensemble programming in general, and, I’ll argue, a highly significant gadget in itself. The hulking problem with which the Trellis does battle on the Mirror world’s behalf is a problem that the real world, too, will be confronting directly and in person very soon. Floods of data are pounding down all around us in torrents. How will we cope? what will we do with all this stuff? when the encroaching electronification of the world pushes the downpour rate higher by a thousand or a million times or more, what will we do then? Concretely: I’m talking about realtime data processing. The subject in this chapter is fresh data straight from the sensor. we’d like to analyze this fresh data in “realtime”—to achieve some understanding of data values as they emerge. Raw data pours into a Mirror world and gets refined by a data distillery in the basement. The processed, refined, one-hundredpercent pure stuff gets stored upstairs in the attic, where it ferments slowly into history. (In the next chapter we move upstairs.) Trellis programs are the topic here: how they are put together, how they work. But there’s an initial question that’s too important to ignore. we need to take a brief trip outside into the deluge, to establish what this stuff is and where it’s coming from. Data-gathering instruments are generally electronic. They are sensors in the field, dedicated to the non-stop, automatic gathering of measurements; or they are full-blown infomachines, waiting for people to sit down, log on and enter data by hand.


2000 ◽  
Vol 43 (6) ◽  
pp. 1477-1481
Author(s):  
G. O. Brown ◽  
D. Needham ◽  
M. L. Stone

2020 ◽  
Vol 493 (4) ◽  
pp. 6071-6078 ◽  
Author(s):  
Sarod Yatawatta

ABSTRACT With ever-increasing data rates produced by modern radio telescopes like LOFAR and future telescopes like the SKA, many data-processing steps are overwhelmed by the amount of data that needs to be handled using limited compute resources. Calibration is one such operation that dominates the overall data processing computational cost; none the less, it is an essential operation to reach many science goals. Calibration algorithms do exist that scale well with the number of stations of an array and the number of directions being calibrated. However, the remaining bottleneck is the raw data volume, which scales with the number of baselines, and which is proportional to the square of the number of stations. We propose a ‘stochastic’ calibration strategy where we read only in a mini-batch of data for obtaining calibration solutions, as opposed to reading the full batch of data being calibrated. None the less, we obtain solutions that are valid for the full batch of data. Normally, data need to be averaged before calibration is performed to accommodate the data in size-limited compute memory. Stochastic calibration overcomes the need for data averaging before any calibration can be performed, and offers many advantages, including: enabling the mitigation of faint radio frequency interference; better removal of strong celestial sources from the data; and better detection and spatial localization of fast radio transients.


2020 ◽  
Vol 149 ◽  
pp. 02011
Author(s):  
Aleksey Raevich ◽  
Boris Dobronets ◽  
Olga Popova ◽  
Ksenia Raevich

Operational data marts that basically constitute slices of thematic narrowly-focused information are designed to provide operational access to big data sources due to consolidation and ranking of information resources based on their relevance. Unlike operational data marts dependent on the sources, analytical data marts are considered as independent data sources created by users to provide structuring of data for the tasks being solved. Therefore, the conceptual model of operational-analytical data marts allows combining the concepts of operational and analytical data marts to generate an analytical cluster that shall act as the basis for quick designing, development and implementation of data models.


Sign in / Sign up

Export Citation Format

Share Document