scholarly journals Architecture of a Compact Data GRID Cluster for Teaching Modern Methods of Data Mining in the Virtual Computer Lab

2020 ◽  
Vol 226 ◽  
pp. 03004
Author(s):  
Mikhail Belov ◽  
Vladimir Korenkov ◽  
Nadezhda Tokareva ◽  
Eugenia Cheremisina

This paper discusses the architecture of a compact Data GRID cluster for teaching new methods of Big Data analytics in the Virtual Computer Lab. Its main destination is training highly qualified IT-professionals able to solve efficiently problems of distributed data storage and processing, drawing insights, data mining, and mathematical modeling based on these data. The Virtual Computer Lab was created and successfully operated by the experts of the System Analysis and Control Department at the Dubna State University in collaboration with the Laboratory of Information Technologies (Joint Institute for Nuclear Research).

2021 ◽  
Vol 18 (5) ◽  
pp. 24-40
Author(s):  
N. G. Kuftinova

The article discusses the problems of using data mining in a transport model as a digital platform for analysing data on traffic flows in a megapolis, and prerequisites for creation in future of single data banks and an integrated environment for interaction of models of different levels as clusters of the digital economy, which will consider all modes of transport to assess transport demand and develop projects for organizing traffic in a megapolis.The objective of the work is to study the processes of obtaining quantitative characteristics of objects of transport modelling when creating a single electronic environment by calculating the derived parameters of the transport network of a megapolis. Quantitative spatial characteristics of an object are associated with calculating the distance from a city centre and a main street and are determined using geographic information systems entailing consequent problem of data unification and efficient data storage.As part of achieving that objective, it is shown that it is necessary to create a preprocessing and validation procedure for all primary transport data, since data sources have different formats and spatial interpolation of tracking data. For this, it is recommended to use various methods of data analysis based on GIS technologies, digital terrain modelling, topology of the road network and other objects of the transport network of a megapolis. Besides, the use of intelligent data should be preceded by formatting and grouping the source data in real time. The most common errors arise at the stage of the iterative process for obtaining quantitative characteristics of objects of transport modelling and building the optimal route in terms of travel time along a certain transport network.The existing trends of urban growth require global digitalization of all transport infrastructure objects, considering changes in the functions of the transport environment and in intensity of traffic flows. Theis entails further development and implementation of new information technologies for data processing using neural networks and other digital technologies.


Author(s):  
Valery Maximov ◽  
Kseniya Reznikova ◽  
Dmitry Popov

There is practically no industry left where modern information technologies would not be used. Data mining approaches are very popular today. Using this technology allows to transform huge amounts of data into useful information. In the article, the authors present the definition of data mining technology and frequently used methods. Some of the popular data mining techniques include classification, clustering, machine learning, and prediction. The authors paid special attention to such a clustering method as the k-means. The algorithm’s essence is to distribute the dataset into clusters. The finished results can be visualized and detect the scatter by naked eye, which implies heterogeneity in the data. By further investigating these variations, the analyst can find errors and weaknesses in the study area according to the task at hand. Accurate and complete data is essential in maritime activities. In the field of shipbuilding data analysis and well-made operational decisions can affect the speed and quality of ship construction or even reduce production costs. In shipping and logistics, they can be used to optimize routes and improve the safety of seafarers. Effective use of data mining usually requires highly qualified database specialists and programmers. In this work, the authors have demonstrated a variant of using the Orange Data Mining software tool. This program does not require programming skills from the user, which makes it a useful tool for people far from writing program code. The article explores the application of the Orange Data Mining program for automated mining of marine data. The results obtained show that the program can be effectively used in maritime activities.


Author(s):  
D. V. Gribanov

Introduction. This article is devoted to legal regulation of digital assets turnover, utilization possibilities of distributed computing and distributed data storage systems in activities of public authorities and entities of public control. The author notes that some national and foreign scientists who study a “blockchain” technology (distributed computing and distributed data storage systems) emphasize its usefulness in different activities. Data validation procedure of digital transactions, legal regulation of creation, issuance and turnover of digital assets need further attention.Materials and methods. The research is based on common scientific (analysis, analogy, comparing) and particular methods of cognition of legal phenomena and processes (a method of interpretation of legal rules, a technical legal method, a formal legal method and a formal logical one).Results of the study. The author conducted an analysis which resulted in finding some advantages of the use of the “blockchain” technology in the sphere of public control which are as follows: a particular validation system; data that once were entered in the system of distributed data storage cannot be erased or forged; absolute transparency of succession of actions while exercising governing powers; automatic repeat of recurring actions. The need of fivefold validation of exercising governing powers is substantiated. The author stresses that the fivefold validation shall ensure complex control over exercising of powers by the civil society, the entities of public control and the Russian Federation as a federal state holding sovereignty over its territory. The author has also conducted a brief analysis of judicial decisions concerning digital transactions.Discussion and conclusion. The use of the distributed data storage system makes it easier to exercise control due to the decrease of risks of forge, replacement or termination of data. The author suggests defining digital transaction not only as some actions with digital assets, but also as actions toward modification and addition of information about legal facts with a purpose of its establishment in the systems of distributed data storage. The author suggests using the systems of distributed data storage for independent validation of information about activities of the bodies of state authority. In the author’s opinion, application of the “blockchain” technology may result not only in the increase of efficiency of public control, but also in the creation of a new form of public control – automatic control. It is concluded there is no legislation basis for regulation of legal relations concerning distributed data storage today.


Pólemos ◽  
2020 ◽  
Vol 14 (1) ◽  
pp. 57-71
Author(s):  
Jeanne Gaakeer

AbstractThis article addresses some of the risks involved in the uses of information technologies such as profiling and data mining by means of the German jurist-philosopher Juli Zeh’s dystopic novel Leere Herzen.


Computers ◽  
2021 ◽  
Vol 10 (11) ◽  
pp. 142
Author(s):  
Obadah Hammoud ◽  
Ivan Tarkhanov ◽  
Artyom Kosmarski

This paper investigates the problem of distributed storage of electronic documents (both metadata and files) in decentralized blockchain-based b2b systems (DApps). The need to reduce the cost of implementing such systems and the insufficient elaboration of the issue of storing big data in DLT are considered. An approach for building such systems is proposed, which allows optimizing the size of the required storage (by using Erasure coding) and simultaneously providing secure data storage in geographically distributed systems of a company, or within a consortium of companies. The novelty of this solution is that we are the first who combine enterprise DLT with distributed file storage, in which the availability of files is controlled. The results of our experiment demonstrate that the speed of the described DApp is comparable to known b2c torrent projects, and subsequently justify the choice of Hyperledger Fabric and Ethereum Enterprise for its use. Obtained test results show that public blockchain networks are not suitable for creating such a b2b system. The proposed system solves the main challenges of distributed data storage by grouping data into clusters and managing them with a load balancer, while preventing data tempering using a blockchain network. The considered DApps storage methodology easily scales horizontally in terms of distributed file storage and can be deployed on cloud computing technologies, while minimizing the required storage space. We compare this approach with known methods of file storage in distributed systems, including central storage, torrents, IPFS, and Storj. The reliability of this approach is calculated and the result is compared to traditional solutions based on full backup.


Author(s):  
Alexander G. Marchuk ◽  
◽  
Sergey Nikolaevich Troshkov ◽  

This paper describes the experience of solving the problem of finding chains in the De Bruijn graph using parallel computations and distributed data storage.


Sign in / Sign up

Export Citation Format

Share Document