scholarly journals Design of technological strategy for Big Data with Hadoop software

Author(s):  
Alicia VALDEZ-MENCHACA ◽  
Laura VAZQUEZ-DE LOS SANTOS ◽  
Griselda CORTES-MORALES ◽  
Ana PAIZ-RIVERA

The objective of this research project is the design and implementation of a technological strategy for the use of big data technologies as Apache Hadoop, as well as its supporting software projects that allows to prepare medium-sized companies in new innovative technologies. As part of the methodology, an analysis of the best big data practices, analysis of the software for design and configure big data in a linux server for the technological proposal. As a first result, a roadmap for the installation and configuration of Hadoop software running on a Linux virtual machine has been obtained, as well as the proposal of the technological strategy whose main components are: analysis of the technological architecture, selection of processes or data to be analyzed and installation of Hadoop, among others.

2017 ◽  
Vol 2 (2) ◽  
pp. 7
Author(s):  
Kyle Jones

College and universities are actively developing capacity for learning analytics, which is a sociotechnical system supported by an assemblage of educational data mining technologies and related Big Data practices. Like many Big Data technologies, learning analytics implicates privacy by surveilling behaviors captured by data-based systems and aggregating and analyzing personal information. Issues of privacy are often linked to concerns about intellectual freedom. Consequently, librarians fervently argue for surveillance-free spaces and places to promote the conditions they believe are necessary to support intellectual freedom. In this commentary, I contrast this common view of intellectual freedom with a separate theory, “positive intellectual freedom,” to show how libraries may be able to participate in learning analytics practices while upholding intellectual freedom as a lodestar guiding practice and policy.


2021 ◽  
Vol 5 (1) ◽  
pp. 12
Author(s):  
Otmane Azeroual ◽  
Renaud Fabre

Big data have become a global strategic issue, as increasingly large amounts of unstructured data challenge the IT infrastructure of global organizations and threaten their capacity for strategic forecasting. As experienced in former massive information issues, big data technologies, such as Hadoop, should efficiently tackle the incoming large amounts of data and provide organizations with relevant processed information that was formerly neither visible nor manageable. After having briefly recalled the strategic advantages of big data solutions in the introductory remarks, in the first part of this paper, we focus on the advantages of big data solutions in the currently difficult time of the COVID-19 pandemic. We characterize it as an endemic heterogeneous data context; we then outline the advantages of technologies such as Hadoop and its IT suitability in this context. In the second part, we identify two specific advantages of Hadoop solutions, globality combined with flexibility, and we notice that they are at work with a “Hadoop Fusion Approach” that we describe as an optimal response to the context. In the third part, we justify selected qualifications of globality and flexibility by the fact that Hadoop solutions enable comparable returns in opposite contexts of models of partial submodels and of models of final exact systems. In part four, we remark that in both these opposite contexts, Hadoop’s solutions allow a large range of needs to be fulfilled, which fits with requirements previously identified as the current heterogeneous data structure of COVID-19 information. In the final part, we propose a framework of strategic data processing conditions. To the best of our knowledge, they appear to be the most suitable to overcome COVID-19 massive information challenges.


2021 ◽  
Vol 348 ◽  
pp. 01003
Author(s):  
Abdullayev Vugar Hacimahmud ◽  
Ragimova Nazila Ali ◽  
Khalilov Matlab Etibar

The volume of information in the 21st century is growing at a rapid pace. Big data technologies are used to process modern information. This article discusses the use of big data technologies to implement monitoring of social processes. Big data has its characteristics and principles, which reflect here. In addition, we also discussed big data applications in some areas. Particular attention in this article pays to the interactions of big data and sociology. For this, there consider digital sociology and computational social sciences. One of the main objects of study in sociology is social processes. The article shows the types of social processes and their monitoring. As an example, there is implemented monitoring of social processes at the university. There are used following technologies for the realization of social processes monitoring: products 1010data (1010edge, 1010connect, 1010reveal, 1010equities), products of Apache Software Foundation (Apache Hive, Apache Chukwa, Apache Hadoop, Apache Pig), MapReduce framework, language R, library Pandas, NoSQL, etc. Despite this, this article examines the use of the MapReduce model for social processes monitoring at the university.


Author(s):  
Сергей Юрьевич Золотов ◽  
Игорь Юрьевич Турчановский

Описан эксперимент по использованию технологий Apache Big Data в исследованиях климатических систем. В ходе эксперимента реализовано четыре варианта решения тестовой задачи. Ускорение расчетов с помощью технологий Apache Big Data вполне достижимо, и наиболее эффективный способ для этого найден в четвертом варианте решения тестовой задачи. Суть найденного решения сводится к преобразованию исходных наборов данных к формату, подходящему для хранения в распределенной файловой системе и применения технологи Spark SQL из стека Apache Big Data для параллельной обработки данных на вычислительных кластерах. The core of the Apache Big Data stack consists of two technologies: Apache Hadoop for organizing distributed file storages of unlimited capacity and Apache Spark for organizing parallel computing on computing clusters. The combination of Apache Spark and Apache Hadoop is fully applicable for creating big data processing systems. The main idea implemented by Spark is dividing data into separate parts (partitions) and processing these parts in memory of many computers connected within a network. Data is sent only when needed, and Spark automatically detects when the exchange will take place. For testing, we chose the problem of calculating the monthly, annual, and seasonal trends in the temperature of the atmosphere of our planet for the period from 1960 to 2010 according to the NCEP/NCAR and JRA-55 reanalysis data. During the experiment, four variants of solving the test problem were implemented. The first variant represents the simplest implementation without parallelism. The second implementation variant assumes parallel reading of data from the local file system, aggregation, and calculation of trends. The third variant was the calculation of a test problem on a two-node cluster. NCEP and JRA-55 reanalysis files were placed in their original format in the Hadoop storage (HDFS), which combines the disk subsystems of two computers. The disadvantage of this variant is loading all reanalysis files completely into the random access memory of the workflow. The solution proposed in the fourth variant is to pre-convert the original file format to a form when reading from HDFS is selective, based on the specified parameters.


Author(s):  
Ebru Aydindag Bayrak ◽  
Pinar Kirci

This article presents a brief introduction to big data and big data analytics and also their roles in the healthcare system. A definite range of scientific researches about big data analytics in the healthcare system have been reviewed. The definition of big data, the components of big data, medical big data sources, used big data technologies in present, and big data analytics in healthcare have been examined under the different titles. Also, the historical development process of big data analytics has been mentioned. As a known big data analytics technology, Apache Hadoop technology and its core components with tools have been explained briefly. Moreover, a glance of some of the big data analytics tools or platforms apart from Hadoop eco-system were given. The main goal is to help researchers or specialists with giving an opinion about the rising importance of used big data analytics in healthcare systems.


2020 ◽  
Vol 23 (3) ◽  
pp. 514-525
Author(s):  
Alexander Sergeevich Kozitsin ◽  
Sergey Alexandrovich Afonin ◽  
Dmitiy Alekseevich Shachnev

The number of scientific journals published in the world is very large. In this regard, it is necessary to create software tools that will allow analyzing thematic links of journals. The algorithm presented in this paper uses graphs of co-authorship for analyzing the thematic proximity of journals. It is insensitive to the language of the journal and can find similar journals in different languages. This task is difficult for algorithms based on the analysis of full-text information. Approbation of the algorithm was carried out in the scientometric system IAS ISTINA. Using a special interface, a user can select one interesting journal. Then the system will automatically generate a selection of journals that may be of interest to the user. In the future, the developed algorithm can be adapted to search for similar conferences, collections of publications and research projects. The use of such tools will increase the publication activity of young employees, increase the citation of articles and quoting between journals. In addition, the results of the algorithm for determining thematic proximity between journals, collections, conferences and research projects can be used to build rules in the ontology models for access control systems.


2019 ◽  
Vol 4 (2) ◽  
pp. 207-220
Author(s):  
김기수 ◽  
Yukun Hahm ◽  
장유림 ◽  
Jaejin Yi ◽  
HONGHOI KIM

1987 ◽  
Author(s):  
A. A. Balasco ◽  
J. I. Stevens ◽  
T. J. Lamb ◽  
J. J. Stahr ◽  
L. R. Woodland

Author(s):  
Julia Gonschorek ◽  
Anja Langer ◽  
Benjamin Bernhardt ◽  
Caroline Räbiger

This article gives insight in a running dissertation at the University in Potsdam. Point of discussion is the spatial and temporal distribution of emergencies of German fire brigades that have not sufficiently been scientifically examined. The challenge is seen in Big Data: enormous amounts of data that exist now (or can be collected in the future) and whose variables are linked to one another. These analyses and visualizations can form a basis for strategic, operational and tactical planning, as well as prevention measures. The user-centered (geo-) visualization of fire brigade data accessible to the general public is a scientific contribution to the research topic 'geovisual analytics and geographical profiling'. It may supplement antiquated methods such as the so-called pinmaps as well as the areas of engagement that are freehand constructions in GIS. Considering police work, there are already numerous scientific projects, publications, and software solutions designed to meet the specific requirements of Crime Analysis and Crime Mapping. By adapting and extending these methods and techniques, civil security research can be tailored to the needs of fire departments. In this paper, a selection of appropriate visualization methods will be presented and discussed.


Sign in / Sign up

Export Citation Format

Share Document