scholarly journals High-Performance Extreme Learning Machines: A Complete Toolbox for Big Data Applications

IEEE Access ◽  
2015 ◽  
Vol 3 ◽  
pp. 1011-1025 ◽  
Author(s):  
Anton Akusok ◽  
Kaj-Mikael Bjork ◽  
Yoan Miche ◽  
Amaury Lendasse
2019 ◽  
Vol 6 (1) ◽  
Author(s):  
Mahdi Torabzadehkashi ◽  
Siavash Rezaei ◽  
Ali HeydariGorji ◽  
Hosein Bobarshad ◽  
Vladimir Alves ◽  
...  

AbstractIn the era of big data applications, the demand for more sophisticated data centers and high-performance data processing mechanisms is increasing drastically. Data are originally stored in storage systems. To process data, application servers need to fetch them from storage devices, which imposes the cost of moving data to the system. This cost has a direct relation with the distance of processing engines from the data. This is the key motivation for the emergence of distributed processing platforms such as Hadoop, which move process closer to data. Computational storage devices (CSDs) push the “move process to data” paradigm to its ultimate boundaries by deploying embedded processing engines inside storage devices to process data. In this paper, we introduce Catalina, an efficient and flexible computational storage platform, that provides a seamless environment to process data in-place. Catalina is the first CSD equipped with a dedicated application processor running a full-fledged operating system that provides filesystem-level data access for the applications. Thus, a vast spectrum of applications can be ported for running on Catalina CSDs. Due to these unique features, to the best of our knowledge, Catalina CSD is the only in-storage processing platform that can be seamlessly deployed in clusters to run distributed applications such as Hadoop MapReduce and HPC applications in-place without any modifications on the underlying distributed processing framework. For the proof of concept, we build a fully functional Catalina prototype and a CSD-equipped platform using 16 Catalina CSDs to run Intel HiBench Hadoop and HPC benchmarks to investigate the benefits of deploying Catalina CSDs in the distributed processing environments. The experimental results show up to 2.2× improvement in performance and 4.3× reduction in energy consumption, respectively, for running Hadoop MapReduce benchmarks. Additionally, thanks to the Neon SIMD engines, the performance and energy efficiency of DFT algorithms are improved up to 5.4× and 8.9×, respectively.


Author(s):  
Javier Conejero ◽  
Sandra Corella ◽  
Rosa M Badia ◽  
Jesus Labarta

Task-based programming has proven to be a suitable model for high-performance computing (HPC) applications. Different implementations have been good demonstrators of this fact and have promoted the acceptance of task-based programming in the OpenMP standard. Furthermore, in recent years, Apache Spark has gained wide popularity in business and research environments as a programming model for addressing emerging big data problems. COMP Superscalar (COMPSs) is a task-based environment that tackles distributed computing (including Clouds) and is a good alternative for a task-based programming model for big data applications. This article describes why we consider that task-based programming models are a good approach for big data applications. The article includes a comparison of Spark and COMPSs in terms of architecture, programming model, and performance. It focuses on the differences that both frameworks have in structural terms, on their programmability interface, and in terms of their efficiency by means of three widely known benchmarking kernels: Wordcount, Kmeans, and Terasort. These kernels enable the evaluation of the more important functionalities of both programming models and analyze different work flows and conditions. The main results achieved from this comparison are (1) COMPSs is able to extract the inherent parallelism from the user code with minimal coding effort as opposed to Spark, which requires the existing algorithms to be adapted and rewritten by explicitly using their predefined functions, (2) it is an improvement in terms of performance when compared with Spark, and (3) COMPSs has shown to scale better than Spark in most cases. Finally, we discuss the advantages and disadvantages of both frameworks, highlighting the differences that make them unique, thereby helping to choose the right framework for each particular objective.


Author(s):  
Chandu Thota ◽  
Gunasekaran Manogaran ◽  
Daphne Lopez ◽  
Revathi Sundarasekar

Cloud Computing is a new computing model that distributes the computation on a resource pool. The need for a scalable database capable of expanding to accommodate growth has increased with the growing data in web world. More familiar Cloud Computing vendors such as Amazon Web Services, Microsoft, Google, IBM and Rackspace offer cloud based Hadoop and NoSQL database platforms to process Big Data applications. Variety of services are available that run on top of cloud platforms freeing users from the need to deploy their own systems. Nowadays, integrating Big Data and various cloud deployment models is major concern for Internet companies especially software and data services vendors that are just getting started themselves. This chapter proposes an efficient architecture for integration with comprehensive capabilities including real time and bulk data movement, bi-directional replication, metadata management, high performance transformation, data services and data quality for customer and product domains.


Electronics ◽  
2021 ◽  
Vol 10 (5) ◽  
pp. 552
Author(s):  
Ania Cravero ◽  
Samuel Sepúlveda

The data generated in modern agricultural operations are provided by diverse elements, which allow a better understanding of the dynamic conditions of the crop, soil and climate, which indicates that these processes will be increasingly data-driven. Big Data and Machine Learning (ML) have emerged as high-performance computing technologies to create new opportunities to unravel, quantify and understand agricultural processes through data. However, there are many challenges to achieve the integration of these technologies. It implies making some adaptations to ML for using it with Big Data. These adaptations must consider the increasing volume of data, its variety and the transmission speed issues. This paper provides information on the use of Big Data and ML for agriculture, identifying challenges, adaptations and the design of architectures for these systems. We conducted a Systematic Literature Review (SLR), which allowed us to analyze 34 real cases applied in agriculture. This review may be of interest to computer or data scientists and electronic or software engineers. The results show that manipulating large volumes of data is no longer a challenge due to Cloud technologies. There are still challenges regarding (1) processing speed due to little control of the data in its different stages, raw, semi-processed and processed data (value data); (2) information visualization systems, which support technical data little understood by farmers.


Sign in / Sign up

Export Citation Format

Share Document