scholarly journals Understanding and optimizing the performance of distributed machine learning applications on apache spark

Author(s):  
Celestine Dunner ◽  
Thomas Parnell ◽  
Kubilay Atasu ◽  
Manolis Sifalakis ◽  
Haralampos Pozidis
Author(s):  
Ziyu Zhu ◽  
Xiaochun Tang ◽  
Quan Zhao

With the widespread using of GPU hardware facilities, more and more distributed machine learning applications have begun to use CPU-GPU hybrid cluster resources to improve the efficiency of algorithms. However, the existing distributed machine learning scheduling framework either only considers task scheduling on CPU resources or only considers task scheduling on GPU resources. Even considering the difference between CPU and GPU resources, it is difficult to improve the resource usage of the entire system. In other words, the key challenge in using CPU-GPU clusters for distributed machine learning jobs is how to efficiently schedule tasks in the job. In the full paper, we propose a CPU-GPU hybrid cluster schedule framework in detail. First, according to the different characteristics of the computing power of the CPU and the computing power of the GPU, the data is divided into data fragments of different sizes to adapt to CPU and GPU computing resources. Second, the paper introduces the task scheduling method under the CPU-GPU hybrid. Finally, the proposed method is verified at the end of the paper. After our verification for K-Means, using the CPU-GPU hybrid computing framework can increase the performance of K-Means by about 1.5 times. As the number of GPUs increases, the performance of K-Means can be significantly improved.


2021 ◽  
Author(s):  
Yunyan Guo ◽  
Zhipeng Zhang ◽  
Jiawei Jiang ◽  
Wentao Wu ◽  
Ce Zhang ◽  
...  

Sensors ◽  
2021 ◽  
Vol 21 (9) ◽  
pp. 2993
Author(s):  
Ebtesam Alomari ◽  
Iyad Katib ◽  
Aiiad Albeshri ◽  
Tan Yigitcanlar ◽  
Rashid Mehmood

Digital societies could be characterized by their increasing desire to express themselves and interact with others. This is being realized through digital platforms such as social media that have increasingly become convenient and inexpensive sensors compared to physical sensors in many sectors of smart societies. One such major sector is road transportation, which is the backbone of modern economies and costs globally 1.25 million deaths and 50 million human injuries annually. The cutting-edge on big data-enabled social media analytics for transportation-related studies is limited. This paper brings a range of technologies together to detect road traffic-related events using big data and distributed machine learning. The most specific contribution of this research is an automatic labelling method for machine learning-based traffic-related event detection from Twitter data in the Arabic language. The proposed method has been implemented in a software tool called Iktishaf+ (an Arabic word meaning discovery) that is able to detect traffic events automatically from tweets in the Arabic language using distributed machine learning over Apache Spark. The tool is built using nine components and a range of technologies including Apache Spark, Parquet, and MongoDB. Iktishaf+ uses a light stemmer for the Arabic language developed by us. We also use in this work a location extractor developed by us that allows us to extract and visualize spatio-temporal information about the detected events. The specific data used in this work comprises 33.5 million tweets collected from Saudi Arabia using the Twitter API. Using support vector machines, naïve Bayes, and logistic regression-based classifiers, we are able to detect and validate several real events in Saudi Arabia without prior knowledge, including a fire in Jeddah, rains in Makkah, and an accident in Riyadh. The findings show the effectiveness of Twitter media in detecting important events with no prior knowledge about them.


Author(s):  
Tausifa Jan Saleem ◽  
Mohammad Ahsan Chishti

The rapid progress in domains like machine learning, and big data has created plenty of opportunities in data-driven applications particularly healthcare. Incorporating machine intelligence in healthcare can result in breakthroughs like precise disease diagnosis, novel methods of treatment, remote healthcare monitoring, drug discovery, and curtailment in healthcare costs. The implementation of machine intelligence algorithms on the massive healthcare datasets is computationally expensive. However, consequential progress in computational power during recent years has facilitated the deployment of machine intelligence algorithms in healthcare applications. Motivated to explore these applications, this paper presents a review of research works dedicated to the implementation of machine learning on healthcare datasets. The studies that were conducted have been categorized into following groups (a) disease diagnosis and detection, (b) disease risk prediction, (c) health monitoring, (d) healthcare related discoveries, and (e) epidemic outbreak prediction. The objective of the research is to help the researchers in this field to get a comprehensive overview of the machine learning applications in healthcare. Apart from revealing the potential of machine learning in healthcare, this paper will serve as a motivation to foster advanced research in the domain of machine intelligence-driven healthcare.


Author(s):  
Ivan Herreros

This chapter discusses basic concepts from control theory and machine learning to facilitate a formal understanding of animal learning and motor control. It first distinguishes between feedback and feed-forward control strategies, and later introduces the classification of machine learning applications into supervised, unsupervised, and reinforcement learning problems. Next, it links these concepts with their counterparts in the domain of the psychology of animal learning, highlighting the analogies between supervised learning and classical conditioning, reinforcement learning and operant conditioning, and between unsupervised and perceptual learning. Additionally, it interprets innate and acquired actions from the standpoint of feedback vs anticipatory and adaptive control. Finally, it argues how this framework of translating knowledge between formal and biological disciplines can serve us to not only structure and advance our understanding of brain function but also enrich engineering solutions at the level of robot learning and control with insights coming from biology.


Author(s):  
Muhammad Junaid ◽  
Shiraz Ali Wagan ◽  
Nawab Muhammad Faseeh Qureshi ◽  
Choon Sung Nam ◽  
Dong Ryeol Shin

2021 ◽  
Vol 3 (2) ◽  
pp. 392-413
Author(s):  
Stefan Studer ◽  
Thanh Binh Bui ◽  
Christian Drescher ◽  
Alexander Hanuschkin ◽  
Ludwig Winkler ◽  
...  

Machine learning is an established and frequently used technique in industry and academia, but a standard process model to improve success and efficiency of machine learning applications is still missing. Project organizations and machine learning practitioners face manifold challenges and risks when developing machine learning applications and have a need for guidance to meet business expectations. This paper therefore proposes a process model for the development of machine learning applications, covering six phases from defining the scope to maintaining the deployed machine learning application. Business and data understanding are executed simultaneously in the first phase, as both have considerable impact on the feasibility of the project. The next phases are comprised of data preparation, modeling, evaluation, and deployment. Special focus is applied to the last phase, as a model running in changing real-time environments requires close monitoring and maintenance to reduce the risk of performance degradation over time. With each task of the process, this work proposes quality assurance methodology that is suitable to address challenges in machine learning development that are identified in the form of risks. The methodology is drawn from practical experience and scientific literature, and has proven to be general and stable. The process model expands on CRISP-DM, a data mining process model that enjoys strong industry support, but fails to address machine learning specific tasks. The presented work proposes an industry- and application-neutral process model tailored for machine learning applications with a focus on technical tasks for quality assurance.


Sign in / Sign up

Export Citation Format

Share Document