scholarly journals Technical challenges and perspectives in batch and stream big data machine learning

2017 ◽  
Vol 7 (1.3) ◽  
pp. 48 ◽  
Author(s):  
KVSN Rama Rao ◽  
Sivakannan S ◽  
M.A. Prasad ◽  
R. Agilesh Saravanan

Machine Learning is playing a predominant role across various domains. However traditional Machine Learning algorithms are becoming unsuitable for majority of applications as the data is acquiring new characteristics. Sensors, devices, servers, Internet, Social Networking, Smart phones and Internet of Things are contributing the major sources of data. Hence there is a paradigm shift in the Machine learning with the advent of Big Data. Research works are in evolution to deal with Big Data Batch and stream real time data. In this paper, we highlighted several research works that contributed towards Big Data Machine Learning.

2021 ◽  
Author(s):  
Rodrigo Chamusca Machado ◽  
Fabbio Leite ◽  
Cristiano Xavier ◽  
Alberto Albuquerque ◽  
Samuel Lima ◽  
...  

Objectives/Scope This paper presents how a brazilian Drilling Contractor and a startup built a partnership to optimize the maintenance window of subsea blowout preventers (BOPs) using condition-based maintenance (CBM). It showcases examples of insights about the operational conditions of its components, which were obtained by applying machine learning techniques in real time and historic, structured or unstructured, data. Methods, Procedures, Process From unstructured and structured historical data, which are generated daily from BOP operations, a knowledge bank was built and used to develop normal functioning models. This has been possible even without real-time data, as it has been tested with large sets of operational data collected from event log text files. Software retrieves the data from Event Loggers and creates structured database, comprising analog variables, warnings, alarms and system information. Using machine learning algorithms, the historical data is then used to develop normal behavior modeling for the target components. Thereby, it is possible to use the event logger or real time data to identify abnormal operation moments and detect failure patterns. Critical situations are immediately transmitted to the RTOC (Real-time Operations Center) and management team, while less critical alerts are recorded in the system for further investigation. Results, Observations, Conclusions During the implementation period, Drilling Contractor was able to identify a BOP failure using the detection algorithms and used 100% of the information generated by the system and reports to efficiently plan for equipment maintenance. The system has also been intensively used for incident investigation, helping to identify root causes through data analytics and retro-feeding the machine learning algorithms for future automated failure predictions. This development is expected to significantly reduce the risk of BOP retrieval during the operation for corrective maintenance, increased staff efficiency in maintenance activities, reducing the risk of downtime and improving the scope of maintenance during operational windows, and finally reduction in the cost of spare parts replacementduring maintenance without impact on operational safety. Novel/Additive Information For the near future, the plan is to integrate the system with the Computerized Maintenance Management System (CMMS), checking for historical maintenance, overdue maintenance, certifications, at the same place and time that we are getting real-time operational data and insights. Using real-time data as input, we expect to expand the failure prediction application for other BOP parts (such as regulators, shuttle valves, SPMs (Submounted Plate valves), etc) and increase the applicability for other critical equipment on the rig.


Author(s):  
Amitava Choudhury ◽  
Kalpana Rangra

Data type and amount in human society is growing at an amazing speed, which is caused by emerging new services such as cloud computing, internet of things, and location-based services. The era of big data has arrived. As data has been a fundamental resource, how to manage and utilize big data better has attracted much attention. Especially with the development of the internet of things, how to process a large amount of real-time data has become a great challenge in research and applications. Recently, cloud computing technology has attracted much attention to high performance, but how to use cloud computing technology for large-scale real-time data processing has not been studied. In this chapter, various big data processing techniques are discussed.


2021 ◽  
Vol 2021 ◽  
pp. 1-7
Author(s):  
Mohamed Ali Mohamed ◽  
Ibrahim Mahmoud El-henawy ◽  
Ahmad Salah

Sensors, satellites, mobile devices, social media, e-commerce, and the Internet, among others, saturate us with data. The Internet of Things, in particular, enables massive amounts of data to be generated more quickly. The Internet of Things is a term that describes the process of connecting computers, smart devices, and other data-generating equipment to a network and transmitting data. As a result, data is produced and updated on a regular basis to reflect changes in all areas and activities. As a consequence of this exponential growth of data, a new term and idea known as big data have been coined. Big data is required to illuminate the relationships between things, forecast future trends, and provide more information to decision-makers. The major problem at present, however, is how to effectively collect and evaluate massive amounts of diverse and complicated data. In some sectors or applications, machine learning models are the most frequently utilized methods for interpreting and analyzing data and obtaining important information. On their own, traditional machine learning methods are unable to successfully handle large data problems. This article gives an introduction to Spark architecture as a platform that machine learning methods may utilize to address issues regarding the design and execution of large data systems. This article focuses on three machine learning types, including regression, classification, and clustering, and how they can be applied on top of the Spark platform.


Author(s):  
Amit Kumar Tyagi ◽  
Poonam Chahal

With the recent development in technologies and integration of millions of internet of things devices, a lot of data is being generated every day (known as Big Data). This is required to improve the growth of several organizations or in applications like e-healthcare, etc. Also, we are entering into an era of smart world, where robotics is going to take place in most of the applications (to solve the world's problems). Implementing robotics in applications like medical, automobile, etc. is an aim/goal of computer vision. Computer vision (CV) is fulfilled by several components like artificial intelligence (AI), machine learning (ML), and deep learning (DL). Here, machine learning and deep learning techniques/algorithms are used to analyze Big Data. Today's various organizations like Google, Facebook, etc. are using ML techniques to search particular data or recommend any post. Hence, the requirement of a computer vision is fulfilled through these three terms: AI, ML, and DL.


Author(s):  
Mamoon Rashid ◽  
Harjeet Singh ◽  
Vishal Goyal ◽  
Nazir Ahmad ◽  
Neeraj Mogla

As the lot of data is getting generated and captured in Internet of Things (IoT)—based industrial devices which is real time and unstructured in nature. The IoT technology—based sensors are the effective solution for monitoring these industrial processes in an efficient way. However, the real—time data storage and its processing in IoT applications is still a big challenge. This chapter proposes a new big data pipeline solution for storing and processing IoT sensor data. The proposed big data processing platform uses Apache Flume for efficiently collecting and transferring large amounts of IoT data from Cloud—based server into Hadoop Distributed File System for storage of IoT—based sensor data. Apache Storm is to be used for processing this real—time data. Next, the authors propose the use of hybrid prediction model of Density-based spatial clustering of applications with noise (DBSCAN) to remove sensor data outliers and provide better accuracy fault detection in IoT Industrial processes by using Support Vector Machine (SVM) machine learning classification technique.


2021 ◽  
Author(s):  
Jasleen Kaur ◽  
Shruti Kapoor ◽  
Maninder Singh ◽  
Parvinderjit Singh Kohli ◽  
Urvinder Singh ◽  
...  

BACKGROUND Infectious diseases are the major cause of mortality across the globe. Tuberculosis is one such infectious disease which is in the top 10 deaths causing diseases in developing as well as developed countries. The biosensors have emerged as a promising approach to attain the early detection of the pathogenic infection with accuracy and precision. However, the main challenge with biosensors is real time data monitoring preferentially reversible and label free measurements of certain analytes. Integration of biosensor and Artificial Intelligence (AI) approach would enable better acquisition of patient’s data in real time manner enabling automatic detection and monitoring of Mycobacterium tuberculosis (M.tb.) at an early stage. Here we propose a biosensor based smart handheld device that can be designed for automatic detection and real time monitoring of M.tb from varied analytic sources including DNA, proteins and biochemical metabolites. The collected data would be continuously transferred to the connected cloud integrated with AI based clinical decision support systems (CDSS) which may consist of the machine learning based analysis model useful in studying the patterns of disease infestation, progression, early detection and treatment. The proposed system may get deployed in different collaborating centres for validation and collecting the real time data. OBJECTIVE To propose a biosensor based smart handheld device that can be designed for automatic detection and real time monitoring of M.tb from varied analytic sources including DNA, proteins and biochemical metabolites. METHODS The Major challenges for control and early detection of the Mycobacterium tuberculosis were studied based upon the literature survey. Based upon the observed challenges, the biosensor based smart handheld device has been proposed for automatic detection and real time monitoring of M.tb from varied analytic sources including DNA, proteins and biochemical metabolites. RESULTS In this viewpoint, we propose an application based novel approach of combining AI based machine learning algorithms on the real time data collected with the use of biosensor technology which can serve as a point of care system for early diagnosis of the disease which would be low cost, simple, responsive, measurable, can diagnose and distinguish between active and passive cases, include single patient visits, cause considerable inconvenience, can evaluate the cough sample, require minimum material aid and experienced staff, and is user-friendly. CONCLUSIONS In this viewpoint, we propose an application based novel approach of combining AI based machine learning algorithms on the real time data collected with the use of biosensor technology which can serve as a point of care system for early diagnosis of the disease which would be low cost, simple, responsive, measurable, can diagnose and distinguish between active and passive cases, include single patient visits, cause considerable inconvenience, can evaluate the cough sample, require minimum material aid and experienced staff, and is user-friendly.


Author(s):  
Amitava Choudhury ◽  
Kalpana Rangra

Data type and amount in human society is growing at an amazing speed, which is caused by emerging new services such as cloud computing, internet of things, and location-based services. The era of big data has arrived. As data has been a fundamental resource, how to manage and utilize big data better has attracted much attention. Especially with the development of the internet of things, how to process a large amount of real-time data has become a great challenge in research and applications. Recently, cloud computing technology has attracted much attention to high performance, but how to use cloud computing technology for large-scale real-time data processing has not been studied. In this chapter, various big data processing techniques are discussed.


2020 ◽  
Author(s):  
Sohini Sengupta ◽  
Sareeta Mugde

BACKGROUND India reported its first Covid-19 case on 30th Jan 2020 with no practically no significant rise noticed in the number of cases in the month of February but March2020 onwards there has been a huge escalation as has been the case with like many other countries the world over. This research paper analyses COVID -19 data initially at a global level and then drills down to the scenario obtained in India. Data is gathered from multiple data sources- several authentic government websites. Variables such as gender, geographical location, age etc. have been represented using Python and Data Visualization techniques. Getting insights on Trend pattern and time series analysis will bring more clarity to the current scenario as analysis is totally on real-time data(till 19th June). Time Series Analysis and other pattern-recognition techniques are deployed to bring more clarity to the current scenario as analysis is totally based on real-time data(till 19th June,2020) Finally we will use some machine learning algorithms and perform predictive analytics for the near future scenario. We are using a sigmoid model to give an estimate of the day on which we can expect the number of active cases to reach its peak and also when the curve will start to flatten. Strength of Sigmoid model lies in providing a count of date –this is unique feature of analysis in this paper. We are also using certain feature engineering techniques to transfer data into logarithmic scale for better comparison removing any data extremities or outliers. Certain feature engineering techniques have been used to transfer data into logarithmic scale as is affords better comparison removing any data extremities or outliers. Based on the predictions of the short-term interval, our model can be tuned to forecast long time intervals. Needless to mention there are a lot of factors responsible for the cases to come in the upcoming days. One factor being extent of adherence to the rules and restriction imposed by the Government by the citizens of the country. OBJECTIVE Prediction of the number of positive covid cases in the next few months . METHODS Machine Learning Model - Clustering Sigmoid Model RESULTS The model predicts maximum active cases at 258846. The curve flattens by day 154 i.e. 25th September and after that the curve goes down and the number of active cases eventually will decrease. CONCLUSIONS There are a lot of research works going on with respect to vaccines, economic dealings, precautions and reduction of Covid-19 cases. However currently we are at a mid-Covid situation. India along with many other countries are still witnessing upsurge in the number of cases at alarming rates on a daily basis. We have not yet reached the peak. Therefore cuff learning and downward growth are also yet to happen. Each day comes out with fresh information and large amount of data. Also there are many other predictive models using machine learning that beyond the scope of this paper. However at the end of the day it is only the precautionary measures we as responsible citizens can take that will help to flatten the curve. We can all join hands together and maintain all rules and regulations strictly. Maintaining social distancing, taking the lockdown seriously is the only key. This study is based on real time data and will be useful for certain key stakeholders like government officials, healthcare workers to prepare a combat plan along with stringent measures. Also the study will help mathematicians and statisticians to predict outbreak numbers more accurately.


Development of high web utilization made business procedures in a difficult manner. In request to dissect the online business un-organized and gigantic measure of information is unimaginable with the Traditional frameworks. Recent innovations propel the strategies for examination are made to break down a lot of the information utilizing the Big Data Techniques, and to improve the adaptability and the precision of investigating the business methodologies, it has actualized on Hadoop with parallel preparing. This paper presents the experimental study on IBM real time data of one lakh records for demonstrating the efficiency of proposed Hadoop based distributed query processing technique.


Sign in / Sign up

Export Citation Format

Share Document