Review on Application of Machine learning Algorithm for Data Science

Author(s):  
Kushal Rashmikant Dalal

Visualization ensures the modern expectation of all forms of data. It is important to understand the data and its statistical variance graphically. Visualization on crime data would be supportive to analyze and prevent the threats in society. According to recent surveys and records, India has undergone many crime issues which occur on women. In order to prevent and analyze the crime issues against women, Data visualization is a useful approach to deal with it. The current data technologies available are appropriate to accomplish the task of visualization for women safety. Efficient visualization with effective machine learning algorithm and its performance finds the response for data related requests in the field of data science. This paper clarifies the details of crime against women through a graphical approach and illustrates about how to notify the unsafe levels by alert to safeguard the women


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Minoru Kusaba ◽  
Chang Liu ◽  
Yukinori Koyama ◽  
Kiyoyuki Terakura ◽  
Ryo Yoshida

AbstractIn 1869, the first draft of the periodic table was published by Russian chemist Dmitri Mendeleev. In terms of data science, his achievement can be viewed as a successful example of feature embedding based on human cognition: chemical properties of all known elements at that time were compressed onto the two-dimensional grid system for a tabular display. In this study, we seek to answer the question of whether machine learning can reproduce or recreate the periodic table by using observed physicochemical properties of the elements. To achieve this goal, we developed a periodic table generator (PTG). The PTG is an unsupervised machine learning algorithm based on the generative topographic mapping, which can automate the translation of high-dimensional data into a tabular form with varying layouts on-demand. The PTG autonomously produced various arrangements of chemical symbols, which organized a two-dimensional array such as Mendeleev’s periodic table or three-dimensional spiral table according to the underlying periodicity in the given data. We further showed what the PTG learned from the element data and how the element features, such as melting point and electronegativity, are compressed to the lower-dimensional latent spaces.


Author(s):  
Nilesh Kumar Sahu ◽  
Manorama Patnaik ◽  
Itu Snigdh

The precision of any machine learning algorithm depends on the data set, its suitability, and its volume. Therefore, data and its characteristics have currently become the predominant components of any predictive or precision-based domain like machine learning. Feature engineering refers to the process of changing and preparing this input data so that it is ready for training machine learning models. Several features such as categorical, numerical, mixed, date, and time are to be considered for feature extraction in feature engineering. Datasets containing characteristics such as cardinality, missing data, and rare labels for categorical features, distribution, outliers, and magnitude are currently considered as features. This chapter discusses various data types and their techniques for applying to feature engineering. This chapter also focuses on the implementation of various data techniques for feature extraction.


Author(s):  
Man Tianxing ◽  
Ildar Raisovich Baimuratov ◽  
Natalia Alexandrovna Zhukova

With the development of the Big Data, data analysis technology has been actively developed, and now it is used in various subject fields. More and more non-computer professional researchers use machine learning algorithms in their work. Unfortunately, datasets can be messy and knowledge cannot be directly extracted, which is why they need preprocessing. Because of the diversity of the algorithms, it is difficult for researchers to find the most suitable algorithm. Most of them choose algorithms through their intuition. The result is often unsatisfactory. Therefore, this article proposes a recommendation system for data processing. This system consists of an ontology subsystem and an estimation subsystem. Ontology technology is used to represent machine learning algorithm taxonomy, and information-theoretic based criteria are used to form recommendations. This system helps users to apply data processing algorithms without specific knowledge from the data science field.


2021 ◽  
Author(s):  
Bogi Haryo Nugroho ◽  
Brahmantiyo Aji Sumarto ◽  
Muhammad Arief Joenaedy ◽  
Huda Jassim Al-Aradi ◽  
Pajar Rahman Achmad

Abstract Objective/scope It has been a challenge to analyze and estimate reliable water cut. The current well test data is not sufficient to satisfy the required information for prediction of the rate and water cut behaviors. Only on wells having stable and good behaviors, water cut levels can be estimated appropriately. The wells have Electrical Submersible Pump (ESP) sensor reading and data acquisition recorded in real-time help to fill this gap. The data are stored and available in KOC data repositories, such as Corporate Database, Well Surveillance Management System (WSMS), and Artificial Lift Management System (ALMS) Engineers spend this effort in spreadsheets and working with multiple data repositories. It is fit for data analysis by combining the data into a simple data set and presentation. Nevertheless, spreadsheets do not address a number of important tasks in a typical analyst's pipeline, and their design frequently complicates the analyses. It may take hours for single well analysis and days for multi-wells analysis and could be too late to plan and take preventive actions. Concerning the above situation, collaboration has been performed between NFD-North Kuwait and Information Management Team. In this first phase, this initiative is to design a conceptual integrated preventive system, which provide easy and quick tool to compute water cut estimation from well tests and downhole sensors data by using data science approach. Method, procedure, process There are 5 steps were applied in this initial work. It was included but not limited to user interview, exercise and performed data dissemination. It included gather full knowledge and defining the goal. Mapping pain points to solution also conducted to identify the technical challenge and find ways to overcome them. In the end of this stage, data and process review was conducted and applied for a given simple example to understand the requirements, demonstrate technical functionality and verify technical feasibility. Then conceptual design was built based on the requirements, features, and solutions gathered. Integrated system solution was recommended to include intermediate layer for integration, data retrieval, running calculation-heavy process in background, model optimization, visual analytics, decision-making, and automation. A roadmap with complete planning of different phases is then provided to achieve the objective. Results, observations, conclusions Process, functionalities, requirements, and finding have been examined and elaborated. The conceptual design has proved and assured the utilization of ESP sensor data in helping to estimate continuous well water cut's behavior. Further, the next implementation phase of data science expects an increase of confidence level of the results into higher degree. The design is promising to achieve the requirement to provide seamless, scalable, and easy to deploy automation capability tools for data analytic workflow with several major business benefits arising. Proposed solution includes combination of technologies, implementation services, and project management. The proposed technology components are distributed into 3 layers, source data, data science layer, and visual analytics layer. Furthermore, a roadmap of the project along with the recommendation for each phase has also been included. Novel/additive information Data Science for Exploration and Production is new area in which research and development will be required. Data science driven approach and application of digital transformation enables an integrated preventive system providing solution to compute water cut estimation from well tests and downhole sensors data. In the next larger scale of implementation, this system is expected to provide automated workflow supporting engineers in their daily tasks leveraging Data to Decision (D2D) approach. Machine learning is a data analytics technique that teaches computers to do what comes naturally to human, which is learn from experience. Machine learning algorithm use computational methods to learn information from the data without relying on predetermined equation as a model. Adding artificial intelligence and machine learning capability into the process requires knowledge on input data, the impact of data on the output, understanding of machine learning algorithm and building the model required to meet the expected output.


2018 ◽  
Author(s):  
C.H.B. van Niftrik ◽  
F. van der Wouden ◽  
V. Staartjes ◽  
J. Fierstra ◽  
M. Stienen ◽  
...  

Author(s):  
Kunal Parikh ◽  
Tanvi Makadia ◽  
Harshil Patel

Dengue is unquestionably one of the biggest health concerns in India and for many other developing countries. Unfortunately, many people have lost their lives because of it. Every year, approximately 390 million dengue infections occur around the world among which 500,000 people are seriously infected and 25,000 people have died annually. Many factors could cause dengue such as temperature, humidity, precipitation, inadequate public health, and many others. In this paper, we are proposing a method to perform predictive analytics on dengue’s dataset using KNN: a machine-learning algorithm. This analysis would help in the prediction of future cases and we could save the lives of many.


Sign in / Sign up

Export Citation Format

Share Document