scholarly journals DATA MINING AND MACHINE LEARNING IN ASTRONOMY

2010 ◽  
Vol 19 (07) ◽  
pp. 1049-1106 ◽  
Author(s):  
NICHOLAS M. BALL ◽  
ROBERT J. BRUNNER

We review the current state of data mining and machine learning in astronomy. Data Mining can have a somewhat mixed connotation from the point of view of a researcher in this field. If used correctly, it can be a powerful approach, holding the potential to fully exploit the exponentially increasing amount of available data, promising great scientific advance. However, if misused, it can be little more than the black box application of complex computing algorithms that may give little physical insight, and provide questionable results. Here, we give an overview of the entire data mining process, from data collection through to the interpretation of results. We cover common machine learning algorithms, such as artificial neural networks and support vector machines, applications from a broad range of astronomy, emphasizing those in which data mining techniques directly contributed to improving science, and important current and future directions, including probability density functions, parallel algorithms, Peta-Scale computing, and the time domain. We conclude that, so long as one carefully selects an appropriate algorithm and is guided by the astronomical problem at hand, data mining can be very much the powerful tool, and not the questionable black box.

Author(s):  
Baban. U. Rindhe ◽  
Nikita Ahire ◽  
Rupali Patil ◽  
Shweta Gagare ◽  
Manisha Darade

Heart-related diseases or Cardiovascular Diseases (CVDs) are the main reason for a huge number of death in the world over the last few decades and has emerged as the most life-threatening disease, not only in India but in the whole world. So, there is a need fora reliable, accurate, and feasible system to diagnose such diseases in time for proper treatment. Machine Learning algorithms and techniques have been applied to various medical datasets to automate the analysis of large and complex data. Many researchers, in recent times, have been using several machine learning techniques to help the health care industry and the professionals in the diagnosis of heart-related diseases. Heart is the next major organ comparing to the brain which has more priority in the Human body. It pumps the blood and supplies it to all organs of the whole body. Prediction of occurrences of heart diseases in the medical field is significant work. Data analytics is useful for prediction from more information and it helps the medical center to predict various diseases. A huge amount of patient-related data is maintained on monthly basis. The stored data can be useful for the source of predicting the occurrence of future diseases. Some of the data mining and machine learning techniques are used to predict heart diseases, such as Artificial Neural Network (ANN), Random Forest,and Support Vector Machine (SVM).Prediction and diagnosingof heart disease become a challenging factor faced by doctors and hospitals both in India and abroad. To reduce the large scale of deaths from heart diseases, a quick and efficient detection technique is to be discovered. Data mining techniques and machine learning algorithms play a very important role in this area. The researchers accelerating their research works to develop software with thehelp of machine learning algorithms which can help doctors to decide both prediction and diagnosing of heart disease. The main objective of this research project is to predict the heart disease of a patient using machine learning algorithms.


2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Babacar Gaye ◽  
Dezheng Zhang ◽  
Aziguli Wulamu

With the rapid development of the Internet and the rapid development of big data analysis technology, data mining has played a positive role in promoting industry and academia. Classification is an important problem in data mining. This paper explores the background and theory of support vector machines (SVM) in data mining classification algorithms and analyzes and summarizes the research status of various improved methods of SVM. According to the scale and characteristics of the data, different solution spaces are selected, and the solution of the dual problem is transformed into the classification surface of the original space to improve the algorithm speed. Research Process. Incorporating fuzzy membership into multicore learning, it is found that the time complexity of the original problem is determined by the dimension, and the time complexity of the dual problem is determined by the quantity, and the dimension and quantity constitute the scale of the data, so it can be based on the scale of the data Features Choose different solution spaces. The algorithm speed can be improved by transforming the solution of the dual problem into the classification surface of the original space. Conclusion. By improving the calculation rate of traditional machine learning algorithms, it is concluded that the accuracy of the fitting prediction between the predicted data and the actual value is as high as 98%, which can make the traditional machine learning algorithm meet the requirements of the big data era. It can be widely used in the context of big data.


2013 ◽  
Vol 4 (4) ◽  
pp. 47-57
Author(s):  
Yahya M. Tashtoush ◽  
Derar Darwish ◽  
Motasim Albdarneh ◽  
Izzat M. Alsmadi ◽  
Khalid Alkhatib

Readability metric is considered to be one of the most important factors that may affect games business in terms of evaluating games' quality in general and usability in particular. As games may go through many evolutions and developed by many developers, code readability can significantly impact the time and resources required to build, update or maintain such games. This paper introduces a new approach to detect readability for games built in Java or C++ for desktop and mobile environments. Based on data mining techniques, an approach for predicting the type of the game is proposed based on readability and some other software metrics or attributes. Another classifier is built to predict software readability in games applications based on several collected features. These classifiers are built using machine learning algorithms (J48 decision tree, support vector machine, SVM and Naive Bayes, NB) that are available in WEKA data mining tool.


Author(s):  
Selami Bagriyanik ◽  
Adem Karahoca

Cosmic Function Point (CFP) measurement errors leads budget, schedule and quality problems in software projects. Therefore, it’s important to identify and plan requirements engineers’ CFP training need quickly and correctly. The purpose of this paper is to identify software requirements engineers’ COSMIC Function Point measurement competence development need by using machine learning algorithms and requirements artifacts created by engineers. Used artifacts have been provided by a large service and technology company ecosystem in Telco. First, feature set has been extracted from the requirements model at hand. To do the data preparation for educational data mining, requirements and COSMIC Function Point (CFP) audit documents have been converted into CFP data set based on the designed feature set. This data set has been used to train and test the machine learning models by designing two different experiment settings to reach statistically significant results. Ten different machine learning algorithms have been used. Finally, algorithm performances have been compared with a baseline and each other to find the best performing models on this data set. In conclusion, REPTree, OneR, and Support Vector Machines (SVM) with Sequential Minimal Optimization (SMO) algorithms achieved top performance in forecasting requirements engineers’ CFP training need.


Author(s):  
Meenu Gupta ◽  
Vijender Kumar Solanki ◽  
Vijay Kumar Singh ◽  
Vicente García-Díaz

Data mining is used in various domains of research to identify a new cause for tan effect in the society over the globe. This article includes the same reason for using the data mining to identify the Accident Occurrences in different regions and to identify the most valid reason for happening accidents over the globe. Data Mining and Advanced Machine Learning algorithms are used in this research approach and this article discusses about hyperline, classifications, pre-processing of the data, training the machine with the sample datasets which are collected from different regions in which we have structural and semi-structural data. We will dive into deep of machine learning and data mining classification algorithms to find or predict something novel about the accident occurrences over the globe. We majorly concentrate on two classification algorithms to minify the research and task and they are very basic and important classification algorithms. SVM (Support vector machine), CNB Classifier. This discussion will be quite interesting with WEKA tool for CNB classifier, Bag of Words Identification, Word Count and Frequency Calculation.


Author(s):  
Shler Farhad Khorshid ◽  
Adnan Mohsin Abdulazeez ◽  
Amira Bibo Sallow

Breast cancer is one of the most common diseases among women, accounting for many deaths each year. Even though cancer can be treated and cured in its early stages, many patients are diagnosed at a late stage. Data mining is the method of finding or extracting information from massive databases or datasets, and it is a field of computer science with a lot of potentials. It covers a wide range of areas, one of which is classification. Classification may also be accomplished using a variety of methods or algorithms. With the aid of MATLAB, five classification algorithms were compared. This paper presents a performance comparison among the classifiers: Support Vector Machine (SVM), Logistics Regression (LR), K-Nearest Neighbors (K-NN), Weighted K-Nearest Neighbors (Weighted K-NN), and Gaussian Naïve Bayes (Gaussian NB). The data set was taken from UCI Machine learning Repository. The main objective of this study is to classify breast cancer women using the application of machine learning algorithms based on their accuracy. The results have revealed that Weighted K-NN (96.7%) has the highest accuracy among all the classifiers.


Entropy ◽  
2021 ◽  
Vol 23 (4) ◽  
pp. 485 ◽  
Author(s):  
Carlos A. Palacios ◽  
José A. Reyes-Suárez ◽  
Lorena A. Bearzotti ◽  
Víctor Leiva ◽  
Carolina Marchant

Data mining is employed to extract useful information and to detect patterns from often large data sets, closely related to knowledge discovery in databases and data science. In this investigation, we formulate models based on machine learning algorithms to extract relevant information predicting student retention at various levels, using higher education data and specifying the relevant variables involved in the modeling. Then, we utilize this information to help the process of knowledge discovery. We predict student retention at each of three levels during their first, second, and third years of study, obtaining models with an accuracy that exceeds 80% in all scenarios. These models allow us to adequately predict the level when dropout occurs. Among the machine learning algorithms used in this work are: decision trees, k-nearest neighbors, logistic regression, naive Bayes, random forest, and support vector machines, of which the random forest technique performs the best. We detect that secondary educational score and the community poverty index are important predictive variables, which have not been previously reported in educational studies of this type. The dropout assessment at various levels reported here is valid for higher education institutions around the world with similar conditions to the Chilean case, where dropout rates affect the efficiency of such institutions. Having the ability to predict dropout based on student’s data enables these institutions to take preventative measures, avoiding the dropouts. In the case study, balancing the majority and minority classes improves the performance of the algorithms.


2020 ◽  
Vol 12 (2) ◽  
pp. 84-99
Author(s):  
Li-Pang Chen

In this paper, we investigate analysis and prediction of the time-dependent data. We focus our attention on four different stocks are selected from Yahoo Finance historical database. To build up models and predict the future stock price, we consider three different machine learning techniques including Long Short-Term Memory (LSTM), Convolutional Neural Networks (CNN) and Support Vector Regression (SVR). By treating close price, open price, daily low, daily high, adjusted close price, and volume of trades as predictors in machine learning methods, it can be shown that the prediction accuracy is improved.


Author(s):  
Anantvir Singh Romana

Accurate diagnostic detection of the disease in a patient is critical and may alter the subsequent treatment and increase the chances of survival rate. Machine learning techniques have been instrumental in disease detection and are currently being used in various classification problems due to their accurate prediction performance. Various techniques may provide different desired accuracies and it is therefore imperative to use the most suitable method which provides the best desired results. This research seeks to provide comparative analysis of Support Vector Machine, Naïve bayes, J48 Decision Tree and neural network classifiers breast cancer and diabetes datsets.


2021 ◽  
Vol 186 (Supplement_1) ◽  
pp. 445-451
Author(s):  
Yifei Sun ◽  
Navid Rashedi ◽  
Vikrant Vaze ◽  
Parikshit Shah ◽  
Ryan Halter ◽  
...  

ABSTRACT Introduction Early prediction of the acute hypotensive episode (AHE) in critically ill patients has the potential to improve outcomes. In this study, we apply different machine learning algorithms to the MIMIC III Physionet dataset, containing more than 60,000 real-world intensive care unit records, to test commonly used machine learning technologies and compare their performances. Materials and Methods Five classification methods including K-nearest neighbor, logistic regression, support vector machine, random forest, and a deep learning method called long short-term memory are applied to predict an AHE 30 minutes in advance. An analysis comparing model performance when including versus excluding invasive features was conducted. To further study the pattern of the underlying mean arterial pressure (MAP), we apply a regression method to predict the continuous MAP values using linear regression over the next 60 minutes. Results Support vector machine yields the best performance in terms of recall (84%). Including the invasive features in the classification improves the performance significantly with both recall and precision increasing by more than 20 percentage points. We were able to predict the MAP with a root mean square error (a frequently used measure of the differences between the predicted values and the observed values) of 10 mmHg 60 minutes in the future. After converting continuous MAP predictions into AHE binary predictions, we achieve a 91% recall and 68% precision. In addition to predicting AHE, the MAP predictions provide clinically useful information regarding the timing and severity of the AHE occurrence. Conclusion We were able to predict AHE with precision and recall above 80% 30 minutes in advance with the large real-world dataset. The prediction of regression model can provide a more fine-grained, interpretable signal to practitioners. Model performance is improved by the inclusion of invasive features in predicting AHE, when compared to predicting the AHE based on only the available, restricted set of noninvasive technologies. This demonstrates the importance of exploring more noninvasive technologies for AHE prediction.


Sign in / Sign up

Export Citation Format

Share Document