scholarly journals Photometric Redshifts With Machine Learning, Lights and Shadows on a Complex Data Science Use Case

Author(s):  
Massimo Brescia ◽  
Stefano Cavuoti ◽  
Oleksandra Razim ◽  
Valeria Amaro ◽  
Giuseppe Riccio ◽  
...  

The importance of the current role of data-driven science is constantly increasing within Astrophysics, due to the huge amount of multi-wavelength data collected every day, characterized by complex and high-volume information requiring efficient and, as much as possible, automated exploration tools. Furthermore, to accomplish main and legacy science objectives of future or incoming large and deep survey projects, such as James Webb Space Telescope (JWST), James Webb Space Telescope (LSST), and Euclid, a crucial role is played by an accurate estimation of photometric redshifts, whose knowledge would permit the detection and analysis of extended and peculiar sources by disentangling low-z from high-z sources and would contribute to solve the modern cosmological discrepancies. The recent photometric redshift data challenges, organized within several survey projects, like LSST and Euclid, pushed the exploitation of the observed multi-wavelength and multi-dimensional data or ad hoc simulated data to improve and optimize the photometric redshifts prediction and statistical characterization based on both Spectral Energy Distribution (SED) template fitting and machine learning methodologies. They also provided a new impetus in the investigation of hybrid and deep learning techniques, aimed at conjugating the positive peculiarities of different methodologies, thus optimizing the estimation accuracy and maximizing the photometric range coverage, which are particularly important in the high-z regime, where the spectroscopic ground truth is poorly available. In such a context, we summarize what was learned and proposed in more than a decade of research.

2020 ◽  
Author(s):  
Saeed Nosratabadi ◽  
Amir Mosavi ◽  
Puhong Duan ◽  
Pedram Ghamisi ◽  
Ferdinand Filip ◽  
...  

This paper provides a state-of-the-art investigation of advances in data science in emerging economic applications. The analysis was performed on novel data science methods in four individual classes of deep learning models, hybrid deep learning models, hybrid machine learning, and ensemble models. Application domains include a wide and diverse range of economics research from the stock market, marketing, and e-commerce to corporate banking and cryptocurrency. Prisma method, a systematic literature review methodology, was used to ensure the quality of the survey. The findings reveal that the trends follow the advancement of hybrid models, which, based on the accuracy metric, outperform other learning algorithms. It is further expected that the trends will converge toward the advancements of sophisticated hybrid deep learning models.


2020 ◽  
Author(s):  
Saeed Nosratabadi ◽  
Amir Mosavi ◽  
Puhong Duan ◽  
Pedram Ghamisi ◽  
Filip Ferdinand ◽  
...  

2020 ◽  
Vol 21 ◽  
Author(s):  
Sukanya Panja ◽  
Sarra Rahem ◽  
Cassandra J. Chu ◽  
Antonina Mitrofanova

Background: In recent years, the availability of high throughput technologies, establishment of large molecular patient data repositories, and advancement in computing power and storage have allowed elucidation of complex mechanisms implicated in therapeutic response in cancer patients. The breadth and depth of such data, alongside experimental noise and missing values, requires a sophisticated human-machine interaction that would allow effective learning from complex data and accurate forecasting of future outcomes, ideally embedded in the core of machine learning design. Objective: In this review, we will discuss machine learning techniques utilized for modeling of treatment response in cancer, including Random Forests, support vector machines, neural networks, and linear and logistic regression. We will overview their mathematical foundations and discuss their limitations and alternative approaches all in light of their application to therapeutic response modeling in cancer. Conclusion: We hypothesize that the increase in the number of patient profiles and potential temporal monitoring of patient data will define even more complex techniques, such as deep learning and causal analysis, as central players in therapeutic response modeling.


Author(s):  
Ritu Khandelwal ◽  
Hemlata Goyal ◽  
Rajveer Singh Shekhawat

Introduction: Machine learning is an intelligent technology that works as a bridge between businesses and data science. With the involvement of data science, the business goal focuses on findings to get valuable insights on available data. The large part of Indian Cinema is Bollywood which is a multi-million dollar industry. This paper attempts to predict whether the upcoming Bollywood Movie would be Blockbuster, Superhit, Hit, Average or Flop. For this Machine Learning techniques (classification and prediction) will be applied. To make classifier or prediction model first step is the learning stage in which we need to give the training data set to train the model by applying some technique or algorithm and after that different rules are generated which helps to make a model and predict future trends in different types of organizations. Methods: All the techniques related to classification and Prediction such as Support Vector Machine(SVM), Random Forest, Decision Tree, Naïve Bayes, Logistic Regression, Adaboost, and KNN will be applied and try to find out efficient and effective results. All these functionalities can be applied with GUI Based workflows available with various categories such as data, Visualize, Model, and Evaluate. Result: To make classifier or prediction model first step is learning stage in which we need to give the training data set to train the model by applying some technique or algorithm and after that different rules are generated which helps to make a model and predict future trends in different types of organizations Conclusion: This paper focuses on Comparative Analysis that would be performed based on different parameters such as Accuracy, Confusion Matrix to identify the best possible model for predicting the movie Success. By using Advertisement Propaganda, they can plan for the best time to release the movie according to the predicted success rate to gain higher benefits. Discussion: Data Mining is the process of discovering different patterns from large data sets and from that various relationships are also discovered to solve various problems that come in business and helps to predict the forthcoming trends. This Prediction can help Production Houses for Advertisement Propaganda and also they can plan their costs and by assuring these factors they can make the movie more profitable.


Author(s):  
Sumi Helal ◽  
Flavia C. Delicato ◽  
Cintia B. Margi ◽  
Satyajayant Misra ◽  
Markus Endler

Sensors ◽  
2021 ◽  
Vol 21 (14) ◽  
pp. 4736
Author(s):  
Sk. Tanzir Mehedi ◽  
Adnan Anwar ◽  
Ziaur Rahman ◽  
Kawsar Ahmed

The Controller Area Network (CAN) bus works as an important protocol in the real-time In-Vehicle Network (IVN) systems for its simple, suitable, and robust architecture. The risk of IVN devices has still been insecure and vulnerable due to the complex data-intensive architectures which greatly increase the accessibility to unauthorized networks and the possibility of various types of cyberattacks. Therefore, the detection of cyberattacks in IVN devices has become a growing interest. With the rapid development of IVNs and evolving threat types, the traditional machine learning-based IDS has to update to cope with the security requirements of the current environment. Nowadays, the progression of deep learning, deep transfer learning, and its impactful outcome in several areas has guided as an effective solution for network intrusion detection. This manuscript proposes a deep transfer learning-based IDS model for IVN along with improved performance in comparison to several other existing models. The unique contributions include effective attribute selection which is best suited to identify malicious CAN messages and accurately detect the normal and abnormal activities, designing a deep transfer learning-based LeNet model, and evaluating considering real-world data. To this end, an extensive experimental performance evaluation has been conducted. The architecture along with empirical analyses shows that the proposed IDS greatly improves the detection accuracy over the mainstream machine learning, deep learning, and benchmark deep transfer learning models and has demonstrated better performance for real-time IVN security.


Author(s):  
Yun Peng ◽  
Byron Choi ◽  
Jianliang Xu

AbstractGraphs have been widely used to represent complex data in many applications, such as e-commerce, social networks, and bioinformatics. Efficient and effective analysis of graph data is important for graph-based applications. However, most graph analysis tasks are combinatorial optimization (CO) problems, which are NP-hard. Recent studies have focused a lot on the potential of using machine learning (ML) to solve graph-based CO problems. Most recent methods follow the two-stage framework. The first stage is graph representation learning, which embeds the graphs into low-dimension vectors. The second stage uses machine learning to solve the CO problems using the embeddings of the graphs learned in the first stage. The works for the first stage can be classified into two categories, graph embedding methods and end-to-end learning methods. For graph embedding methods, the learning of the the embeddings of the graphs has its own objective, which may not rely on the CO problems to be solved. The CO problems are solved by independent downstream tasks. For end-to-end learning methods, the learning of the embeddings of the graphs does not have its own objective and is an intermediate step of the learning procedure of solving the CO problems. The works for the second stage can also be classified into two categories, non-autoregressive methods and autoregressive methods. Non-autoregressive methods predict a solution for a CO problem in one shot. A non-autoregressive method predicts a matrix that denotes the probability of each node/edge being a part of a solution of the CO problem. The solution can be computed from the matrix using search heuristics such as beam search. Autoregressive methods iteratively extend a partial solution step by step. At each step, an autoregressive method predicts a node/edge conditioned to current partial solution, which is used to its extension. In this survey, we provide a thorough overview of recent studies of the graph learning-based CO methods. The survey ends with several remarks on future research directions.


2007 ◽  
Author(s):  
Jonathan P. Gardner ◽  
Xiaohui Fan ◽  
Gillian Wilson ◽  
Massimo Stiavelli ◽  
Lisa J. Storrie-Lombardi ◽  
...  

2020 ◽  
Vol 36 ◽  
pp. 49-62
Author(s):  
Nureni Olawale Adeboye ◽  
Peter Osuolale Popoola ◽  
Oluwatobi Nurudeen Ogunnusi

Data science is a concept to unify statistics, data analysis, machine learning and their related methods in order to analyze actual phenomena with data to provide better understanding. This article focused its investigation on acquisition of data science skills in building partnership for efficient school curriculum delivery in Africa, especially in the area of teaching statistics courses at the beginners’ level in tertiary institutions. Illustrations were made using Big data of selected 18 African countries sourced from United Nations Educational, Scientific and Cultural Organization (UNESCO) with special focus on some macro-economic variables that drives economic policy. Data description techniques were adopted in the analysis of the sourced open data with the aid of R analytics software for data science, as improvement on the traditional methods of data description for learning and thus open a new charter of education curriculum delivery in African schools. Though, the collaboration is not without its own challenges, its prospects in creating self-driven learning culture among students of tertiary institutions has greatly enhanced the quality of teaching, advancing students skills in machine learning, improved understanding of the role of data in global perspective and being able to critique claims based on data.


Sign in / Sign up

Export Citation Format

Share Document