scholarly journals Introduction to Computational Psychometrics: Towards a Principled Integration of Data Science and Machine Learning Techniques into Psychometrics

Author(s):  
Alina A. von Davier ◽  
Robert J. Mislevy ◽  
Jiangang Hao
Author(s):  
Ritu Khandelwal ◽  
Hemlata Goyal ◽  
Rajveer Singh Shekhawat

Introduction: Machine learning is an intelligent technology that works as a bridge between businesses and data science. With the involvement of data science, the business goal focuses on findings to get valuable insights on available data. The large part of Indian Cinema is Bollywood which is a multi-million dollar industry. This paper attempts to predict whether the upcoming Bollywood Movie would be Blockbuster, Superhit, Hit, Average or Flop. For this Machine Learning techniques (classification and prediction) will be applied. To make classifier or prediction model first step is the learning stage in which we need to give the training data set to train the model by applying some technique or algorithm and after that different rules are generated which helps to make a model and predict future trends in different types of organizations. Methods: All the techniques related to classification and Prediction such as Support Vector Machine(SVM), Random Forest, Decision Tree, Naïve Bayes, Logistic Regression, Adaboost, and KNN will be applied and try to find out efficient and effective results. All these functionalities can be applied with GUI Based workflows available with various categories such as data, Visualize, Model, and Evaluate. Result: To make classifier or prediction model first step is learning stage in which we need to give the training data set to train the model by applying some technique or algorithm and after that different rules are generated which helps to make a model and predict future trends in different types of organizations Conclusion: This paper focuses on Comparative Analysis that would be performed based on different parameters such as Accuracy, Confusion Matrix to identify the best possible model for predicting the movie Success. By using Advertisement Propaganda, they can plan for the best time to release the movie according to the predicted success rate to gain higher benefits. Discussion: Data Mining is the process of discovering different patterns from large data sets and from that various relationships are also discovered to solve various problems that come in business and helps to predict the forthcoming trends. This Prediction can help Production Houses for Advertisement Propaganda and also they can plan their costs and by assuring these factors they can make the movie more profitable.


Author(s):  
P. Priakanth ◽  
S. Gopikrishnan

The idea of an intelligent, independent learning machine has fascinated humans for decades. The philosophy behind machine learning is to automate the creation of analytical models in order to enable algorithms to learn continuously with the help of available data. Since IoT will be among the major sources of new data, data science will make a great contribution to make IoT applications more intelligent. Machine learning can be applied in cases where the desired outcome is known (guided learning) or the data is not known beforehand (unguided learning) or the learning is the result of interaction between a model and the environment (reinforcement learning). This chapter answers the questions: How could machine learning algorithms be applied to IoT smart data? What is the taxonomy of machine learning algorithms that can be adopted in IoT? And what are IoT data characteristics in real-world which requires data analytics?


Author(s):  
P. Priakanth ◽  
S. Gopikrishnan

The idea of an intelligent, independent learning machine has fascinated humans for decades. The philosophy behind machine learning is to automate the creation of analytical models in order to enable algorithms to learn continuously with the help of available data. Since IoT will be among the major sources of new data, data science will make a great contribution to make IoT applications more intelligent. Machine learning can be applied in cases where the desired outcome is known (guided learning) or the data is not known beforehand (unguided learning) or the learning is the result of interaction between a model and the environment (reinforcement learning). This chapter answers the questions: How could machine learning algorithms be applied to IoT smart data? What is the taxonomy of machine learning algorithms that can be adopted in IoT? And what are IoT data characteristics in real-world which requires data analytics?


Author(s):  
Jonathan M. Gumley ◽  
Hayden Marcollo ◽  
Stuart Wales ◽  
Andrew E. Potts ◽  
Christopher J. Carra

Abstract There is growing importance in the offshore floating production sector to develop reliable and robust means of continuously monitoring the integrity of mooring systems for FPSOs and FPUs, particularly in light of the upcoming introduction of API-RP-2MIM. Here, the limitations of the current range of monitoring techniques are discussed, including well established technologies such as load cells, sonar, or visual inspection, within the context of the growing mainstream acceptance of data science and machine learning. Due to the large fleet of floating production platforms currently in service, there is a need for a readily deployable solution that can be retrofitted to existing platforms to passively monitor the performance of floating assets on their moorings, for which machine learning based systems have particular advantages. An earlier investigation conducted in 2016 on a shallow water, single point moored FPSO employed host facility data from in-service field measurements before and after a single mooring line failure event. This paper presents how the same machine learning techniques were applied to a deep water, semi taut, spread moored system where there was no host facility data available, therefore requiring a calibrated hydrodynamic numerical model to be used as the basis for the training data set. The machine learning techniques applied to both real and synthetically generated data were successful in replicating the response of the original system, even with the latter subjected to different variations of artificial noise. Furthermore, utilizing a probability-based approach, it was demonstrated that replicating the response of the underlying system was a powerful technique for predicting changes in the mooring system.


2022 ◽  
pp. 209-232
Author(s):  
Xiang Li ◽  
Jingxi Liao ◽  
Tianchuan Gao

Machine learning is a broad field that contains multiple fields of discipline including mathematics, computer science, and data science. Some of the concepts, like deep neural networks, can be complicated and difficult to explain in several words. This chapter focuses on essential methods like classification from supervised learning, clustering, and dimensionality reduction that can be easily interpreted and explained in an acceptable way for beginners. In this chapter, data for Airbnb (Air Bed and Breakfast) listings in London are used as the source data to study the effect of each machine learning technique. By using the K-means clustering, principal component analysis (PCA), random forest, and other methods to help build classification models from the features, it is able to predict the classification results and provide some performance measurements to test the model.


Author(s):  
Omar Farooq ◽  
Parminder Singh

Introduction: The emergence of the concepts like Big Data, Data Science, Machine Learning (ML), and the Internet of Things (IoT) has added the potential of research in today's world. The continuous use of IoT devices, sensors, etc. that collect data continuously puts tremendous pressure on the existing IoT network. Materials and Methods: This resource-constrained IoT environment is flooded with data acquired from millions of IoT nodes deployed at the device level. The limited resources of the IoT Network have driven the researchers towards data Management. This paper focuses on data classification at the device level, edge/fog level, and cloud level using machine learning techniques. Results: The data coming from different devices is vast and is of variety. Therefore, it becomes essential to choose the right approach for classification and analysis. It will help optimize the data at the device edge/fog level to better the network's performance in the future. Conclusion: This paper presents data classification, machine learning approaches, and a proposed mathematical model for the IoT environment.


2018 ◽  
Vol 47 (6) ◽  
pp. 1081-1097 ◽  
Author(s):  
Jeremy Auerbach ◽  
Christopher Blackburn ◽  
Hayley Barton ◽  
Amanda Meng ◽  
Ellen Zegura

We estimate the cost and impact of a proposed anti-displacement program in the Westside of Atlanta (GA) with data science and machine learning techniques. This program intends to fully subsidize property tax increases for eligible residents of neighborhoods where there are two major urban renewal projects underway, a stadium and a multi-use trail. We first estimate household-level income eligibility for the program with data science and machine learning approaches applied to publicly available household-level data. We then forecast future property appreciation due to urban renewal projects using random forests with historic tax assessment data. Combining these projections with household-level eligibility, we estimate the costs of the program for different eligibility scenarios. We find that our household-level data and machine learning techniques result in fewer eligible homeowners but significantly larger program costs, due to higher property appreciation rates than the original analysis, which was based on census and city-level data. Our methods have limitations, namely incomplete data sets, the accuracy of representative income samples, the availability of characteristic training set data for the property tax appreciation model, and challenges in validating the model results. The eligibility estimates and property appreciation forecasts we generated were also incorporated into an interactive tool for residents to determine program eligibility and view their expected increases in home values. Community residents have been involved with this work and provided greater transparency, accountability, and impact of the proposed program. Data collected from residents can also correct and update the information, which would increase the accuracy of the program estimates and validate the modeling, leading to a novel application of community-driven data science.


Energies ◽  
2020 ◽  
Vol 13 (13) ◽  
pp. 3497 ◽  
Author(s):  
César Benavente-Peces ◽  
Nisrine Ibadah

Energy efficiency is a major concern to achieve sustainability in modern society. Smart cities sustainability depends on the availability of energy-efficient infrastructures and services. Buildings compose most of the city, and they are responsible for most of the energy consumption and emissions to the atmosphere (40%). Smart cities need smart buildings to achieve sustainability goals. Building’s thermal modeling is essential to face the energy efficiency race. In this paper, we show how ICT and data science technologies and techniques can be applied to evaluate the energy efficiency of buildings. In concrete, we apply machine learning techniques to classify buildings based on their energy efficiency. Particularly, our focus is on single-family buildings in residential areas. Along this paper, we demonstrate the capabilities of machine learning techniques to classify buildings depending on their energy efficiency. Moreover, we analyze and compare the performance of different classifiers. Furthermore, we introduce new parameters which have some impact on the buildings thermal modeling, especially those concerning the environment where the building is located. We also make an insight on ICT and remark the growing relevance in data acquisition and monitoring of relevant parameters by using wireless sensor networks. It is worthy to remark the need for an appropriate and reliable dataset to achieve the best results. Moreover, we demonstrate that reliable classification is feasible with a few featured parameters.


2021 ◽  
Vol 3 ◽  
Author(s):  
Ahmed Al-Hindawi ◽  
Ahmed Abdulaal ◽  
Timothy M. Rawson ◽  
Saleh A. Alqahtani ◽  
Nabeela Mughal ◽  
...  

The SARS-CoV-2 virus, which causes the COVID-19 pandemic, has had an unprecedented impact on healthcare requiring multidisciplinary innovation and novel thinking to minimize impact and improve outcomes. Wide-ranging disciplines have collaborated including diverse clinicians (radiology, microbiology, and critical care), who are working increasingly closely with data-science. This has been leveraged through the democratization of data-science with the increasing availability of easy to access open datasets, tutorials, programming languages, and hardware which makes it significantly easier to create mathematical models. To address the COVID-19 pandemic, such data-science has enabled modeling of the impact of the virus on the population and individuals for diagnostic, prognostic, and epidemiological ends. This has led to two large systematic reviews on this topic that have highlighted the two different ways in which this feat has been attempted: one using classical statistics and the other using more novel machine learning techniques. In this review, we debate the relative strengths and weaknesses of each method toward the specific task of predicting COVID-19 outcomes.


2022 ◽  
Vol 25 (1) ◽  
pp. 45-57
Author(s):  
Luis Fernández-Revuelta Pérez ◽  
Álvaro Romero Blasco

Cost estimation may become increasingly difficult, slow, and resource-consuming when it cannot be performed analytically. If traditional cost estimation techniques are usable at all under those circumstances, they have important limitations. This article analyses the potential applications of data science to management accounting, through the case of a cost estimation task posted on Kaggle, a Google data science and machine learning website. When extensive data exist, machine learning techniques can overcome some of those limitations. Applying machine learning to the data reveals non-obvious patterns and relationships that can be used to predict costs of new assemblies with acceptable accuracy. This article discusses the advantages and limitations of this approach and its potential to transform cost estimation, and more widely management accounting. The multinational company Caterpillar posted a contest on Kaggle to estimate the price that a supplier would quote for manufacturing a number of industrial assemblies, given historical quotes for similar assemblies. Hitherto, this problem would have required reverse-engineering the supplier’s accounting structure to establish the cost structure of each assembly, identifying non-obvious relationships among variables. This complex and tedious task is usually performed by human experts, adding subjectivity to the process. La estimación de costes puede resultar cada vez más difícil, lenta y consumidora de recursos cuando no puede realizarse de forma analítica. Cuando las técnicas tradicionales de estimación de costes son utilizadas en esas circunstancias se presentan importantes limitaciones. Este artículo analiza las posibles aplicaciones de la ciencia de datos a la contabilidad de gestión, a través del caso de una tarea de estimación de costes publicada en Kaggle, un sitio web de ciencia de datos y aprendizaje automático de Google. Cuando existen muchos datos, las técnicas de aprendizaje automático pueden superar algunas de esas limitaciones. La aplicación del aprendizaje automático a los datos revela patrones y relaciones no evidentes que pueden utilizarse para predecir los costes de nuevos montajes con una precisión aceptable. En nuestra investigación se analizan las ventajas y limitaciones de este enfoque y su potencial para transformar la estimación de costes y, más ampliamente, la contabilidad de gestión. La multinacional Caterpillar publicó un concurso en Kaggle para estimar el precio que un proveedor ofrecería por la fabricación de una serie de conjuntos industriales, dados los presupuestos históricos de conjuntos similares. Hasta ahora, este problema habría requerido una ingeniería inversa de la estructura contable del proveedor para establecer la estructura de costes de cada ensamblaje, identificando relaciones no obvias entre las variables. Esta compleja y tediosa tarea suele ser realizada por expertos humanos, lo que añade subjetividad al proceso.


Sign in / Sign up

Export Citation Format

Share Document