Green IoT primarily focuses on increasing IoT sustainability by reducing the large amount of energy required by IoT devices. Whether increasing the efficiency of these devices or conserving energy, predictive analytics is the cornerstone for creating value and insight from large IoT data. This work aims at providing predictive models driven by data collected from various sensors to model the energy usage of appliances in an IoT-based smart home environment. Specifically, we address the prediction problem from two perspectives. Firstly, an overall energy consumption model is developed using both linear and non-linear regression techniques to identify the most relevant features in predicting the energy consumption of appliances. The performances of the proposed models are assessed using a publicly available dataset comprising historical measurements from various humidity and temperature sensors, along with total energy consumption data from appliances in an IoT-based smart home setup. The prediction results comparison show that LSTM regression outperforms other linear and ensemble regression models by showing high variability (
) with the training (96.2%) and test (96.1%) data for selected features. Secondly, we develop a multi-step time-series model using the
auto regressive integrated moving average (ARIMA)
technique to effectively forecast future energy consumption based on past energy usage history. Overall, the proposed predictive models will enable consumers to minimize the energy usage of home appliances and the energy providers to better plan and forecast future energy demand to facilitate green urban development.
Predictive models are currently used for early intervention to help identify patients with a high risk of adverse events. Assessing the accuracy of such models is a crucial part of the development process. To measure the predictive performance of a scoring model, quantitative indices such as the K-S statistic and C-statistic are used. This paper discusses the relationship between Gini coefficients and event prevalence rates. The main contribution of the paper is the theoretical proof of the relationship between the Gini coefficient and event prevalence rate.
BACKGROUND As databases grow larger, it becomes harder to fully control their collection, and they frequently come with missing values: incomplete observations. These large databases are well suited to train machine-learning models, for instance for forecasting or to extract biomarkers in biomedical settings. Such predictive approaches can use discriminative --rather than generative-- modeling, and thus open the door to new missing-values strategies. Yet existing empirical evaluations of strategies to handle missing values have focused on inferential statistics. RESULTS Here we conduct a systematic benchmark of missing-values strategies in predictive models with a focus on large health databases: four electronic health record datasets, a population brain imaging one, a health survey and two intensive care ones. Using gradient-boosted trees, we compare native support for missing values with simple and state-of-the-art imputation prior to learning. We investigate prediction accuracy and computational time. For prediction after imputation, we find that adding an indicator to express which values have been imputed is important, suggesting that the data are missing not at random. Elaborate missing values imputation can improve prediction compared to simple strategies but requires longer computational time on large data. Learning trees that model missing values --with missing incorporated attribute-- leads to robust, fast, and well-performing predictive modeling. CONCLUSIONS Native support for missing values in supervised machine learning predicts better than state-of-the-art imputation with much less computational cost. When using imputation, it is important to add indicator columns expressing which values have been imputed.
1. The prediction of species interactions is gaining momentum as a way to circumvent limitations in data volume. Yet, ecological networks are challenging to predict because they are typically small and sparse. Dealing with extreme class imbalance is a challenge for most binary classifiers, and there are currently no guidelines as to how predictive models can be trained for this specific problem.2. Using simple mathematical arguments and numerical experiments in which a variety of classifiers (for supervised learning) are trained on simulated networks, we develop a series of guidelines related to the choice of measures to use for model selection, and the degree of unbiasing to apply to the training dataset.3. Neither classifier accuracy nor the ROC-AUC are informative measures for the performance of interaction prediction. PR-AUC is a fairer assessment of performance. In some cases, even standard measures can lead to selecting a more biased classifier because the effect of connectance is strong. The amount of correction to apply to the training dataset depends on network connectance, on the measure to be optimized, and only weakly on the classifier.4. These results reveal that training machines to predict networks is a challenging task, and that in virtually all cases, the composition of the training set needs to be experimented on before performing the actual training. We discuss these consequences in the context of the low volume of data.
La investigación se realizó en la oriental provincia de Granma, área que destaca por sus resultados científicos relacionados con el comportamiento productivo de la especie bubalina en Cuba. El objetivo fue estimar el peso vivo a través de medidas corporales en bucerros desde el nacimiento hasta los ocho meses de edad. Se registraron datos de 1 302 animales, hembras y machos nacidos de 120 búfalas de la raza Buffalypso en el período 2008 a 2015, las medidas corporales: alto de la cruz (AC), longitud del cuerpo (LC), perímetro torácico (PT), perímetro abdominal (PA), ancho de la pelvis (AP), largo de la pelvis (LP) y ancho del tórax (AT) fueron medidas con cinta métrica en cm. Mientras el peso vivo (PV) fue determinado con plataforma digital, todas las maniobras se hicieron cada 30 días. Los modelos predictivos utilizados fueron: Quetélet, PV = (PT)2 (longitud de cuerpo) (87,5); Crevat, PV= (PT) (Longitud del cuerpo) (PA) (80) y Correa, PV= (PT)2(Longitud del cuerpo)/ 300. Los resultados comparativos por sexo arrojaron diferencias altamente significativas (P<0,001) para el PA y diferencias significativas (P<0,05) para el PT, PV, LP y LC a favor de los machos. EL modelo que mostró mejor ajuste (r2=0,96, P>0,001) combinó tres variables (PT, PA y LC), aunque el perímetro torácico solo mostró parámetros elevados (r2=0,94, P>0,001). Finalmente se concluye que las elevadas correlaciones entre las medidas corporales y el peso vivo, demuestran que las variables estudiadas pueden por si solas o combinadas explicar el comportamiento del peso vivo, pero la ecuación de predicción del PV (kg) a través de PT (cm) propuesta atribuye mayores ventajas para la práctica del pesaje.
The research was carried out in the eastern province of Granma, an area that stands out for its scientific results related to the productive behavior of the buffalo species in Cuba. The objective was to estimate live weight through body measurements in calves from birth to eight months of age. Data were recorded on 1 302 animals, females and males born to 120 buffaloes of the Buffalypso breed in the period 2008 to 2015, body measurements: height at the withers (AC), body length (LC), thoracic perimeter (PT) , abdominal perimeter (PA), pelvic width (AP), pelvic length (LP) and chest width (AT) were measured with a tape measure in cm. While the live weight (PV) was determined with a digital platform, all the maneuvers were done every 30 days. The predictive models used were: Quetélet, PV = (PT) 2 (body length) (87.5); Crevat, PV = (PT) (Body length) (PA) (80) and Correa, PV = (PT) 2 (Body length) / 300. The comparative results by sex yielded highly significant differences (P <0.001) for the PA and significant differences (P <0.05) for the PT, PV, LP and LC in favor of males. The model that showed the best fit (r2 = 0.96, P> 0.001) combined three variables (PT, PA and LC), although the thoracic perimeter only showed elevated parameters (r2 = 0.94, P> 0.001). Finally, it is concluded that the high correlations between body measurements and live weight show that the variables studied can, alone or in combination, explain the behavior of live weight, but the prediction equation of LW (kg) through PT (cm ) proposal attributes greater advantages to the practice of weighing.
Computer vision-based automation has become popular in detecting and monitoring plants’ nutrient deficiencies in recent times. The predictive model developed by various researchers were so designed that it can be used in an embedded system, keeping in mind the availability of computational resources. Nevertheless, the enormous popularity of smart phone technology has opened the door of opportunity to common farmers to have access to high computing resources. To facilitate smart phone users, this study proposes a framework of hosting high end systems in the cloud where processing can be done, and farmers can interact with the cloud-based system. With the availability of high computational power, many studies have been focused on applying convolutional Neural Networks-based Deep Learning (CNN-based DL) architectures, including Transfer learning (TL) models on agricultural research. Ensembling of various TL architectures has the potential to improve the performance of predictive models by a great extent. In this work, six TL architectures viz. InceptionV3, ResNet152V2, Xception, DenseNet201, InceptionResNetV2, and VGG19 are considered, and their various ensemble models are used to carry out the task of deficiency diagnosis in rice plants. Two publicly available datasets from Mendeley and Kaggle are used in this study. The ensemble-based architecture enhanced the highest classification accuracy to 100% from 99.17% in the Mendeley dataset, while for the Kaggle dataset; it was enhanced to 92% from 90%.