scholarly journals 24-hour cloud cover calculation using ground-based imager with machine learning

2021 ◽  
Author(s):  
Bu-Yo Kim ◽  
Joo Wan Cha ◽  
Ki-Ho Chang

Abstract. In this study, image data features and machine learning methods were used to calculate 24-h continuous cloud cover from image data obtained by a camera-based imager on the ground. The image data features were the time (Julian day and hour), solar zenith angle, and statistical characteristics of the red-blue ratio, blue–red difference, and luminance. These features were determined from the red, green, and blue brightness of images subjected to a pre-processing process involving masking removal and distortion correction. The collected image data were divided into training, validation, and test sets and were used to optimize and evaluate the accuracy of each machine learning method. The cloud cover calculated by each machine learning method was verified with human-eye observation data from a manned observatory. Supervised machine learning models suitable for nowcasting, namely, support vector regression, random forest, gradient boosting machine, k-nearest neighbor, artificial neural network, and multiple linear regression methods, were employed and their results were compared. The best learning results were obtained by the support vector regression model, which had an accuracy, recall, and precision of 0.94, 0.70, and 0.76, respectively. Further, bias, root mean square error, and correlation coefficient values of 0.04 tenth, 1.45 tenths, and 0.93, respectively, were obtained for the cloud cover calculated using the test set. When the difference between the calculated and observed cloud cover was allowed to range between 0, 1, and 2 tenths, high agreement of approximately 42 %, 79 %, and 91 %, respectively, were obtained. The proposed system involving a ground-based imager and machine learning methods is expected to be suitable for application as an automated system to replace human-eye observations.

2021 ◽  
Vol 14 (10) ◽  
pp. 6695-6710
Author(s):  
Bu-Yo Kim ◽  
Joo Wan Cha ◽  
Ki-Ho Chang

Abstract. In this study, image data features and machine learning methods were used to calculate 24 h continuous cloud cover from image data obtained by a camera-based imager on the ground. The image data features were the time (Julian day and hour), solar zenith angle, and statistical characteristics of the red–blue ratio, blue–red difference, and luminance. These features were determined from the red, green, and blue brightness of images subjected to a pre-processing process involving masking removal and distortion correction. The collected image data were divided into training, validation, and test sets and were used to optimize and evaluate the accuracy of each machine learning method. The cloud cover calculated by each machine learning method was verified with human-eye observation data from a manned observatory. Supervised machine learning models suitable for nowcasting, namely, support vector regression, random forest, gradient boosting machine, k-nearest neighbor, artificial neural network, and multiple linear regression methods, were employed and their results were compared. The best learning results were obtained by the support vector regression model, which had an accuracy, recall, and precision of 0.94, 0.70, and 0.76, respectively. Further, bias, root mean square error, and correlation coefficient values of 0.04 tenths, 1.45 tenths, and 0.93, respectively, were obtained for the cloud cover calculated using the test set. When the difference between the calculated and observed cloud cover was allowed to range between 0, 1, and 2 tenths, high agreements of approximately 42 %, 79 %, and 91 %, respectively, were obtained. The proposed system involving a ground-based imager and machine learning methods is expected to be suitable for application as an automated system to replace human-eye observations.


2013 ◽  
Vol 2013 ◽  
pp. 1-13 ◽  
Author(s):  
Ching Lee Koo ◽  
Mei Jing Liew ◽  
Mohd Saberi Mohamad ◽  
Abdul Hakim Mohamed Salleh

Recently, the greatest statistical computational challenge in genetic epidemiology is to identify and characterize the genes that interact with other genes and environment factors that bring the effect on complex multifactorial disease. These gene-gene interactions are also denoted as epitasis in which this phenomenon cannot be solved by traditional statistical method due to the high dimensionality of the data and the occurrence of multiple polymorphism. Hence, there are several machine learning methods to solve such problems by identifying such susceptibility gene which are neural networks (NNs), support vector machine (SVM), and random forests (RFs) in such common and multifactorial disease. This paper gives an overview on machine learning methods, describing the methodology of each machine learning methods and its application in detecting gene-gene and gene-environment interactions. Lastly, this paper discussed each machine learning method and presents the strengths and weaknesses of each machine learning method in detecting gene-gene interactions in complex human disease.


2017 ◽  
Author(s):  
Fadhl M Alakwaa ◽  
Kumardeep Chaudhary ◽  
Lana X Garmire

ABSTRACTMetabolomics holds the promise as a new technology to diagnose highly heterogeneous diseases. Conventionally, metabolomics data analysis for diagnosis is done using various statistical and machine learning based classification methods. However, it remains unknown if deep neural network, a class of increasingly popular machine learning methods, is suitable to classify metabolomics data. Here we use a cohort of 271 breast cancer tissues, 204 positive estrogen receptor (ER+) and 67 negative estrogen receptor (ER-), to test the accuracies of autoencoder, a deep learning (DL) framework, as well as six widely used machine learning models, namely Random Forest (RF), Support Vector Machines (SVM), Recursive Partitioning and Regression Trees (RPART), Linear Discriminant Analysis (LDA), Prediction Analysis for Microarrays (PAM), and Generalized Boosted Models (GBM). DL framework has the highest area under the curve (AUC) of 0.93 in classifying ER+/ER-patients, compared to the other six machine learning algorithms. Furthermore, the biological interpretation of the first hidden layer reveals eight commonly enriched significant metabolomics pathways (adjusted P-value<0.05) that cannot be discovered by other machine learning methods. Among them, protein digestion & absorption and ATP-binding cassette (ABC) transporters pathways are also confirmed in integrated analysis between metabolomics and gene expression data in these samples. In summary, deep learning method shows advantages for metabolomics based breast cancer ER status classification, with both the highest prediction accurcy (AUC=0.93) and better revelation of disease biology. We encourage the adoption of autoencoder based deep learning method in the metabolomics research community for classification.


2021 ◽  
Vol 10 (3) ◽  
pp. 39
Author(s):  
Nirmalya Thakur ◽  
Chia Y. Han

This paper makes four scientific contributions to the field of fall detection in the elderly to contribute to their assisted living in the future of internet of things (IoT)-based pervasive living environments, such as smart homes. First, it presents and discusses a comprehensive comparative study, where 19 different machine learning methods were used to develop fall detection systems, to deduce the optimal machine learning method for the development of such systems. This study was conducted on two different datasets, and the results show that out of all the machine learning methods, the k-NN classifier is best suited for the development of fall detection systems in terms of performance accuracy. Second, it presents a framework that overcomes the limitations of binary classifier-based fall detection systems by being able to detect falls and fall-like motions. Third, to increase the trust and reliance on fall detection systems, it introduces a novel methodology based on the usage of k-folds cross-validation and the AdaBoost algorithm that improves the performance accuracy of the k-NN classifier-based fall detection system to the extent that it outperforms all similar works in this field. This approach achieved performance accuracies of 99.87% and 99.66%, respectively, when evaluated on the two datasets. Finally, the proposed approach is also highly accurate in detecting the activity of standing up from a lying position to infer whether a fall was followed by a long lie, which can cause minor to major health-related concerns. The above contributions address multiple research challenges in the field of fall detection, that we identified after conducting a comprehensive review of related works, which is also presented in this paper.


2021 ◽  
Vol 14 (1) ◽  
pp. 30
Author(s):  
Boyi Li ◽  
Adu Gong ◽  
Tingting Zeng ◽  
Wenxuan Bao ◽  
Can Xu ◽  
...  

The evaluation of mortality in earthquake-stricken areas is vital for the emergency response during rescue operations. Hence, an effective and universal approach for accurately predicting the number of casualties due to an earthquake is needed. To obtain a precise casualty prediction method that can be applied to regions with different geographical environments, a spatial division method based on regional differences and a zoning casualty prediction method based on support vector regression (SVR) are proposed in this study. This study comprises three parts: (1) evaluating the importance of influential features on seismic fatality based on random forest to select indicators for the prediction model; (2) dividing the study area into different grades of risk zones with a strata fault line dataset and WorldPop population dataset; and (3) developing a zoning support vector regression model (Z-SVR) with optimal parameters that is suitable for different risk areas. We selected 30 historical earthquakes that occurred in China’s mainland from 1950 to 2017 to examine the prediction performance of Z-SVR and compared its performance with those of other widely used machine learning methods. The results show that Z-SVR outperformed the other machine learning methods and can further enhance the accuracy of casualty prediction.


2019 ◽  
Vol 14 (3) ◽  
pp. 234-240 ◽  
Author(s):  
Wuritu Yang ◽  
Xiao-Juan Zhu ◽  
Jian Huang ◽  
Hui Ding ◽  
Hao Lin

Background:The location of proteins in a cell can provide important clues to their functions in various biological processes. Thus, the application of machine learning method in the prediction of protein subcellular localization has become a hotspot in bioinformatics. As one of key organelles, the Golgi apparatus is in charge of protein storage, package, and distribution.Objective:The identification of protein location in Golgi apparatus will provide in-depth insights into their functions. Thus, the machine learning-based method of predicting protein location in Golgi apparatus has been extensively explored. The development of protein sub-Golgi apparatus localization prediction should be reviewed for providing a whole background for the fields.Method:The benchmark dataset, feature extraction, machine learning method and published results were summarized.Results:We briefly introduced the recent progresses in protein sub-Golgi apparatus localization prediction using machine learning methods and discussed their advantages and disadvantages.Conclusion:We pointed out the perspective of machine learning methods in protein sub-Golgi localization prediction.


2021 ◽  
Vol 2021 ◽  
pp. 1-8
Author(s):  
Hao Wang

Currently, many methods that could estimate the effects of conditions on a given biological target require either strong modelling assumptions or separate screens. Traditionally, many conditions and targets, without doing all possible experiments, could be achieved by driven experimentation or several mathematical methods, especially conversational machine learning methods. However, these methods still could not avoid and replace manual labels completely. This paper presented a meta-active machine learning method to resolve this problem. This project has used nine traditional machine learning methods to compare their accuracy and running time. In addition, this paper analyzes the meta-active machine learning method (MAML) compared with a classical screening method and progressive experiments. The obtained results show that applying this method yields the best experimental results on the current dataset.


2019 ◽  
Vol 19 (25) ◽  
pp. 2301-2317 ◽  
Author(s):  
Ruirui Liang ◽  
Jiayang Xie ◽  
Chi Zhang ◽  
Mengying Zhang ◽  
Hai Huang ◽  
...  

In recent years, the successful implementation of human genome project has made people realize that genetic, environmental and lifestyle factors should be combined together to study cancer due to the complexity and various forms of the disease. The increasing availability and growth rate of ‘big data’ derived from various omics, opens a new window for study and therapy of cancer. In this paper, we will introduce the application of machine learning methods in handling cancer big data including the use of artificial neural networks, support vector machines, ensemble learning and naïve Bayes classifiers.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Jing Xu ◽  
Xiangdong Liu ◽  
Qiming Dai

Abstract Background Hypertrophic cardiomyopathy (HCM) represents one of the most common inherited heart diseases. To identify key molecules involved in the development of HCM, gene expression patterns of the heart tissue samples in HCM patients from multiple microarray and RNA-seq platforms were investigated. Methods The significant genes were obtained through the intersection of two gene sets, corresponding to the identified differentially expressed genes (DEGs) within the microarray data and within the RNA-Seq data. Those genes were further ranked using minimum-Redundancy Maximum-Relevance feature selection algorithm. Moreover, the genes were assessed by three different machine learning methods for classification, including support vector machines, random forest and k-Nearest Neighbor. Results Outstanding results were achieved by taking exclusively the top eight genes of the ranking into consideration. Since the eight genes were identified as candidate HCM hallmark genes, the interactions between them and known HCM disease genes were explored through the protein–protein interaction (PPI) network. Most candidate HCM hallmark genes were found to have direct or indirect interactions with known HCM diseases genes in the PPI network, particularly the hub genes JAK2 and GADD45A. Conclusions This study highlights the transcriptomic data integration, in combination with machine learning methods, in providing insight into the key hallmark genes in the genetic etiology of HCM.


Energies ◽  
2021 ◽  
Vol 14 (15) ◽  
pp. 4595
Author(s):  
Parisa Asadi ◽  
Lauren E. Beckingham

X-ray CT imaging provides a 3D view of a sample and is a powerful tool for investigating the internal features of porous rock. Reliable phase segmentation in these images is highly necessary but, like any other digital rock imaging technique, is time-consuming, labor-intensive, and subjective. Combining 3D X-ray CT imaging with machine learning methods that can simultaneously consider several extracted features in addition to color attenuation, is a promising and powerful method for reliable phase segmentation. Machine learning-based phase segmentation of X-ray CT images enables faster data collection and interpretation than traditional methods. This study investigates the performance of several filtering techniques with three machine learning methods and a deep learning method to assess the potential for reliable feature extraction and pixel-level phase segmentation of X-ray CT images. Features were first extracted from images using well-known filters and from the second convolutional layer of the pre-trained VGG16 architecture. Then, K-means clustering, Random Forest, and Feed Forward Artificial Neural Network methods, as well as the modified U-Net model, were applied to the extracted input features. The models’ performances were then compared and contrasted to determine the influence of the machine learning method and input features on reliable phase segmentation. The results showed considering more dimensionality has promising results and all classification algorithms result in high accuracy ranging from 0.87 to 0.94. Feature-based Random Forest demonstrated the best performance among the machine learning models, with an accuracy of 0.88 for Mancos and 0.94 for Marcellus. The U-Net model with the linear combination of focal and dice loss also performed well with an accuracy of 0.91 and 0.93 for Mancos and Marcellus, respectively. In general, considering more features provided promising and reliable segmentation results that are valuable for analyzing the composition of dense samples, such as shales, which are significant unconventional reservoirs in oil recovery.


Sign in / Sign up

Export Citation Format

Share Document