H-GPR: A Hybrid Strategy for Large-Scale Gaussian Process Regression

Author(s):  
Naiqi Li ◽  
Yinghua Gao ◽  
Wenjie Li ◽  
Yong Jiang ◽  
Shu-Tao Xia
2019 ◽  
Vol 31 (1) ◽  
pp. 41-53
Author(s):  
Hashem Ahmadin ◽  
Karim Zare ◽  
Majid Monajjemi ◽  
Ali Shamel

Today, using thermal and chemical reduction and solubility, graphene oxide is produced in large scale. Since there are various methods for producing graphene, each of which allocates properties to the produced graphene, the purpose of this research is to investigate graphite delamination using anionic surfactants and produce graphene by means of computational methods. The research method was applied in order to perform molecular dynamics analysis, first, (the minimum) force that each atom imposes to other atoms was calculated. This is the total gradient of the system’s energy according to the coordinates of the related atom. A Bayesian method was used for dynamic modeling, which, on average, uses dynamic parameters instead of their estimates. The Gaussian Process Dynamic Model (GPDM) was completely defined by a set of low-level data representations and was observed by both dynamics and modeling of GP regression (Gaussian process regression). Then, using the Gaussian software, along with empirical results or just using this software, the molecular state and reactions and their mechanisms were simulated. The results indicated that the presence of benzene, ether and carbo xyl groups in the optimal structure facilitates the entry of surfactants into the sheets and that the agent to start the separation of the graphene sheets adhered to each other by comparing the results of this study between the two surfactants, it was found that the gap to change by separation layers between the graphene plates is different for two surfactants. Besides, the difference in the polarity of the surfactants resulted in the final polarization of the surfactant and graphene system. Therefore, the difference in the polarity causes the difference in the solubility.


2021 ◽  
Author(s):  
Yanghui Zhao ◽  
Bryan Riel

<p>Seamounts are isolated, underwater volcanoes with more than 100 m in relief. This kind of volcanism arises from the lithosphere or asthenosphere through fractional melt and is a direct manifestation of the tectonic-magmatic activity of the interior of the earth. While previous studies have quantified the global distribution of seamounts by their physical properties (e.g., height, semimajor axis, angle, etc.), these studies usually (1) assume an elliptical cone to model seamount shape, and (2) neglect the sediment coverage on the seamount, which results in significant uncertainties when comparing properties of seamounts near the continents covered with thick sediments to those in the open ocean covered with thin sediments.</p><p>We apply a large-scale Gaussian Process regression to recover the seamount topography covered by sediments for an accurate distribution of volcanism in the South China Sea basin (with an average thickness of 1.5 km sediments) and the entire Pacific Ocean (with < 300 m thick sediments). Specifically, we first use Tophat filtering to isolate short-spatial-wavelength seamount topography above long-wavelength seafloor. Subsequently, we apply Gaussian Process regression to learn the seamount structure above the seafloor in order to extrapolate the structure beneath the sediment. Lastly, we compute the seamount volume above the sedimentary basement (i.e., top boundary of the oceanic crust) and compare it to the volume above the seafloor. Our results show that for the South China Sea, there is a significant increase in estimated seamount volume above the basement as compared to above the seafloor. For the Pacific Ocean, due to the thin sediment coverage, we observe negligible differences between the two volume estimates. Thus, analysis of seamount properties in marginal basins in the West Pacific with thick sediment coverage can lead to significant underestimation of volcanism intensity if sub-seafloor topography is not accounted for. For these marginal basins, without massive hotspots or apparent evidence of mantle plumes, normal plate tectonic processes are likely responsible for the intensive oceanic volcanism.</p>


2019 ◽  
Vol 12 (9) ◽  
pp. 5161-5181 ◽  
Author(s):  
Tongshu Zheng ◽  
Michael H. Bergin ◽  
Ronak Sutaria ◽  
Sachchida N. Tripathi ◽  
Robert Caldow ◽  
...  

Abstract. Wireless low-cost particulate matter sensor networks (WLPMSNs) are transforming air quality monitoring by providing particulate matter (PM) information at finer spatial and temporal resolutions. However, large-scale WLPMSN calibration and maintenance remain a challenge. The manual labor involved in initial calibration by collocation and routine recalibration is intensive. The transferability of the calibration models determined from initial collocation to new deployment sites is questionable, as calibration factors typically vary with the urban heterogeneity of operating conditions and aerosol optical properties. Furthermore, the stability of low-cost sensors can drift or degrade over time. This study presents a simultaneous Gaussian process regression (GPR) and simple linear regression pipeline to calibrate and monitor dense WLPMSNs on the fly by leveraging all available reference monitors across an area without resorting to pre-deployment collocation calibration. We evaluated our method for Delhi, where the PM2.5 measurements of all 22 regulatory reference and 10 low-cost nodes were available for 59 d from 1 January to 31 March 2018 (PM2.5 averaged 138±31 µg m−3 among 22 reference stations), using a leave-one-out cross-validation (CV) over the 22 reference nodes. We showed that our approach can achieve an overall 30 % prediction error (RMSE: 33 µg m−3) at a 24 h scale, and it is robust as it is underscored by the small variability in the GPR model parameters and in the model-produced calibration factors for the low-cost nodes among the 22-fold CV. Of the 22 reference stations, high-quality predictions were observed for those stations whose PM2.5 means were close to the Delhi-wide mean (i.e., 138±31 µg m−3), and relatively poor predictions were observed for those nodes whose means differed substantially from the Delhi-wide mean (particularly on the lower end). We also observed washed-out local variability in PM2.5 across the 10 low-cost sites after calibration using our approach, which stands in marked contrast to the true wide variability across the reference sites. These observations revealed that our proposed technique (and more generally the geostatistical technique) requires high spatial homogeneity in the pollutant concentrations to be fully effective. We further demonstrated that our algorithm performance is insensitive to training window size as the mean prediction error rate and the standard error of the mean (SEM) for the 22 reference stations remained consistent at ∼30 % and ∼3 %–4 %, respectively, when an increment of 2 d of data was included in the model training. The markedly low requirement of our algorithm for training data enables the models to always be nearly the most updated in the field, thus realizing the algorithm's full potential for dynamically surveilling large-scale WLPMSNs by detecting malfunctioning low-cost nodes and tracking the drift with little latency. Our algorithm presented similarly stable 26 %–34 % mean prediction errors and ∼3 %–7 % SEMs over the sampling period when pre-trained on the current week's data and predicting 1 week ahead, and therefore it is suitable for online calibration. Simulations conducted using our algorithm suggest that in addition to dynamic calibration, the algorithm can also be adapted for automated monitoring of large-scale WLPMSNs. In these simulations, the algorithm was able to differentiate malfunctioning low-cost nodes (due to either hardware failure or under the heavy influence of local sources) within a network by identifying aberrant model-generated calibration factors (i.e., slopes close to zero and intercepts close to the Delhi-wide mean of true PM2.5). The algorithm was also able to track the drift of low-cost nodes accurately within 4 % error for all the simulation scenarios. The simulation results showed that ∼20 reference stations are optimum for our solution in Delhi and confirmed that low-cost nodes can extend the spatial precision of a network by decreasing the extent of pure interpolation among only reference stations. Our solution has substantial implications in reducing the amount of manual labor for the calibration and surveillance of extensive WLPMSNs, improving the spatial comprehensiveness of PM evaluation, and enhancing the accuracy of WLPMSNs.


2019 ◽  
Vol 165 ◽  
pp. 208-218 ◽  
Author(s):  
Bingshui Da ◽  
Yew-Soon Ong ◽  
Abhishek Gupta ◽  
Liang Feng ◽  
Haitao Liu

2019 ◽  
Author(s):  
Tongshu Zheng ◽  
Michael H. Bergin ◽  
Ronak Sutaria ◽  
Sachchida N. Tripathi ◽  
Robert Caldow ◽  
...  

Abstract. Wireless low-cost particulate matter sensor networks (WLPMSNs) are transforming air quality monitoring by providing PM information at finer spatial and temporal resolutions; however, large-scale WLPMSN calibration and maintenance remain a challenge because the manual labor involved in initial calibration by collocation and routine recalibration is intensive, the transferability of the calibration models determined from initial collocation to new deployment sites is questionable as calibration factors typically vary with urban heterogeneity of operating conditions and aerosol optical properties, and the stability of low-cost sensors can develop drift or degrade over time. This study presents a simultaneous Gaussian Process regression (GPR) and simple linear regression pipeline to calibrate and monitor dense WLPMSNs on the fly by leveraging all available reference monitors across an area without resorting to pre-deployment collocation calibration. We evaluated our method for Delhi where the PM2.5 measurements of all 22 regulatory reference and 10 low-cost nodes were available in 59 valid days from 1 January 2018 to 31 March 2018 (PM2.5 averaged 138 ± 31 μg m−3 among 22 reference stations) using a leave-one-out cross-validation (CV) over the 22 reference nodes. We showed that our approach can achieve an overall 30 % prediction error (RMSE: 33 μg m−3) at a 24 h scale and is robust as underscored by the small variability in the GPR model parameters and in the model-produced calibration factors for the low-cost nodes among the 22-fold CV. We revealed that the accuracy of our calibrations depends on the degree of homogeneity of PM concentrations, and decreases with increasing local source contributions. As by-products of dynamic calibration, our algorithm can be adapted for automated large-scale WLPMSN monitoring as simulations proved its capability of differentiating malfunctioning or singular low-cost nodes within a network via model-generated calibration factors with the aberrant nodes having slopes close to 0 and intercepts close to the global mean of true PM2.5 and of tracking the drift of low-cost nodes accurately within 4 % error for all the simulation scenarios. The simulation results showed that ~20 reference stations are optimum for our solution in Delhi and confirmed that low-cost nodes can extend the spatial precision of a network by decreasing the extent of pure interpolation among only reference stations. Our solution has substantial implications in reducing the amount of manual labor for the calibration and surveillance of extensive WLPMSNs, improving the spatial comprehensiveness of PM evaluation, and enhancing the accuracy of WLPMSNs.


2020 ◽  
Author(s):  
Marc Philipp Bahlke ◽  
Natnael Mogos ◽  
Jonny Proppe ◽  
Carmen Herrmann

Heisenberg exchange spin coupling between metal centers is essential for describing and understanding the electronic structure of many molecular catalysts, metalloenzymes, and molecular magnets for potential application in information technology. We explore the machine-learnability of exchange spin coupling, which has not been studied yet. We employ Gaussian process regression since it can potentially deal with small training sets (as likely associated with the rather complex molecular structures required for exploring spin coupling) and since it provides uncertainty estimates (“error bars”) along with predicted values. We compare a range of descriptors and kernels for 257 small dicopper complexes and find that a simple descriptor based on chemical intuition, consisting only of copper-bridge angles and copper-copper distances, clearly outperforms several more sophisticated descriptors when it comes to extrapolating towards larger experimentally relevant complexes. Exchange spin coupling is similarly easy to learn as the polarizability, while learning dipole moments is much harder. The strength of the sophisticated descriptors lies in their ability to linearize structure-property relationships, to the point that a simple linear ridge regression performs just as well as the kernel-based machine-learning model for our small dicopper data set. The superior extrapolation performance of the simple descriptor is unique to exchange spin coupling, reinforcing the crucial role of choosing a suitable descriptor, and highlighting the interesting question of the role of chemical intuition vs. systematic or automated selection of features for machine learning in chemistry and material science.


Sign in / Sign up

Export Citation Format

Share Document