scholarly journals Big data and machine learning for materials science

2021 ◽  
Vol 1 (1) ◽  
Author(s):  
Jose F. Rodrigues ◽  
Larisa Florea ◽  
Maria C. F. de Oliveira ◽  
Dermot Diamond ◽  
Osvaldo N. Oliveira

AbstractHerein, we review aspects of leading-edge research and innovation in materials science that exploit big data and machine learning (ML), two computer science concepts that combine to yield computational intelligence. ML can accelerate the solution of intricate chemical problems and even solve problems that otherwise would not be tractable. However, the potential benefits of ML come at the cost of big data production; that is, the algorithms demand large volumes of data of various natures and from different sources, from material properties to sensor data. In the survey, we propose a roadmap for future developments with emphasis on computer-aided discovery of new materials and analysis of chemical sensing compounds, both prominent research fields for ML in the context of materials science. In addition to providing an overview of recent advances, we elaborate upon the conceptual and practical limitations of big data and ML applied to materials science, outlining processes, discussing pitfalls, and reviewing cases of success and failure.

2015 ◽  
Vol 2015 ◽  
pp. 1-9 ◽  
Author(s):  
Sivaraman Ramachandramurthy ◽  
Srinivasan Subramaniam ◽  
Chandrasekeran Ramasamy

Big Data is the buzzword of the modern century. With the invasion of pervasive computing, we live in a data centric environment, where we always leave a track of data related to our day to day activities. Be it a visit to a shopping mall or hospital or surfing Internet, we create voluminous data related to credit card transactions, user details, location information, and so on. These trails of data simply define an individual and form the backbone for user-profiling. With the mobile phones and their easy access to online social networks on the go, sensor data such as geo-taggings and events and sentiments around them contribute to the already overwhelming data containers. With reductions in the cost of storage and computational devices and with increasing proliferation of Cloud, we never felt any constraints in storing or processing such data. Eventually we end up having several exabytes of data and analysing them for their usefulness has introduced new frontiers of research. Effective distillation of these data is the need of the hour to improve the veracity of the Big Data. This research targets the utilization of the Fuzzy Bayesian process model to improve the quality of information in Big Data.


2019 ◽  
Vol 8 (3) ◽  
pp. 3257-3263

Around 2.5 quintillion bytes of data have been created online: out of which most of the data has been generated in the last two years. To generate this huge amount of data from different sources, many devices are being utilized such as sensors to get the data about climate information, social networking sites, banking records, e-commerce records, etc. This data is known as Big Data. It mainly consists of three 3v’s volumes, velocity, and variety. Variety of data discusses about different formats of data originating from various data foundations. Hence, the big data variety’s issue is significant in explaining some genuine challenges. The semantic Web is utilized as an Integrator to join information from different sorts of data foundations like web services, social databases, and spreadsheets and so on and in various formats. The semantic Web is an all-encompassing type of the present web that gives simpler methods to look, reuse, join and offer the data. In this manner, it is along these lines seen as a combiner transversely over different things, information applications, and systems. This paper is an effort to uncover the nature of big data and a brief survey on the use of various semantic web-based methods and tools to add value to today’s big data. In addition, it discusses a case study on performing various machine learning functionalities on news articles and proposes a web-based framework for classification and integration of news articles big data using ontologies.


2018 ◽  
Vol 2018 ◽  
pp. 1-8 ◽  
Author(s):  
Rafał Kozik ◽  
Marek Pawlicki ◽  
Michał Choraś

The recent advancements of malevolent techniques have caused a situation where the traditional signature-based approach to cyberattack detection is rendered ineffective. Currently, new, improved, potent solutions incorporating Big Data technologies, effective distributed machine learning, and algorithms countering data imbalance problem are needed. Therefore, the major contribution of this paper is the proposal of the cost-sensitive distributed machine learning approach for cybersecurity. In particular, we proposed to use and implemented cost-sensitive distributed machine learning by means of distributed Extreme Learning Machines (ELM), distributed Random Forest, and Distributed Random Boosted-Trees to detect botnets. The system’s concept and architecture are based on the Big Data processing framework with data mining and machine learning techniques. In practical terms in this paper, as a use case, we consider the problem of botnet detection by means of analysing the data in form of NetFlows. The reported results are promising and show that the proposed system can be considered as a useful tool for the improvement of cybersecurity.


Author(s):  
E. B. Priyanka ◽  
S. Thangavel ◽  
D. Venkatesa Prabu

Big data and analytics may be new to some industries, but the oil and gas industry has long dealt with large quantities of data to make technical decisions. Oil producers can capture more detailed data in real-time at lower costs and from previously inaccessible areas, to improve oilfield and plant performance. Stream computing is a new way of analyzing high-frequency data for real-time complex-event-processing and scoring data against a physics-based or empirical model for predictive analytics, without having to store the data. Hadoop Map/Reduce and other NoSQL approaches are a new way of analyzing massive volumes of data used to support the reservoir, production, and facilities engineering. Hence, this chapter enumerates the routing organization of IoT with smart applications aggregating real-time oil pipeline sensor data as big data subjected to machine learning algorithms using the Hadoop platform.


2021 ◽  
Vol 37 (2) ◽  
pp. 341-365
Author(s):  
Mattia Zeni ◽  
Ivano Bison ◽  
Fernando Reis ◽  
Britta Gauckler ◽  
Fausto Giunchiglia

Abstract This article assesses the experience with i-Log at the European Big Data Hackathon 2019, a satellite event of the New Techniques and Technologies for Statistics (NTTS) conference, organised by Eurostat. i-Log is a system that enables capturing personal big data from smartphones’ internal sensors to be used for time use measurement. It allows the collection of heterogeneous types of data, enabling new possibilities for sociological urban field studies. Sensor data such as those related to the location or the movements of the user can be used to investigate and gain insights into the time diaries’ answers and assess their overall quality. The key idea is that the users’ answers are used to train machine-learning algorithms, allowing the system to learn from the user’s habits and to generate new time diaries’ answers. In turn, these new labels can be used to assess the quality of existing ones, or to fill the gaps when the user does not provide an answer. The aim of this paper is to introduce the pilot study, the i-Log system and the methodological evidence that emerged during the survey.


2020 ◽  
Author(s):  
Jin Soo Lim ◽  
Jonathan Vandermause ◽  
Matthijs A. van Spronsen ◽  
Albert Musaelian ◽  
Christopher R. O’Connor ◽  
...  

Restructuring of interface plays a crucial role in materials science and heterogeneous catalysis. Bimetallic systems, in particular, often adopt very different composition and morphology at surfaces compared to the bulk. For the first time, we reveal a detailed atomistic picture of the long-timescale restructuring of Pd deposited on Ag, using microscopy, spectroscopy, and novel simulation methods. Encapsulation of Pd by Ag always precedes layer-by-layer dissolution of Pd, resulting in significant Ag migration out of the surface and extensive vacancy pits. These metastable structures are of vital catalytic importance, as Ag-encapsulated Pd remains much more accessible to reactants than bulk-dissolved Pd. The underlying mechanisms are uncovered by performing fast and large-scale machine-learning molecular dynamics, followed by our newly developed method for complete characterization of atomic surface restructuring events. Our approach is broadly applicable to other multimetallic systems of interest and enables the previously impractical mechanistic investigation of restructuring dynamics.


Author(s):  
Leanne Findlay ◽  
Dafna Kohen

Affordability of child care is fundamental to parents’, in particular, women’s decision to work. However, information on the cost of care in Canada is limited. The purpose of the current study was to examine the feasibility of using linked survey and administrative data to compare and contrast parent-reported child care costs based on two different sources of data. The linked file brings together data from the 2011 General Social Survey (GSS) and the annual tax files (TIFF) for the corresponding year (2010). Descriptive analyses were conducted to examine the socio-demographic and employment characteristics of respondents who reported using child care, and child care costs were compared. In 2011, parents who reported currently paying for child care (GSS) spent almost $6700 per year ($7,500 for children age 5 and under). According to the tax files, individuals claimed just over $3900 per year ($4,700). Approximately one in four individuals who reported child care costs on the GSS did not report any amount on their tax file; about four in ten who claimed child care on the tax file did not report any cost on the survey. Multivariate analyses suggested that individuals with a lower education, lower income, with Indigenous identity, and who were self-employed were less likely to make a tax claim despite reporting child care expenses on the GSS. Further examination of child care costs by province and by type of care are necessary, as is research to determine the most accurate way to measure and report child care costs.


2020 ◽  
Author(s):  
Nalika Ulapane ◽  
Karthick Thiyagarajan ◽  
sarath kodagoda

<div>Classification has become a vital task in modern machine learning and Artificial Intelligence applications, including smart sensing. Numerous machine learning techniques are available to perform classification. Similarly, numerous practices, such as feature selection (i.e., selection of a subset of descriptor variables that optimally describe the output), are available to improve classifier performance. In this paper, we consider the case of a given supervised learning classification task that has to be performed making use of continuous-valued features. It is assumed that an optimal subset of features has already been selected. Therefore, no further feature reduction, or feature addition, is to be carried out. Then, we attempt to improve the classification performance by passing the given feature set through a transformation that produces a new feature set which we have named the “Binary Spectrum”. Via a case study example done on some Pulsed Eddy Current sensor data captured from an infrastructure monitoring task, we demonstrate how the classification accuracy of a Support Vector Machine (SVM) classifier increases through the use of this Binary Spectrum feature, indicating the feature transformation’s potential for broader usage.</div><div><br></div>


Sign in / Sign up

Export Citation Format

Share Document