scholarly journals Predictive and generative machine learning models for photonic crystals

Nanophotonics ◽  
2020 ◽  
Vol 9 (13) ◽  
pp. 4183-4192 ◽  
Author(s):  
Thomas Christensen ◽  
Charlotte Loh ◽  
Stjepan Picek ◽  
Domagoj Jakobović ◽  
Li Jing ◽  
...  

AbstractThe prediction and design of photonic features have traditionally been guided by theory-driven computational methods, spanning a wide range of direct solvers and optimization techniques. Motivated by enormous advances in the field of machine learning, there has recently been a growing interest in developing complementary data-driven methods for photonics. Here, we demonstrate several predictive and generative data-driven approaches for the characterization and inverse design of photonic crystals. Concretely, we built a data set of 20,000 two-dimensional photonic crystal unit cells and their associated band structures, enabling the training of supervised learning models. Using these data set, we demonstrate a high-accuracy convolutional neural network for band structure prediction, with orders-of-magnitude speedup compared to conventional theory-driven solvers. Separately, we demonstrate an approach to high-throughput inverse design of photonic crystals via generative adversarial networks, with the design goal of substantial transverse-magnetic band gaps. Our work highlights photonic crystals as a natural application domain and test bed for the development of data-driven tools in photonics and the natural sciences.

2021 ◽  
Vol 14 (3) ◽  
pp. 119
Author(s):  
Fabian Waldow ◽  
Matthias Schnaubelt ◽  
Christopher Krauss ◽  
Thomas Günter Fischer

In this paper, we demonstrate how a well-established machine learning-based statistical arbitrage strategy can be successfully transferred from equity to futures markets. First, we preprocess futures time series comprised of front months to render them suitable for our returns-based trading framework and compile a data set comprised of 60 futures covering nearly 10 trading years. Next, we train several machine learning models to predict whether the h-day-ahead return of each future out- or underperforms the corresponding cross-sectional median return. Finally, we enter long/short positions for the top/flop-k futures for a duration of h days and assess the financial performance of the resulting portfolio in an out-of-sample testing period. Thereby, we find the machine learning models to yield statistically significant out-of-sample break-even transaction costs of 6.3 bp—a clear challenge to the semi-strong form of market efficiency. Finally, we discuss sources of profitability and the robustness of our findings.


2021 ◽  
Vol 14 (1) ◽  
Author(s):  
Martine De Cock ◽  
Rafael Dowsley ◽  
Anderson C. A. Nascimento ◽  
Davis Railsback ◽  
Jianwei Shen ◽  
...  

Abstract Background In biomedical applications, valuable data is often split between owners who cannot openly share the data because of privacy regulations and concerns. Training machine learning models on the joint data without violating privacy is a major technology challenge that can be addressed by combining techniques from machine learning and cryptography. When collaboratively training machine learning models with the cryptographic technique named secure multi-party computation, the price paid for keeping the data of the owners private is an increase in computational cost and runtime. A careful choice of machine learning techniques, algorithmic and implementation optimizations are a necessity to enable practical secure machine learning over distributed data sets. Such optimizations can be tailored to the kind of data and Machine Learning problem at hand. Methods Our setup involves secure two-party computation protocols, along with a trusted initializer that distributes correlated randomness to the two computing parties. We use a gradient descent based algorithm for training a logistic regression like model with a clipped ReLu activation function, and we break down the algorithm into corresponding cryptographic protocols. Our main contributions are a new protocol for computing the activation function that requires neither secure comparison protocols nor Yao’s garbled circuits, and a series of cryptographic engineering optimizations to improve the performance. Results For our largest gene expression data set, we train a model that requires over 7 billion secure multiplications; the training completes in about 26.90 s in a local area network. The implementation in this work is a further optimized version of the implementation with which we won first place in Track 4 of the iDASH 2019 secure genome analysis competition. Conclusions In this paper, we present a secure logistic regression training protocol and its implementation, with a new subprotocol to securely compute the activation function. To the best of our knowledge, we present the fastest existing secure multi-party computation implementation for training logistic regression models on high dimensional genome data distributed across a local area network.


2021 ◽  
Vol 143 (3) ◽  
Author(s):  
Suhui Li ◽  
Huaxin Zhu ◽  
Min Zhu ◽  
Gang Zhao ◽  
Xiaofeng Wei

Abstract Conventional physics-based or experimental-based approaches for gas turbine combustion tuning are time consuming and cost intensive. Recent advances in data analytics provide an alternative method. In this paper, we present a cross-disciplinary study on the combustion tuning of an F-class gas turbine that combines machine learning with physics understanding. An artificial-neural-network-based (ANN) model is developed to predict the combustion performance (outputs), including NOx emissions, combustion dynamics, combustor vibrational acceleration, and turbine exhaust temperature. The inputs of the ANN model are identified by analyzing the key operating variables that impact the combustion performance, such as the pilot and the premixed fuel flow, and the inlet guide vane angle. The ANN model is trained by field data from an F-class gas turbine power plant. The trained model is able to describe the combustion performance at an acceptable accuracy in a wide range of operating conditions. In combination with the genetic algorithm, the model is applied to optimize the combustion performance of the gas turbine. Results demonstrate that the data-driven method offers a promising alternative for combustion tuning at a low cost and fast turn-around.


Author(s):  
Xiaolong Guo ◽  
Yugang Yu ◽  
Gad Allon ◽  
Meiyan Wang ◽  
Zhentai Zhang

To support the 2021 Manufacturing & Service Operations Management (MSOM) Data-Driven Research Challenge, RiRiShun Logistics (a Haier group subsidiary focusing on logistics service for home appliances) provides MSOM members with logistics operational-level data for data-driven research. This paper provides a detailed description of the data associated with over 14 million orders from 149 clients (the consigners) associated with 4.2 million end consumers (the recipients and end users of the appliances) in China, involving 18,000 stock keeping units operated at 103 warehouses. Researchers are welcomed to develop econometric models, data-driven optimization techniques, analytical models, and algorithm designs by using this data set to address questions suggested by company managers.


2022 ◽  
Vol 54 (9) ◽  
pp. 1-36
Author(s):  
Dylan Chou ◽  
Meng Jiang

Data-driven network intrusion detection (NID) has a tendency towards minority attack classes compared to normal traffic. Many datasets are collected in simulated environments rather than real-world networks. These challenges undermine the performance of intrusion detection machine learning models by fitting machine learning models to unrepresentative “sandbox” datasets. This survey presents a taxonomy with eight main challenges and explores common datasets from 1999 to 2020. Trends are analyzed on the challenges in the past decade and future directions are proposed on expanding NID into cloud-based environments, devising scalable models for large network data, and creating labeled datasets collected in real-world networks.


Author(s):  
Maicon Herverton Lino Ferreira da Silva Barros ◽  
Geovanne Oliveira Alves ◽  
Lubnnia Morais Florêncio Souza ◽  
Élisson da Silva Rocha ◽  
João Fausto Lorenzato de Oliveira ◽  
...  

Tuberculosis (TB) is an airborne infectious disease caused by organisms in the Mycobacterium tuberculosis (Mtb) complex. In many low and middle-income countries, TB remains a major cause of morbidity and mortality. Once a patient has been diagnosed with TB, it is critical that healthcare workers make the most appropriate treatment decision given the individual conditions of the patient and the likely course of the disease based on medical experience. Depending on the prognosis, delayed or inappropriate treatment can result in unsatisfactory results including the exacerbation of clinical symptoms, poor quality of life, and increased risk of death. This work benchmarks machine learning models to aid TB prognosis using a Brazilian health database of confirmed cases and deaths related to TB in the State of Amazonas. The goal is to predict the probability of death by TB thus aiding the prognosis of TB and associated treatment decision making process. In its original form, the data set comprised 36,228 records and 130 fields but suffered from missing, incomplete, or incorrect data. Following data cleaning and preprocessing, a revised data set was generated comprising 24,015 records and 38 fields, including 22,876 reported cured TB patients and 1,139 deaths by TB. To explore how the data imbalance impacts model performance, two controlled experiments were designed using (1) imbalanced and (2) balanced data sets. The best result is achieved by the Gradient Boosting (GB) model using the balanced data set to predict TB-mortality, and the ensemble model composed by the Random Forest (RF), GB and Multi-layer Perceptron (MLP) models is the best model to predict the cure class.


2021 ◽  
Vol 42 (12) ◽  
pp. 124101
Author(s):  
Thomas Hirtz ◽  
Steyn Huurman ◽  
He Tian ◽  
Yi Yang ◽  
Tian-Ling Ren

Abstract In a world where data is increasingly important for making breakthroughs, microelectronics is a field where data is sparse and hard to acquire. Only a few entities have the infrastructure that is required to automate the fabrication and testing of semiconductor devices. This infrastructure is crucial for generating sufficient data for the use of new information technologies. This situation generates a cleavage between most of the researchers and the industry. To address this issue, this paper will introduce a widely applicable approach for creating custom datasets using simulation tools and parallel computing. The multi-I–V curves that we obtained were processed simultaneously using convolutional neural networks, which gave us the ability to predict a full set of device characteristics with a single inference. We prove the potential of this approach through two concrete examples of useful deep learning models that were trained using the generated data. We believe that this work can act as a bridge between the state-of-the-art of data-driven methods and more classical semiconductor research, such as device engineering, yield engineering or process monitoring. Moreover, this research gives the opportunity to anybody to start experimenting with deep neural networks and machine learning in the field of microelectronics, without the need for expensive experimentation infrastructure.


Sign in / Sign up

Export Citation Format

Share Document