Federated Learning and Privacy

Centralized data collection can expose individuals to privacy risks and organizations to legal risks if data is not properly managed. Federated learning is a machine learning setting where multiple entities collaborate in solving a machine learning problem, under the coordination of a central server or service provider. Each client's raw data is stored locally and not exchanged or transferred; instead, focused updates intended for immediate aggregation are used to achieve the learning objective. This article provides a brief introduction to key concepts in federated learning and analytics with an emphasis on how privacy technologies may be combined in real-world systems and how their use charts a path toward societal benefit from aggregate statistics in new domains and with minimized risk to individuals and to the organizations who are custodians of the data.

Download Full-text

Detection of illicit cryptomining using network metadata

EURASIP Journal on Information Security ◽

10.1186/s13635-021-00126-1 ◽

2021 ◽

Vol 2021 (1) ◽

Author(s):

Michele Russo ◽

Nedim Šrndić ◽

Pavel Laskov

Keyword(s):

Machine Learning ◽

Computer Security ◽

False Alarm ◽

False Alarm Rate ◽

Real World ◽

Detection Rate ◽

World Systems ◽

Normal Network ◽

Compromised Systems ◽

Security Incidents

AbstractIllicit cryptocurrency mining has become one of the prevalent methods for monetization of computer security incidents. In this attack, victims’ computing resources are abused to mine cryptocurrency for the benefit of attackers. The most popular illicitly mined digital coin is Monero as it provides strong anonymity and is efficiently mined on CPUs.Illicit mining crucially relies on communication between compromised systems and remote mining pools using the de facto standard protocol Stratum. While prior research primarily focused on endpoint-based detection of in-browser mining, in this paper, we address network-based detection of cryptomining malware in general. We propose XMR-Ray, a machine learning detector using novel features based on reconstructing the Stratum protocol from raw NetFlow records. Our detector is trained offline using only mining traffic and does not require privacy-sensitive normal network traffic, which facilitates its adoption and integration.In our experiments, XMR-Ray attained 98.94% detection rate at 0.05% false alarm rate, outperforming the closest competitor. Our evaluation furthermore demonstrates that it reliably detects previously unseen mining pools, is robust against common obfuscation techniques such as encryption and proxies, and is applicable to mining in the browser or by compiled binaries. Finally, by deploying our detector in a large university network, we show its effectiveness in protecting real-world systems.

Download Full-text

Software to Support Layout and Data Collection for Machine-Learning-Based Real-World Sensors

Communications in Computer and Information Science - HCI International 2019 - Posters ◽

10.1007/978-3-030-23528-4_28 ◽

2019 ◽

pp. 198-205

Author(s):

Ayane Saito ◽

Wataru Kawai ◽

Yuta Sugiura

Keyword(s):

Machine Learning ◽

Data Collection ◽

Real World ◽

Support Layout

Download Full-text

Applying Machine Learning to the Classification of DC-DC Converters: Real-world data collection processing & Validation.

10.2172/1670255 ◽

2020 ◽

Author(s):

Benjamin Davis

Keyword(s):

Machine Learning ◽

Data Collection ◽

Real World ◽

Real World Data ◽

World Data

Download Full-text

Detection of illicit cryptomining using network metadata

10.21203/rs.3.rs-607598/v1 ◽

2021 ◽

Author(s):

Michele Russo ◽

Nedim Šrndić ◽

Pavel Laskov

Keyword(s):

Machine Learning ◽

Computer Security ◽

False Alarm ◽

Real World ◽

Detection Rate ◽

Standard Protocol ◽

World Systems ◽

Normal Network ◽

Compromised Systems ◽

Security Incidents

Abstract Illicit cryptocurrency mining has become one of the prevalent methods for monetization of computer security incidents. In this attack, victims' computing resources are abused to mine cryptocurrency for the benefit of attackers. The most popular illicitly mined digital coin is Monero as it provides strong anonymity and is efficiently mined on CPUs. Illicit mining crucially relies on communication between compromised systems and remote mining pools using the de facto standard protocol Stratum. While prior research primarily focused on endpoint-based detection of in-browser mining, in this paper we address network-based detection of cryptomining malware in general. We propose XMR-Ray, a machine learning detector using novel features based on reconstructing the Stratum protocol from raw NetFlow records. Our detector is trained offline using only mining traffic and does not require privacy-sensitive normal network traffic, which facilitates its adoption and integration. In our experiments, XMR-Ray attained 98.94% detection rate at 0.05% false alarm rate, outperforming the closest competitor. Our evaluation furthermore demonstrates that it reliably detects previously unseen mining pools, is robust against common obfuscation techniques such as encryption and proxies, and is applicable to mining in the browser or by compiled binaries. Finally, by deploying our detector in a large university network, we show its effectiveness in protecting real-world systems.

Download Full-text

Learning and Generalising Object Extraction Skill for Contact-rich Disassembly Tasks: An Introductory Study

10.21203/rs.3.rs-331448/v1 ◽

2021 ◽

Author(s):

Antonio Serrano Muñoz ◽

Nestor Arana-Arexolaleiba ◽

Dimitrios Chrysostomou ◽

Simon Bøgh

Keyword(s):

Machine Learning ◽

Reinforcement Learning ◽

Real World ◽

State Of The Art ◽

Learning Algorithms ◽

Robotic Manipulation ◽

Object Extraction ◽

Learning Agents ◽

Key Concepts ◽

Planning And Operation

Abstract Remanufacturing automation must be designed to be flexible and robust enough to overcome the uncertainties, conditions of the products, and complexities in the process's planning and operation. Machine learning, particularly reinforcement learning, methods are presented as techniques to learn, improve, and generalise the automation of many robotic manipulation tasks (most of them related to grasping, picking, or assembly). However, not much has been exploited in remanufacturing, in particular in disassembly tasks. This work presents the State-of-the-Art of contact-rich disassembly using reinforcement learning algorithms and a study about the object extraction skill's generalisation when applied to contact-rich disassembly tasks. The generalisation capabilities of two State-of-the-Art reinforcement learning agents (trained in simulation) are tested and evaluated in simulation and real-world while perform a disassembly task. Results shows that, at least, one of the agents can generalise the contact-rich extraction skill. Also, this work identifies key concepts and gaps for the reinforcement learning algorithms' research and application on disassembly tasks.

Download Full-text

Optimism in Active Learning

Computational Intelligence and Neuroscience ◽

10.1155/2015/973696 ◽

2015 ◽

Vol 2015 ◽

pp. 1-17 ◽

Cited By ~ 3

Author(s):

Timothé Collet ◽

Olivier Pietquin

Keyword(s):

Machine Learning ◽

Active Learning ◽

Real World ◽

State Of The Art ◽

Classification Error ◽

Exploration And Exploitation ◽

Training Set ◽

Learning Problem ◽

The Face ◽

Real World Datasets

Active learning is the problem of interactively constructing the training set used in classification in order to reduce its size. It would ideally successively add the instance-label pair that decreases the classification error most. However, the effect of the addition of a pair is not known in advance. It can still be estimated with the pairs already in the training set. The online minimization of the classification error involves a tradeoff between exploration and exploitation. This is a common problem in machine learning for which multiarmed bandit, using the approach of Optimism int the Face of Uncertainty, has proven very efficient these last years. This paper introduces three algorithms for the active learning problem in classification using Optimism in the Face of Uncertainty. Experiments lead on built-in problems and real world datasets demonstrate that they compare positively to state-of-the-art methods.

Download Full-text

Platform for Analysing and Encouraging Student Activity on Contest and E-learning Systems

OLYMPIADS IN INFORMATICS ◽

10.15388/ioi.2018.07 ◽

2018 ◽

Vol 12 ◽

pp. 85-98

Author(s):

Bojan Kostadinov ◽

Mile Jovanov ◽

Emil STANKOV

Keyword(s):

Machine Learning ◽

Data Collection ◽

Educational Policy ◽

Learning Systems ◽

Data Sources ◽

Or Education ◽

Student Activity ◽

The World ◽

E Learning ◽

Analyse Data

Data collection and machine learning are changing the world. Whether it is medicine, sports or education, companies and institutions are investing a lot of time and money in systems that gather, process and analyse data. Likewise, to improve competitiveness, a lot of countries are making changes to their educational policy by supporting STEM disciplines. Therefore, it’s important to put effort into using various data sources to help students succeed in STEM. In this paper, we present a platform that can analyse student’s activity on various contest and e-learning systems, combine and process the data, and then present it in various ways that are easy to understand. This in turn enables teachers and organizers to recognize talented and hardworking students, identify issues, and/or motivate students to practice and work on areas where they’re weaker.

Download Full-text

Debt Markets and Investments

10.1093/oso/9780190877439.001.0001 ◽

2019 ◽

Keyword(s):

Real World ◽

Level Of Detail ◽

Fixed Income ◽

Key Concepts ◽

Debt Markets ◽

Basic Concepts ◽

User Friendly ◽

Distinguishing Features ◽

Consistent Approach ◽

Complex Subject

This book provides an objective look into the dynamic world of debt markets, products, valuation, and analysis. It also provides an in-depth understanding about this subject from experts in the field, both practitioners and academics. The coverage extends from discussing basic concepts and their application to increasingly intricate and real-world situations. This volume spans the gamut from theoretical to practical, while attempting to offer a useful balance of detailed and user-friendly coverage. The book has several distinguishing features. It blends the contributions of a global array of scholars and practitioners into a single review of some of the most important topics in this area. The book follows an internally consistent approach in format and style. Hence, it is collectively much more than a compilation of chapters from an array of different authors. It presents theory without unnecessary abstraction, quantitative techniques using basic bond mathematics, and conventions at a useful level of detail. It also incorporates how investment professionals analyze and manage fixed income portfolios. The book emphasizes empirical evidence involving debt securities and markets so it is understandable to a wide array of readers. Each chapter contains discussion questions to help reinforce key concepts. The end of the book contains guideline answers to each question. Readers interested in a broad survey will benefit as will those looking for more in-depth presentations of specific areas within this field of study. In summary, the book provides a fresh look at this intriguing and dynamic but often complex subject.

Download Full-text

Phenomenological Modelling

10.1093/oso/9780198782933.003.0009 ◽

2018 ◽

Author(s):

Ray Huffaker ◽

Marco Bittelli ◽

Rodolfo Rosa

Keyword(s):

Nonlinear Dynamics ◽

Time Series ◽

Real World ◽

Biological Diversity ◽

False Positives ◽

World Systems ◽

Economic Activities ◽

Essential Services ◽

Low Dimension ◽

Convergent Cross Mapping

Detecting causal interactions among climatic, environmental, and human forces in complex biophysical systems is essential for understanding how these systems function and how public policies can be devised that protect the flow of essential services to biological diversity, agriculture, and other core economic activities. Convergent Cross Mapping (CCM) detects causal networks in real-world systems diagnosed with deterministic, low-dimension, and nonlinear dynamics. If CCM detects correspondence between phase spaces reconstructed from observed time series variables, then the variables are determined to causally interact in the same dynamic system. CCM can give false positives by misconstruing synchronized variables as causally interactive. Extended (delayed) CCM screens for false positives among synchronized variables.

Download Full-text

Predicting Future Occurrence of Acute Hypotensive Episodes Using Noninvasive and Invasive Features

Military Medicine ◽

10.1093/milmed/usaa418 ◽

2021 ◽

Vol 186 (Supplement_1) ◽

pp. 445-451

Author(s):

Yifei Sun ◽

Navid Rashedi ◽

Vikrant Vaze ◽

Parikshit Shah ◽

Ryan Halter ◽

...

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Real World ◽

Short Term Memory ◽

Model Performance ◽

Learning Technologies ◽

Machine Learning Algorithms ◽

Support Vector ◽

K Nearest Neighbor ◽

Continuous Map

ABSTRACT Introduction Early prediction of the acute hypotensive episode (AHE) in critically ill patients has the potential to improve outcomes. In this study, we apply different machine learning algorithms to the MIMIC III Physionet dataset, containing more than 60,000 real-world intensive care unit records, to test commonly used machine learning technologies and compare their performances. Materials and Methods Five classification methods including K-nearest neighbor, logistic regression, support vector machine, random forest, and a deep learning method called long short-term memory are applied to predict an AHE 30 minutes in advance. An analysis comparing model performance when including versus excluding invasive features was conducted. To further study the pattern of the underlying mean arterial pressure (MAP), we apply a regression method to predict the continuous MAP values using linear regression over the next 60 minutes. Results Support vector machine yields the best performance in terms of recall (84%). Including the invasive features in the classification improves the performance significantly with both recall and precision increasing by more than 20 percentage points. We were able to predict the MAP with a root mean square error (a frequently used measure of the differences between the predicted values and the observed values) of 10 mmHg 60 minutes in the future. After converting continuous MAP predictions into AHE binary predictions, we achieve a 91% recall and 68% precision. In addition to predicting AHE, the MAP predictions provide clinically useful information regarding the timing and severity of the AHE occurrence. Conclusion We were able to predict AHE with precision and recall above 80% 30 minutes in advance with the large real-world dataset. The prediction of regression model can provide a more fine-grained, interpretable signal to practitioners. Model performance is improved by the inclusion of invasive features in predicting AHE, when compared to predicting the AHE based on only the available, restricted set of noninvasive technologies. This demonstrates the importance of exploring more noninvasive technologies for AHE prediction.

Download Full-text