scholarly journals Benchmark dataset for mid-price forecasting of limit order book data with machine learning methods

2018 ◽  
Vol 37 (8) ◽  
pp. 852-866 ◽  
Author(s):  
Adamantios Ntakaris ◽  
Martin Magris ◽  
Juho Kanniainen ◽  
Moncef Gabbouj ◽  
Alexandros Iosifidis
2021 ◽  
pp. jfds.2021.1.074
Author(s):  
Charles Huang ◽  
Weifeng Ge ◽  
Hongsong Chou ◽  
Xin Du

Energies ◽  
2019 ◽  
Vol 12 (9) ◽  
pp. 1680 ◽  
Author(s):  
Moting Su ◽  
Zongyi Zhang ◽  
Ye Zhu ◽  
Donglan Zha ◽  
Wenying Wen

Natural gas has been proposed as a solution to increase the security of energy supply and reduce environmental pollution around the world. Being able to forecast natural gas price benefits various stakeholders and has become a very valuable tool for all market participants in competitive natural gas markets. Machine learning algorithms have gradually become popular tools for natural gas price forecasting. In this paper, we investigate data-driven predictive models for natural gas price forecasting based on common machine learning tools, i.e., artificial neural networks (ANN), support vector machines (SVM), gradient boosting machines (GBM), and Gaussian process regression (GPR). We harness the method of cross-validation for model training and monthly Henry Hub natural gas spot price data from January 2001 to October 2018 for evaluation. Results show that these four machine learning methods have different performance in predicting natural gas prices. However, overall ANN reveals better prediction performance compared with SVM, GBM, and GPR.


2019 ◽  
Author(s):  
Kersten Döring ◽  
Ammar Qaseem ◽  
Kiran K Telukunta ◽  
Michael Becer ◽  
Philippe Thomas ◽  
...  

AbstractMotivationMuch effort has been invested in the identification of protein-protein interactions using text mining and machine learning methods. The extraction of functional relationships between chemical compounds and proteins from literature has received much less attention, and no ready-to-use open-source software is so far available for this task.MethodWe created a new benchmark dataset of 2,753 sentences from abstracts containing annotations of proteins, small molecules, and their relationships. Two kernel methods were applied to classify these relationships as functional or non-functional, named shallow linguistic and all-paths graph kernel. Furthermore, the benefit of interaction verbs in sentences was evaluated.ResultsThe cross-validation of the all-paths graph kernel (AUC value: 84.2%, F1 score: 81.8%) shows slightly better results than the shallow linguistic kernel (AUC value: 81.6%, F1 score: 79.7%) on our benchmark dataset. Both models achieve state-of-the-art performance in the research area of relation extraction. Furthermore, the combination of shallow linguistic and all-paths graph kernel could further increase the overall performance. We used each of the two kernels to identify functional relationships in all PubMed abstracts (28 million) and provide the results, including recorded processing time.AvailabilityThe software for the tested kernels, the benchmark, the processed 28 million PubMed abstracts, all evaluation scripts, as well as the scripts for processing the complete PubMed database are freely available at https://github.com/KerstenDoering/CPI-Pipeline.Author summaryText mining aims at organizing large sets of unstructured text data to provide efficient information extraction. Particularly in the area of drug discovery, the knowledge about small molecules and their interactions with proteins is of crucial importance to understand the drug effects on cells, tissues, and organisms. This data is normally hidden in written articles, which are published in journals with a focus on life sciences. In this publication, we show how text mining methods can be used to extract data about functional interactions between small molecules and proteins from texts. We created a new dataset with annotated sentences of scientific abstracts for the purpose of training two diverse machine learning methods (kernels), and successfully classified compound-protein pairs as functional and non-functional relations, i.e. no interactions. Our newly developed benchmark dataset and the pipeline for information extraction are freely available for download. Furthermore, we show that the software can be easily up-scaled to process large datasets by applying the approach to 28 million abstracts.


IEEE Access ◽  
2019 ◽  
Vol 7 ◽  
pp. 64722-64736 ◽  
Author(s):  
Paraskevi Nousi ◽  
Avraam Tsantekidis ◽  
Nikolaos Passalis ◽  
Adamantios Ntakaris ◽  
Juho Kanniainen ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document