Are traditional forecasting models suitable for hotels in Italian cities?

Purpose – The aim of this paper is to assess the performance of different widely-adopted models to forecast Italian hotel occupancy. In particular, the paper tests the different models for forecasting the demand in hotels located in urban areas, which typically experience both business and leisure demand, and whose demand is often affected by the presence of special events in the hotels themselves, or in their neighborhood. Design/methodology/approach – Several forecasting models that the literature reports as most suitable for hotel room occupancy data were selected. Historical data on occupancy in five Italian hotels were divided into a training set and a test set. The parameters of the models were trained and fine-tuned on the training data, obtaining one specific set for each of the five Italian hotels considered. For each hotel, each method, with corresponding best parameter choice, is used to forecast room occupancy in the test set. Findings – In the particular Italian market, models based on booking information outperform historical ones: pick-up models achieve the best results but forecasts are in any case rather poor. Research limitations/implications – The main conclusions of the analysis are that the pick-up models are the most promising ones. Nonetheless, none of the traditional forecasting models tested appears satisfactory in the Italian framework, although the data collected by the front offices can be rather rich. Originality/value – From a managerial point-of-view, the outcome of the study shows that traditional forecasting models can be considered only as a sort of “first aid” for revenue management decisions.

Download Full-text

An Alternative Approach to Prediction of Critical Heat Flux: Projection Support Vector Regression

Volume 3: Next Generation Reactors and Advanced Reactors; Nuclear Safety and Security ◽

10.1115/icone22-30747 ◽

2014 ◽

Author(s):

Botao Jiang ◽

Fuyu Zhao

Keyword(s):

Heat Flux ◽

Support Vector Regression ◽

Critical Heat Flux ◽

Cooling System ◽

Prediction Method ◽

Training Data ◽

Support Vector ◽

Training Set ◽

Test Set ◽

Alternative Approach

Critical heat flux (CHF) is one of the most crucial design criteria in other boiling systems such as evaporator, steam generators, fuel cooling system, boiler, etc. This paper presents an alternative CHF prediction method named projection support vector regression (PSVR), which is a combination of feature vector selection (FVS) method and support vector regression (SVR). In PSVR, the FVS method is first used to select a relevant subset (feature vectors, FVs) from the training data, and then both the training data and the test data are projected into the subspace constructed by FVs, and finally SVR is applied to estimate the projected data. An available CHF dataset taken from the literature is used in this paper. The CHF data are split into two subsets, the training set and the test set. The training set is used to train the PSVR model and the test set is then used to evaluate the trained model. The predicted results of PSVR are compared with those of artificial neural networks (ANNs). The parametric trends of CHF are also investigated using the PSVR model. It is found that the results of the proposed method not only fit the general understanding, but also agree well with the experimental data. Thus, PSVR can be used successfully for prediction of CHF in contrast to ANNs.

Download Full-text

Memorizing All for Implicit Discourse Relation Recognition

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3485016 ◽

2022 ◽

Vol 21 (3) ◽

pp. 1-20

Author(s):

Kashif Munir ◽

Hongxiao Bai ◽

Hai Zhao ◽

Junhan Zhao

Keyword(s):

Performance Improvement ◽

Performance Enhancement ◽

Point Of View ◽

Training Data ◽

Full Model ◽

Training Set ◽

Baseline Model ◽

Data Sparsity ◽

Memory Mechanism ◽

Discourse Relations

Implicit discourse relation recognition is a challenging task due to the absence of the necessary informative clues from explicit connectives. An implicit discourse relation recognizer has to carefully tackle the semantic similarity of sentence pairs and the severe data sparsity issue. In this article, we learn token embeddings to encode the structure of a sentence from a dependency point of view in their representations and use them to initialize a baseline model to make it really strong. Then, we propose a novel memory component to tackle the data sparsity issue by allowing the model to master the entire training set, which helps in achieving further performance improvement. The memory mechanism adequately memorizes information by pairing representations and discourse relations of all training instances, thus filling the slot of the data-hungry issue in the current implicit discourse relation recognizer. The proposed memory component, if attached with any suitable baseline, can help in performance enhancement. The experiments show that our full model with memorizing the entire training data provides excellent results on PDTB and CDTB datasets, outperforming the baselines by a fair margin.

Download Full-text

Machine learning for identifying resistance features of Klebsiella pneumoniae using whole-genome sequence single nucleotide polymorphisms

Journal of Medical Microbiology ◽

10.1099/jmm.0.001474 ◽

2021 ◽

Vol 70 (11) ◽

Author(s):

Wenjia Liu ◽

Nanjiao Ying ◽

Qiusi Mo ◽

Shanshan Li ◽

Mengjie Shao ◽

...

Keyword(s):

Machine Learning ◽

Drug Resistance ◽

Resistance Genes ◽

Type Species ◽

Whole Genome ◽

Training Set ◽

Test Set ◽

Content Type ◽

Machine Learning Methods ◽

Link Type

Introduction. Klebsiella pneumoniae , a gram-negative bacterium, is a common pathogen causing nosocomial infection. The drug-resistance rate of K. pneumoniae is increasing year by year, posing a severe threat to public health worldwide. K. pneumoniae has been listed as one of the pathogens causing the global crisis of antimicrobial resistance in nosocomial infections. We need to explore the drug resistance of K. pneumoniae for clinical diagnosis. Single nucleotide polymorphisms (SNPs) are of high density and have rich genetic information in whole-genome sequencing (WGS), which can affect the structure or expression of proteins. SNPs can be used to explore mutation sites associated with bacterial resistance. Hypothesis/Gap Statement. Machine learning methods can detect genetic features associated with the drug resistance of K. pneumoniae from whole-genome SNP data. Aims. This work used Fast Feature Selection (FFS) and Codon Mutation Detection (CMD) machine learning methods to detect genetic features related to drug resistance of K. pneumoniae from whole-genome SNP data. Methods. WGS data on resistance of K. pneumoniae strains to four antibiotics (tetracycline, gentamicin, imipenem, amikacin) were downloaded from the European Nucleotide Archive (ENA). Sequence alignments were performed with MUMmer 3 to complete SNP calling using K. pneumoniae HS11286 chromosome as the reference genome. The FFS algorithm was applied to feature selection of the SNP dataset. The training set was constructed based on mutation sites with mutation frequency >0.995. Based on the original SNP training set, 70% of SNPs were randomly selected from each dataset as the test set to verify the accuracy of the training results. Finally, the resistance genes were obtained by the CMD algorithm and Venny. Results. The number of strains resistant to tetracycline, gentamicin, imipenem and amikacin was 931, 1048, 789 and 203, respectively. Machine learning algorithms were applied to the SNP training set and test set, and 28 and 23 resistance genes were predicted, respectively. The 28 resistance genes in the training set included 22 genes in the test set, which verified the accuracy of gene prediction. Among them, some genes (KPHS_35310, KPHS_18220, KPHS_35880, etc.) corresponded to known resistance genes (Eef2, lpxK, MdtC, etc). Logistic regression classifiers were established based on the identified SNPs in the training set. The area under the curves (AUCs) of the four antibiotics was 0.939, 0.950, 0.912 and 0.935, showing a strong ability to predict bacterial resistance. Conclusion. Machine learning methods can effectively be used to predict resistance genes and associated SNPs. The FFS and CMD algorithms have wide applicability. They can be used for the drug-resistance analysis of any microorganism with genomic variation and phenotypic data. This work lays a foundation for resistance research in clinical applications.

Download Full-text

Improved feature decay algorithms for statistical machine translation

Natural Language Engineering ◽

10.1017/s1351324920000467 ◽

2020 ◽

pp. 1-21

Author(s):

Alberto Poncelas ◽

Gideon Maillette de Buy Wenniger ◽

Andy Way

Keyword(s):

Statistical Machine Translation ◽

Training Data ◽

Data Selection ◽

Training Set ◽

Excellent Performance ◽

Test Set ◽

Translation Quality ◽

Novel Approach ◽

Parallel Data ◽

Machine Learning Applications

Abstract In machine-learning applications, data selection is of crucial importance if good runtime performance is to be achieved. In a scenario where the test set is accessible when the model is being built, training instances can be selected so they are the most relevant for the test set. Feature Decay Algorithms (FDA) are a technique for data selection that has demonstrated excellent performance in a number of tasks. This method maximizes the diversity of the n-grams in the training set by devaluing those ones that have already been included. We focus on this method to undertake deeper research on how to select better training data instances. We give an overview of FDA and propose improvements in terms of speed and quality. Using German-to-English parallel data, first we create a novel approach that decreases the execution time of FDA when multiple computation units are available. In addition, we obtain improvements on translation quality by extending FDA using information from the parallel corpus that is generally ignored.

Download Full-text

Forecasting the price trends of digital currency: a hybrid model integrating the stochastic index and grey Markov chain methods

Grey Systems Theory and Application ◽

10.1108/gs-12-2019-0068 ◽

2020 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Ming-Huan Shou ◽

Zheng-Xin Wang ◽

Dan-Dan Li ◽

Yi-Tong Zhou

Keyword(s):

Markov Chain ◽

Decision Support ◽

Hybrid Model ◽

Decision Support Tool ◽

Training Set ◽

Test Set ◽

Content Type ◽

Support Tool ◽

Digital Currency ◽

Markov Chain Method

PurposeSince the issuance in 2009, the digital currency has enjoyed an increasing popularity and has become one of the most important options for global investors. The purpose of this paper is to propose a hybrid model ( KDJ–Markov chain) which integrates the advantages of the stochastic index (KDJ) and grey Markov chain methods and provide a useful decision support tool for investors participating in the digital currency market.Design/methodology/approachTaking Litecoin's closing price prediction as an example, the closing prices from May 2 to June 20, 2017, are used as the training set, while those from June 21 to August 9, 2017, are used as the test set. In addition, an adaptive KDJ–Markov chain is proposed to enhance the adaptability for dynamic transaction information. And the paper verifies the effectiveness of the KDJ–Markov chain method and adaptive KDJ–Markov chain method.FindingsThe results show that the proposed methods can provide a reliable foundation for market analysis and investment decisions. Under the circumstances the accuracy of the training set and the accuracy of the test set are 76% and 78%, respectively.Practical implicationsThis study not only solves the problems that KDJ method cannot accurately predict the next day's state and the grey Markov chain method cannot divide the states very well, but it also provides two useful decision support tools for investors to make more scientific and reasonable decisions for digital currency where there are no existing methods to analyze the fluctuation.Originality/valueA new approach to analyze the fluctuation of digital currency, in which there are no existing methods, is proposed based on the stochastic index (KDJ) and grey Markov chain methods. And both of these two models have high accuracy.

Download Full-text

Crossvalidation in brain imaging analysis

10.1101/017418 ◽

2015 ◽

Cited By ~ 3

Author(s):

Nikolaus Kriegeskorte

Keyword(s):

Brain Imaging ◽

Empirical Test ◽

Predictive Performance ◽

Training Data ◽

Multiple Models ◽

Imaging Analysis ◽

Training Set ◽

Test Set ◽

And Performance ◽

Performance Estimates

Crossvalidation is a method for estimating predictive performance and adjudicating between multiple models. On each of k folds of the process, k-1 of k independent subsets of the data (training set) are used to fit the parameters of each model and the left-out subset (test set) is used to estimate predictive performance. The method is statistically efficient, because training data are reused for testing and performance estimates combined across folds. The method requires no assumptions, provides nearly unbiased (slightly conservative) estimates of predictive performance, and is generally applicable because it amounts to a direct empirical test of each model.

Download Full-text

An artificial intelligence model (euploid prediction algorithm) can predict embryo ploidy status based on time-lapse data

Reproductive Biology and Endocrinology ◽

10.1186/s12958-021-00864-4 ◽

2021 ◽

Vol 19 (1) ◽

Author(s):

Bo Huang ◽

Wei Tan ◽

Zhou Li ◽

Lei Jin

Keyword(s):

Artificial Intelligence ◽

Time Lapse ◽

Training Data ◽

Prediction Algorithm ◽

Preliminary Evaluation ◽

Training Set ◽

Data Set ◽

Test Set ◽

Validation Set ◽

Ploidy Status

Abstract Background For the association between time-lapse technology (TLT) and embryo ploidy status, there has not yet been fully understood. TLT has the characteristics of large amount of data and non-invasiveness. If we want to accurately predict embryo ploidy status from TLT, artificial intelligence (AI) technology is a good choice. However, the current work of AI in this field needs to be strengthened. Methods A total of 469 preimplantation genetic testing (PGT) cycles and 1803 blastocysts from April 2018 to November 2019 were included in the study. All embryo images are captured during 5 or 6 days after fertilization before biopsy by time-lapse microscope system. All euploid embryos or aneuploid embryos are used as data sets. The data set is divided into training set, validation set and test set. The training set is mainly used for model training, the validation set is mainly used to adjust the hyperparameters of the model and the preliminary evaluation of the model, and the test set is used to evaluate the generalization ability of the model. For better verification, we used data other than the training data for external verification. A total of 155 PGT cycles from December 2019 to December 2020 and 523 blastocysts were included in the verification process. Results The euploid prediction algorithm (EPA) was able to predict euploid on the testing dataset with an area under curve (AUC) of 0.80. Conclusions The TLT incubator has gradually become the choice of reproductive centers. Our AI model named EPA that can predict embryo ploidy well based on TLT data. We hope that this system can serve all in vitro fertilization and embryo transfer (IVF-ET) patients in the future, allowing embryologists to have more non-invasive aids when selecting the best embryo to transfer.

Download Full-text

PREDIKSI KUALITAS AIR SUNGAI CILIWUNG DENGAN MENGGUNAKAN ALGORITMA POHON KEPUTUSAN

Jurnal Air Indonesia ◽

10.29122/jai.v12i2.4364 ◽

2021 ◽

Vol 12 (2) ◽

Author(s):

Mohammad Haekal ◽

Henki Bayu Seta ◽

Mayanda Mega Santoni

Keyword(s):

Data Mining ◽

Decision Tree ◽

Cross Validation ◽

Online Monitoring ◽

Training Set ◽

Microsoft Excel ◽

Test Set

Untuk memprediksi kualitas air sungai Ciliwung, telah dilakukan pengolahan data-data hasil pemantauan secara Online Monitoring dengan menggunakan Metode Data Mining. Pada metode ini, pertama-tama data-data hasil pemantauan dibuat dalam bentuk tabel Microsoft Excel, kemudian diolah menjadi bentuk Pohon Keputusan yang disebut Algoritma Pohon Keputusan (Decision Tree) mengunakan aplikasi WEKA. Metode Pohon Keputusan dipilih karena lebih sederhana, mudah dipahami dan mempunyai tingkat akurasi yang sangat tinggi. Jumlah data hasil pemantauan kualitas air sungai Ciliwung yang diolah sebanyak 5.476 data. Hasil klarifikasi dengan Pohon Keputusan, dari 5.476 data ini diperoleh jumlah data yang mengindikasikan sungai Ciliwung Tidak Tercemar sebanyak 1.059 data atau sebesar 19,3242%, dan yang mengindikasikan Tercemar sebanyak 4.417 data atau 80,6758%. Selanjutnya data-data hasil pemantauan ini dievaluasi menggunakan 4 Opsi Tes (Test Option) yaitu dengan Use Training Set, Supplied Test Set, Cross-Validation folds 10, dan Percentage Split 66%. Hasil evaluasi dengan 4 opsi tes yang digunakan ini, semuanya menunjukkan tingkat akurasi yang sangat tinggi, yaitu diatas 99%. Dari data-data hasil peneltian ini dapat diprediksi bahwa sungai Ciliwung terindikasi sebagai sungai tercemar bila mereferensi kepada Peraturan Pemerintah Republik Indonesia nomor 82 tahun 2001 dan diketahui pula bahwa penggunaan aplikasi WEKA dengan Algoritma Pohon Keputusan untuk mengolah data-data hasil pemantauan dengan mengambil tiga parameter (pH, DO dan Nitrat) adalah sangat akuran dan tepat. Kata Kunci : Kualitas air sungai, Data Mining, Algoritma Pohon Keputusan, Aplikasi WEKA.

Download Full-text

Integrated waste water management for a small river basin - a case study

Water Science & Technology ◽

10.2166/wst.1998.0442 ◽

1998 ◽

Vol 38 (11) ◽

pp. 87-95

Author(s):

R. Fenz ◽

M. Zessner ◽

N. Kreuzinger ◽

H. Kroiss

Keyword(s):

Waste Water ◽

Water Management ◽

River Basin ◽

Rural Areas ◽

Urban Areas ◽

Waste Water Treatment ◽

Point Of View ◽

Political Issue ◽

The North ◽

Waste Water Management

In Austria approximately 70% of the population is connected to sewerage and to biological waste water treatment plants. Whereas the urban areas are already provided with these facilities to a very high extent, effort is still needed in rural areas to meet the requirements of the Austrian legislation. The way, this task should be solved has provoked much controversy. It is mainly the question, whether centralised or decentralised sewage disposal systems are preferable from the ecological and economical point of view, that became a political issue during the last 5 years. The Institute for Water Quality and Waste Management was asked to elaborate a waste water management concept for the Lainsitz River Basin, a mainly rural area in the north of Austria discharging to the Elbe river. Both ecological and economical aspects should be considered. This paper presents the methodology that was applied and the criteria which were decisive for the selection of the final solution.

Download Full-text

Transition from Emotional to Rational Advertising for Food Products on the Romanian Market

Cercetari Agronomice in Moldova ◽

10.1515/cerce-2017-0009 ◽

2017 ◽

Vol 50 (1) ◽

pp. 101-108

Author(s):

A.F. Jităreanu ◽

Elena Leonte ◽

A. Chiran ◽

Benedicta Drobotă

Keyword(s):

Urban Areas ◽

Eating Habits ◽

Well Being ◽

Point Of View ◽

The Body ◽

Positive Effects ◽

New Methods ◽

Brand Advertising ◽

Tangible Evidence ◽

Health And Well Being

Abstract Advertising helps to establish a set of assumptions that the consumer will bring to all other aspects of their engagement with a given brand. Advertising provides tangible evidence of the financial credibility and competitive presence of an organization. Persuasion is becoming more important in advertising. In marketing, persuasive advertising acts to establish wants/motivations and beliefs/attitudes by helping to formulate a conception of the brand as being one which people like those in the target audience would or should prefer. Considering the changes in lifestyle and eating habits of a significant part of the population in urban areas in Romania, the paper aims to analyse how brands manage to differentiate themselves from competitors, to reposition themselves on the market and influence consumers, meeting their increasingly varied needs. Food brands on the Romanian market are trying, lately, to identify new methods of differentiation and new benefits for their buyers. Given that more and more consumers are becoming increasingly concerned about what they eat and the products’ health effects, brands struggle to highlight the fact that their products offer real benefits for the body. The advertisements have become more diversified and underline the positive effects, from the health and well - being point of view, that those foods offer (no additives and preservatives, use of natural ingredients, various vitamins and minerals or the fact that they are dietary). Advertising messages’ diversification is obvious on the Romanian market, in the context of an increasing concern of the population for the growing level of information of some major consumer segments.

Download Full-text