scholarly journals Analysis and Prediction of Soccer Games: An Application to the Kaggle European Soccer Database

2020 ◽  
Vol 3 (1) ◽  
pp. 1
Author(s):  
Wuhuan Deng ◽  
Eric Zhong

<p>The study of soccer game data has many applications for both fans and teams. The effective analytical work can not only help the teams to improve their offensive and defensive skills and strategies, but also could assist the fans to make a bet. In this work, the authors study the European League Dataset with statistical methods to analyze the game data. Moreover, machine learning techniques are designed to predict the game results based on in-game performance and pre-game odds provided by bookmakers. With rational feature engineering and model selection, our model results in an overall 95% accuracy.</p>

Nanoscale ◽  
2021 ◽  
Author(s):  
Susana I. L. Gomes ◽  
Mónica J. B. Amorim ◽  
Suman Pokhrel ◽  
Lutz Mädler ◽  
Matteo Fasano ◽  
...  

Based on a highly detailed materials characterisation database (including atomistic and multiscale modelling), single and univariate statistical methods, combined with machine learning techniques, revealed key descriptors of biological functions.


Author(s):  
Kayalvizhi S. ◽  
Thenmozhi D.

Catch phrases are the important phrases that precisely explain the document. They represent the context of the whole document. They can also be used to retrieve relevant prior cases by the judges and lawyers for assuring justice in the domain of law. Currently, catch phrases are extracted using statistical methods, machine learning techniques, and deep learning techniques. The authors propose a sequence to sequence (Seq2Seq) deep neural network to extract catch phrases from legal documents. They have employed several layers, namely embedding layer, encoder-decoder layer, projection layer, and loss layer to build the deep neural network. The methodology is evaluated on IRLeD@FIRE-2017 dataset and the method has obtained 0.787 and 0.607 as mean average precision and recall scores respectively. Results show that the proposed method outperforms the existing systems.


PLoS ONE ◽  
2021 ◽  
Vol 16 (5) ◽  
pp. e0252068
Author(s):  
David Guijo-Rubio ◽  
Javier Briceño ◽  
Pedro Antonio Gutiérrez ◽  
Maria Dolores Ayllón ◽  
Rubén Ciria ◽  
...  

Donor-Recipient (D-R) matching is one of the main challenges to be fulfilled nowadays. Due to the increasing number of recipients and the small amount of donors in liver transplantation, the allocation method is crucial. In this paper, to establish a fair comparison, the United Network for Organ Sharing database was used with 4 different end-points (3 months, and 1, 2 and 5 years), with a total of 39, 189 D-R pairs and 28 donor and recipient variables. Modelling techniques were divided into two groups: 1) classical statistical methods, including Logistic Regression (LR) and Naïve Bayes (NB), and 2) standard machine learning techniques, including Multilayer Perceptron (MLP), Random Forest (RF), Gradient Boosting (GB) or Support Vector Machines (SVM), among others. The methods were compared with standard scores, MELD, SOFT and BAR. For the 5-years end-point, LR (AUC = 0.654) outperformed several machine learning techniques, such as MLP (AUC = 0.599), GB (AUC = 0.600), SVM (AUC = 0.624) or RF (AUC = 0.644), among others. Moreover, LR also outperformed standard scores. The same pattern was reproduced for the others 3 end-points. Complex machine learning methods were not able to improve the performance of liver allocation, probably due to the implicit limitations associated to the collection process of the database.


Agriculture ◽  
2020 ◽  
Vol 10 (10) ◽  
pp. 436 ◽  
Author(s):  
Mohsen Niazian ◽  
Gniewko Niedbała

Classical univariate and multivariate statistics are the most common methods used for data analysis in plant breeding and biotechnology studies. Evaluation of genetic diversity, classification of plant genotypes, analysis of yield components, yield stability analysis, assessment of biotic and abiotic stresses, prediction of parental combinations in hybrid breeding programs, and analysis of in vitro-based biotechnological experiments are mainly performed by classical statistical methods. Despite successful applications, these classical statistical methods have low efficiency in analyzing data obtained from plant studies, as the genotype, environment, and their interaction (G × E) result in nondeterministic and nonlinear nature of plant characteristics. Large-scale data flow, including phenomics, metabolomics, genomics, and big data, must be analyzed for efficient interpretation of results affected by G × E. Nonlinear nonparametric machine learning techniques are more efficient than classical statistical models in handling large amounts of complex and nondeterministic information with “multiple-independent variables versus multiple-dependent variables” nature. Neural networks, partial least square regression, random forest, and support vector machines are some of the most fascinating machine learning models that have been widely applied to analyze nonlinear and complex data in both classical plant breeding and in vitro-based biotechnological studies. High interpretive power of machine learning algorithms has made them popular in the analysis of plant complex multifactorial characteristics. The classification of different plant genotypes with morphological and molecular markers, modeling and predicting important quantitative characteristics of plants, the interpretation of complex and nonlinear relationships of plant characteristics, and predicting and optimizing of in vitro breeding methods are the examples of applications of machine learning in conventional plant breeding and in vitro-based biotechnological studies. Precision agriculture is possible through accurate measurement of plant characteristics using imaging techniques and then efficient analysis of reliable extracted data using machine learning algorithms. Perfect interpretation of high-throughput phenotyping data is applicable through coupled machine learning-image processing. Some applied and potentially applicable capabilities of machine learning techniques in conventional and in vitro-based plant breeding studies have been discussed in this overview. Discussions are of great value for future studies and could inspire researchers to apply machine learning in new layers of plant breeding.


2020 ◽  
Author(s):  
Michael Tonderai Mapundu ◽  
Chodziwadziwa Kabudula ◽  
Eustasius Musenge ◽  
Turgay Celik

Abstract Background: The process of determining causes of death in areas where there is limited clinical services using verbal autopsies has become a key issue in terms of accuracy on cause of death (prone to errors and subjective), quality of data among many drawbacks. This is mainly because there is no proper standard available in performing verbal autopsy, even though it is important for civil registration systems and strengthening of health priorities. Physician diagnosis is the only gold standard in reviewing verbal autopsy narratives. In practice, conventional statistical methods are used to perform verbal autopsies due to their simplicity and transparency. However, in literature complex machine learning models can be found that can replace the traditional statistical methods. There has not been much application of machine learning techniques in verbal autopsy to determine cause of death, despite the advances in technology. As such, there is a need for a thorough survey of recent literature on statistical and machine learning approaches applied in verbal autopsy to determine cause of death. Methods: A systematic review was conducted and included a search from six databases. Our study only included scientific articles published in last decade that reported on verbal autopsy and: (1) algorithms; (2) statistical techniques; (3) machine learning and (4) deep learning. The search yielded 110 articles, after meta analysis, we identified 85 articles as being relevant and discarded the other 25. We investigated and compared the most commonly used statistical and machine learning techniques in VAs, identified limitations of each of these techniques, proposed a guiding machine learning framework and pointed to future directions. Results: Eighty five studies met the inclusion criteria. Apart from physician diagnosis, statistical methods are the most currently applied tools to determine cause of death from verbal autopsies. However, there has been little application of traditional machine learning and emerging techniques, even though they have shown promising results in other domains. Conclusions: Technological application of machine learning to determine cause of death, should focus on effective ideal strategies of pre-processing, transparency, robust feature engineering techniques and data balancing in order to attain optimal model performance.


2019 ◽  
Vol 2019 (3) ◽  
pp. 191-209 ◽  
Author(s):  
Se Eun Oh ◽  
Saikrishna Sunkam ◽  
Nicholas Hopper

Abstract Recent advances in Deep Neural Network (DNN) architectures have received a great deal of attention due to their ability to outperform state-of-the-art machine learning techniques across a wide range of application, as well as automating the feature engineering process. In this paper, we broadly study the applicability of deep learning to website fingerprinting. First, we show that unsupervised DNNs can generate lowdimensional informative features that improve the performance of state-of-the-art website fingerprinting attacks. Second, when used as classifiers, we show that they can exceed performance of existing attacks across a range of application scenarios, including fingerprinting Tor website traces, fingerprinting search engine queries over Tor, defeating fingerprinting defenses, and fingerprinting TLS-encrypted websites. Finally, we investigate which site-level features of a website influence its fingerprintability by DNNs.


Sign in / Sign up

Export Citation Format

Share Document