scholarly journals A top-level model of case-based argumentation for explanation: Formalisation and experiments

2021 ◽  
pp. 1-36
Author(s):  
Henry Prakken ◽  
Rosa Ratsma

This paper proposes a formal top-level model of explaining the outputs of machine-learning-based decision-making applications and evaluates it experimentally with three data sets. The model draws on AI & law research on argumentation with cases, which models how lawyers draw analogies to past cases and discuss their relevant similarities and differences in terms of relevant factors and dimensions in the problem domain. A case-based approach is natural since the input data of machine-learning applications can be seen as cases. While the approach is motivated by legal decision making, it also applies to other kinds of decision making, such as commercial decisions about loan applications or employee hiring, as long as the outcome is binary and the input conforms to this paper’s factor- or dimension format. The model is top-level in that it can be extended with more refined accounts of similarities and differences between cases. It is shown to overcome several limitations of similar argumentation-based explanation models, which only have binary features and do not represent the tendency of features towards particular outcomes. The results of the experimental evaluation studies indicate that the model may be feasible in practice, but that further development and experimentation is needed to confirm its usefulness as an explanation model. Main challenges here are selecting from a large number of possible explanations, reducing the number of features in the explanations and adding more meaningful information to them. It also remains to be investigated how suitable our approach is for explaining non-linear models.

2021 ◽  
Author(s):  
◽  
Yong Bian

This study includes three chapters related to machine learning applications with focus on different empirical topics. The first chapter talks about a new method and its application. The second chapter focuses on young economics professors salary issues. While the third chapter discusses scientific paper publication values based on text analysis and gender bias. In the first Chapter, I give a discussion of Double/Debiased Machine Learning (DML) which is a causal estimation method recently created by Chernozhukov, Chetverikov, Demirer, Duo, Hansen, Newey, and Robins (2018) and apply it to an education empirical analysis. I explain why DML is practically useful and what it does; I also take a bootstrap procedure to improve the built-in DML standard errors in the curriculum adoption application. As an extension to the existing studies on how curriculum materials affect student achievement, my work compares the results of DML, kernel matching, and ordinary least squares (OLS). In my study, the DML estimators avoid the possible misspecification bias of linear models and obtain statistically significant results that improve upon the kernel matching results. In the second chapter, we analyze the effects of gender, PhD graduation school rank, and undergraduate major on young economics professors' salaries. The dataset used is novel, containing detailed and time-varying research productivity measures and other demographic information of young economics professors from 28 of the top 50 public research universities in the United States. We apply double/debiased machine learning (DML) to obtain consistent estimators under the high-dimensional control variable set. By tracking the first 10 years of their professional work experience, we find that there barely exist effects on young faculties' salaries from the above three factors in most of the experience years. However, the gender effect on salary in experience year 7 is both statistically significant and economically significant (large enough in magnitude to have a practical meaning). In experience years 5 to 7, which are also near most faculties' promotion years, the gender effects are obvious. For both PhD graduation school rank and undergraduate major, the estimates for experience years 7 to 9 are large in magnitude; however they do not possess statistical significance. Overall, the effects tend to expand with years of experience. We also discuss possible economic mechanisms and reasons. In the third chapter, we build machine learning and simple linear models to predict academic paper publication outcomes as measured by journal H-indices, and we discuss the gender bias associated with these outcomes. We use a novel dataset with paper text content and each paper's associated H-index, authors' genders, and other information, collected from recently published economics journals. We apply term frequency-inverse document frequency vectorization and other Natural Language Processing (NLP) tools to transfer text content into numerical values as model inputs. We find that when using paper text content to predict an H-index, the prediction power is around 60 [percent] in our classification model (4 tiers) and the root mean squared error is around 44 in our regression model. Moreover, when controlling for paper text, the gender causal effect hardly exists. As long as the paper contains similar text, gender does not influence the change in H-index. Additionally, we give real-world meanings associated with the models.


2021 ◽  
Vol 12 (2) ◽  
pp. 136
Author(s):  
Arnan Dwika Diasmara ◽  
Aditya Wikan Mahastama ◽  
Antonius Rachmat Chrismanto

Abstract. Intelligent System of the Battle of Honor Board Game with Decision Making and Machine Learning. The Battle of Honor is a board game where 2 players face each other to bring down their opponent's flag. This game requires a third party to act as the referee because the players cannot see each other's pawns during the game. The solution to this is to implement Rule-Based Systems (RBS) on a system developed with Unity to support the referee's role in making decisions based on the rules of the game. Researchers also develop Artificial Intelligence (AI) as opposed to applying Case-Based reasoning (CBR). The application of CBR is supported by the nearest neighbor algorithm to find cases that have a high degree of similarity. In the basic test, the results of the CBR test were obtained with the highest formulated accuracy of the 3 examiners, namely 97.101%. In testing the AI scenario as a referee, it is analyzed through colliding pieces and gives the right decision in determining victoryKeywords: The Battle of Honor, CBR, RBS, unity, AIAbstrak. The Battle of Honor merupakan permainan papan dimana 2 pemain saling berhadapan untuk menjatuhkan bendera lawannya. Permainan ini membutuhkan pihak ketiga yang berperan sebagai wasit karena pemain yang saling berhadapan tidak dapat saling melihat bidak lawannya. Solusi dari hal tersebut yaitu mengimplementasikan Rule-Based Systems (RBS) pada sistem yang dikembangkan dengan Unity untuk mendukung peran wasit dalam memberikan keputusan berdasarkan aturan permainan. Peneliti juga mengembangkan Artificial Intelligence (AI) sebagai lawan dengan menerapkan Case-Based reasoning (CBR). Penerapan CBR didukung dengan algoritma nearest neighbour untuk mencari kasus yang memiliki tingkat kemiripan yang tinggi. Pada pengujian dasar didapatkan hasil uji CBR dengan accuracy yang dirumuskan tertinggi dari 3 penguji yaitu 97,101%. Pada pengujian skenario AI sebagai wasit dianalisis lewat bidak yang bertabrakan dan memberikan keputusan yang tepat dalam menentukan kemenangan.Kata Kunci: The Battle of Honor, CBR, RBS, unity, AI


2021 ◽  
Vol 70 ◽  
pp. 409-472
Author(s):  
Marc-André Zöller ◽  
Marco F. Huber

Machine learning (ML) has become a vital part in many aspects of our daily life. However, building well performing machine learning applications requires highly specialized data scientists and domain experts. Automated machine learning (AutoML) aims to reduce the demand for data scientists by enabling domain experts to build machine learning applications automatically without extensive knowledge of statistics and machine learning. This paper is a combination of a survey on current AutoML methods and a benchmark of popular AutoML frameworks on real data sets. Driven by the selected frameworks for evaluation, we summarize and review important AutoML techniques and methods concerning every step in building an ML pipeline. The selected AutoML frameworks are evaluated on 137 data sets from established AutoML benchmark suites.


Author(s):  
Dimitris Korobilis ◽  
Davide Pettenuzzo

Bayesian inference in economics is primarily perceived as a methodology for cases where the data are short, that is, not informative enough in order to be able to obtain reliable econometric estimates of quantities of interest. In these cases, prior beliefs, such as the experience of the decision-maker or results from economic theory, can be explicitly incorporated to the econometric estimation problem and enhance the desired solution. In contrast, in fields such as computing science and signal processing, Bayesian inference and computation have long been used for tackling challenges associated with ultra high-dimensional data. Such fields have developed several novel Bayesian algorithms that have gradually been established in mainstream statistics, and they now have a prominent position in machine learning applications in numerous disciplines. While traditional Bayesian algorithms are powerful enough to allow for estimation of very complex problems (for instance, nonlinear dynamic stochastic general equilibrium models), they are not able to cope computationally with the demands of rapidly increasing economic data sets. Bayesian machine learning algorithms are able to provide rigorous and computationally feasible solutions to various high-dimensional econometric problems, thus supporting modern decision-making in a timely manner.


2018 ◽  
Vol 7 (1.7) ◽  
pp. 201
Author(s):  
K Jayanthi ◽  
C Mahesh

Machine learning enables computers to help humans in analysing knowledge from large, complex data sets. One of the complex data is genetics and genomic data which needs to analyse various set of functions automatically by the computers. Hope this machine learning methods can provide more useful for making these data for further usage like gene prediction, gene expression, gene ontology, gene finding, gene editing and etc. The purpose of this study is to explore some machine learning applications and algorithms to genetic and genomic data. At the end of this study we conclude the following topics classifications of machine learning problems: supervised, unsupervised and semi supervised, which type of method is suitable for various problems in genomics, applications of machine learning and future views of machine learning in genomics.


2021 ◽  
Vol 11 (6) ◽  
pp. 2823
Author(s):  
Francisco Florez-Revuelta

This paper presents a new evolutionary approach, EvoSplit, for the distribution of multi-label data sets into disjoint subsets for supervised machine learning. Currently, data set providers either divide a data set randomly or using iterative stratification, a method that aims to maintain the label (or label pair) distribution of the original data set into the different subsets. Following the same aim, this paper first introduces a single-objective evolutionary approach that tries to obtain a split that maximizes the similarity between those distributions independently. Second, a new multi-objective evolutionary algorithm is presented to maximize the similarity considering simultaneously both distributions (labels and label pairs). Both approaches are validated using well-known multi-label data sets as well as large image data sets currently used in computer vision and machine learning applications. EvoSplit improves the splitting of a data set in comparison to the iterative stratification following different measures: Label Distribution, Label Pair Distribution, Examples Distribution, folds and fold-label pairs with zero positive examples.


2018 ◽  
Vol 36 (3) ◽  
pp. 700
Author(s):  
Tiago Peres da Silva SUGUIURA ◽  
Omar Cléo Neves PEREIRA ◽  
Waenya Fernandez de CARVALHO ◽  
Isolde Terezinha Santos PREVIDELLI

Data sets with complex structures is increasingly common in dental research. As consequences, statistical  methods to analyze and interpret these data must be efficient and robust. Hierarchical structures is one of  the most common kind of complex structures, and a proper approach is required. The multilevel modeling used to study hierarchical structures is a powerful tool which allows the collected data to be  analyzes in several levels. This study has as objective to make a literature review on multilevel linear models and to illustrate a three level model through a matrix procedure, without the use of specific software to estimate the parameters. With this model, we analyzed the vertical gingival retraction when using the substances: Naphazoline Chloridrate, Aluminium Chloride and without any substance. The intraclass correlation coefficient on dental level within patients showed that the hierarchical structure was important to accommodate the dependence within clusters.


Challenges ◽  
2021 ◽  
Vol 12 (1) ◽  
pp. 2
Author(s):  
Tilman Klaeger ◽  
Sebastian Gottschall ◽  
Lukas Oehm

Much research is done on data analytics and machine learning for data coming from industrial processes. In practical approaches, one finds many pitfalls restraining the application of these modern technologies especially in brownfield applications. With this paper, we want to show state of the art and what to expect when working with stock machines in the field. The paper is a review of literature found to cover challenges for cyber-physical production systems (CPPS) in brownfield applications. This review is combined with our own personal experience and findings gained while setting up such systems in processing and packaging machines as well as in other areas. A major focus in this paper is on data collection, which tends be more cumbersome than most people might expect. In addition, data quality for machine learning applications is a challenge once leaving the laboratory and its academic data sets. Topics here include missing ground truth or the lack of semantic description of the data. A last challenge covered is IT security and passing data through firewalls to allow for the cyber part in CPPS. However, all of these findings show that potentials of data driven production systems are strongly depending on data collection to build proclaimed new automation systems with more flexibility, improved human–machine interaction and better process-stability and thus less waste during manufacturing.


2020 ◽  
Author(s):  
Milena Čukić ◽  
Victoria López ◽  
Juan Pavón

BACKGROUND Machine learning applications in health care have increased considerably in the recent past, and this review focuses on an important application in psychiatry related to the detection of depression. Since the advent of computational psychiatry, research based on functional magnetic resonance imaging has yielded remarkable results, but these tools tend to be too expensive for everyday clinical use. OBJECTIVE This review focuses on an affordable data-driven approach based on electroencephalographic recordings. Web-based applications via public or private cloud-based platforms would be a logical next step. We aim to compare several different approaches to the detection of depression from electroencephalographic recordings using various features and machine learning models. METHODS To detect depression, we reviewed published detection studies based on resting-state electroencephalogram with final machine learning, and to predict therapy outcomes, we reviewed a set of interventional studies using some form of stimulation in their methodology. RESULTS We reviewed 14 detection studies and 12 interventional studies published between 2008 and 2019. As direct comparison was not possible due to the large diversity of theoretical approaches and methods used, we compared them based on the steps in analysis and accuracies yielded. In addition, we compared possible drawbacks in terms of sample size, feature extraction, feature selection, classification, internal and external validation, and possible unwarranted optimism and reproducibility. In addition, we suggested desirable practices to avoid misinterpretation of results and optimism. CONCLUSIONS This review shows the need for larger data sets and more systematic procedures to improve the use of the solution for clinical diagnostics. Therefore, regulation of the pipeline and standard requirements for methodology used should become mandatory to increase the reliability and accuracy of the complete methodology for it to be translated to modern psychiatry. CLINICALTRIAL


Sign in / Sign up

Export Citation Format

Share Document