tree structures
Recently Published Documents


TOTAL DOCUMENTS

612
(FIVE YEARS 116)

H-INDEX

34
(FIVE YEARS 5)

2021 ◽  
Vol 13 (4) ◽  
pp. 1-11
Author(s):  
Stuti Nayak ◽  
Amrapali Zaveri ◽  
Pedro Hernandez Serrano ◽  
Michel Dumontier

While there exists an abundance of open biomedical data, the lack of high-quality metadata makes it challenging for others to find relevant datasets and to reuse them for another purpose. In particular, metadata are useful to understand the nature and provenance of the data. A common approach to improving the quality of metadata relies on expensive human curation, which itself is time-consuming and also prone to error. Towards improving the quality of metadata, we use scientific publications to automatically predict metadata key:value pairs. For prediction, we use a Convolutional Neural Network (CNN) and a Bidirectional Long-short term memory network (BiLSTM). We focus our attention on the NCBI Disease Corpus, which is used for training the CNN and BiLSTM. We perform two different kinds of experiments with these two architectures: (1) we predict the disease names by using their unique ID in the MeSH ontology and (2) we use the tree structures of MeSH ontology to move up in the hierarchy of these disease terms, which reduces the number of labels. We also perform various multi-label classification techniques for the above-mentioned experiments. We find that in both cases CNN achieves the best results in predicting the superclasses for disease with an accuracy of 83%.


2021 ◽  
Author(s):  
◽  
Benjamin Evans

<p>Ensemble learning is one of the most powerful extensions for improving upon individual machine learning models. Rather than a single model being used, several models are trained and the predictions combined to make a more informed decision. Such combinations will ideally overcome the shortcomings of any individual member of the ensemble. Most ma- chine learning competition winners feature an ensemble of some sort, and there is also sound theoretical proof to the performance of certain ensem- bling schemes. The benefits of ensembling are clear in both theory and practice.  Despite the great performance, ensemble learning is not a trivial task. One of the main difficulties is designing appropriate ensembles. For exam- ple, how large should an ensemble be? What members should be included in an ensemble? How should these members be weighted? Our first contribution addresses these concerns using a strongly-typed population- based search (genetic programming) to construct well-performing ensem- bles, where the entire ensemble (members, hyperparameters, structure) is automatically learnt. The proposed method was found, in general, to be significantly better than all base members and commonly used compari- son methods trialled.  With automatically designed ensembles, there is a range of applica- tions, such as competition entries, forecasting and state-of-the-art predic- tions. However, often these applications also require additional prepro- cessing of the input data. Above the ensemble considers only the original training data, however, in many machine learning scenarios a pipeline is required (for example performing feature selection before classification). For the second contribution, a novel automated machine learning method is proposed based on ensemble learning. This method uses a random population-based search of appropriate tree structures, and as such is em- barrassingly parallel, an important consideration for automated machine learning. The proposed method is able to achieve equivalent or improved results over the current state-of-the-art methods and does so in a fraction of the time (six times as fast).  Finally, while complex ensembles offer great performance, one large limitation is the interpretability of such ensembles. For example, why does a forest of 500 trees predict a particular class for a given instance? In an effort to explain the behaviour of complex models (such as ensem- bles), several methods have been proposed. However, these approaches tend to suffer at least one of the following limitations: overly complex in the representation, local in their application, limited to particular fea- ture types (i.e. categorical only), or limited to particular algorithms. For our third contribution, a novel model agnostic method for interpreting complex black-box machine learning models is proposed. The method is based on strongly-typed genetic programming and overcomes the afore- mentioned limitations. Multi-objective optimisation is used to generate a Pareto frontier of simple and explainable models which approximate the behaviour of much more complex methods. We found the resulting rep- resentations are far simpler than existing approaches (an important con- sideration for interpretability) while providing equivalent reconstruction performance.  Overall, this thesis addresses two of the major limitations of existing ensemble learning, i.e. the complex construction process and the black- box models that are often difficult to interpret. A novel application of ensemble learning in the field of automated machine learning is also pro- posed. All three methods have shown at least equivalent or improved performance than existing methods.</p>


2021 ◽  
Author(s):  
◽  
Benjamin Evans

<p>Ensemble learning is one of the most powerful extensions for improving upon individual machine learning models. Rather than a single model being used, several models are trained and the predictions combined to make a more informed decision. Such combinations will ideally overcome the shortcomings of any individual member of the ensemble. Most ma- chine learning competition winners feature an ensemble of some sort, and there is also sound theoretical proof to the performance of certain ensem- bling schemes. The benefits of ensembling are clear in both theory and practice.  Despite the great performance, ensemble learning is not a trivial task. One of the main difficulties is designing appropriate ensembles. For exam- ple, how large should an ensemble be? What members should be included in an ensemble? How should these members be weighted? Our first contribution addresses these concerns using a strongly-typed population- based search (genetic programming) to construct well-performing ensem- bles, where the entire ensemble (members, hyperparameters, structure) is automatically learnt. The proposed method was found, in general, to be significantly better than all base members and commonly used compari- son methods trialled.  With automatically designed ensembles, there is a range of applica- tions, such as competition entries, forecasting and state-of-the-art predic- tions. However, often these applications also require additional prepro- cessing of the input data. Above the ensemble considers only the original training data, however, in many machine learning scenarios a pipeline is required (for example performing feature selection before classification). For the second contribution, a novel automated machine learning method is proposed based on ensemble learning. This method uses a random population-based search of appropriate tree structures, and as such is em- barrassingly parallel, an important consideration for automated machine learning. The proposed method is able to achieve equivalent or improved results over the current state-of-the-art methods and does so in a fraction of the time (six times as fast).  Finally, while complex ensembles offer great performance, one large limitation is the interpretability of such ensembles. For example, why does a forest of 500 trees predict a particular class for a given instance? In an effort to explain the behaviour of complex models (such as ensem- bles), several methods have been proposed. However, these approaches tend to suffer at least one of the following limitations: overly complex in the representation, local in their application, limited to particular fea- ture types (i.e. categorical only), or limited to particular algorithms. For our third contribution, a novel model agnostic method for interpreting complex black-box machine learning models is proposed. The method is based on strongly-typed genetic programming and overcomes the afore- mentioned limitations. Multi-objective optimisation is used to generate a Pareto frontier of simple and explainable models which approximate the behaviour of much more complex methods. We found the resulting rep- resentations are far simpler than existing approaches (an important con- sideration for interpretability) while providing equivalent reconstruction performance.  Overall, this thesis addresses two of the major limitations of existing ensemble learning, i.e. the complex construction process and the black- box models that are often difficult to interpret. A novel application of ensemble learning in the field of automated machine learning is also pro- posed. All three methods have shown at least equivalent or improved performance than existing methods.</p>


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Siham Mousa Alhaider

PurposeThis article studies the particle qad in standard Arabic (SA) and Asiri Arabic (AA). In SA, qad is pronounced as [qæd], whereas in AA it is pronounced as [q?d] and written as qid. Qad in SA is different from qid in AA regarding its functional use and syntactic distribution. Accordingly, the study discusses the semantics and selection properties of qad/qid.Design/methodology/approachContrasting analyses are presented to verify which syntactic analysis better suits extended projection principle (EPP) extension, and tree structures are provided to elucidate ongoing problematic configurations and to provide solutions.FindingsThe SA particle qad has three functions: (1) a probability modal, as in may or might; (2) a perfective auxiliary, as in have, has and had; and (3) indicating emphatic purpose, as in do, does and did. Contrariwise, qid in AA has two meanings: (1) have, has and had (perfective auxiliary); and (2) the past tense of the English copula was/became (a linking verb). Given this background, there has been a debate in the syntax literature about whether qid/qad is an adverb. The current article provides evidence indicating that qid and qad are not adverbs.Research limitations/implicationsThe study is limited to the analysis of qid in Asiri dialect. Further research needs to be done on the different branches of the Asiri dialects according to the tribe. Sometimes, tribes have different sound for some words. There is not any literature review found on the Asiri dialects in the designated area of study; the particle qid.Practical implicationsThe study can be counted towards the Asiri linguistic heritage in documenting the syntactic and semantic properties of qid particle. The study contributes to the linguistic field of the Arabic language and its varieties.Social implicationsThe study offers a general review of the linguistic background of Asir region. The study introduces the reader to qad particle in SA and holds a comparison between the two researched versions of qad in SA and qid in AA.Originality/valueThe paradoxical analysis between qad and qid on all levels is presented (semantics, functional use, selection properties and level of configuration (EPP)). Also, it introduces the particle qid in AA as it was never investigated before.


Mathematics ◽  
2021 ◽  
Vol 9 (23) ◽  
pp. 3054
Author(s):  
Hector Eduardo Roman ◽  
Fabrizio Croccolo

We discuss network models as a general and suitable framework for describing the spreading of an infectious disease within a population. We discuss two types of finite random structures as building blocks of the network, one based on percolation concepts and the second one on random tree structures. We study, as is done for the SIR model, the time evolution of the number of susceptible (S), infected (I) and recovered (R) individuals, in the presence of a spreading infectious disease, by incorporating a healing mechanism for infecteds. In addition, we discuss in detail the implementation of lockdowns and how to simulate them. For percolation clusters, we present numerical results based on site percolation on a square lattice, while for random trees we derive new analytical results, which are illustrated in detail with a few examples. It is argued that such hierarchical networks can complement the well-known SIR model in most circumstances. We illustrate these ideas by revisiting USA COVID-19 data.


2021 ◽  
Vol 2090 (1) ◽  
pp. 012085
Author(s):  
Nobutoshi Ikeda

Abstract Tree graphs such as Cayley trees provide a stage to support the self-organization of fractal networks by the flow of walkers from the root vertex to the outermost shell of the tree graph. This network model is a typical example that demonstrates the ability of a random process on a network to generate fractality. However, the finite scale of the tree structure assumed in the model restricts the size of fractal networks. In this study, we removed the restriction on the size of the trees by introducing a lifetime τ (number of steps of random walks) of walkers. As a result, we successfully induced a size-independent fractal structure on a tree graph without a boundary. Our numerical results show that the mean number of offspring d b of the original tree structure determines the value of the fractal box dimension db through the relation d b — 1 = (n b — 1) -θ . The lifetime τ controls the presence or absence of small-world and scale-free properties. The ideal fractal behaviour can be maintained by selecting an appropriate value of τ. The numerical results contribute to the development of a systematic method for generating fractal small-world and scale-free networks while controlling the value of the fractal box dimension. Unlike other models that use recursive rules to generate self-similar structures, this model specifically produces small-world fractal networks with scale-free properties.


2021 ◽  
Vol 929 (1) ◽  
pp. 012001
Author(s):  
E A Bataleva ◽  
K S Nepeina

Abstract Based on the analysis of deep geophysical (geoelectric and seismic) models of the Central Tien Shan, structures with the morphology resembling the crown of palm trees or the shape of a flower were identified. Geoelectric models are considered along a series of regional profiles (75º, 76º, 76º 30’). The length of the profiles intersecting all the main tectonic structures of the Tien Shan ranges from 75 to 250 km. Particular attention was paid to those zones of concentrated deformation, where the tectonic regime combines the conditions of shear and lateral compression (transpression zones). The structure of the collisional - accretionary wedge of the Atbashi zone in the distribution of electrical and velocity characteristics of the geological section is considered. Geoelectric models plotted along a series of regional profiles identify areas of increased electrical conductivity and show “flower structures”. The integral picture of the distribution and morphology of zones of increased electrical conductivity in the segments of the Earth’s crust of the Central Tien Shan may reflect a discretely localized manifestation of palm tree structures due to the evolution of transpressive suture zones during the Hercynian and Alpine tectogenesis.


2021 ◽  
Author(s):  
Jian Gao ◽  
Changgui Gu ◽  
Chuansheng Shen ◽  
Huijie Yang

Abstract Collective behaviors displaying a variety of fascinating movement patterns are thought to be products of complex interplay among individuals. Previous studies have proposed the hierarchical leadership networks and the coexistence of compromise and leadership in pigeon flocks, but these conclusions have not been confirmed by theoretical or modeling studies. Here, based on the same datasets, using a more reasonable research route, we found a more concise leadership structure in pigeon flocks. i.e., the tree structure, which was verified by our modeling studies. We showed that each individual may follow its only pilot (leader) during collective flights of pigeon flocks, and the only top leader of a certain flock determines the flight direction of the whole flock. Our results confirmed the leadership hypothesis, denying the illusion of compromise between individuals at the same level. The findings shed light on the hierarchical leadership structure in pigeon flocks and have implications for artificial collective systems, e.g., autonomous formation control of multiple unmanned aerial vehicles and unmanned surface vehicles.


Sign in / Sign up

Export Citation Format

Share Document