scholarly journals Size Matters: The Impact of Training Size in Taxonomically-Enriched Word Embeddings

2019 ◽  
Vol 9 (1) ◽  
pp. 252-267
Author(s):  
Alfredo Maldonado ◽  
Filip Klubička ◽  
John Kelleher

AbstractWord embeddings trained on natural corpora (e.g., newspaper collections, Wikipedia or the Web) excel in capturing thematic similarity (“topical relatedness”) on word pairs such as ‘coffee’ and ‘cup’ or ’bus’ and ‘road’. However, they are less successful on pairs showing taxonomic similarity, like ‘cup’ and ‘mug’ (near synonyms) or ‘bus’ and ‘train’ (types of public transport). Moreover, purely taxonomy-based embeddings (e.g. those trained on a random-walk of WordNet’s structure) outperform natural-corpus embeddings in taxonomic similarity but underperform them in thematic similarity. Previous work suggests that performance gains in both types of similarity can be achieved by enriching natural-corpus embeddings with taxonomic information from taxonomies like Word-Net. This taxonomic enrichment can be done by combining natural-corpus embeddings with taxonomic embeddings (e.g. those trained on a random-walk of WordNet’s structure). This paper conducts a deep analysis of this assumption and shows that both the size of the natural corpus and of the random-walk coverage of the WordNet structure play a crucial role in the performance of combined (enriched) vectors in both similarity tasks. Specifically, we show that embeddings trained on medium-sized natural corpora benefit the most from taxonomic enrichment whilst embeddings trained on large natural corpora only benefit from this enrichment when evaluated on taxonomic similarity tasks. The implication of this is that care has to be taken in controlling the size of the natural corpus and the size of the random-walk used to train vectors. In addition, we find that, whilst the WordNet structure is finite and it is possible to fully traverse it in a single pass, the repetition of well-connected WordNet concepts in extended random-walks effectively reinforces taxonomic relations in the learned embeddings.

Author(s):  
Junhao Guo ◽  
Zikai wu

Uncovering the impact of special phenomena on dynamical processes in more distinct weighted network models is still needed. In this paper, we investigate the impact of delay phenomenon on random walk by introducing delayed random walk into a family of weighted m-triangulation networks. Specifically, we introduce delayed random walk into the networks. Then one and three traps are deployed, respectively, on the networks in two rounds of investigation. In both rounds of investigation, average trapping time (ATT) is applied to measure trapping efficiency and derived analytically by harnessing iteration rule of the networks. The analytical solutions of ATT obtained in both investigations show that ATT increases sub-linearity with the size of the network no matter what value the parameter [Formula: see text] manipulating delayed random walk takes. But [Formula: see text] can quantitatively change both its leading scaling and prefactor. So, introduction of delay phenomenon can control trapping efficiency quantitatively. Besides, parameters [Formula: see text] and [Formula: see text] governing networks’ evolution quantitatively impact both the prefactor and leading scaling of ATT simultaneously. In summary, this work may provide incremental insight into understanding the impact of observed phenomena on special trapping process and general random walks in complex systems.


2021 ◽  
pp. 2150369
Author(s):  
Zikai Wu ◽  
Guangyao Xu

In this paper, we put forward a class of weighted extended tree-like fractals and further use them as test bed to unveil the impact of weight heterogeneity on random walks. Specifically, a family of weighted extended tree-like fractals are first proposed, which are parameterized by a growth parameter [Formula: see text] and weight parameter [Formula: see text]. Then, we explore standard weight-dependent walk on the networks by deploying three traps at initial three nodes. To this end, we derive analytically the average trapping time (ATT) to measure the trapping efficiency and the obtained results show that depending on values of [Formula: see text], ATT may grow sub-linearly, linearly and super-linearly with the network size. Besides, it can also quantitatively impact the leading behavior and pre-factor of ATT simultaneously. Finally, more challenging mixed weight-dependent random walk that takes non-nearest-neighbor hopping is addressed. Analytical solutions of ATT derived under this new scenario imply that weight parameter [Formula: see text] still can qualitatively, quantitatively steer leading behavior and quantitatively affect pre-factor of ATT. As to the stochastic parameter [Formula: see text] controlling mixed random walk, it could only impact the pre-factor of ATT and only have negligible effect on the leading behavior of ATT. In summary, this work could further augment our understanding of random walks on networks.


Energies ◽  
2021 ◽  
Vol 14 (4) ◽  
pp. 878 ◽  
Author(s):  
Oliwia Pietrzak ◽  
Krystian Pietrzak

This paper focuses on effects of implementing zero-emission buses in public transport fleets in urban areas in the context of electromobility assumptions. It fills the literature gap in the area of research on the impact of the energy mix of a given country on the issues raised in this article. The main purpose of this paper is to identify and analyse economic effects of implementing zero-emission buses in public transport in cities. The research area was the city of Szczecin, Poland. The research study was completed using the following research methods: literature review, document analysis (legal acts and internal documents), case study, ratio analysis, and comparative analysis of selected variants (investment variant and base variant). The conducted research study has shown that economic benefits resulting from implementing zero-emission buses in an urban transport fleet are limited by the current energy mix structure of the given country. An unfavourable energy mix may lead to increased emissions of SO2 and CO2 resulting from operation of this kind of vehicle. Therefore, achieving full effects in the field of electromobility in the given country depends on taking concurrent actions in order to diversify the power generation sources, and in particular on increasing the share of Renewable Energy Sources (RES).


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Yahya Albalawi ◽  
Jim Buckley ◽  
Nikola S. Nikolov

AbstractThis paper presents a comprehensive evaluation of data pre-processing and word embedding techniques in the context of Arabic document classification in the domain of health-related communication on social media. We evaluate 26 text pre-processings applied to Arabic tweets within the process of training a classifier to identify health-related tweets. For this task we use the (traditional) machine learning classifiers KNN, SVM, Multinomial NB and Logistic Regression. Furthermore, we report experimental results with the deep learning architectures BLSTM and CNN for the same text classification problem. Since word embeddings are more typically used as the input layer in deep networks, in the deep learning experiments we evaluate several state-of-the-art pre-trained word embeddings with the same text pre-processing applied. To achieve these goals, we use two data sets: one for both training and testing, and another for testing the generality of our models only. Our results point to the conclusion that only four out of the 26 pre-processings improve the classification accuracy significantly. For the first data set of Arabic tweets, we found that Mazajak CBOW pre-trained word embeddings as the input to a BLSTM deep network led to the most accurate classifier with F1 score of 89.7%. For the second data set, Mazajak Skip-Gram pre-trained word embeddings as the input to BLSTM led to the most accurate model with F1 score of 75.2% and accuracy of 90.7% compared to F1 score of 90.8% achieved by Mazajak CBOW for the same architecture but with lower accuracy of 70.89%. Our results also show that the performance of the best of the traditional classifier we trained is comparable to the deep learning methods on the first dataset, but significantly worse on the second dataset.


2021 ◽  
Vol 13 (1) ◽  
Author(s):  
Viktoriya Kolarova ◽  
Christine Eisenmann ◽  
Claudia Nobis ◽  
Christian Winkler ◽  
Barbara Lenz

Abstract Introduction The global Coronavirus (COVID-19) pandemic is having a great impact on all areas of the everyday life, including travel behaviour. Various measures that focus on restricting social contacts have been implemented in order to reduce the spread of the virus. Understanding how daily activities and travel behaviour change during such global crisis and the reasons behind is crucial for developing suitable strategies for similar future events and analysing potential mid- and long-term impacts. Methods In order to provide empirical insights into changes in travel behaviour during the first Coronavirus-related lockdown in 2020 for Germany, an online survey with a relative representative sample for the German population was conducted a week after the start of the nationwide contact ban. The data was analysed performing descriptive and inferential statistical analyses. Results and Discussion The results suggest in general an increase in car use and decrease in public transport use as well as more negative perception of public transport as a transport alternative during the pandemic. Regarding activity-related travel patterns, the findings show firstly, that the majority of people go less frequent shopping; simultaneously, an increase in online shopping can be seen and characteristics of this group were analysed. Secondly, half of the adult population still left their home for leisure or to run errands; young adults were more active than all other age groups. Thirdly, the majority of the working population still went to work; one out of four people worked in home-office. Lastly, potential implications for travel behaviour and activity patterns as well as policy measures are discussed.


Mathematics ◽  
2021 ◽  
Vol 9 (10) ◽  
pp. 1148
Author(s):  
Jewgeni H. Dshalalow ◽  
Ryan T. White

In a classical random walk model, a walker moves through a deterministic d-dimensional integer lattice in one step at a time, without drifting in any direction. In a more advanced setting, a walker randomly moves over a randomly configured (non equidistant) lattice jumping a random number of steps. In some further variants, there is a limited access walker’s moves. That is, the walker’s movements are not available in real time. Instead, the observations are limited to some random epochs resulting in a delayed information about the real-time position of the walker, its escape time, and location outside a bounded subset of the real space. In this case we target the virtual first passage (or escape) time. Thus, unlike standard random walk problems, rather than crossing the boundary, we deal with the walker’s escape location arbitrarily distant from the boundary. In this paper, we give a short historical background on random walk, discuss various directions in the development of random walk theory, and survey most of our results obtained in the last 25–30 years, including the very recent ones dated 2020–21. Among different applications of such random walks, we discuss stock markets, stochastic networks, games, and queueing.


2021 ◽  
Vol 11 (10) ◽  
pp. 4703
Author(s):  
Renato Andara ◽  
Jesús Ortego-Osa ◽  
Melva Inés Gómez-Caicedo ◽  
Rodrigo Ramírez-Pisco ◽  
Luis Manuel Navas-Gracia ◽  
...  

This comparative study analyzes the impact of the COVID-19 pandemic on motorized mobility in eight large cities of five Latin American countries. Public institutions and private organizations have made public data available for a better understanding of the contagion process of the pandemic, its impact, and the effectiveness of the implemented health control measures. In this research, data from the IDB Invest Dashboard were used for traffic congestion as well as data from the Moovit© public transport platform. For the daily cases of COVID-19 contagion, those published by Johns Hopkins Hospital University were used. The analysis period corresponds from 9 March to 30 September 2020, approximately seven months. For each city, a descriptive statistical analysis of the loss and subsequent recovery of motorized mobility was carried out, evaluated in terms of traffic congestion and urban transport through the corresponding regression models. The recovery of traffic congestion occurs earlier and faster than that of urban transport since the latter depends on the control measures imposed in each city. Public transportation does not appear to have been a determining factor in the spread of the pandemic in Latin American cities.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
E. Aidman ◽  
M. Balin ◽  
K. Johnson ◽  
S. Jackson ◽  
G. M. Paech ◽  
...  

AbstractCaffeine is widely used to promote alertness and cognitive performance under challenging conditions, such as sleep loss. Non-digestive modes of delivery typically reduce variability of its effect. In a placebo-controlled, 50-h total sleep deprivation (TSD) protocol we administered four 200 mg doses of caffeine-infused chewing-gum during night-time circadian trough and monitored participants' drowsiness during task performance with infra-red oculography. In addition to the expected reduction of sleepiness, caffeine was found to disrupt its degrading impact on performance errors in tasks ranging from standard cognitive tests to simulated driving. Real-time drowsiness data showed that caffeine produced only a modest reduction in sleepiness (compared to our placebo group) but substantial performance gains in vigilance and procedural decisions, that were largely independent of the actual alertness dynamics achieved. The magnitude of this disrupting effect was greater for more complex cognitive tasks.


2020 ◽  
Vol 4 (2) ◽  
pp. 5 ◽  
Author(s):  
Ioannis C. Drivas ◽  
Damianos P. Sakas ◽  
Georgios A. Giannakopoulos ◽  
Daphne Kyriaki-Manessi

In the Big Data era, search engine optimization deals with the encapsulation of datasets that are related to website performance in terms of architecture, content curation, and user behavior, with the purpose to convert them into actionable insights and improve visibility and findability on the Web. In this respect, big data analytics expands the opportunities for developing new methodological frameworks that are composed of valid, reliable, and consistent analytics that are practically useful to develop well-informed strategies for organic traffic optimization. In this paper, a novel methodology is implemented in order to increase organic search engine visits based on the impact of multiple SEO factors. In order to achieve this purpose, the authors examined 171 cultural heritage websites and their retrieved data analytics about their performance and user experience inside them. Massive amounts of Web-based collections are included and presented by cultural heritage organizations through their websites. Subsequently, users interact with these collections, producing behavioral analytics in a variety of different data types that come from multiple devices, with high velocity, in large volumes. Nevertheless, prior research efforts indicate that these massive cultural collections are difficult to browse while expressing low visibility and findability in the semantic Web era. Against this backdrop, this paper proposes the computational development of a search engine optimization (SEO) strategy that utilizes the generated big cultural data analytics and improves the visibility of cultural heritage websites. One step further, the statistical results of the study are integrated into a predictive model that is composed of two stages. First, a fuzzy cognitive mapping process is generated as an aggregated macro-level descriptive model. Secondly, a micro-level data-driven agent-based model follows up. The purpose of the model is to predict the most effective combinations of factors that achieve enhanced visibility and organic traffic on cultural heritage organizations’ websites. To this end, the study contributes to the knowledge expansion of researchers and practitioners in the big cultural analytics sector with the purpose to implement potential strategies for greater visibility and findability of cultural collections on the Web.


Sign in / Sign up

Export Citation Format

Share Document