scholarly journals The Distribution Patterns of Valency-changing Verbs: An Approach of Quantitative Linguistics

2021 ◽  
Vol 1 (2) ◽  
pp. 43-51
Author(s):  
Da Qi ◽  
Hua Wang

The present study attempts to explore the distribution patterns of the valency-changing verbs from the perspective of quantitative linguistics. We took authentic spoken language data as the research materials. The corpus used in this paper is a self-built spoken English corpus containing about 21,000 words. We half-manually annotated the corpus with the help of SpaCy, a natural language processing tool. According to the annotation results and statistical data, we obtained a total of 217 valency-changing English verbs and 248 sentence components governed by them. After analysis, the current study came to the following conclusions: First, bivalent verbs are most frequent among the three types of valency-changing verbs; second, after fitting all the language data to different probability distributions, we found that the rank-frequency distributions of all the valency-changing English verbs with different numbers of obligatory arguments obey the power law, and the frequencies of bivalent valency-changing verbs obey other kinds of distributions such as the mixed Poisson distribution.

2020 ◽  
Vol 6 (8(77)) ◽  
pp. 13-17
Author(s):  
Azimkhan Kurmankozhayev ◽  
Elmira Seilbekovna Yesbergenova

Presented the results of evaluation of structural connection, identity and interchangeability of main asymmetric types of theoretical distributions most often acceptable for assessing the distributions of various indicators in geology and technology. The method of empirical analysis and statistical inference was used with the involvement of nonparametric facts according to the distribution patterns. The analysis of the empirical results of the application of the lognormal, gamma distribution and the Weibull distribution with the involvement of extensive statistical data from literary and research sources is carried out. The characteristic features and statistical regularities of distributions inherent to them are revealed, estimated statistical conclusions are obtained, according to which structural relationships between the functions of the lognormal, gamma and Weibull distributions are revealed. The identity and authenticity of the development of probabilistic frequencies in their application have been established, the complex geometric "image" of asymmetry inherent to these types of distributions is generalized. Structural relationships and interchangeability of asymmetric types of distributions are recommended to increase the reliability and credibility of the estimated choice of distribution in conditions of uncertainty and insignificance of statistical data when solving problems associated with forecasts, technological and computer developments.


2021 ◽  
Vol 21 (2) ◽  
pp. 1-25
Author(s):  
Pin Ni ◽  
Yuming Li ◽  
Gangmin Li ◽  
Victor Chang

Cyber-Physical Systems (CPS), as a multi-dimensional complex system that connects the physical world and the cyber world, has a strong demand for processing large amounts of heterogeneous data. These tasks also include Natural Language Inference (NLI) tasks based on text from different sources. However, the current research on natural language processing in CPS does not involve exploration in this field. Therefore, this study proposes a Siamese Network structure that combines Stacked Residual Long Short-Term Memory (bidirectional) with the Attention mechanism and Capsule Network for the NLI module in CPS, which is used to infer the relationship between text/language data from different sources. This model is mainly used to implement NLI tasks and conduct a detailed evaluation in three main NLI benchmarks as the basic semantic understanding module in CPS. Comparative experiments prove that the proposed method achieves competitive performance, has a certain generalization ability, and can balance the performance and the number of trained parameters.


Entropy ◽  
2021 ◽  
Vol 23 (7) ◽  
pp. 878
Author(s):  
C. T. J. Dodson ◽  
John Soldera ◽  
Jacob Scharcanski

Secure user access to devices and datasets is widely enabled by fingerprint or face recognition. Organization of the necessarily large secure digital object datasets, with objects having content that may consist of images, text, video or audio, involves efficient classification and feature retrieval processing. This usually will require multidimensional methods applicable to data that is represented through a family of probability distributions. Then information geometry is an appropriate context in which to provide for such analytic work, whether with maximum likelihood fitted distributions or empirical frequency distributions. The important provision is of a natural geometric measure structure on families of probability distributions by representing them as Riemannian manifolds. Then the distributions are points lying in this geometrical manifold, different features can be identified and dissimilarities computed, so that neighbourhoods of objects nearby a given example object can be constructed. This can reveal clustering and projections onto smaller eigen-subspaces which can make comparisons easier to interpret. Geodesic distances can be used as a natural dissimilarity metric applied over data described by probability distributions. Exploring this property, we propose a new face recognition method which scores dissimilarities between face images by multiplying geodesic distance approximations between 3-variate RGB Gaussians representative of colour face images, and also obtaining joint probabilities. The experimental results show that this new method is more successful in recognition rates than published comparative state-of-the-art methods.


2021 ◽  
Author(s):  
Vidya Samadi ◽  
Rakshit Pally

<p>Floods are among the most destructive natural hazard that affect millions of people across the world leading to severe loss of life and damage to property, critical infrastructure, and agriculture. Internet of Things (IoTs), machine learning (ML), and Big Data are exceptionally valuable tools for collecting the catastrophic readiness and countless actionable data. The aim of this presentation is to introduce Flood Analytics Information System (FAIS) as a data gathering and analytics system.  FAIS application is smartly designed to integrate crowd intelligence, ML, and natural language processing of tweets to provide warning with the aim to improve flood situational awareness and risk assessment. FAIS has been Beta tested during major hurricane events in US where successive storms made extensive damage and disruption. The prototype successfully identifies a dynamic set of at-risk locations/communities using the USGS river gauge height readings and geotagged tweets intersected with watershed boundary. The list of prioritized locations can be updated, as the river monitoring system and condition change over time (typically every 15 minutes).  The prototype also performs flood frequency analysis (FFA) using various probability distributions with the associated uncertainty estimation to assist engineers in designing safe structures. This presentation will discuss about the FAIS functionalities and real-time implementation of the prototype across south and southeast USA. This research is funded by the US National Science Foundation (NSF).</p>


2021 ◽  
Author(s):  
Xinxu Shen ◽  
Troy Houser ◽  
David Victor Smith ◽  
Vishnu P. Murty

The use of naturalistic stimuli, such as narrative movies, is gaining popularity in many fields, characterizing memory, affect, and decision-making. Narrative recall paradigms are often used to capture the complexity and richness of memory for naturalistic events. However, scoring narrative recalls is time-consuming and prone to human biases. Here, we show the validity and reliability of using a natural language processing tool, the Universal Sentence Encoder (USE), to automatically score narrative recall. We compared the reliability in scoring made between two independent raters (i.e., hand-scored) and between our automated algorithm and individual raters (i.e., automated) on trial-unique, video clips of magic tricks. Study 1 showed that our automated segmentation approaches yielded high reliability and reflected measures yielded by hand-scoring, and further that the results using USE outperformed another popular natural language processing tool, GloVe. In study two, we tested whether our automated approach remained valid when testing individual’s varying on clinically-relevant dimensions that influence episodic memory, age and anxiety. We found that our automated approach was equally reliable across both age groups and anxiety groups, which shows the efficacy of our approach to assess narrative recall in large-scale individual difference analysis. In sum, these findings suggested that machine learning approaches implementing USE are a promising tool for scoring large-scale narrative recalls and perform individual difference analysis for research using naturalistic stimuli.


Author(s):  
Ángela Almela ◽  
Gema Alcaraz-Mármol ◽  
Arancha García-Pinar ◽  
Clara Pallejá

In this paper, the methods for developing a database of Spanish writing that can be used for forensic linguistic research are presented, including our data collection procedures. Specifically, the main instrument used for data collection has been translated into Spanish and adapted from Chaski (2001). It consists of ten tasks, by means of which the subjects are asked to write formal and informal texts about different topics. To date, 93 undergraduates from Spanish universities have already participated in the study and prisoners convicted of gender-based abuse have participated. A twofold analysis has been performed, since the data collected have been approached from a semantic and a morphosyntactic perspective. Regarding the semantic analysis, psycholinguistic categories have been used, many of them taken from the LIWC dictionary (Pennebaker et al., 2001). In order to obtain a more comprehensive depiction of the linguistic data, some other ad-hoc categories have been created, based on the corpus itself, using a double-check method for their validation so as to ensure inter-rater reliability. Furthermore, as regards morphosyntactic analysis, the natural language processing tool ALIAS TATTLER is being developed for Spanish.  Results shows that is it possible to differentiate non-abusers from abusers with strong accuracy based on linguistic features.


2019 ◽  
Vol 9 (2) ◽  
pp. 103-115
Author(s):  
Ľuboš GAJDOŠ

The paper deals with corpus analysis of negation in Chinese, namely the negatives bù 不 and méi/ méiyǒu没/没有. The adverbs BU and MEI are two of the most frequent negatives in Chinese. The aim of this study is to present statistical data together with linguistics analysis. The results provide empirical evidence of discrepancy between “authentic” language data versus linguistic prescription with practical implications for second-language acquisition. The findings inter alia suggest a new approach to verb categorisation.  


Sign in / Sign up

Export Citation Format

Share Document