Automatic Term Recognition Method for Military Domain

Abstract With the development of military intelligence, higher requirements are put forward for automatic term recognition in military field. In view of the characteristics of flexible and diverse naming of military requirement documents without annotated corpus, the method of this paper uses the existing military domain core database, and matches the data set and core database by Aho-Corasic algorithm and word segmentation technology, so that the terms to be recognized in the data set can be divided into three types. The possible rules of word formation of military terms are summarized and phrases that conform to the rules of word formation are found in the documents as the term candidate set. The core library and TF-IDF method are used to calculate the value of the candidate terms, and the candidate terms whose value is greater than the threshold are selected iteratively as the real terms. The experimental results show that the F1 value of this method reaches 0.719, which is better than the traditional C-value method. Therefore, the method proposed in this paper can achieve better automatic term recognition effect for military requirement documents without annotation.

Download Full-text

An application and e aluation of the C/NC-value approach for the automatic term recognition of multi-word units in Japanese

Terminology ◽

10.1075/term.6.2.04mim ◽

2000 ◽

Vol 6 (2) ◽

pp. 175-194 ◽

Cited By ~ 15

Author(s):

Hideki Mima ◽

Sophia Ananiadou

Keyword(s):

Independent Method ◽

The Internet ◽

Recognition Method ◽

Knowledge Mining ◽

Independent Term ◽

Statistical Knowledge ◽

C Value ◽

Automatic Term Recognition ◽

Domain Independent ◽

Technical Terms

Technical terms are important for knowledge mining, especially as vast amounts of multi-lingual documents are available over the Internet. Thus, a domain and language-independent method for term recognition is necessary to automatically recognize terms from Internet documents. The C-/NC-value method is an efficient domain-independent multi-word term recognition method which combines linguistic and statistical knowledge. Although the C-value/NC-value method is originally based on the recognition of nested terms in English, our aim is to evaluate the application of the method to other languages and to show its feasibility for multi-language environments. In this article, we describe the application of the C/NC-value method to Japanese texts. Several experiments analysing the performance of the method using the NACSIS Japanese AI-domain corpus demonstrate that the method can be utilized to realize a practical domain-and language-independent term rec- ognition system.

Download Full-text

Evaluation for estimating of the PDF and the CDF of Generalized Inverted Exponential Distribution with Application in Industry

Advances in Mathematics: Scientific Journal ◽

10.37418/amsj.9.1.39 ◽

2020 ◽

pp. 507-522

Author(s):

Parisa Torkaman

Keyword(s):

Least Squares ◽

Exponential Distribution ◽

Mean Squared Error ◽

Weighted Least Squares ◽

Real Data ◽

Minimum Variance ◽

Cumulative Distribution ◽

Estimation Methods ◽

Data Set ◽

Better Than

The generalized inverted exponential distribution is introduced as a lifetime model with good statistical properties. This paper, the estimation of the probability density function and the cumulative distribution function of with five different estimation methods: uniformly minimum variance unbiased(UMVU), maximum likelihood(ML), least squares(LS), weighted least squares (WLS) and percentile(PC) estimators are considered. The performance of these estimation procedures, based on the mean squared error (MSE) by numerical simulations are compared. Simulation studies express that the UMVU estimator performs better than others and when the sample size is large enough the ML and UMVU estimators are almost equivalent and efficient than LS, WLS and PC. Finally, the result using a real data set are analyzed.

Download Full-text

A Computational Method for the Identification of Endolysins and Autolysins

Protein and Peptide Letters ◽

10.2174/0929866526666191002104735 ◽

2020 ◽

Vol 27 (4) ◽

pp. 329-336 ◽

Cited By ~ 1

Author(s):

Lei Xu ◽

Guangmin Liang ◽

Baowen Chen ◽

Xu Tan ◽

Huaikun Xiang ◽

...

Keyword(s):

Support Vector Machine ◽

Cell Wall ◽

Experimental Results ◽

Computational Method ◽

Lytic Enzyme ◽

Support Vector ◽

Lytic Enzymes ◽

Data Set ◽

Optimal Feature ◽

Better Than

Background: Cell lytic enzyme is a kind of highly evolved protein, which can destroy the cell structure and kill the bacteria. Compared with antibiotics, cell lytic enzyme will not cause serious problem of drug resistance of pathogenic bacteria. Thus, the study of cell wall lytic enzymes aims at finding an efficient way for curing bacteria infectious. Compared with using antibiotics, the problem of drug resistance becomes more serious. Therefore, it is a good choice for curing bacterial infections by using cell lytic enzymes. Cell lytic enzyme includes endolysin and autolysin and the difference between them is the purpose of the break of cell wall. The identification of the type of cell lytic enzymes is meaningful for the study of cell wall enzymes. Objective: In this article, our motivation is to predict the type of cell lytic enzyme. Cell lytic enzyme is helpful for killing bacteria, so it is meaningful for study the type of cell lytic enzyme. However, it is time consuming to detect the type of cell lytic enzyme by experimental methods. Thus, an efficient computational method for the type of cell lytic enzyme prediction is proposed in our work. Method: We propose a computational method for the prediction of endolysin and autolysin. First, a data set containing 27 endolysins and 41 autolysins is built. Then the protein is represented by tripeptides composition. The features are selected with larger confidence degree. At last, the classifier is trained by the labeled vectors based on support vector machine. The learned classifier is used to predict the type of cell lytic enzyme. Results: Following the proposed method, the experimental results show that the overall accuracy can attain 97.06%, when 44 features are selected. Compared with Ding's method, our method improves the overall accuracy by nearly 4.5% ((97.06-92.9)/92.9%). The performance of our proposed method is stable, when the selected feature number is from 40 to 70. The overall accuracy of tripeptides optimal feature set is 94.12%, and the overall accuracy of Chou's amphiphilic PseAAC method is 76.2%. The experimental results also demonstrate that the overall accuracy is improved by nearly 18% when using the tripeptides optimal feature set. Conclusion: The paper proposed an efficient method for identifying endolysin and autolysin. In this paper, support vector machine is used to predict the type of cell lytic enzyme. The experimental results show that the overall accuracy of the proposed method is 94.12%, which is better than some existing methods. In conclusion, the selected 44 features can improve the overall accuracy for identification of the type of cell lytic enzyme. Support vector machine performs better than other classifiers when using the selected feature set on the benchmark data set.

Download Full-text

The Core of the Global Corporate Network

Networks and Spatial Economics ◽

10.1007/s11067-021-09527-8 ◽

2021 ◽

Author(s):

Ricardo Giglio ◽

Thomas Lux

Keyword(s):

Network Topology ◽

World Wide ◽

Data Set ◽

The Core ◽

Comprehensive Data ◽

The World ◽

Corporate Network ◽

Board Membership ◽

National Networks ◽

Wide Population

AbstractWe investigate the network topology of a comprehensive data set of the world-wide population of corporate entities. In particular, we have extracted information on the boards of all companies listed in Bloomberg’s archive of company profiles in October, 2015, a total of almost 100,000 firms. We provide information on board membership overlaps at various levels, and, in particular, show that there exists a core of directors who accumulate a large number of seats and are highly connected among themselves both at the level of national networks and at the worldwide aggregated level.

Download Full-text

Applying Artificial Neural Networks. I. Estimating Nicotine in Tobacco from near Infrared Data

Journal of Near Infrared Spectroscopy ◽

10.1255/jnirs.64 ◽

1995 ◽

Vol 3 (3) ◽

pp. 133-142 ◽

Cited By ~ 10

Author(s):

M. Hana ◽

W.F. McClure ◽

T.B. Whitaker ◽

M. White ◽

D.R. Bahler

Keyword(s):

Linear Regression ◽

Regression Model ◽

Linear Regression Model ◽

Near Infrared ◽

Back Propagation ◽

Linear Network ◽

Data Set ◽

Input Layer ◽

Propagation Network ◽

Better Than

Two artificial neural network models were used to estimate the nicotine in tobacco: (i) a back-propagation network and (ii) a linear network. The back-propagation network consisted of an input layer, an output layer and one hidden layer. The linear network consisted of an input layer and an output layer. Both networks used the generalised delta rule for learning. Performances of both networks were compared to the multiple linear regression method MLR of calibration. The nicotine content in tobacco samples was estimated for two different data sets. Data set A contained 110 near infrared (NIR) spectra each consisting of reflected energy at eight wavelengths. Data set B consisted of 200 NIR spectra with each spectrum having 840 spectral data points. The Fast Fourier transformation was applied to data set B in order to compress each spectrum into 13 Fourier coefficients. For data set A, the linear regression model gave better results followed by the back-propagation network which was followed by the linear network. The true performance of the linear regression model was better than the back-propagation and the linear networks by 14.0% and 18.1%, respectively. For data set B, the back-propagation network gave the best result followed by MLR and the linear network. Both the linear network and MLR models gave almost the same results. The true performance of the back-propagation network model was better than the MLR and linear network by 35.14%.

Download Full-text

A Off-Line Stroke-Based Handwritten Word Segmentation and Recognition Method for Low-Quality Educational Videos

IEEE Sixth International Symposium on Multimedia Software Engineering ◽

10.1109/mmse.2004.16 ◽

2005 ◽

Cited By ~ 1

Author(s):

Lijun Tang ◽

J.R. Kender

Keyword(s):

Word Segmentation ◽

Recognition Method ◽

Educational Videos

Download Full-text

Beyond response output: More logical than we think

Behavioral and Brain Sciences ◽

10.1017/s0140525x09000326 ◽

2009 ◽

Vol 32 (1) ◽

pp. 87-88 ◽

Cited By ~ 4

Author(s):

Wim De Neys

Keyword(s):

Bayesian Model ◽

Response Selection ◽

Data Fitting ◽

Output Data ◽

The Core ◽

Response Output ◽

Exclusive Focus ◽

Better Than

AbstractOaksford & Chater (O&C) rely on a data fitting approach to show that a Bayesian model captures the core reasoning data better than its logicist rivals. The problem is that O&C's modeling has focused exclusively on response output data. I argue that this exclusive focus is biasing their conclusions. Recent studies that focused on the processes that resulted in the response selection are more positive for the role of logic.

Download Full-text

A Proposed Speaker Recognition Method B Based on Long-Term Voice Features and Fuzzy Logic

Engineering and Technology Journal ◽

10.30684/etj.v39i1b.343 ◽

2021 ◽

Vol 39 (1B) ◽

pp. 1-10

Author(s):

Iman H. Hadi ◽

Alia K. Abdul-Hassan

Keyword(s):

Fuzzy Logic ◽

Speaker Recognition ◽

Recognition Accuracy ◽

Inner Product ◽

Maximum Frequency ◽

Recognition Method ◽

Data Set ◽

Zero Crossing ◽

Zero Crossing Rate

Speaker recognition depends on specific predefined steps. The most important steps are feature extraction and features matching. In addition, the category of the speaker voice features has an impact on the recognition process. The proposed speaker recognition makes use of biometric (voice) attributes to recognize the identity of the speaker. The long-term features were used such that maximum frequency, pitch and zero crossing rate (ZCR). In features matching step, the fuzzy inner product was used between feature vectors to compute the matching value between a claimed speaker voice utterance and test voice utterances. The experiments implemented using (ELSDSR) data set. These experiments showed that the recognition accuracy is 100% when using text dependent speaker recognition.

Download Full-text

On self-neglect and safeguarding adult reviews: diminishing returns or adding value?

The Journal of Adult Protection ◽

10.1108/jap-11-2016-0028 ◽

2017 ◽

Vol 19 (2) ◽

pp. 53-66 ◽

Cited By ~ 7

Author(s):

Michael Preston-Shoot

Keyword(s):

Thematic Analysis ◽

Annual Reports ◽

National Database ◽

Service Improvement ◽

Policy And Practice ◽

Data Set ◽

Content Type ◽

Core Data ◽

The Core

Purpose The purpose of this paper is twofold: first, to update the core data set of self-neglect serious case reviews (SCRs) and safeguarding adult reviews (SARs), and accompanying thematic analysis; second, to respond to the critique in the Wood Report of SCRs commissioned by Local Safeguarding Children Boards (LSCBs) by exploring the degree to which the reviews scrutinised here can transform and improve the quality of adult safeguarding practice. Design/methodology/approach Further published reviews are added to the core data set from the websites of Safeguarding Adults Boards (SABs) and from contacts with SAB independent chairs and business managers. Thematic analysis is updated using the four domains employed previously. The findings are then further used to respond to the critique in the Wood Report of SCRs commissioned by LSCBs, with implications discussed for Safeguarding Adult Boards. Findings Thematic analysis within and recommendations from reviews have tended to focus on the micro context, namely, what takes place between individual practitioners, their teams and adults who self-neglect. This level of analysis enables an understanding of local geography. However, there are other wider systems that impact on and influence this work. If review findings and recommendations are to fully answer the question “why”, systemic analysis should appreciate the influence of national geography. Review findings and recommendations may also be used to contest the critique of reviews, namely, that they fail to engage practitioners, are insufficiently systemic and of variable quality, and generate repetitive findings from which lessons are not learned. Research limitations/implications There is still no national database of reviews commissioned by SABs so the data set reported here might be incomplete. The Care Act 2014 does not require publication of reports but only a summary of findings and recommendations in SAB annual reports. This makes learning for service improvement challenging. Reading the reviews reported here against the strands in the critique of SCRs enables conclusions to be reached about their potential to transform adult safeguarding policy and practice. Practical implications Answering the question “why” is a significant challenge for SARs. Different approaches have been recommended, some rooted in systems theory. The critique of SCRs challenges those now engaged in SARs to reflect on how transformational change can be achieved to improve the quality of adult safeguarding policy and practice. Originality/value The paper extends the thematic analysis of available reviews that focus on work with adults who self-neglect, further building on the evidence base for practice. The paper also contributes new perspectives to the process of conducting SARs by using the analysis of themes and recommendations within this data set to evaluate the critique that reviews are insufficiently systemic, fail to engage those involved in reviewed cases and in their repetitive conclusions demonstrate that lessons are not being learned.

Download Full-text

Fast effect size shrinkage software for beta-binomial models of allelic imbalance

F1000Research ◽

10.12688/f1000research.20916.2 ◽

2020 ◽

Vol 8 ◽

pp. 2024

Author(s):

Joshua P. Zitovsky ◽

Michael I. Love

Keyword(s):

Allelic Imbalance ◽

Real Data ◽

Shrinkage Estimators ◽

Data Set ◽

Bayesian Shrinkage ◽

In Cis ◽

Posterior Estimation ◽

Binomial Models ◽

Better Than ◽

Diploid Organism

Allelic imbalance occurs when the two alleles of a gene are differentially expressed within a diploid organism and can indicate important differences in cis-regulation and epigenetic state across the two chromosomes. Because of this, the ability to accurately quantify the proportion at which each allele of a gene is expressed is of great interest to researchers. This becomes challenging in the presence of small read counts and/or sample sizes, which can cause estimators for allelic expression proportions to have high variance. Investigators have traditionally dealt with this problem by filtering out genes with small counts and samples. However, this may inadvertently remove important genes that have truly large allelic imbalances. Another option is to use pseudocounts or Bayesian estimators to reduce the variance. To this end, we evaluated the accuracy of four different estimators, the latter two of which are Bayesian shrinkage estimators: maximum likelihood, adding a pseudocount to each allele, approximate posterior estimation of GLM coefficients (apeglm) and adaptive shrinkage (ash). We also wrote C++ code to quickly calculate ML and apeglm estimates and integrated it into the apeglm package. The four methods were evaluated on two simulations and one real data set. Apeglm consistently performed better than ML according to a variety of criteria, and generally outperformed use of pseudocounts as well. Ash also performed better than ML in one of the simulations, but in the other performance was more mixed. Finally, when compared to five other packages that also fit beta-binomial models, the apeglm package was substantially faster and more numerically reliable, making our package useful for quick and reliable analyses of allelic imbalance. Apeglm is available as an R/Bioconductor package at http://bioconductor.org/packages/apeglm.

Download Full-text