Fast similarity search on a large speech data set with neighborhood graph indexing

Author(s):  
Kazuo Aoyama ◽  
Shinji Watanabe ◽  
Hiroshi Sawada ◽  
Yasuhiro Minami ◽  
Naonori Ueda ◽  
...  
2020 ◽  
Vol 8 (2) ◽  
pp. 117-141
Author(s):  
Alberto Rodríguez Márquez

The objective of this paper is to describe the prosodic features of the final intonation contour of minor intonational phrases (ip) and the tonemes of major intonational phrases (IP) in Mexico City’s Spanish variety. The speech data was taken from a spontaneous speech corpus made from speakers from two social networks: neighborhood and labor. Final intonation contours of ip show a predominantly rising movement. These contours are generally produced with greater length in the last syllable of the ip, which represents the most significant difference between both networks in the case of oxitone endings. On the other hand, tonemes are predominantly descendant, although the circumflex accent has an important number of cases within the data set. Tonemes produced by the neighborhood network are produced with larger length than those from the labor network.


2019 ◽  
Vol 8 (4) ◽  
pp. 590
Author(s):  
Chhayarani Ram Kinkar ◽  
Yogendra Kumar Jain

Natural language processing is a very active area of research and development, there is not a single agreed upon a method that would satisfy everyone for the use of natural language to operate electronic devices or other practical applications. But there are some aspects used from many years in the formulation and solution of computational problem arising in natural language processing. This paper describes a model in which numerical values are assigned to word of natural language speech data set to convert the information present in natural language speech data set into an intermediate numeric form as a structured data set. The intermediated numerical values of each word will be used for generation of machine code which will be easily understand by electronic devices to draw inferences from data set. The designed model is useful for a number of practical applications and very simple to implement.  


2019 ◽  
Author(s):  
Andrew Dalke

<div>This paper describes the 10 years of work and research results of the chemfp project, available from http://chemfp.com/ . The project started as a way to promote the FPS format for cheminformatics fingerprint exchange. This is a line-oriented text format meant to be easy to read and write. It supports metadata such as the fingerprint type and data provenance.The chemfp package for Python was developed to provide the basic command-line tools and Python API for working with fingerprint data, because a format without useful tools will not be used. The similarity search performance improved by an order of magnitude over the decade, due to careful implementation and effective use of CPU hardware, including AVX2 support for faster popcount calculations than the built-in POPCNT instruction. The implementation details for high-performance search have rarely been discussed in the literature. As a result, many tools and published papers use implementations which are not close to the machine's capabilities. This paper describes those details to help with future optimization efforts. The most advanced version of chemfp evaluates about 130 million 1024-bit fingerprint Tanimotos per second on a single core of a standard x86-64 server machine. When combined with the BitBound algorithm, a k=1000 nearest-neighbor search of the 1.8 million 2048-bit Morgan fingerprints of ChEMBL 24 averages 27 ms/query and the same search of the 970 million PubChem fingerprints averages 220 ms/query, making chemfp one of the fastest similarity search tools available for CPUs. This appears to be several times faster than previously published work in the field, including in papers which use much more sophisticated data structures. A close analysis shows that nearly all earlier work assumes that the intersection popcount was the limiting performance factor, while on modern hardware uncompressed search is effectively memory bandwidth limited. For example, AVX2 search is 10% faster when memory prefetching, and the popcount evaluation time is far faster than fetching a random location in main memory. It proved difficult to evaluate existing tool performance because in the few cases where the tools were available, each used its own format, data sets, and search tasks. This paper introduces the chemfp benchmark data set to help make head-to-head comparisons easier in the future, and to help promote the FPS format. The FPS format is slow for tasks like web server reloads and command-line scripting. This paper also describes the FPB format, which is a binary application format for fast loads. </div>


2021 ◽  
Author(s):  
Bengt Ljungquist ◽  
Masood A Akram ◽  
Giorgio A Ascoli

Most functions of the nervous system depend on neuronal and glial morphology. Continuous advances in microscopic imaging and tracing software have provided an increasingly abundant availability of 3D reconstructions of arborizing dendrites, axons, and processes, allowing their detailed study. However, efficient, large-scale methods to rank neural morphologies by similarity to an archetype are still lacking. Using the NeuroMorpho.Org database, we present a similarity search software enabling fast morphological comparison of hundreds of thousands of neural reconstructions from any species, brain regions, cell types, and preparation protocols. We compared the performance of different morphological measurements: 1) summary morphometrics calculated by L-Measure, 2) persistence vectors, a vectorized descriptor of branching structure, 3) the combination of the two. In all cases, we also investigated the impact of applying dimensionality reduction using principal component analysis (PCA). We assessed qualitative performance by gauging the ability to rank neurons in order of visual similarity. Moreover, we quantified information content by examining explained variance and benchmarked the ability to identify occasional duplicate reconstructions of the same specimen. The results indicate that combining summary morphometrics and persistence vectors with applied PCA provides an information rich characterization that enables efficient and precise comparison of neural morphology. The execution time scaled linearly with data set size, allowing seamless live searching through the entire NeuroMorpho.Org content in fractions of a second. We have deployed the similarity search function as an open-source online software tool both through a user-friendly graphical interface and as an API for programmatic access.


Author(s):  
Kazuo Aoyama ◽  
Atsunori Ogawa ◽  
Takashi Hattori ◽  
Takaaki Hori ◽  
Atsushi Nakamura

Author(s):  
Mikel Artetxe ◽  
Holger Schwenk

We introduce an architecture to learn joint multilingual sentence representations for 93 languages, belonging to more than 30 different families and written in 28 different scripts. Our system uses a single BiLSTM encoder with a shared byte-pair encoding vocabulary for all languages, which is coupled with an auxiliary decoder and trained on publicly available parallel corpora. This enables us to learn a classifier on top of the resulting embeddings using English annotated data only, and transfer it to any of the 93 languages without any modification. Our experiments in cross-lingual natural language inference (XNLI data set), cross-lingual document classification (MLDoc data set), and parallel corpus mining (BUCC data set) show the effectiveness of our approach. We also introduce a new test set of aligned sentences in 112 languages, and show that our sentence embeddings obtain strong results in multilingual similarity search even for low- resource languages. Our implementation, the pre-trained encoder, and the multilingual test set are available at https://github.com/facebookresearch/LASER .


Author(s):  
L. WALAVALKAR ◽  
M. YEASIN ◽  
A. NARASIMHAMURTHY ◽  
R. SHARMA

Computer vision systems for monitoring people and collecting valuable demographic information in a social environment is an important research problem. It is expected that such a system will play an increasingly important role in enhancing user's experience and can significantly improve the intelligibility of a human computer interaction (HCI) system. For example, a robust gender classification system can provide a basis for passive surveillance and access to a smart building using demographic information or can provide valuable consumer statistics in a public place. The option of an audio cue in addition to the visual cue promises a robust solution with high accuracy and ease-of-use in human computer interaction systems. This paper investigates gender classification using Support Vector Machines (SVMs). The visual (thumbnail frontal face) and the audio (features from speech data) cues were considered for designing the classifier. Three different representations of the data, namely, raw data, principle component analysis (PCA) and non-negative matrix factorization (NMF) were used for the experimentation with visual signal. For speech, mel-cepstral coefficient and pitch were used for the experimentation. It was found that the best overall classification rates obtained using the SVM for the visual and speech data were 95.31% and 100%, respectively, on data set collected in laboratory environment. The performance of the SVM was compared with two simple classifiers namely, the nearest prototype neighbor and the k-nearest neighbor on all feature sets. It was found that the SVM outperformed the other two classifiers on all datasets. To further understand the robustness issues, the proposed approach has been applied on a large balanced (roughly equal distribution of gender, ethnicity and age group) data-base consisting of 8000 faces collected in real world environment. While, the results are very promising it indicates more to be done to make a statistically meaningful conclusion.


2019 ◽  
Author(s):  
Andrew Dalke

<div>This paper describes the 10 years of work and research results of the chemfp project, available from http://chemfp.com/ . The project started as a way to promote the FPS format for cheminformatics fingerprint exchange. This is a line-oriented text format meant to be easy to read and write. It supports metadata such as the fingerprint type and data provenance.The chemfp package for Python was developed to provide the basic command-line tools and Python API for working with fingerprint data, because a format without useful tools will not be used. <br></div><div><br></div><div>The similarity search performance improved by an order of magnitude over the decade, due to careful implementation and effective use of CPU hardware, including AVX2 support for faster popcount calculations than the built-in POPCNT instruction. The implementation details for high-performance search have rarely been discussed in the literature. As a result, many tools and published papers use implementations which are not close to the machine's capabilities.</div><div><br></div><div>This paper describes those details to help with future optimization efforts.</div><div><br></div><div>The most advanced version of chemfp evaluates about 130 million 1024-bit fingerprint Tanimotos per second on a single core of a standard x86-64 server machine. When combined with the BitBound algorithm, a k=1000 nearest-neighbor search of the 1.8 million 2048-bit Morgan fingerprints of ChEMBL 24 averages 27 ms/query and the same search of the 970 million PubChem fingerprints averages 220 ms/query, making chemfp one of the fastest similarity search tools available for CPUs. This appears to be several times faster than previously published work in the field, including in papers which use much more sophisticated data structures.</div><div><br></div><div>A close analysis shows that nearly all earlier work assumes that the intersection popcount was the limiting performance factor, while on modern hardware uncompressed search is effectively memory bandwidth limited. For example, AVX2 search is 10% faster when memory prefetching, and the popcount evaluation time is far faster than fetching a random location in main memory. It proved difficult to evaluate existing tool performance because in the few cases where the tools were available, each used its own format, data sets, and search tasks.</div><div><br></div><div>This paper introduces the chemfp benchmark data set to help make head-to-head comparisons easier in the future, and to help promote the FPS format. The FPS format is slow for tasks like web server reloads and command-line scripting. This paper also describes the FPB format, which is a binary application format for fast loads. </div>


2021 ◽  
Vol 20 (1) ◽  
Author(s):  
Jia Fu ◽  
Sen Yang ◽  
Fei He ◽  
Ling He ◽  
Yuanyuan Li ◽  
...  

Abstract Background Schizophrenia is a chronic and severe mental disease, which largely influences the daily life and work of patients. Clinically, schizophrenia with negative symptoms is usually misdiagnosed. The diagnosis is also dependent on the experience of clinicians. It is urgent to develop an objective and effective method to diagnose schizophrenia with negative symptoms. Recent studies had shown that impaired speech could be considered as an indicator to diagnose schizophrenia. The literature about schizophrenic speech detection was mainly based on feature engineering, in which effective feature extraction is difficult because of the variability of speech signals. Methods This work designs a novel Sch-net neural network based on a convolutional neural network, which is the first work for end-to-end schizophrenic speech detection using deep learning techniques. The Sch-net adds two components, skip connections and convolutional block attention module (CBAM), to the convolutional backbone architecture. The skip connections enrich the information used for the classification by emerging low- and high-level features. The CBAM highlights the effective features by giving learnable weights. The proposed Sch-net combines the advantages of the two components, which can avoid the procedure of manual feature extraction and selection. Results We validate our Sch-net through ablation experiments on a schizophrenic speech data set that contains 28 patients with schizophrenia and 28 healthy controls. The comparisons with the models based on feature engineering and deep neural networks are also conducted. The experimental results show that the Sch-net has a great performance on the schizophrenic speech detection task, which can achieve 97.68% accuracy on the schizophrenic speech data set. To further verify the generalization of our model, the Sch-net is tested on open access LANNA children speech database for specific language impairment detection. The results show that our model achieves 99.52% accuracy in classifying patients with SLI and healthy controls. Our code will be available at https://github.com/Scu-sen/Sch-net. Conclusions Extensive experiments show that the proposed Sch-net can provide aided information for the diagnosis of schizophrenia and specific language impairment.


Sign in / Sign up

Export Citation Format

Share Document