An Unsupervised Data-Driven Cross-Lingual Method for Building High Precision Sentiment Lexicons

Author(s):  
Pierluca Sangiorgi ◽  
Agnese Augello ◽  
Giovanni Pilato
Author(s):  
Petya Osenova ◽  
Kiril Simov

The data-driven Bulgarian WordNet: BTBWNThe paper presents our work towards the simultaneous creation of a data-driven WordNet for Bulgarian and a manually annotated treebank with semantic information. Such an approach requires synchronization of the word senses in both - syntactic and lexical resources, without limiting the WordNet senses to the corpus or vice versa. Our strategy focuses on the identification of senses used in BulTreeBank, but the missing senses of a lemma also have been covered through exploration of bigger corpora. The identified senses have been organized in synsets for the Bulgarian WordNet. Then they have been aligned to the Princeton WordNet synsets. Various types of mappings are considered between both resources in a cross-lingual aspect and with respect to ensuring maximum connectivity and potential for incorporating the language specific concepts. The mapping between the two WordNets (English and Bulgarian) is a basis for applications such as machine translation and multilingual information retrieval. Oparty na danych WordNet bułgarski: BTBWNW artykule przedstawiono naszą pracę na rzecz jednoczesnej budowy opartego na danych wordnetu dla języka bułgarskiego oraz ręcznie oznaczonego informacjami semantycznymi banku drzew. Takie podejście wymaga uzgodnienia znaczeń słów zarówno w zasobach składniowych, jak i leksykalnych, bez ograniczania znaczeń umieszczanych w wordnecie do tych obecnych w korpusie, jak i odwrotnie. Nasza strategia koncentruje się na identyfikacji znaczeń stosowanych w BulTreeBank, przy czym brakujące znaczenia lematu zostały również zbadane przez zgłębienie większych korpusów. Zidentyfikowane znaczenia zostały zorganizowane w synsety bułgarskiego wordnetu, a następnie powiązane z synsetami Princeton WordNet. Rozmaite rodzaje rzutowań są rozpatrywane pomiędzy obydwoma zasobami w kontekście międzyjęzykowym, a także w odniesieniu do zapewnienia maksymalnej łączności i możliwości uwzględnienia pojęć specyficznych dla języka bułgarskiego. Rzutowanie między dwoma wordnetami (angielskim i bułgarskim) jest podstawą dla aplikacji, takich jak tłumaczenie maszynowe i wielojęzyczne wyszukiwanie informacji.


2020 ◽  
Vol 34 (08) ◽  
pp. 13369-13375
Author(s):  
Zheyuan Ryan Shi ◽  
Yiwen Yuan ◽  
Kimberly Lo ◽  
Leah Lizarondo ◽  
Fei Fang

Food waste and food insecurity are two challenges that coexist in many communities. To mitigate the problem, food rescue platforms match excess food with the communities in need, and leverage external volunteers to transport the food. However, the external volunteers bring significant uncertainty to the food rescue operation. We work with a large food rescue organization to predict the uncertainty and furthermore to find ways to reduce the human dispatcher's workload and the redundant notifications sent to volunteers. We make two main contributions. (1) We train a stacking model which predicts whether a rescue will be claimed with high precision and AUC. This model can help the dispatcher better plan for backup options and alleviate their uncertainty. (2) We develop a data-driven optimization algorithm to compute the optimal intervention and notification scheme. The algorithm uses a novel counterfactual data generation approach and the branch and bound framework. Our result reduces the number of notifications and interventions required in the food rescue operation. We are working with the organization to deploy our results in the near future.


2018 ◽  
Vol 97 (4) ◽  
Author(s):  
Re-Bing Wu ◽  
Bing Chu ◽  
David H. Owens ◽  
Herschel Rabitz

2018 ◽  
Vol 15 (5) ◽  
pp. 805-819 ◽  
Author(s):  
Likun Wang ◽  
Chaofeng Chen ◽  
Zhengyang Li ◽  
Wei Dong ◽  
Zhijiang Du ◽  
...  

Energies ◽  
2020 ◽  
Vol 13 (15) ◽  
pp. 3791
Author(s):  
Yong Li ◽  
Jue Yang ◽  
Wei Long Liu ◽  
Cheng Lin Liao

The lithium-ion battery is a complicated non-linear system with multi electrochemical processes including mass and charge conservations as well as electrochemical kinetics. The calculation process of the electrochemical model depends on an in-depth understanding of the physicochemical characteristics and parameters, which can be costly and time-consuming. We investigated the electrochemical modeling, reduction, and identification methods of the lithium-ion battery from the electrode-level to the system-level. A reduced 9th order linear model was proposed using electrode-level physicochemical modeling and the cell-level mathematical reduction method. The data-driven predictor-based subspace identification algorithm was presented for the estimation of lithium-ion battery model in the system-level. The effectiveness of the proposed modeling and identification methods was validated in an experimental study based on LiFePO4 cells. The accuracy and dynamic characteristics of the identified model were found to be much more likely related to the operating State of Charge (SOC) range. Experimental results showed that the proposed methods perform well with high precision and good robustness in the SOC range of 90% to 10%, and the tracking error increases significantly within higher (100–90%) or lower (10–0%) SOC ranges. Moreover, to achieve an optimal balance between high-precision and low complexity, statistical analysis revealed that the 6th, 3rd, and 5th order battery model is the optimal choice in the SOC range of 90% to 100%, 90% to 10%, and 10% to 0%, respectively.


IEEE Access ◽  
2021 ◽  
pp. 1-1
Author(s):  
Jingbo Zhou ◽  
Laisheng Pan ◽  
Yuehua Li ◽  
Renjie Du ◽  
Fuxiang Zhang

2021 ◽  
Vol 12 ◽  
Author(s):  
Kenji Sagae

Recent work on the application of neural networks to language modeling has shown that models based on certain neural architectures can capture syntactic information from utterances and sentences even when not given an explicitly syntactic objective. We examine whether a fully data-driven model of language development that uses a recurrent neural network encoder for utterances can track how child language utterances change over the course of language development in a way that is comparable to what is achieved using established language assessment metrics that use language-specific information carefully designed by experts. Given only transcripts of child language utterances from the CHILDES Database and no pre-specified information about language, our model captures not just the structural characteristics of child language utterances, but how these structures reflect language development over time. We establish an evaluation methodology with which we can examine how well our model tracks language development compared to three known approaches: Mean Length of Utterance, the Developmental Sentence Score, and the Index of Productive Syntax. We discuss the applicability of our model to data-driven assessment of child language development, including how a fully data-driven approach supports the possibility of increased research in multilingual and cross-lingual issues.


Author(s):  
Hans Aulin ◽  
Per Tunestal ◽  
Thomas Johansson ◽  
Bengt Johansson

A high precision torque sensor is used for extracting combustion timing information from cylinder individual pressure estimates constructed from the torque measurements. A combination of physics-based and data driven modeling is used where the physical part of the model is based on equations describing contributions of inertial and gas forces while the flexing of the crankshaft, which has rather complex dynamics, is modeled using the data driven approach. The first part of the study shows the derivation of the models and how well the torque at the sensor position can be estimated from the pressures in the four cylinders. The second part demonstrates how it is possible to reconstruct cylinder individual torque and pressure by inverting the pressure to torque model. Going from measured torque to pressure in each cylinder is not trivial since the inverted model is ill conditioned around top dead centre which causes large errors where the precision is the most needed. A parameterized combustion model is therefore introduced to improve the signal to noise ratio in the estimated parameters. The proposed method for detecting combustion demonstrated good results with a coefficient of determination of 0.95 against “true” combustion phasing.


Author(s):  
J. C. Russ ◽  
T. Taguchi ◽  
P. M. Peters ◽  
E. Chatfield ◽  
J. C. Russ ◽  
...  

Conventional SAD patterns as obtained in the TEM present difficulties for identification of materials such as asbestiform minerals, although diffraction data is considered to be an important method for making this purpose. The preferred orientation of the fibers and the spotty patterns that are obtained do not readily lend themselves to measurement of the integrated intensity values for each d-spacing, and even the d-spacings may be hard to determine precisely because the true center location for the broken rings requires estimation. We have implemented an automatic method for diffraction pattern measurement to overcome these problems. It automatically locates the center of patterns with high precision, measures the radius of each ring of spots in the pattern, and integrates the density of spots in that ring. The resulting spectrum of intensity vs. radius is then used just as a conventional X-ray diffractometer scan would be, to locate peaks and produce a list of d,I values suitable for search/match comparison to known or expected phases.


Sign in / Sign up

Export Citation Format

Share Document