scholarly journals Local Translation Services for Neglected Languages

2021 ◽  
Author(s):  
David Noever ◽  
Josh Kalin ◽  
Matthew Ciolino ◽  
Dom Hambrick ◽  
Gerry Dozier

Taking advantage of computationally lightweight, but high-quality translators prompt consideration of new applications that address neglected languages. For projects with protected or personal data, translators for less popular or low-resource languages require specific compliance checks before posting to a public translation API. In these cases, locally run translators can render reasonable, cost-effective solutions if done with an army of offline, smallscale pair translators. Like handling a specialist’s dialect, this research illustrates translating two historically interesting, but obfuscated languages: 1) hacker-speak (“l33t”) and 2) reverse (or “mirror”) writing as practiced by Leonardo da Vinci. The work generalizes a deep learning architecture to translatable variants of hacker-speak with lite, medium, and hard vocabularies. The original contribution highlights a fluent translator of hacker-speak in under 50 megabytes and demonstrates a companion text generator for augmenting future datasets with greater than a million bilingual sentence pairs. A primary motivation stems from the need to understand and archive the evolution of the international computer community, one that continuously enhances their talent for speaking openly but in hidden contexts. This training of bilingual sentences supports deep learning models using a long short-term memory, recurrent neural network (LSTM-RNN). It extends previous work demonstrating an English-to-foreign translation service built from as little as 10,000 bilingual sentence pairs. This work further solves the equivalent translation problem in twenty-six additional (non-obfuscated) languages and rank orders those models and their proficiency quantitatively with Italian as the most successful and Mandarin Chinese as the most challenging. For neglected languages, the method prototypes novel services for smaller niche translations such as Kabyle (Algerian dialect) which covers between 5-7 million speakers but one which for most enterprise translators, has not yet reached development. One anticipates the extension of this approach to other important dialects, such as translating technical (medical or legal) jargon and processing health records or handling many of the dialects collected from specialized domains (mixed languages like “Spanglish”, acronym-laden Twitter feeds, or urban slang).

Computers ◽  
2021 ◽  
Vol 10 (12) ◽  
pp. 158
Author(s):  
Filipa Esgalhado ◽  
Beatriz Fernandes ◽  
Valentina Vassilenko ◽  
Arnaldo Batista ◽  
Sara Russo

Photoplethysmography (PPG) is widely used in wearable devices due to its conveniency and cost-effective nature. From this signal, several biomarkers can be collected, such as heart and respiration rate. For the usual acquisition scenarios, PPG is an artefact-ridden signal, which mandates the need for the designated classification algorithms to be able to reduce the noise component effect on the classification. Within the selected classification algorithm, the hyperparameters’ adjustment is of utmost importance. This study aimed to develop a deep learning model for robust PPG wave detection, which includes finding each beat’s temporal limits, from which the peak can be determined. A study database consisting of 1100 records was created from experimental PPG measurements performed in 47 participants. Different deep learning models were implemented to classify the PPG: Long Short-Term Memory (LSTM), Bidirectional LSTM, and Convolutional Neural Network (CNN). The Bidirectional LSTM and the CNN-LSTM were investigated, using the PPG Synchrosqueezed Fourier Transform (SSFT) as the models’ input. Accuracy, precision, recall, and F1-score were evaluated for all models. The CNN-LSTM algorithm, with an SSFT input, was the best performing model with accuracy, precision, and recall of 0.894, 0.923, and 0.914, respectively. This model has shown to be competent in PPG detection and delineation tasks, under noise-corrupted signals, which justifies the use of this innovative approach.


Atmosphere ◽  
2021 ◽  
Vol 12 (7) ◽  
pp. 924
Author(s):  
Moslem Imani ◽  
Hoda Fakour ◽  
Wen-Hau Lan ◽  
Huan-Chin Kao ◽  
Chi Ming Lee ◽  
...  

Despite the great significance of precisely forecasting the wind speed for development of the new and clean energy technology and stable grid operators, the stochasticity of wind speed makes the prediction a complex and challenging task. For improving the security and economic performance of power grids, accurate short-term wind power forecasting is crucial. In this paper, a deep learning model (Long Short-term Memory (LSTM)) has been proposed for wind speed prediction. Knowing that wind speed time series is nonlinear stochastic, the mutual information (MI) approach was used to find the best subset from the data by maximizing the joint MI between subset and target output. To enhance the accuracy and reduce input characteristics and data uncertainties, rough set and interval type-2 fuzzy set theory are combined in the proposed deep learning model. Wind speed data from an international airport station in the southern coast of Iran Bandar-Abbas City was used as the original input dataset for the optimized deep learning model. Based on the statistical results, the rough set LSTM (RST-LSTM) model showed better prediction accuracy than fuzzy and original LSTM, as well as traditional neural networks, with the lowest error for training and testing datasets in different time horizons. The suggested model can support the optimization of the control approach and the smooth procedure of power system. The results confirm the superior capabilities of deep learning techniques for wind speed forecasting, which could also inspire new applications in meteorology assessment.


Author(s):  
Claire Brenner ◽  
Jonathan Frame ◽  
Grey Nearing ◽  
Karsten Schulz

ZusammenfassungDie Verdunstung ist ein entscheidender Prozess im globalen Wasser‑, Energie- sowie Kohlenstoffkreislauf. Daten zur räumlich-zeitlichen Dynamik der Verdunstung sind daher von großer Bedeutung für Klimamodellierungen, zur Abschätzung der Auswirkungen der Klimakrise sowie nicht zuletzt für die Landwirtschaft.In dieser Arbeit wenden wir zwei Machine- und Deep Learning-Methoden für die Vorhersage der Verdunstung mit täglicher und halbstündlicher Auflösung für Standorte des FLUXNET-Datensatzes an. Das Long Short-Term Memory Netzwerk ist ein rekurrentes neuronales Netzwerk, welchen explizit Speichereffekte berücksichtigt und Zeitreihen der Eingangsgrößen analysiert (entsprechend physikalisch-basierten Wasserbilanzmodellen). Dem gegenüber gestellt werden Modellierungen mit XGBoost, einer Entscheidungsbaum-Methode, die in diesem Fall nur Informationen für den zu bestimmenden Zeitschritt erhält (entsprechend physikalisch-basierten Energiebilanzmodellen). Durch diesen Vergleich der beiden Modellansätze soll untersucht werden, inwieweit sich durch die Berücksichtigung von Speichereffekten Vorteile für die Modellierung ergeben.Die Analysen zeigen, dass beide Modellansätze gute Ergebnisse erzielen und im Vergleich zu einem ausgewerteten Referenzdatensatz eine höhere Modellgüte aufweisen. Vergleicht man beide Modelle, weist das LSTM im Mittel über alle 153 untersuchten Standorte eine bessere Übereinstimmung mit den Beobachtungen auf. Allerdings zeigt sich eine Abhängigkeit der Güte der Verdunstungsvorhersage von der Vegetationsklasse des Standorts; vor allem wärmere, trockene Standorte mit kurzer Vegetation werden durch das LSTM besser repräsentiert, wohingegen beispielsweise in Feuchtgebieten XGBoost eine bessere Übereinstimmung mit den Beobachtung liefert. Die Relevanz von Speichereffekten scheint daher zwischen Ökosystemen und Standorten zu variieren.Die präsentierten Ergebnisse unterstreichen das Potenzial von Methoden der künstlichen Intelligenz für die Beschreibung der Verdunstung.


Electronics ◽  
2021 ◽  
Vol 10 (4) ◽  
pp. 495
Author(s):  
Imayanmosha Wahlang ◽  
Arnab Kumar Maji ◽  
Goutam Saha ◽  
Prasun Chakrabarti ◽  
Michal Jasinski ◽  
...  

This article experiments with deep learning methodologies in echocardiogram (echo), a promising and vigorously researched technique in the preponderance field. This paper involves two different kinds of classification in the echo. Firstly, classification into normal (absence of abnormalities) or abnormal (presence of abnormalities) has been done, using 2D echo images, 3D Doppler images, and videographic images. Secondly, based on different types of regurgitation, namely, Mitral Regurgitation (MR), Aortic Regurgitation (AR), Tricuspid Regurgitation (TR), and a combination of the three types of regurgitation are classified using videographic echo images. Two deep-learning methodologies are used for these purposes, a Recurrent Neural Network (RNN) based methodology (Long Short Term Memory (LSTM)) and an Autoencoder based methodology (Variational AutoEncoder (VAE)). The use of videographic images distinguished this work from the existing work using SVM (Support Vector Machine) and also application of deep-learning methodologies is the first of many in this particular field. It was found that deep-learning methodologies perform better than SVM methodology in normal or abnormal classification. Overall, VAE performs better in 2D and 3D Doppler images (static images) while LSTM performs better in the case of videographic images.


2021 ◽  
Vol 13 (10) ◽  
pp. 1953
Author(s):  
Seyed Majid Azimi ◽  
Maximilian Kraus ◽  
Reza Bahmanyar ◽  
Peter Reinartz

In this paper, we address various challenges in multi-pedestrian and vehicle tracking in high-resolution aerial imagery by intensive evaluation of a number of traditional and Deep Learning based Single- and Multi-Object Tracking methods. We also describe our proposed Deep Learning based Multi-Object Tracking method AerialMPTNet that fuses appearance, temporal, and graphical information using a Siamese Neural Network, a Long Short-Term Memory, and a Graph Convolutional Neural Network module for more accurate and stable tracking. Moreover, we investigate the influence of the Squeeze-and-Excitation layers and Online Hard Example Mining on the performance of AerialMPTNet. To the best of our knowledge, we are the first to use these two for regression-based Multi-Object Tracking. Additionally, we studied and compared the L1 and Huber loss functions. In our experiments, we extensively evaluate AerialMPTNet on three aerial Multi-Object Tracking datasets, namely AerialMPT and KIT AIS pedestrian and vehicle datasets. Qualitative and quantitative results show that AerialMPTNet outperforms all previous methods for the pedestrian datasets and achieves competitive results for the vehicle dataset. In addition, Long Short-Term Memory and Graph Convolutional Neural Network modules enhance the tracking performance. Moreover, using Squeeze-and-Excitation and Online Hard Example Mining significantly helps for some cases while degrades the results for other cases. In addition, according to the results, L1 yields better results with respect to Huber loss for most of the scenarios. The presented results provide a deep insight into challenges and opportunities of the aerial Multi-Object Tracking domain, paving the way for future research.


Publications ◽  
2021 ◽  
Vol 9 (3) ◽  
pp. 27
Author(s):  
Yaniasih Yaniasih ◽  
Indra Budi

Classifying citations according to function has many benefits when it comes to information retrieval tasks, scholarly communication studies, and ranking metric developments. Many citation function classification schemes have been proposed, but most of them have not been systematically designed for an extensive literature-based compilation process. Many schemes were also not evaluated properly before being used for classification experiments utilizing large datasets. This paper aimed to build and evaluate new citation function categories based upon sufficient scientific evidence. A total of 2153 citation sentences were collected from Indonesian journal articles for our dataset. To identify the new categories, a literature survey was conducted, analyses and groupings of category meanings were carried out, and then categories were selected based on the dataset’s characteristics and the purpose of the classification. The evaluation used five criteria: coherence, ease, utility, balance, and coverage. Fleiss’ kappa and automatic classification metrics using machine learning and deep learning algorithms were used to assess the criteria. These methods resulted in five citation function categories. The scheme’s coherence and ease of use were quite good, as indicated by an inter-annotator agreement value of 0.659 and a Long Short-Term Memory (LSTM) F1-score of 0.93. According to the balance and coverage criteria, the scheme still needs to be improved. This research data was limited to journals in food science published in Indonesia. Future research will involve classifying the citation function using a massive dataset collected from various scientific fields and published from some representative countries, as well as applying improved annotation schemes and deep learning methods.


2021 ◽  
Vol 366 (1) ◽  
Author(s):  
Zhichao Wen ◽  
Shuhui Li ◽  
Lihua Li ◽  
Bowen Wu ◽  
Jianqiang Fu

Author(s):  
Saeed Vasebi ◽  
Yeganeh M. Hayeri ◽  
Peter J. Jin

Relatively recent increased computational power and extensive traffic data availability have provided a unique opportunity to re-investigate drivers’ car-following (CF) behavior. Classic CF models assume drivers’ behavior is only influenced by their preceding vehicle. Recent studies have indicated that considering surrounding vehicles’ information (e.g., multiple preceding vehicles) could affect CF models’ performance. An in-depth investigation of surrounding vehicles’ contribution to CF modeling performance has not been reported in the literature. This study uses a deep-learning model with long short-term memory (LSTM) to investigate to what extent considering surrounding vehicles could improve CF models’ performance. This investigation helps to select the right inputs for traffic flow modeling. Five CF models are compared in this study (i.e., classic, multi-anticipative, adjacent-lanes, following-vehicle, and all-surrounding-vehicles CF models). Performance of the CF models is compared in relation to accuracy, stability, and smoothness of traffic flow. The CF models are trained, validated, and tested by a large publicly available dataset. The average mean square errors (MSEs) for the classic, multi-anticipative, adjacent-lanes, following-vehicle, and all-surrounding-vehicles CF models are 1.58 × 10−3, 1.54 × 10−3, 1.56 × 10−3, 1.61 × 10−3, and 1.73 × 10−3, respectively. However, the results show insignificant performance differences between the classic CF model and multi-anticipative model or adjacent-lanes model in relation to accuracy, stability, or smoothness. The following-vehicle CF model shows similar performance to the multi-anticipative model. The all-surrounding-vehicles CF model has underperformed all the other models.


Sign in / Sign up

Export Citation Format

Share Document