User Profiles for Personalizing Digital Libraries

Author(s):  
Giovanni Semeraro ◽  
Pierpaolo Basile ◽  
Marco de Gemmis ◽  
Pasquale Lops

Exploring digital collections to find information relevant to a user’s interests is a challenging task. Information preferences vary greatly across users; therefore, filtering systems must be highly personalized to serve the individual interests of the user. Algorithms designed to solve this problem base their relevance computations on user profiles in which representations of the users’ interests are maintained. The main focus of this chapter is the adoption of machine learning to build user profiles that capture user interests from documents. Profiles are used for intelligent document filtering in digital libraries. This work suggests the exploiting of knowledge stored in machine-readable dictionaries to obtain accurate user profiles that describe user interests by referring to concepts in those dictionaries. The main aim of the proposed approach is to show a real-world scenario in which the combination of machine learning techniques and linguistic knowledge is helpful to achieve intelligent document filtering.

2020 ◽  
Author(s):  
Arnaud Adam ◽  
Isabelle Thomas

<p>Transport geography has always been characterized by a lack of accurate data, leading to surveys often based on samples that are spatially not representative. However, the current deluge of data collected through sensors promises to overpass this scarcity of data. We here consider one example: since April 1<sup>st</sup> 2016, a GPS tracker is mandatory within each truck circulating in Belgium for kilometre taxes. Every 30 seconds, this tracker collects the position of the truck (as well as some other information such as speed or direction), leading to an individual taxation of trucks. This contribution uses a one-week exhaustive database containing the totality of trucks circulating in Belgium, in order to understand transport fluxes within the country, as well as the spatial effects of the taxation on the circulation of trucks.</p><p>Machine learning techniques are applied on over 270 million of GPS points to detect stops of trucks, leading to transform GPS sequences into a complete Origin-Destination matrix. Using machine learning allows to accurately classify stops that are different in nature (leisure stop, (un-)loading areas, or congested roads). Based on this matrix, we firstly propose an overview of the daily traffic, as well as an evaluation of the number of stops made in every Belgian place. Secondly, GPS sequences and stops are combined, leading to characterise sub-trajectories of each truck (first/last miles and transit) by their fiscal debit. This individual characterisation, as well as its variation in space and time, are here discussed: is the individual taxation system always efficient in space and time?</p><p>This contribution helps to better understand the circulation of trucks in Belgium, the places where they stopped, as well as the importance of their locations in a fiscal point of view. What are the potential modifications of the trucks routes that would lead to a more sustainable kilometre taxation? This contribution illustrates that combining big-data and machine learning open new roads for accurately measuring and modelling transportation.</p>


The prediction of price for a vehicle has been more popular in research area, and it needs predominant effort and information about the experts of this particular field. The number of different attributes is measured and also it has been considerable to predict the result in more reliable and accurate. To find the price of used vehicles a well defined model has been developed with the help of three machine learning techniques such as Artificial Neural Network, Support Vector Machine and Random Forest. These techniques were used not on the individual items but for the whole group of data items. This data group has been taken from some web portal and that same has been used for the prediction. The data must be collected using web scraper that was written in PHP programming language. Distinct machine learning algorithms of varying performances had been compared to get the best result of the given data set. The final prediction model was integrated into Java application


Author(s):  
Matthias Mühlbauer ◽  
Hubert Würschinger ◽  
Dominik Polzer ◽  
Nico Hanenkamp

AbstractThe prediction of the power consumption increases the transparency and the understanding of a cutting process, this delivers various potentials. Beside the planning and optimization of manufacturing processes, there are application areas in different kinds of deviation detection and condition monitoring. Due to the complicated stochastic processes during the cutting processes, analytical approaches quickly reach their limits. Since the 1980s, approaches for predicting the time or energy consumption use empirical models. Nevertheless, most of the existing models regard only static snapshots and are not able to picture the dynamic load fluctuations during the entire milling process. This paper describes a data-driven way for a more detailed prediction of the power consumption for a milling process using Machine Learning techniques. To increase the accuracy we used separate models and machine learning algorithms for different operations of the milling machine to predict the required time and energy. The merger of the individual models allows finally the accurate forecast of the load profile of the milling process for a specific machine tool. The following method introduces the whole pipeline from the data acquisition, over the preprocessing and the model building to the validation.


Author(s):  
Abikoye Oluwakemi Christiana ◽  
Benjamin Aruwa Gyunka ◽  
Akande Noah

<p class="0abstract">The open source nature of Android Operating System has attracted wider adoption of the system by multiple types of developers. This phenomenon has further fostered an exponential proliferation of devices running the Android OS into different sectors of the economy. Although this development has brought about great technological advancements and ease of doing businesses (e-commerce) and social interactions, they have however become strong mediums for the uncontrolled rising cyberattacks and espionage against business infrastructures and the individual users of these mobile devices. Different cyberattacks techniques exist but attacks through malicious applications have taken the lead aside other attack methods like social engineering. Android malware have evolved in sophistications and intelligence that they have become highly resistant to existing detection systems especially those that are signature-based. Machine learning techniques have risen to become a more competent choice for combating the kind of sophistications and novelty deployed by emerging Android malwares. The models created via machine learning methods work by first learning the existing patterns of malware behaviour and then use this knowledge to separate or identify any such similar behaviour from unknown attacks. This paper provided a comprehensive review of machine learning techniques and their applications in Android malware detection as found in contemporary literature.</p>


Author(s):  
Aleksei Netšunajev ◽  
Sven Nõmm ◽  
Aaro Toomela ◽  
Kadri Medijainen ◽  
Pille Taba

Analysis of the sentence writing test is conducted in this paper to support diagnostics of the Parkinsons disease. Drawing and writing tests digitization has become a trend where synergy of machine learning techniques on the one side and knowledge base of the neurology and psychiatry on the other side leading sophisticated result in computer aided diagnostics. Such rapid progress has a drawback. In many cases, decisions made by machine learning algorithm are difficult to explain in a language human practitioner familiar with. The method proposed in this paper employs unsupervised learning techniques to segment the sentence into the individual characters. Then, feature engineering process is applied to describe writing of each letter using a set of kinematic and pressure parameters. Following feature selection process applicability of different machine learning classifiers is evaluated. To guarantee that achieved results may be interpreted by human, two major guidelines are established. The first one is to keep dimensionality of the feature set low. The second one is clear physical meaning of the features describing the writing process. Features describing amount and smoothness of the motion observed during the writing alongside with letter size are considered. Resulting algorithm does not take into account any semantic information or language particularities and therefore may be easily adopted to any language based on Latin or Cyrillic alphabets.


2020 ◽  
Vol 9 (1) ◽  
pp. 1000-1004

The automatic extraction of bibliographic data remains a difficult task to the present day, when it's realized that the scientific publications are not in a standard format and every publications has its own template. There are many “regular expression” techniques and “supervised machine learning” techniques for extracting the entire details of the references mentioned within the bibliographic section. But there's no much difference within the percentage of their success. Our idea is to seek out whether unsupervised machine learning techniques can help us in increasing the share of success. This paper presents a technique for segregating and automatically extracting the individual components of references like Authors, Title of the references, publications details, etc., using “Unsupervised technique”, “Named-Entity recognition”(NER) technique and link these references to their corresponding full text article with the assistance of google


Author(s):  
Anisha C. D ◽  
Arulanand N

Myopathy and Neuropathy are non-progressive and progressive neuromuscular disorders which weakens the muscles and nerves respectively. Electromyography (EMG) signals are bio signals obtained from the individual muscle cells. EMG based diagnosis for neuromuscular disorders is a safe and reliable method. Integrating the EMG signals with machine learning techniques improves the diagnostic accuracy. The proposed system performs analysis on the clinical raw EMG dataset which is obtained from the publicly available PhysioNet database. The two-channel raw EMG dataset of healthy, myopathy and neuropathy subjects are divided into samples. The Time Domain (TD) features are extracted from divided samples of each subject. The extracted features are annotated with the class label representing the state of the individual. The annotated features split into training and testing set in the standard ratio 70: 30. The comparative classification analysis on the complete annotated features set and prominent features set procured using Pearson correlation technique is performed. The features are scaled using standard scaler technique. The analysis on scaled annotated features set and scaled prominent features set is also implemented. The hyperparameter space of the classifiers are given by trial and error method. The hyperparameters of the classifiers are tuned using Bayesian optimization technique and the optimal parameters are obtained. and are fed to the tuned classifier. The classification algorithms considered in the analysis are Random Forest and Multi-Layer Perceptron Neural Network (MLPNN). The performance evaluation of the classifiers on the test data is computed using the Accuracy, Confusion Matrix, F1 Score, Precision and Recall metrics. The evaluation results of the classifiers states that Random Forest performs better than MLPNN wherein it provides an accuracy of 96 % with non-scaled Time Domain (TD) features and MLPNN outperforms better than Random Forest with an accuracy of 97% on scaled Time Domain (TD) features which is higher than the existing systems. The inferences from the evaluation results is that Bayesian optimization tuned classifiers improves the accuracy which provides a robust diagnostic model for neuromuscular disorder diagnosis.


Sign in / Sign up

Export Citation Format

Share Document