Research on Recommendation Algorithm Based on Ranking Learning

After analyzing the logistic regression and support vector machine's limitation, the author has chosen the learning to rank method to solve the problem of news recommendations. The article proposes two news recommendation methods which were based on Bayesian optimization criterion and RankSVM. In addition, the article also proposes two methods to solve the dynamic change of user interest and recommendation novelty and diversity. The experimental results show that the two methods can get ideal results, and the overall performance of the method based on Bayesian optimization criterion is better than that based on RankSVM.

Download Full-text

Research on News Recommendation Algorithm Based on User Interest and Timeliness Modeling

The 2nd International Conference on Computing and Data Science ◽

10.1145/3448734.3450933 ◽

2021 ◽

Author(s):

Zhongtai Qin ◽

Mingjun Zhang

Keyword(s):

User Interest ◽

Recommendation Algorithm ◽

News Recommendation

Download Full-text

A Computational Method for the Identification of Endolysins and Autolysins

Protein and Peptide Letters ◽

10.2174/0929866526666191002104735 ◽

2020 ◽

Vol 27 (4) ◽

pp. 329-336 ◽

Cited By ~ 1

Author(s):

Lei Xu ◽

Guangmin Liang ◽

Baowen Chen ◽

Xu Tan ◽

Huaikun Xiang ◽

...

Keyword(s):

Support Vector Machine ◽

Cell Wall ◽

Experimental Results ◽

Computational Method ◽

Lytic Enzyme ◽

Support Vector ◽

Lytic Enzymes ◽

Data Set ◽

Optimal Feature ◽

Better Than

Background: Cell lytic enzyme is a kind of highly evolved protein, which can destroy the cell structure and kill the bacteria. Compared with antibiotics, cell lytic enzyme will not cause serious problem of drug resistance of pathogenic bacteria. Thus, the study of cell wall lytic enzymes aims at finding an efficient way for curing bacteria infectious. Compared with using antibiotics, the problem of drug resistance becomes more serious. Therefore, it is a good choice for curing bacterial infections by using cell lytic enzymes. Cell lytic enzyme includes endolysin and autolysin and the difference between them is the purpose of the break of cell wall. The identification of the type of cell lytic enzymes is meaningful for the study of cell wall enzymes. Objective: In this article, our motivation is to predict the type of cell lytic enzyme. Cell lytic enzyme is helpful for killing bacteria, so it is meaningful for study the type of cell lytic enzyme. However, it is time consuming to detect the type of cell lytic enzyme by experimental methods. Thus, an efficient computational method for the type of cell lytic enzyme prediction is proposed in our work. Method: We propose a computational method for the prediction of endolysin and autolysin. First, a data set containing 27 endolysins and 41 autolysins is built. Then the protein is represented by tripeptides composition. The features are selected with larger confidence degree. At last, the classifier is trained by the labeled vectors based on support vector machine. The learned classifier is used to predict the type of cell lytic enzyme. Results: Following the proposed method, the experimental results show that the overall accuracy can attain 97.06%, when 44 features are selected. Compared with Ding's method, our method improves the overall accuracy by nearly 4.5% ((97.06-92.9)/92.9%). The performance of our proposed method is stable, when the selected feature number is from 40 to 70. The overall accuracy of tripeptides optimal feature set is 94.12%, and the overall accuracy of Chou's amphiphilic PseAAC method is 76.2%. The experimental results also demonstrate that the overall accuracy is improved by nearly 18% when using the tripeptides optimal feature set. Conclusion: The paper proposed an efficient method for identifying endolysin and autolysin. In this paper, support vector machine is used to predict the type of cell lytic enzyme. The experimental results show that the overall accuracy of the proposed method is 94.12%, which is better than some existing methods. In conclusion, the selected 44 features can improve the overall accuracy for identification of the type of cell lytic enzyme. Support vector machine performs better than other classifiers when using the selected feature set on the benchmark data set.

Download Full-text

Performance Evaluation of IMERG GPM Products during Tropical Storm Imelda

Atmosphere ◽

10.3390/atmos12060687 ◽

2021 ◽

Vol 12 (6) ◽

pp. 687

Author(s):

Salman Sakib ◽

Dawit Ghebreyesus ◽

Hatim O. Sharif

Keyword(s):

Stage Iv ◽

Tropical Storm ◽

Average Correlation ◽

Acceptable Range ◽

Precipitation Estimates ◽

Overall Performance ◽

Early Late ◽

Statistical Metrics ◽

Better Than

Tropical Storm Imelda struck the southeast coastal regions of Texas from 17–19 September, 2019, and delivered precipitation above 500 mm over about 6000 km2. The performance of the three IMERG (Early-, Late-, and Final-run) GPM satellite-based precipitation products was evaluated against Stage-IV radar precipitation estimates. Basic and probabilistic statistical metrics, such as CC, RSME, RBIAS, POD, FAR, CSI, and PSS were employed to assess the performance of the IMERG products. The products captured the event adequately, with a fairly high POD value of 0.9. The best product (Early-run) showed an average correlation coefficient of 0.60. The algorithm used to produce the Final-run improved the quality of the data by removing systematic errors that occurred in the near-real-time products. Less than 5 mm RMSE error was experienced in over three-quarters (ranging from 73% to 76%) of the area by all three IMERG products in estimating the Tropical Storm Imelda. The Early-run product showed a much better RBIAS relatively to the Final-run product. The overall performance was poor, as areas with an acceptable range of RBIAS (i.e., between −10% and 10%) in all the three IMERG products were only 16% to 17% of the total area. Overall, the Early-run product was found to be better than Late- and Final-run.

Download Full-text

NLOS Multipath Classification of GNSS Signal Correlation Output Using Machine Learning

Sensors ◽

10.3390/s21072503 ◽

2021 ◽

Vol 21 (7) ◽

pp. 2503

Author(s):

Taro Suzuki ◽

Yoshiharu Amano

Keyword(s):

Machine Learning ◽

Satellite System ◽

Training Data ◽

Support Vector ◽

Positioning Errors ◽

Automated Method ◽

Global Navigation Satellite ◽

Better Than ◽

Signal Correlation

This paper proposes a method for detecting non-line-of-sight (NLOS) multipath, which causes large positioning errors in a global navigation satellite system (GNSS). We use GNSS signal correlation output, which is the most primitive GNSS signal processing output, to detect NLOS multipath based on machine learning. The shape of the multi-correlator outputs is distorted due to the NLOS multipath. The features of the shape of the multi-correlator are used to discriminate the NLOS multipath. We implement two supervised learning methods, a support vector machine (SVM) and a neural network (NN), and compare their performance. In addition, we also propose an automated method of collecting training data for LOS and NLOS signals of machine learning. The evaluation of the proposed NLOS detection method in an urban environment confirmed that NN was better than SVM, and 97.7% of NLOS signals were correctly discriminated.

Download Full-text

Landslide hazard assessment based on Bayesian optimization–support vector machine in Nanping City, China

Natural Hazards ◽

10.1007/s11069-021-04862-y ◽

2021 ◽

Author(s):

Wei Xie ◽

Wen Nie ◽

Pooya Saffari ◽

Luis F. Robledo ◽

Pierre-Yves Descote ◽

...

Keyword(s):

Support Vector Machine ◽

Hazard Assessment ◽

Landslide Hazard ◽

Bayesian Optimization ◽

Support Vector ◽

Landslide Hazard Assessment

Download Full-text

Zonation of Landslide Susceptibility in Ruijin, Jiangxi, China

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18115906 ◽

2021 ◽

Vol 18 (11) ◽

pp. 5906

Author(s):

Xiaoting Zhou ◽

Weicheng Wu ◽

Ziyu Lin ◽

Guiliang Zhang ◽

Renxiang Chen ◽

...

Keyword(s):

Environmental Factors ◽

Landslide Susceptibility ◽

Urban Areas ◽

Support Vector ◽

Susceptibility Map ◽

Human Society ◽

Learning Approaches ◽

Prevention Measures ◽

Landslide Occurrence ◽

Better Than

Landslides are one of the major geohazards threatening human society. The objective of this study was to conduct a landslide hazard susceptibility assessment for Ruijin, Jiangxi, China, and to provide technical support to the local government for implementing disaster reduction and prevention measures. Machine learning approaches, e.g., random forests (RFs) and support vector machines (SVMs) were employed and multiple geo-environmental factors such as land cover, NDVI, landform, rainfall, lithology, and proximity to faults, roads, and rivers, etc., were utilized to achieve our purposes. For categorical factors, three processing approaches were proposed: simple numerical labeling (SNL), weight assignment (WA)-based and frequency ratio (FR)-based. Then 19 geo-environmental factors were respectively converted into raster to constitute three 19-band datasets, i.e., DS1, DS2, and DS3 from three different processes. Then, 155 observed landslides that occurred in the past decades were vectorized, among which 70% were randomly selected to compose a training set (TS1) and the remaining 30% to form a validation set (VS1). A number of non-landslide (no-risk) samples distributed in the whole study area were identified in low slope (<1–3°) zones such as urban areas and croplands, and also added to the TS1 and VS1 in the same ratio. For comparison, we used the FR approach to identify the no-risk samples in both flat and non-flat areas, and merged them into the field-observed landslides to constitute another pair of training and validation sets (TS2 and VS2) using the same ratio of 7:3. The RF algorithm was applied to model the probability of the landslide occurrence using DS1, DS2, and DS3 as predictive variables and TS1 and TS2 for training to obtain the SNL-based, WA-based, and FR-based RF models, respectively. Verified against VS1 and VS2, the three models have similar overall accuracy (OA) and Kappa coefficient (KC), which are 89.61%, 91.47%, and 94.54%, and 0.7926, 0.8299, and 0.8908, respectively. All of them are much better than the three models obtained by SVM algorithm with OA of 81.79%, 82.86%, and 83%, and KC of 0.6337, 0.655, and 0.660. New case verification with the recent 26 landslide events of 2017–2020 revealed that the landslide susceptibility map from WA-based RF modeling was able to properly identify the high and very high susceptibility zones where 23 new landslides had occurred, and performed better than the SNL-based and FR-based RF modeling, though the latter has a slightly higher OA and KC. Hence, we concluded that all three RF models achieve reasonable risk prediction, but WA-based and FR-based RF modeling deserves a recommendation for application elsewhere. The results of this study may serve as reference for the local authorities in prevention and early warning of landslide hazards.

Download Full-text

An Improved News Recommendation Algorithm Based on Text Similarity

2020 3rd International Conference on Smart BlockChain (SmartBlock) ◽

10.1109/smartblock52591.2020.00031 ◽

2020 ◽

Author(s):

Yihang Gao ◽

Hui Zhao ◽

Qian Zhou ◽

Meikang Qiu ◽

Meiqin Liu

Keyword(s):

Text Similarity ◽

Recommendation Algorithm ◽

News Recommendation

Download Full-text

Impact of Dataset Size on Classification Performance: An Empirical Evaluation in the Medical Domain

Applied Sciences ◽

10.3390/app11020796 ◽

2021 ◽

Vol 11 (2) ◽

pp. 796

Author(s):

Alhanoof Althnian ◽

Duaa AlSaeed ◽

Heyam Al-Baity ◽

Amani Samha ◽

Alanoud Bin Dris ◽

...

Keyword(s):

Empirical Evaluation ◽

Classification Performance ◽

Support Vector ◽

Robust Model ◽

Original Distribution ◽

C4.5 Decision Tree ◽

Dataset Size ◽

Overall Performance ◽

Medical Domain ◽

The Impact

Dataset size is considered a major concern in the medical domain, where lack of data is a common occurrence. This study aims to investigate the impact of dataset size on the overall performance of supervised classification models. We examined the performance of six widely-used models in the medical field, including support vector machine (SVM), neural networks (NN), C4.5 decision tree (DT), random forest (RF), adaboost (AB), and naïve Bayes (NB) on eighteen small medical UCI datasets. We further implemented three dataset size reduction scenarios on two large datasets and analyze the performance of the models when trained on each resulting dataset with respect to accuracy, precision, recall, f-score, specificity, and area under the ROC curve (AUC). Our results indicated that the overall performance of classifiers depend on how much a dataset represents the original distribution rather than its size. Moreover, we found that the most robust model for limited medical data is AB and NB, followed by SVM, and then RF and NN, while the least robust model is DT. Furthermore, an interesting observation is that a robust machine learning model to limited dataset does not necessary imply that it provides the best performance compared to other models.

Download Full-text

Deep Learning Methods for Classification of Certain Abnormalities in Echocardiography

Electronics ◽

10.3390/electronics10040495 ◽

2021 ◽

Vol 10 (4) ◽

pp. 495

Author(s):

Imayanmosha Wahlang ◽

Arnab Kumar Maji ◽

Goutam Saha ◽

Prasun Chakrabarti ◽

Michal Jasinski ◽

...

Keyword(s):

Deep Learning ◽

Short Term Memory ◽

Support Vector ◽

Variational Autoencoder ◽

Different Types ◽

Static Images ◽

Long Short Term Memory ◽

2D And 3D ◽

Better Than

This article experiments with deep learning methodologies in echocardiogram (echo), a promising and vigorously researched technique in the preponderance field. This paper involves two different kinds of classification in the echo. Firstly, classification into normal (absence of abnormalities) or abnormal (presence of abnormalities) has been done, using 2D echo images, 3D Doppler images, and videographic images. Secondly, based on different types of regurgitation, namely, Mitral Regurgitation (MR), Aortic Regurgitation (AR), Tricuspid Regurgitation (TR), and a combination of the three types of regurgitation are classified using videographic echo images. Two deep-learning methodologies are used for these purposes, a Recurrent Neural Network (RNN) based methodology (Long Short Term Memory (LSTM)) and an Autoencoder based methodology (Variational AutoEncoder (VAE)). The use of videographic images distinguished this work from the existing work using SVM (Support Vector Machine) and also application of deep-learning methodologies is the first of many in this particular field. It was found that deep-learning methodologies perform better than SVM methodology in normal or abnormal classification. Overall, VAE performs better in 2D and 3D Doppler images (static images) while LSTM performs better in the case of videographic images.

Download Full-text

Application of Artificial Intelligence (AI) for Sustainable Highway and Road System

Symmetry ◽

10.3390/sym13010060 ◽

2020 ◽

Vol 13 (1) ◽

pp. 60

Author(s):

Md Arifuzzaman ◽

Muhammad Aniq Gul ◽

Kaffayatullah Khan ◽

S. M. Zakir Hossain

Keyword(s):

Experimental Data ◽

High Performance ◽

Adhesion Force ◽

Model Development ◽

Real Life ◽

Bayesian Optimization ◽

Support Vector ◽

Adhesive Properties ◽

Bayesian Optimization Algorithm ◽

The Mean

There are several environmental factors such as temperature differential, moisture, oxidation, etc. that affect the extended life of the modified asphalt influencing its desired adhesive properties. Knowledge of the properties of asphalt adhesives can help to provide a more resilient and durable asphalt surface. In this study, a hybrid of Bayesian optimization algorithm and support vector regression approach is recommended to predict the adhesion force of asphalt. The effects of three important variables viz., conditions (fresh, wet and aged), binder types (base, 4% SB, 5% SB, 4% SBS and 5% SBS), and Carbon Nano Tube doses (0.5%, 1.0% and 1.5%) on adhesive force are taken into consideration. Real-life experimental data (405 specimens) are considered for model development. Using atomic force microscopy, the adhesive strength of nanoscales of test specimens is determined according to functional groups on the asphalt. It is found that the model predictions overlap with the experimental data with a high R2 of 90.5% and relative deviation are scattered around zero line. Besides, the mean, median and standard deviations of experimental and the predicted values are very close. In addition, the mean absolute Error, root mean square error and fractional bias values were found to be low, indicating the high performance of the developed model.

Download Full-text