training samples
Recently Published Documents


TOTAL DOCUMENTS

1351
(FIVE YEARS 700)

H-INDEX

30
(FIVE YEARS 12)

Author(s):  
Raheem Sarwar ◽  
Saeed-Ul Hassan

The authorship identification task aims at identifying the original author of an anonymous text sample from a set of candidate authors. It has several application domains such as digital text forensics and information retrieval. These application domains are not limited to a specific language. However, most of the authorship identification studies are focused on English and limited attention has been paid to Urdu. However, existing Urdu authorship identification solutions drop accuracy as the number of training samples per candidate author reduces and when the number of candidate authors increases. Consequently, these solutions are inapplicable to real-world cases. Moreover, due to the unavailability of reliable POS taggers or sentence segmenters, all existing authorship identification studies on Urdu text are limited to the word n-grams features only. To overcome these limitations, we formulate a stylometric feature space, which is not limited to the word n-grams feature only. Based on this feature space, we use an authorship identification solution that transforms each text sample into a point set, retrieves candidate text samples, and relies on the nearest neighbors classifier to predict the original author of the anonymous text sample. To evaluate our solution, we create a significantly larger corpus than existing studies and conduct several experimental studies that show that our solution can overcome the limitations of existing studies and report an accuracy level of 94.03%, which is higher than all previous authorship identification works.


2022 ◽  
Vol 18 (1) ◽  
pp. 1-24
Author(s):  
Yi Zhang ◽  
Yue Zheng ◽  
Guidong Zhang ◽  
Kun Qian ◽  
Chen Qian ◽  
...  

Gait, the walking manner of a person, has been perceived as a physical and behavioral trait for human identification. Compared with cameras and wearable sensors, Wi-Fi-based gait recognition is more attractive because Wi-Fi infrastructure is almost available everywhere and is able to sense passively without the requirement of on-body devices. However, existing Wi-Fi sensing approaches impose strong assumptions of fixed user walking trajectories, sufficient training data, and identification of already known users. In this article, we present GaitSense , a Wi-Fi-based human identification system, to overcome the above unrealistic assumptions. To deal with various walking trajectories and speeds, GaitSense first extracts target specific features that best characterize gait patterns and applies novel normalization algorithms to eliminate gait irrelevant perturbation in signals. On this basis, GaitSense reduces the training efforts in new deployment scenarios by transfer learning and data augmentation techniques. GaitSense also enables a distinct feature of illegal user identification by anomaly detection, making the system readily available for real-world deployment. Our implementation and evaluation with commodity Wi-Fi devices demonstrate a consistent identification accuracy across various deployment scenarios with little training samples, pushing the limit of gait recognition with Wi-Fi signals.


Geophysics ◽  
2022 ◽  
pp. 1-44
Author(s):  
Yuhang Sun ◽  
Yang Liu ◽  
Mi Zhang ◽  
Haoran Zhang

AVO (amplitude variation with offset) inversion and neural networks are widely used to invert elastic parameters. With more constraints from well log data, neural network-based inversion may estimate elastic parameters with greater precision and resolution than traditional AVO inversion, however, neural network approaches necessitate a massive number of reliable training samples. Furthermore, because the lack of low-frequency information in seismic gathers leads to multiple solutions of the inverse problem, both inversions rely heavily on proper low-frequency initial models. To mitigate the dependence of inversions on accurate training samples and initial models, we propose solving inverse problems with the recently developed invertible neural networks (INNs). Unlike conventional neural networks, which address the ambiguous inverse issues directly, INNs learn definite forward modeling and use additional latent variables to increase the uniqueness of solutions. Motivated by the newly developed neural networks, we propose an INN-based AVO inversion method, which can reliably invert low to medium frequency velocities and densities with randomly generated easy-to-access datasets rather than trustworthy training samples or well-prepared initial models. Tests on synthetic and field data show that our method is feasible, anti-noise capable, and practicable.


Entropy ◽  
2022 ◽  
Vol 24 (1) ◽  
pp. 128
Author(s):  
Zhenwei Guan ◽  
Feng Min ◽  
Wei He ◽  
Wenhua Fang ◽  
Tao Lu

Forest fire detection from videos or images is vital to forest firefighting. Most deep learning based approaches rely on converging image loss, which ignores the content from different fire scenes. In fact, complex content of images always has higher entropy. From this perspective, we propose a novel feature entropy guided neural network for forest fire detection, which is used to balance the content complexity of different training samples. Specifically, a larger weight is given to the feature of the sample with a high entropy source when calculating the classification loss. In addition, we also propose a color attention neural network, which mainly consists of several repeated multiple-blocks of color-attention modules (MCM). Each MCM module can extract the color feature information of fire adequately. The experimental results show that the performance of our proposed method outperforms the state-of-the-art methods.


Water ◽  
2022 ◽  
Vol 14 (2) ◽  
pp. 244
Author(s):  
Arsalan Ghorbanian ◽  
Seyed Ali Ahmadi ◽  
Meisam Amani ◽  
Ali Mohammadzadeh ◽  
Sadegh Jamali

Mangroves, as unique coastal wetlands with numerous benefits, are endangered mainly due to the coupled effects of anthropogenic activities and climate change. Therefore, acquiring reliable and up-to-date information about these ecosystems is vital for their conservation and sustainable blue carbon development. In this regard, the joint use of remote sensing data and machine learning algorithms can assist in producing accurate mangrove ecosystem maps. This study investigated the potential of artificial neural networks (ANNs) with different topologies and specifications for mangrove classification in Iran. To this end, multi-temporal synthetic aperture radar (SAR) and multi-spectral remote sensing data from Sentinel-1 and Sentinel-2 were processed in the Google Earth Engine (GEE) cloud computing platform. Afterward, the ANN topologies and specifications considering the number of layers and neurons, learning algorithm, type of activation function, and learning rate were examined for mangrove ecosystem mapping. The results indicated that an ANN model with four hidden layers, 36 neurons in each layer, adaptive moment estimation (Adam) learning algorithm, rectified linear unit (Relu) activation function, and the learning rate of 0.001 produced the most accurate mangrove ecosystem map (F-score = 0.97). Further analysis revealed that although ANN models were subjected to accuracy decline when a limited number of training samples were used, they still resulted in satisfactory results. Additionally, it was observed that ANN models had a high resistance when training samples included wrong labels, and only the ANN model with the Adam learning algorithm produced an accurate mangrove ecosystem map when no data standardization was performed. Moreover, further investigations showed the higher potential of multi-temporal and multi-source remote sensing data compared to single-source and mono-temporal (e.g., single season) for accurate mangrove ecosystem mapping. Overall, the high potential of the proposed method, along with utilizing open-access satellite images and big-geo data processing platforms (i.e., GEE, Google Colab, and scikit-learn), made the proposed approach efficient and applicable over other study areas for all interested users.


2022 ◽  
Author(s):  
Zhuoxuan Xia ◽  
Lingcao Huang ◽  
Chengyan Fan ◽  
Shichao Jia ◽  
Zhanjun Lin ◽  
...  

Abstract. The important Qinghai Tibet Engineering Corridor (QTEC) covers the part of the Highway and Railway underlain by permafrost. The permafrost on the QTEC is sensitive to climate warming and human disturbance and suffers accelerating degradation. Retrogressive thaw slumps (RTSs) are slope failures due to the thawing of ice-rich permafrost. They typically retreat and expand at high rates, damaging infrastructure, and releasing carbon preserved in frozen ground. Along the critical and essential corridor, RTSs are commonly distributed but remain poorly investigated. To compile the first comprehensive inventory of RTSs, this study uses an iteratively semi-automatic method built on deep learning to delineate thaw slumps in the 2019 PlanetScope CubeSat images over a ~54,000 km2 corridor area. The method effectively assesses every image pixel using DeepLabv3+ with limited training samples and manually inspects the deep-learning-identified thaw slumps based on their geomorphic features and temporal changes. The inventory includes 875 RTSs, of which 474 are clustered in the Beiluhe region, and 38 are near roads or railway lines. The dataset is available at https://doi.org/10.1594/PANGAEA.933957 (Xia et al., 2021), with the Chinese version at https://data.tpdc.ac.cn/zh-hans/disallow/50de2d4f-75e1-4bad-b316-6fb91d915a1a/. These RTSs tend to be located on north-facing slopes with gradients of 1.2°–18.1° and distributed at medium elevations ranging from 4511 to 5212 m. a.s.l. They prefer to develop on land receiving relatively low annual solar radiation (from 2900 to 3200 kWh m−2), alpine meadow covered, and silt loam underlay. The results provide a significant and fundamental benchmark dataset for quantifying thaw slump changes in this vulnerable region undergoing strong climatic warming and extensive human activities.


2022 ◽  
Vol 2022 ◽  
pp. 1-11
Author(s):  
Shi Song-men

The diagnosis of new diseases is a challenging problem. In the early stage of the emergence of new diseases, there are few case samples; this may lead to the low accuracy of intelligent diagnosis. Because of the advantages of support vector machine (SVM) in dealing with small sample problems, it is selected for the intelligent diagnosis method. The standard SVM diagnosis model updating needs to retrain all samples. It costs huge storage and calculation costs and is difficult to adapt to the changing reality. In order to solve this problem, this paper proposes a new disease diagnosis method based on Fuzzy SVM incremental learning. According to SVM theory, the support vector set and boundary sample set related to the SVM diagnosis model are extracted. Only these sample sets are considered in incremental learning to ensure the accuracy and reduce the cost of calculation and storage. To reduce the impact of noise points caused by the reduction of training samples, FSVM is used to update the diagnosis model, and the generalization is improved. The simulation results on the banana dataset show that the proposed method can improve the classification accuracy from 86.4% to 90.4%. Finally, the method is applied in COVID-19’s diagnostic. The diagnostic accuracy reaches 98.2% as the traditional SVM only gets 84%. With the increase of the number of case samples, the model is updated. When the training samples increase to 400, the number of samples participating in training is only 77; the amount of calculation of the updated model is small.


2022 ◽  
Author(s):  
Zhiwen Yan ◽  
Ying Chen ◽  
Jinlong Song ◽  
Jia Zhu ◽  
Jianbo Li

Abstract Pit and fissure sealant is for children aged seven to twelve years to prevent molars from becoming caries. In this paper, we propose a new detection framework to identify whether children need pit and fissure sealing. We divide the framework into two parts: molar detection and molar classification. According to the characteristics of teeth, we propose to use the clustering method to filter the bounding box in the object detection part. In the region divided by clustering, we only keep one detection frame in the same category. In the classification part, we propose a noise filtering layer based on wavelet transform for feature extraction. We map the training samples to another space in the training process based on metric learning to increase the distance between categories and improve the accuracy of classification.


Author(s):  
Houjie Li ◽  
Min Yang ◽  
Yu Zhou ◽  
Ruirui Zheng ◽  
Wenpeng Liu ◽  
...  

Partial label learning is a new weak- ly supervised learning framework. In this frame- work, the real category label of a training sample is usually concealed in a set of candidate labels, which will lead to lower accuracy of learning al- gorithms compared with traditional strong super- vised cases. Recently, it has been found that met- ric learning technology can be used to improve the accuracy of partial label learning algorithm- s. However, because it is difficult to ascertain similar pairs from training samples, at present there are few metric learning algorithms for par- tial label learning framework. In view of this, this paper proposes a similar pair-free partial la- bel metric learning algorithm. The main idea of the algorithm is to define two probability distri- butions on the training samples, i.e., the proba- bility distribution determined by the distance of sample pairs and the probability distribution de- termined by the similarity of candidate label set of sample pairs, and then the metric matrix is ob- tained via minimizing the KL divergence of the two probability distributions. The experimental results on several real-world partial label dataset- s show that the proposed algorithm can improve the accuracy of k-nearest neighbor partial label learning algorithm (PL-KNN) better than the ex- isting partial label metric learning algorithms, up to 8 percentage points.


Author(s):  
Xiaoyu He ◽  
Yong Wang ◽  
Shuang Zhao ◽  
Chunli Yao

AbstractCurrently, convolutional neural networks (CNNs) have made remarkable achievements in skin lesion classification because of their end-to-end feature representation abilities. However, precise skin lesion classification is still challenging because of the following three issues: (1) insufficient training samples, (2) inter-class similarities and intra-class variations, and (3) lack of the ability to focus on discriminative skin lesion parts. To address these issues, we propose a deep metric attention learning CNN (DeMAL-CNN) for skin lesion classification. In DeMAL-CNN, a triplet-based network (TPN) is first designed based on deep metric learning, which consists of three weight-shared embedding extraction networks. TPN adopts a triplet of samples as input and uses the triplet loss to optimize the embeddings, which can not only increase the number of training samples, but also learn the embeddings robust to inter-class similarities and intra-class variations. In addition, a mixed attention mechanism considering both the spatial-wise and channel-wise attention information is designed and integrated into the construction of each embedding extraction network, which can further strengthen the skin lesion localization ability of DeMAL-CNN. After extracting the embeddings, three weight-shared classification layers are used to generate the final predictions. In the training procedure, we combine the triplet loss with the classification loss as a hybrid loss to train DeMAL-CNN. We compare DeMAL-CNN with the baseline method, attention methods, advanced challenge methods, and state-of-the-art skin lesion classification methods on the ISIC 2016 and ISIC 2017 datasets, and test its generalization ability on the PH2 dataset. The results demonstrate its effectiveness.


Sign in / Sign up

Export Citation Format

Share Document