pu learning
Recently Published Documents


TOTAL DOCUMENTS

54
(FIVE YEARS 26)

H-INDEX

5
(FIVE YEARS 2)

Author(s):  
Guangxin Su ◽  
Weitong Chen ◽  
Miao Xu

Positive-unlabeled (PU) learning deals with the binary classification problem when only positive (P) and unlabeled (U) data are available, without negative (N) data. Existing PU methods perform well on the balanced dataset. However, in real applications such as financial fraud detection or medical diagnosis, data are always imbalanced. It remains unclear whether existing PU methods can perform well on imbalanced data. In this paper, we explore this problem and propose a general learning objective for PU learning targeting specially at imbalanced data. By this general learning objective, state-of-the-art PU methods based on optimizing a consistent risk can be adapted to conquer the imbalance. We theoretically show that in expectation, optimizing our learning objective is equivalent to learning a classifier on the oversampled balanced data with both P and N data available, and further provide an estimation error bound. Finally, experimental results validate the effectiveness of our proposal compared to state-of-the-art PU methods.


2021 ◽  
Vol 9 ◽  
Author(s):  
Zeeshan Shirazi ◽  
Lei Wang ◽  
Valery G. Bondur

Wildfire is one of the most common natural hazards in the world. Fire risk estimation for the purposes of risk reduction is an important aspect in disaster studies around the world. The aim of this research was to develop a machine learning workflow process for South East China to monitor fire risks over a large region by learning from a grid file database containing a time series of several of the important environmental parameters largely extracted from remote sensing data products, and highlight areas as fire risk or non-fire risk over a couple of weeks in the future. The study employed fire threshold and the transductive PU learning method to identify reliable non-fire/negative training samples from the grid file database using fire/positive training samples, labeled using the MODIS MCD14ML fire location product. Different models were trained for the three natural vegetation land covers, namely evergreen broadleaf forest, mixed forest, and woody savannas in the study area. On the test dataset, the three models exhibited high sensitivity (>80%) by identifying the majority of fires in the test dataset for all land covers. The use of the reliable negatives identified though the fire threshold and PU learning process resulted in low precision and accuracy. During the model verification process, the model for the mixed forest land cover performed the best with 70% of verification fires falling within the classified fire zone. It was found that the better representation of mixed forest in the training samples made this model perform more reliably as compared to others. Improving the individual models constructed for different land covers and combining them can provide fire classification for a larger region. There is room to improve the spatial precision of fire cell classification. Introducing finer scale features that have higher correlation with fire activity and exhibit high spatial variability seems a viable way forward.


Author(s):  
Michał Karwatowski ◽  
Maciej Wielgosz ◽  
Marcin Pietroń ◽  
Kamil Piętak ◽  
Dominik Żurek
Keyword(s):  

Author(s):  
Yuda Gao ◽  
Bin Shi ◽  
Bo Dong ◽  
Yiyang Wang ◽  
Lingyun Mi ◽  
...  
Keyword(s):  

Author(s):  
Vaishnavi Muralidharan ◽  
Nandan Sudarsanam ◽  
Balaraman Ravindran
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document