HAN Yongsheng, QI Zhiquan, TIAN Yingjie
Accepted: 2025-03-05
Learning from Label Proportions (LLP) is a weakly labeled learning problem, where the instance-level label information is abstracted in the form of bags, that is, only the label proportion information of each bag is available. Consequently, LLP can be grouped into learning with bags community, where bags consisted of instances are related. Similar to typical classification, our aim is not only to learn a classifier to greatly recover the instance-level labels in training data, but also to generalize this label prediction to unseen data. However, due to the ambiguous or approximate property in statistic estimation and the existence of label noises, a more realistic situation for this learning framework is prone to conceive an interval-type proportion information, instead of real-valued proportions in LLP. Thus, for these universal scenarios, the standard LLP methods are failed to offer a satisfied label predictor. In this paper, we propose a new learning framework called Bounded Label Proportions (BLP) to tackle this puzzled problem. In addition, we perform a robust algorithm for BLP based on Random Forest (RF):BLPForest, which is naturally able to deal with multi-class and high dimensional problems. For the purpose of comparison, we divided our experiments into two parts. In the first part, we degenerated BLPForest into standard LLP problem, in order to verify the evolution between these two similar learning problems. Consequently, the results demonstrated BLPForest with a natural advantage even in the case of real-valued proportion information equipped, which mainly benefited from the application of RF algorithm. For the second part, we chose large datasets with multi-class and much higher dimensions. In a meantime, appropriate noise for proportion information in each bag was deliberately added. All experiments showed that BLPForest can yield the best accuracies in the most cases. Finally, we offered the corresponding discussion and necessary analysis.