尚未解答DataScience- Teacher Student Model Semi-supervised
DataScience- Teacher Student Model Semi-supervised
在 Billion-scale semi-supervised learning for image classification(Facebook AI Research) 當中有提到student model不將D與D-hat合起來train的原因: Remark: It is possible to use a mixture of data in D and Dˆ for training like in previous approaches [34]. However, this requires for searching for optimal mixing parameters, which depend on other parameters. This is resource-intensi ve in the case of our large-scale training. Additionally, as shown later in ou r analysis, taking full advantage of large-scale un- labelled data requires ad opting long pre-training schedules, which adds some complexity when mixing is involved.