Abstract: Knowledge distillation (KD) aims to distill the knowledge from the teacher (larger) to the student (smaller) model via soft-label for the efficient neural network. In general, the ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results