We introduce FastViTHD, a novel hybrid vision encoder designed to output fewer tokens and significantly reduce encoding time for high-resolution images. Our smallest variant outperforms ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results