Fast Inference for Language Models
314W, PABAbout the Event: The PyData Yerevan Second Monthly Meetup is approaching. NLP research engineer in Unum, Vladimir Orshulevich, will walk us through the next chapter with “Fast inference for Language Models” talk. This will be a remarkable opportunity to: discover about speedups for vanilla hugging face models inference using TensorRT, ONNX, and Nvidia NGC Container […]