Triton Inference Guides

Triton Inference tutorial directory

Each guide includes a link to the source code for the provided example. In most cases, it will be required to clone the repository in order to follow along directly with the tutorial's walkthrough.

📄️ FasterTransformer GPT-J and GPT: NeoX 20B

Deploy GPT-J or GPT-NeoX using NVIDIA Triton Inference Server with the FasterTransformer backend