Personally, if you’re considering it already, kubernetes might be something to look into. It’s a lot. Like a lot to learn. But I can honestly say I could do it for a job now with how much I’ve learned. Then it’s less about how to set up machines and more about just reapplying your infrastructure.
LLMs use a ton of VRAM, the more VRAM you have the better.
If you just need an API, then TabbyAPI is pretty great.
If you need a full UI, then Oogabooga’s TextGenration WebUI is a good place to start