I have been thinking that the barrier to setting up local LLMs should be lowered to allow people to get the most out of their hardware and models. So that's what Openjet is about, it auto-detects your hardware and configures the llama.cpp server with the best model and parameters.

Using openjet, I get ~38-40 tok/s without configuring anything (all I did was run the install command from the Github repo). Setup: RTX 3090, 240k context, Qwen3.5-27B-Q4_K_M

Whereas, the default Ollama configuration gives you 16 tok/s for the same prompt, same hardware. Openjet is 2.4x faster.

You don't have to worry about any configuration settings. People who don't know how many GPU layers or KV Cache quantisation won't be missing out on the performance boost they provide.

I hope this helps solve any problems people are having setting up their local llms and getting the most out of their hardware. If you've got any other suggestions to make it more accessible, I'm willing to chat.

Try it out: https://github.com/L-Forster/open-jet

OpenJet - Code with a Local LLM in seconds.

Replies