safa

The 3 hardest technical problems I hit building an AI agent that calls real APIs

by

Not a launch post. Just things I wish someone had written down before I spent a month figuring them out.

1. LLMs send partial payloads on write operations

You ask the agent to update a record. It sends only the fields you mentioned in the prompt. The PUT request goes through, returns 200, and you've silently wiped every field you didn't specify.

The fix: before every write call, fetch the current resource state via the companion GET endpoint and deep-merge the LLM's payload on top. The LLM only needs to specify what's changing — the executor fills in the rest.

2. LLMs hallucinate success when API calls fail

A tool returns a 404. The agent says "done, the record was updated!"

The fix: explicitly prefix every error response with "Error:" and add one line to the system prompt — if a tool returns a message starting with Error:, report it directly. Do not assume success. Without this, the agent will confidently lie every time.

3. Query parameters break in subtle ways

The LLM passes query params as a plain string instead of a dict. The request fires, looks fine in logs, returns nothing. No error. Just silence.

The fix: coerce string inputs to dicts in the tool executor and be extremely explicit in the field description about the expected shape — including a concrete example.

None of this shows up in tutorials or documentation. You only find it by shipping something real and watching it break.

If you're building anything that connects an LLM to a real API — what failure modes have you hit?

146 views

Add a comment

Replies

Best
Joséphine Roux

Query parameter formatting has caused me more confusion than I expected. I kept checking logs thinking the request was fine, but the structure was slightly off and results where empty

safa

@josephine_roux Totally. The worst part is everything looks correct—no errors, clean logs, just empty responses. It’s that silent failure mode that burns the most time.

Shyun Bill

The "silent wipe" on partial updates is a total nightmare that still catches the best of us! It’s wild how these models will lie about success just to stay "helpful," so that "Error:" prefix is a total lifesaver. These real-world quirks are exactly why shipping is the only way to actually learn how this stuff works. Have you dealt with the model getting overwhelmed when a GET request returns way too much data?