Pavel (Pasha) Simakov - Gemini CLI: A Developer's Mental Model

Gemini CLI: A Developer’s Mental Model

by Pasha Simakov, 2025-08-21

Today’s developers don’t need to be experts in AI system architecture. You don’t have to understand every detail of transformers, tokenization, or distributed serving. But to be an effective user of modern AI tools, you do need a working mental model of the agentic loop — the back-and-forth between client and server that quietly shapes every interaction. Without it, prompts can feel unpredictable or “magical.” With it, you can reason about what’s happening, phrase tasks more effectively, and use these tools with confidence.

When Simple Clients Aren’t So Simple

Not all API clients are created equal. Some are lightweight wrappers that simply pass a request along and return a response. Others, like database clients, are far more elaborate. They hide vast amounts of machinery -- scrollable datasets, transaction management, connection pooling—behind a deceptively simple method calls. As developers, we rarely stop to consider what happens under the hood, so long as the client delivers the capabilities we expect.

A similar evolution is unfolding right now in front of our eyes with Agentic LLM Clients. At first glance, the Gemini CLI looks like a thin shell around a powerful API: you type a prompt, press Enter, and get a reply. But just as with database drivers, the reality is richer and more complex. Beneath the surface, the LLM Client is orchestrating hidden back-and-forth exchanges, managing state, invoking tools, compressing conversation context, and recovering gracefully from errors—all without you lifting a finger.

That invisible choreography is the subject of this article.

Why the Details Matter

The Gemini CLI looks like a thin wrapper around the Gemini API Server. You type a prompt, it streams back an answer. But the truth is far more layered. What looks like one synchronous call is, in fact, a multi-turn collaboration between client and server, stitched together by invisible logic in the CLI.

Every user interaction involves three actors, each with a distinct role:

The Developer — you, typing the prompt. From your perspective, there’s one request and you expect one response.

The Gemini CLI Client — the orchestrator. It’s stateful, keeps track of conversation context, and decides when to execute tools, retry, or summarize. It’s also where all agentic behavior resides: it executes local commands, manages turn compression, and actively drives the flow forward rather than waiting passively. Because it’s open source, we can read the code, debug it, and even change its behavior (client.ts, geminiChat.ts, turn.ts).

The Gemini API Server — the stateless but powerful brain, the LLM model. It only sees what the client sends in a single request. It is optimized for reasoning and generation, but deliberately avoids holding memory or state between calls. Unlike the client, we cannot inspect or modify the server — it’s a black box.

Your single keystroke sets off a quiet dance between these actors. What looks atomic to you is often many hidden steps under the hood.

Three short examples in the appendix make this clear. Even a simple “Hello” involves hidden context setup and a next-speaker check (Example 1). A request like “find all filenames containing foo” triggers a local filesystem detour before the answer comes back (Example 2). A chained query like find all filenames containing foo; do they contain bar?” requires multiple turns, tool calls, and parallel searches (Example 3).

As a preview, here’s the Agentic Loop sequence diagram from Example 3: Chaining Tools — a pattern you’ll quickly become familiar with.

Diagram: Example 3: Chaining Tools Sequence Diagram

Inside the Gemini CLI Agentic Loop

When you type a prompt into the Gemini CLI and hit Enter, the client executes the following routine:

Initialize
- Collect environment and IDE context.
- List all available local tools LLM can call if need be — whether those are local MCP servers, local shell utilities, or anything else. What matters most is that the LLM understands their purpose and how to invoke them, so it can delegate tasks cleanly instead of guessing or improvising.
- Append this to the conversation context.
Start a Turn
- Append your prompt to the conversation context.
- Send the full conversation context to the Gemini API Server.
Process the Response
- If the server replies with text, stream it to the terminal.
- If the server replies with a tool request, pause and execute the tool locally.
- Capture tool output, append it to conversation context, and send it back to the server to continue to next steps of overall execution strategy.
Repeat the Loop
- Continue until the server provides a final text answer.
- Afterward, quietly ask: “who speaks next?”
- If the answer is “user,” the turn ends.
Handle Special Cases
- If conversation context grows too large, the client triggers context compression. This isn’t a vague summary — the client asks the server to produce a structured <state_snapshot> with fields like <overall_goal>, <key_knowledge>, and <current_plan>. The old turns are discarded, replaced by this compact state plus the most recent exchanges. The effect is like checkpointing in a database: the session stays coherent, but lean enough to keep going without hitting context limits.
- If the server runs out of tokens mid-thought, request continuation. If a call fails, retry with backoff or switch to a fallback model. This resilience ensures you rarely see raw errors — the client absorbs most of them on your behalf.

Each developer-visible exchange — your prompt and the model’s reply — is in fact a Turn. But under the hood, a Turn often hides multiple steps: one or more server calls, local tool executions, retries, and checks for completeness. The client is the conductor, orchestrating all of this before handing control back to you.

Takeaways for Developers

The Gemini CLI hides a surprising amount of orchestration behind each prompt, but you don’t need to memorize every internal detail to use it effectively. What helps most is a clear mental model — a few rules of thumb that guide how you phrase tasks, what you can expect from the system, and where you should focus your attention.

The Client is the agent. It’s the part you can inspect, debug, and extend. The Server is powerful but opaque — a black box you can only interact with indirectly.
One prompt can mean many turns. A simple request may unfold into multiple back-and-forth exchanges between client and server, plus local tool calls you see only the hints of.
State lives locally. The client maintains conversation context, compresses it when needed, and decides how much context to send upstream. The server only knows what it’s given in that one call.
Think about execution. Use a developer’s heuristic: if your prompt feels like a project spec, expect the client to orchestrate a long loop. Sometimes it’s faster and more reliable to give milestones instead of one giant goal. If you’re unsure, you can even ask the model directly: “If I asked this, how would you execute it?”
Be intentional. The client is designed to recover gracefully from errors, but clarity in your request reduces retries and unnecessary tool calls. A little foresight in how you phrase a task often translates into faster, more predictable results.

Closing Thoughts

As engineers, we know that simple interfaces often mask deep systems. The Gemini CLI Client is no different. A single keystroke at your shell launches a cascade of invisible interactions across client.ts, geminiChat.ts, and turn.ts.

The beauty is that you don’t need to think about this while working — but knowing it helps you reason about what’s happening when things get slow or complex. Understanding the hidden choreography — and the role of a Turn — can give you a new appreciation for the design, and a sharper intuition for how to phrase prompts when building with LLMs.

When you notice the model hesitating, looping, or suddenly compressing, these aren’t glitches — they’re the machinery at work. Understanding this makes you less surprised and more in control.

Agentic Loop Examples

Example 1

Prompt: Hello!

Heuristic: Even the simplest prompts carry hidden setup work — context gathering and verification steps you never see. When you just want a quick response, keep in mind that the client may still perform extra checks before handing control back, so expect a small amount of invisible overhead.

Actual Agentic Loop:

Pre-flight (local): The Client collects environment info (OS, working directory, and definitions of available tools) into its local conversation context — no network yet.
API Call #1 (Greeting): The Client sends the combined conversation context (workspace context + tool definitions + Hello!) to the Server. The Server replies with “Hi there! How can I help you today?”.
API Call #2 (Hidden Check): The Client asks the Server “Who should speak next?”. The Server confirms: “user”.

Key Insight: What looks like one exchange is really two API calls, with the first carrying both context and your prompt together.

Example 2

Prompt: find all filenames that contain `foo`

Heuristic: For simple tool-driven prompts, expect the server to choose the best tool (glob tool in this case), the client to execute the tool locally, and then report results back to server. You can make the process smoother by phrasing the request in a way that clearly signals a single tool action — e.g., “List all files with names containing foo.”

Actual Agentic Loop:

Pre-flight (local): The Client collects environment info (OS, working directory, and definitions of available tools) into its local conversation context — no network yet.
API Call #1 (Request): The Client sends the combined conversation context (workspace context + tool definitions + prompt) to the server. The Server responds with a functionCall to glob.
Local Detour: The Client executes glob locally, finds matches.
API Call #2 (Report): Results sent back as a functionResponse. The Server formats the user-facing answer.
API Call #3 (Hidden Check): Confirms turn is finished.

Key Insight: A single prompt becomes three API calls + a local tool run.

Example 3

Prompt: find all filenames that contain `foo`; do any of them contain `bar`?

Heuristic: For multi-part prompts, expect the server to split the work into sub-steps and choose the best tool for each: first gathering candidates (filenames containing foo, using glob tool), then testing them (for occurrences of bar in their contents, using search_file_content tool). The client executes the necessary tool calls locally and reports the results back to the server after each step. To guide the process more clearly, you can phrase the request as a two-sentence plan — e.g., “First list all files with foo. Then check which of those contain bar.”

Actual Agentic Loop:

Pre-flight (local): The Client collects environment info (OS, working directory, and definitions of available tools) into its local conversation context — no network yet.
API Call #1 (Prompt): The Client sends the combined conversation context (workspace context + tool definitions + prompt) to the server. The Server plans and asks client to call glob first.
Local Detour 1: Client executes glob locally.
API Call #2 (Report): Results returned to server. The Server issues two parallel search_file_content calls next.
Local Detour 2: Client executes both searches concurrently.
API Call #3 (Answer): Results returned to server; Server synthesizes the final answer.
API Call #4 (Hidden Check): Confirms the user speaks next.

Key Insight: A single prompt becomes four API calls + three local tool executions.

PS: Written with and about CLI Version 0.1.22

Gemini CLI Masterclass Articles

I’m Pasha Simakov, a Google software engineer passionate about building intelligent systems that help developers work faster and smarter. I’ve been leading adoption of Gemini CLI in software development teams, and I want to share some of the lessons I’ve learned along the way. Beyond my own projects, I also teach these techniques through 1-on-1 mentoring and group masterclasses.

Here are all of my articles on Gemini CLI and related topics:

From Luck to Skill: Using AI to Consistently Win System Design Interviews (2025/9/10) original
Gemini CLI: A Developer's Mental Model (2025/8/21) original
Architecting AI Memory: Lessons from Gemini CLI (2025/8/13) original
Inside the Mind: Gemini CLI's System Prompts Deep Dive (2025/7/19) original
Meet the Agent: The Brain Behind Gemini CLI (2025/7/18) original