The Field Note about command intent

Mark Piper, Principal Explorer

04 October, 2024

Back in May, we introduced Nimbot - our proof of concept chatbot designed to help researching topics easier.

One of the things we discussed in the post was command input. Specifically, we discussed using slash commands to direct user input. It looked a little something like this:

Essentially, you type /save https://distantfield.space/ to save an interesting link, and Nimbot would go off and save it or maybe you smash /fetch 0ea7aeb7-b7f0-4f09-b88a-2311f729ce4c into the chatbot and Nimbot would fetch the item from our database and so on and so forth.

One of the things we learnt after months of use, especially when collaborating within a Google Chat Space, is that it didn’t really work well. You easily broke flow of both your own thinking or the conversation at hand. It’s hard to put into words, but it’s almost like you find yourself having to stop, evaluate reality and then wonder how would you ask the Terminator.

In between jobs over the last month or so, we’ve been working towards moving Nimbot from a proof of concept quality to reliable, production ready code (which can be deployed into any Google Workspace account to help teams track intelligence or information items).

We’ll cover more in subsequent field notes, but today I want to touch on modifying the code for command intent. How can we make a better flow for our experience?

A quick side note - We have renamed Nimbot as part of this work to Fungate. There’s multiple reasons for this, but one of them is that we believe there needs to be a clear distinction between AI driven workflows and human driven ones. This isn’t to say that we don’t believe having natural language workflows is wrong, it’s just we believe all cards should be on the table so users can keep in the forefront of their mind that it is code executing in the background, not a sentient being. Even a sentient fluffy dog.

The Idea

In order to avoid having to type slash commands and break flow, as well as maximising how we can pass through to Gemini in general, when we refactored Fungate’s code, we introduced the command intent AI query with Google Gemini (currently via gemini-1.5-flash-002).

At a high level, it looks something like this…

What’s happening here? Let’s go through step by step:

The user request has come in via the Google Chat API service which might be text, a card or a file and it might be a quoted reply to an existing message. We do some initial work to determine what the basic context of the request is.
We then gather a sample of the content (to keep token use down), and fire it off to getGeminiIntent() which is a wrapper function that calls our internal Gemini API service with a per-function prompt.
The prompt is processed and the response is returned in a structured format for our code to compare to a map of known handlers. If the intent is known, we fire the request off to the handler. If it’s unknown, we fire it off to a Gemini passthrough handler for kicks.

The prompt itself is fairly straightforward. Like all good prompts, we try to be as specific as possible about our request and intended output. We currently return in a 2 line (line delimited) format, but this can be just as effective with JSON structured output.

Here are the core components of the prompt:

Is it perfect? No, but in testing it’s right about 99% of the time and that’s good enough to get working with.

The Flow

So what does it look like for the explorers? Fungate can be interacted with in two key modes, direct messages (DM) and spaces. In DM mode, the only real difference is the default intent is save and there’s no need to @ the bot. For this example, we’re going to assume we are in a thread, within a space.

Our example starts with someone posting an interesting link. It might be of interest, but maybe we want a quick summary.

Note, we didn’t have to say ‘summarise’ in order to initial the summary. Our intent prompt successfully got Gemini to guess that what we wanted was a summary. Naturally, we could just say ‘summarise’ if we wanted.

Ok, looks useful, let’s save it for later into our intelligence database - SporeCore…

Not only can you see we are using natural language to process through stuff, but we are also able to keep it as a reply or in the thread making for a much more enjoyable experience as a user of the system.

The command flow works for all the original slash commands (/fetch, /save, /summarise etc.), but if it’s not something we’ve built in, we can also just push the request onwards into Gemini Flash. For example, maybe we have a string to decode, we can do it right in the chat:

Final Thoughts

When we set about moving the commands from the classic slash commands to the intent engine, we had no idea how it was going to go. As we have written extensively about in the past, LLMs are great at many things, and not so great at others. It turns out however, that sentiment analysis and understanding the intent of a request are two things they are very good at. This meant adapting our code to leverage Gemini Flash was minutes, not hours of effort.

Leveraging the natural language capabilities into Fungate has completely changed the way it feels to use our assistant. The resulting chat space or DM no longer looks out of place with slash commands everywhere and broken flows.
It’s also worth noting that the overall code implementation is very small. We take a small sample of the user request, send it to our standard Gemini response and then read a basic JavaScript map of known intents and associated functions. That’s it. Given the size of the prompt and the current cost of Gemini Flash, we don’t even need to think about pricing with the current workloads.