Engineering with AI 2: Prompt Engineering

By Eric Koyanagi

Posted on 03/7/24

Using the OpenAI API to create a basic assistant is scary-easy, but what's all this "prompt engineering" people keep talking about? Let's understand how we can optimize our prompts based on OpenAI's guide.

How much detail is too much...?

OpenAI isn't like Google, where sometimes more terse and descriptive queries yield better results. For example, if you want a lot of detail in a response, you should ask for it explicitly. "Summarize this message" obviously isn't as good as "Summarize the transcript in two paragraphs". But we can get more specific than that: "You read customer messages. Provide a one sentence summary of the customer's mood, then a detailed summary of at least 1 paragraph describing their message. Focus this summary on any issues the customer is having."

This is very much like natural language programming; you have to coerce the transformer model into your desired output, so you need to be specific about what you want. You can also be a bit creative in how you describe something like length: using a specific number of words, a number of bullet points, and phrases like "at least" can help shape your output.

Another facet of "detail" is asking the AI to pretend. It's already been shown that LLMs might be better at math when pretending to be Star Trek characters (other than Tom Paris I assume), and...no one really knows why. We do know it's a powerful shortcut because LLM's are natural language specialists. When you tell it to "act like something", it gives it shorthand context on the sort of style it should adopt. Based on how the transformer model works, we know that the model can pay more attention to some words than other. So if you ask the AI to pretend to be an engineer, it's more likely to pull from the vector cloud of engineering-related words.

Another facet of detail is the use of delimiters. GPT can understand what's intended when we use delimiters and placeholders. Again, it's a natural language specialist, so it can do this with some flexibility. The examples listed in the docs do a great job explaining this, and I don't just want to regurgitate what I think is a pretty simple concept. Delimiters allow us to also you to shape the output, useful if you're feeding output back into some system or process.

You can also "delimit" specific steps you want to take. For example:

Step 1: provide a one sentence summary of the customer's mood.

Step 2: make a detailed summary of at least 3 bullet points describing their message. Focus this summary on any issues the customer is having.

Taken together, there's a lot of granularity in how specific you can be in asking for a result...you're only constrained by your creativity and patience (and in tweaking the parameters mentioned in the last article).

Embeddings

Let's say you have a very specific use case that has data outside OpenAI's knowledge. In other words, it's data so fresh or specific OpenAI hasn't had a chance to dig its claws into it just yet. Training a model at this scale is so immense (and might cost hundreds of millions of dollars), there's no way it can always have fresh data. Embeddings are vector representations of words clustered based on their relatedness. We've talked about the idea of the vector database in other articles, so we won't review more than that!

Not only does the API allow us to obtain embeddings, we can provide them with prompts to give the model more data. This might be information outside its context. For example, maybe you're making a chat bot that you want to load up with internal company documents so that coworkers can ask the bot questions. You can also run analysis on the vectors OpenAI returns using classic algorithms or machine learning. It would be fun to imagine a way to visualize these in something like Unity, with a dimensional slider that peels back each layer...but maybe that's a project for another time.

Prompt Chaining

It's one thing if you have a simple one-off query that has a single response, but sometimes you need interactive layers. By chaining prompts and coercing output into some conventional format like JSON, we can turn a conversation into a sort of state machine. When the state changes, we can feed that state into a new prompt that guides the user toward more detailed information. See the example in their playground for a more detailed view. This also breaks the prompt into steps and uses delimiters to improve clarity. What's impressive is that the bot can actually help beyond the explicit options listed here. For example, when it asks for the model number and the user replies that they do not know, it might prompt them to check the back of the device. That's the magic of the LLM in action...and being fair, also a part of the risk.

That's why we are careful to include a caveat: if the user starts asking about anything else, don't answer and end the chat.

As this becomes more complex, forward-thinking individuals might think this starts to resemble more traditional application design...and is well-suited to no-code interfaces. I believe native tools will eventually materialize allowing prompt chaining to be visualized like a behavior tree. This article explores how these tools might look, referencing work done by Google and the University of Washington (here).

Forcing Robots to "Think"

When dealing with math especially, the LLM might fail on reasonably simple things. It's a language model, not a math model, right? Asking it to be explicit with its steps (sort of like showing your work) can help, coercing it into 'thinking' about the problem before just spitting out an answer. For example, maybe you need the AI to check a pricing calculation. Instead of just asking it if the data is correct, break it into two steps. First, ask it to compute the proper value, itself...then to compare that value to the provided answer to check for correctness.

You can also accomplish this with prompt chaining. In this case, you'd ask the AI to compute the correct value, first. With this prompt, it can't be biased by the supplied answer since we don't provide it. We can then follow up with a second prompt to check for correctness. "Correctness" might be specific or it might be subjective, that's the power with LLMs. You can define that criteria in natural language, and the LLM has a better chance with subjective interpretations than hard math, anyway.

Speaking of hard math, there is one other technique when dealing with computations that can improve accuracy. For example, you can simply declare in the system role that the AI can write and run python code by wrapping it in a delimiter, and that it should use this to perform calculations. GPT will happily comply (in theory), if there is some use case where you do need mathematics. This makes a sort of sense. The LLM is a language specialist, not a math person. But code is language, so it can happily write code to do the math easier than doing the math itself.

Honestly, I can relate with that.

Conclusion and Next Steps

There's more to explore with embeddings and prompt chaining. Before we go into details, let's make a real app using everything we've learned so far, with some sprinkles of conventional software engineering.

« Back to Article List