Engineering with AI 1: Creating SOWs with AI

By Eric Koyanagi

Posted on 03/7/24

In traditional software engineering, humans act as translators, mediums between abstract business logic and physical electrons moving around on a computer. Our job is ultimately to send instructions to a CPU -- it's a (generally) deterministic world where code affects a system in predictable ways based on how well an engineer translates those requirements into code.

The idea of deep learning specifically is to eliminate this translation layer by creating synthetic brains conditioned to respond to inputs in desirable ways. It isn't just about how AI tools "automate" coding, it's the core idea that deep learning offers a novel (but not always better) way to "interact" with the compute resources...by essentially conditioning these mini-brains to "understand" language. Because we have no freakin' clue why neural nets even work, all we can do is throw quotes around words like "learn" and "know", no matter what exaggerated claims you might see about how these models "know" things.

Prompt Engineering with OpenAI

There's already a wealth of tools you can use to engineer bespoke AI workflows. SageMaker is an example; as a cloud-based solution, it can tap into other AWS resources like Mechanical Turk and leverage the scale Amazon is famous for. However, the first question to consider in creating custom AI pipelines is a fairly classic one in engineering: do it yourself, or use an API?

Let's remember that "GPT" stands for "generative pre-trained model". That "pre-trained" bit is important, because they've already done the (massively) difficult part in training their AI against Internet-scale data. You might need your own AI stack with your own vector word databases and bespoke configurations, but for something basic...you can tap into GPT's already-trained model.

What this means is that you can't force GPT to "train" itself on your input like you might expect with a more traditional machine learning or deep learning pipeline. This might seem like a non-starter if you're looking for a specific pipeline because you'd want the AI tot be trained on some very specific set of text or documents.

However, you have to think on a larger scale. GPT has been trained on such a vast realm of text, the entire point is that it can be enough of a generalist to accomplish what you need with the right prompting.

To understand this, we're going to start here and read every bit of documentation we can as published by OpenAI itself. If you're really the impatient type, you should head straight to the playground here to get a glimpse of what this platform is capable of without writing a line of code.

A Basic Example

The first thing you have to understand is that prompt engineering isn't entirely science. The goal is to "condition" the tone and content and shape a more desirable (and specific) outcome from user prompts. Let's see a simple example in NodeJS:

import OpenAI from "openai";


const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});


const response = await openai.chat.completions.create({
  model: "gpt-3.5-turbo",
  messages: [
    {
      "role": "system",
      "content": "You are a developer responsible for generating statements of work (SOWs) for PHP development projects. You always ask for 20% down and 80% after completion. Yo are friendly and thank the client for the work. "
    },
    {
      "role": "user",
      "content": "Create a SOW for 50 hours of WP work focusing on the backend and optimizations"
    }
  ],
  temperature: 1,
  max_tokens: 256,
  top_p: 1,
  frequency_penalty: 0,
  presence_penalty: 0,
});

Note that all important system role. This is very different from a simple chatGPT session where you only have an input for a query.

The "system" role allows you to "direct" the model into a more specific response. This is where you can define "what you want" at a very high level. The user-provided query will then return a response based on this system role.

In this example, the response I get looks something like this:

Statement of Work

Project Title: WordPress Backend Development and Optimization Project

Project Overview:

This statement of work outlines the scope of work, deliverables, timeline, and payment terms for the WordPress backend development and optimization project.

Scope of Work:

The scope of work includes:

- Backend development to enhance website performance and functionality

- Database optimization for improved speed and efficiency

- Implementation of caching mechanisms to reduce load times

- Security enhancements to protect the website from vulnerabilities

- Code refactoring for better maintainability and scalability

- Integration of monitoring tools for performance tracking

- Documentation on backend changes and optimizations made

Deliverables:

The deliverables for this project include:

- Optimized backend architecture for improved performance

- Enhanced database structure and indexing for faster query execution

- Implemented caching strategies to speed up page loading times

- Enhanced security measures to protect against threats and attacks

- Refactored codebase for easier maintenance and future updates

- Monitoring tools set up for tracking website performance metrics

Timeline:

This project is estimated to be completed within 50 hours of backend development and optimization work. The timeline for each milestone will be as follows:

- Week 1: Backend analysis and planning for optimizations

- Week 2-3: Database optimization and caching implementation

- Week 4-5: Security enhancements and code refactoring

- Week 6: Integration of monitoring tools and final testing

Payment Terms:

Payment for this project will be divided into two milestones:

- 20% upfront payment upon signing of the agreement

- 80% final payment upon completion and approval of the project

Acceptance:

By signing below, the client acknowledges and agrees to the outlined scope of work, deliverables, timeline, and payment terms for the WordPress backend development and optimization project.

Client Signature: ____________________________

Developer Signature: ____________________________

Date: ____________________________

Thank you for choosing us to handle your WordPress backend development and optimization project. We are dedicated to delivering a high-quality solution that enhances the performance and security of your website. Please feel free to contact us if you have any questions or require further clarification. We appreciate the opportunity to work with you on this project.

Okay, this is obviously very boilerplate. However, we can see that the result honors both the payment terms and the directive to "be friendly and thank the client for work". Asking the model create a SOW without that little bit omits the last paragraph, which is a bit more polite than the default. Specificity is important!

OpenAI Params: Temperature and More

Intuitively, we already see how critical it is to set the right "system" role. Better prompts will be more specific, too, but we really want to be sure that the "system" role is a solid high level description of what we're trying to achieve. Beyond that, let's dig into the other parameters we can tune (docs):

Temperature determines how volatile the assistant role is (i.e. the answers the bot returns). The hotter we go, the more creative...while cooler temperatures will result in a lot more predictable responses. Sometimes you want more creativity in answers, but in this specific example a lower temperature probably makes more sense. We want to have a level of certainty over what's spit out in documents like a SOW.

Frequency and presence penalties are used to tune how often repetitive sequences of tokens are used. The frequency penalty coerces the model into using more diverse language instead of repeating a word it's already used. This is very similar to the presence penalty, which tries to reduce word repetition overall, encouraging a more diverse vocabulary. This really needs an example to make sense. If an AI were writing dialog, the frequency parameter might be concerned with a character saying something like "shit! shit! shit!". This might be natural depending on the context, but with the frequency parameter, it will try to avoid having a string of the same word repeated so close together. On the other hand, the presence penalty is more concerned with the entire dialog. If a character just muttered the word "shit", it will try to opt for a different word entirely to create a more broad vocabulary in general. I don't know why my mind immediately went to a vulgar example, but it's actually a good example of why these parameters can require flexibility...because sometimes language does repeat like this and that's what you want. For example in an SOW, we can probably tune these a bit lower (especially the presence penalty) because we might need some terms to repeat (like "PHP").

logprobs is used for debugging and will return the probabilities of your output tokens and a (short) list of alternate tokens and their probabilities. This can help us understand what's going on behind the scenes and how confident the model is in its choices. This can help us tune further.

top_p is an alternative to temperature -- for example, setting this to "0.1" means that only the top 10% of tokens "with the most probability mass" are considered in output. One important tip the docs note is to either adjust temperature or top_p...but not both.

model the model to use -- this example uses gpt-3.5-turbo.

Next Steps

First, we can play with different models -- GPT-4 is vastly more powerful than 3.5, but do we really need it for something relatively simple? The more detailed we make the "system" role, the better GPT-4 will do compared to 3.5. Once you finally get this tuned to a reasonably happy place, you can integrate it into an actual application. You might throw some NodeJs code into a Lambda or run it with a more traditional API infrastructure.

In this example, you'd probably want to format the output as a PDF, perhaps injecting the body into some template so it has a bit more style and branding. You'd also want to save the output to a classic database or memory store...unless you really trust GPT, you want a log of output you can review, because the model can always benefit form more tuning. This could be changing params or tweaking the prompt, so you probably want to save the "system" prompt used and think about how to version this properly. This way, you can easily review historic results and compare it with different system prompts.

Another other key next step might be experimenting with "prompt chaining", fine tuning, and function calling...all concepts that will be covered in the next article.

« Back to Article List

This Small Corner

Engineering with AI 1: Creating SOWs with AI

Prompt Engineering with OpenAI

A Basic Example

OpenAI Params: Temperature and More

Next Steps