Prompt Engineering In Production - The Structured Approach

Large Language Models (LLMs) like GPT-4 have transformed the way we interact with AI, but to achieve reliable and consistent results in real-world applications, we need a more structured approach to prompting to keep the amount of operational variability to a minimum. This guide is about prompt "engineering", namely the systematic and programmatic use of prompts. To learn how to improve reasoning of LLMs through the use of language, see our In-context learning guide (ICL).

Define a Demonstration Set

A demonstration set plays a pivotal role in prompt engineering, as it serves as a reference point for designing, testing, and evaluating prompt candidates. Comprising carefully crafted input-output pairs, the demonstration set allows us to gauge the effectiveness of each prompt candidate in generating the desired response from an LLM. These pairs encompass the expected input, which is the data we want the LLM to process, and the expected output, which is the accurate and relevant response we aim to obtain.

The demonstration set serves three primary functions:

Measuring the accuracy of our prompt: By comparing the LLM's generated output to the expected output in the demonstration set, we can quantitatively assess the accuracy of each prompt candidate. This helps us identify the most effective prompts and refine them further to improve their performance.
Specifying the expected shape of prompt inputs and outputs: The demonstration set also provides a clear outline of what the inputs and outputs should look like, ensuring that the prompts are well-suited to the task at hand.
Providing exemplars for few-shot prompting if necessary: In cases where few-shot prompting is employed, the demonstration set can offer a subset of examples to be included with the prompt. This additional context enables the LLM to better understand the problem and generate more accurate responses.

For instance, if we're developing a prompt to convert temperatures from Celsius to Fahrenheit, our demonstration set might include input-output pairs like ("30°C", "86°F") and ("100°C", "212°F"). By evaluating the performance of various prompt candidates against this demonstration set, we can ensure that our final prompt consistently and accurately converts temperatures, ultimately enhancing the practical utility of the LLM.

Create Prompt Candidates

To create prompt candidates, we first need to identify the specific behavior or output we want the LLM to produce. Then, we generate a diverse set of potential prompts, each designed to evoke the target response in a slightly different way. This variety is crucial, as it allows us to test and iterate on the most effective prompts, increasing the likelihood of obtaining accurate and reliable results from the LLM.

Let's say we are aiming to build a prompt that can provide us with a summary of a movie plot. We might create the following prompt candidates:

"Summarize the plot of the movie '{}'."
"In a few sentences, describe the story of '{}'."
"What is the storyline of the film '{}'?"

By generating these distinct, yet related prompt candidates, we set the stage for prompt testing. This step enables us to compare and analyze the performance of each candidate, ultimately helping us select the one that most effectively elicits accurate and concise summaries from the LLM. As we experiment with various prompts, we may also uncover patterns and insights that inform the creation of even better prompts, further refining our approach and increasing the overall reliability of the LLM's output.

Use Advanced Prompting Techniques

Advanced prompting techniques offer powerful tools for extracting diverse and nuanced responses from LLMs, enabling more granular control over the LLM's behavior and focus. By employing these techniques, we can fine-tune the LLM's output to better align with our specific needs and requirements.

Prompt Alternating

Prompt alternating involves switching between two user-given prompts during output generation. For example, if we want to elicit opinions on two different topics, such as cats and dogs, we could alternate between the prompts "Tell me something interesting about cats" and "Tell me something interesting about dogs". This alternation ensures diverse responses, preventing the LLM from becoming fixated on a single topic. Prompt alternating is used a lot for synthetic dataset generation.

Python 3 Example for Prompt Alterning — Python 3 Example for Prompt Alternating

Prompt Editing

With prompt editing, we can dictate how many tokens the LLM generates before switching to another prompt. For instance, if we want a brief comparison of two movies, we could set the token limit to 10 and use prompts like "Compare the plot of Movie A to Movie B" and "Discuss the acting in Movie A and Movie B". This allows for very diverse output that touches on multiple aspects of the comparison. Of course, choosing a small token limit will generate extremely short outputs which can then be further used for ICL.

Dynamic Prompting

Dynamic prompts involve intelligently adapting and adjusting the prompts in real-time based on the user's input or other contextual information. This enables the LLM to generate more relevant and targeted responses. This works because LLMs are very good at classification tasks. For example, we can use conditional statements to utilize a prewritten prompt based on the user's preference:

Pre- and Postprocessing

Preprocessing and postprocessing are essential components of effectively deploying LLMs in production environments. These stages help vastly improve the LLM's input and output, ensuring that the generated content is both relevant and contextually appropriate, ultimately leading to more reliable and accurate AI systems.

Preprocessing

Preprocessing, like chunking, involves preparing and transforming the raw input data into a format that the LLM can understand and process efficiently. This may include cleaning the data, tokenizing the text, or converting it into suitable embeddings. Here, we preprocess user input by removing special characters through regular expressions.

Postprocessing

Postprocessing encompasses refining the LLM's output to meet specific requirements, such as filtering out sensitive information, formatting the text, or validating the response for accuracy and consistency. Here, we postprocess the LLM's output to remove any excessive whitespace.

Sparring Time with Opsie!

Opsie is our (imaginary) external audit & consulting sparring partner who answers all the naïve and uncomfortable questions. Let’s spar!

Q: Is there a risk of overfitting our LLM to specific prompt candidates, limiting its broader applicability or ability to handle less predictable queries?

Opsie: There is a potential risk of overfitting the LLM to specific prompt candidates. It's like studying for a test by memorizing the answers to previous exams; if a new question arises, the student (or model, in this case) may not perform well. This can be mitigated by using diverse and comprehensive training data and periodically validating the model's generalization capabilities.

Q: Advanced prompting techniques could be seen as forcing the model to behave in a certain way. Is there a danger that we might lose some of the richness and novelty that LLMs can bring by thinking outside the box?

Opsie: It's a delicate balance to maintain. Over-restricting the LLM could indeed stifle its creativity or the 'thinking outside the box' capability. Some direction is needed to ensure that the LLM's outputs align with the user's intent.

Q: Prompt Alternating: Does this technique lead to a loss of depth in the model's responses since it keeps switching between topics?

Opsie: This technique could potentially lead to a loss of depth if not managed appropriately. This could be mitigated by considering the context and designing the sequence of alternating prompts such that the continuity and depth are preserved.

Q: Prompt Editing: By controlling the length of the response to each prompt, aren't we risking that some topics might not be covered as comprehensively as they should be?

Opsie: The risk of not covering some topics comprehensively due to controlling the length of responses is a real one. This could be addressed by iterating on the prompts, based on the initial responses, to dive deeper into the required areas.

Opsie: This could lead to confusion if not done carefully. The fused prompts should be semantically coherent, and the resulting responses should be tested for understandability and relevance.

Q: Dynamic Prompts: Might these potentially lead to unpredictability in the model's responses, making it harder for users to understand what to expect?

Opsie: This can lead to unpredictability in the model's responses, but it also offers the opportunity for more creative and context-aware responses. User feedback and rigorous testing are essential to manage this unpredictability.

Q: With Preprocessing and Postprocessing: Is there a risk of losing important information during these processes? How can we make sure that the model doesn't omit critical nuances or context?

Opsie: There could indeed be a risk of losing important information during these processes. It's important to design these processes in such a way that they preserve the key information and context. This again involves rigorous testing and iteration.

Q: How scalable is this approach? Can we maintain these levels of prompt engineering as the model size and user base grow?

Opsie: The scalability of the approach is indeed a valid concern. As the model size and user base grow, the prompt engineering effort could potentially grow as well. This can be managed by using automated methods for prompt generation and evaluation, and also by leveraging community contributions for creating and refining prompts.

Opsie: Well, I have to jump now, lads. I have dinner with a few chicken I met in law school ages ago.

So How To Do Prompt Engineering In Production?

The process of prompt engineering involves two critical steps: defining a demonstration set and creating prompt candidates. A demonstration set, composed of input-output pairs, serves as a reference point to measure the accuracy of prompts, specify the expected shape of inputs and outputs, and provide exemplars for few-shot prompting when necessary. Prompt candidates are potential prompts generated to elicit the desired behavior or output from an LLM. For production environments, preprocessing and postprocessing play a vital role in the ongoing maintenance and performance of your LLMs.

As user demands evolve and new data becomes available, these components help maintain the accuracy, flexibility, and relevance of your LLM setups, ensuring that they continues to meet the changing needs of your users.

Start Your Project Today

If this work is of interest to you, then we’d love to talk to you. Please get in touch with our experts and we can chat about how we can help you.

Send us a message and we’ll get right back to you. ->

Keep Reading

Marketing Tools

What Is MarketMuse Used For? - Artificial Intelligence Use Cases for SEO

Market Muse is an AI-powered content planning and optimization tool that revolutionizes content marketing strategies. By utilizing AI and machine learning, Market Muse analyzes content, suggests topics to cover, and provides data-driven insights for content marketing strategies.

Case Studies

We Replaced Four Facebook Ad Managers With OpenAI, Amazon Reviews, and Slack

We built a custom Slack Bot that analyzes Amazon reviews to create targeted FB ads, uses DALL·E for matching images, crunches ad data from Google Sheets, and predicts future ad performance. The total headcount of the creative team was reduced from five to only one in-house creative who controls and monitors the new workflow.

Customer Relations

Boosting Average Order Value with AI - How Zalando, Amazon, and Stitch Do It

Implementing AI in e-commerce can significantly increase average order value (AOV) by leveraging personalized product recommendations, optimizing pricing strategies, and automating customer support processes. AI-powered chatbots can provide instant assistance and product expertise, guiding customers towards higher-value purchases. AI can also analyze customer data to identify patterns and trends, allowing companies to create targeted marketing campaigns and deliver personalized messaging and offers.

Marketing Tools

What Is Neuroflash? - The Impact of AI on the Evolution of the Content Industry

Neuroflash is an innovative platform that harnesses the power of artificial intelligence (AI) to generate original and relevant text for a wide range of content types. With the ability to specify tone, style, and target audience, Neuroflash empowers users to create high-quality content that aligns with their brand. By leveraging AI technology, Neuroflash saves time and effort in content creation, revolutionizing the way businesses generate profit and enhance their operations.

Marketing Tools

What Is Jasper AI Used For? - AI in Sales and Marketing To Increase Revenue

Jasper AI is an innovative writing assistant that uses artificial intelligence to help users make money online. With its advanced natural language processing and machine learning algorithms, Jasper AI can generate high-quality, original content quickly and efficiently. Whether it's creating blog posts, social media content, or product descriptions, Jasper AI offers a wide range of features and templates to streamline content creation and maximize earning potential. From affiliate marketing to offering writing services, Jasper AI provides users with 24 different ways to generate income online.

GDPR & Security Guides

Negative Autonomy - The Democratic Need For Negative Freedom Of Information

What if there exists a right in the premises of liberal democracy, that has been overlooked and underutilized, not intentionally concealed but simply unrecognized as a crucial set of principles to acknowledge.

GPT-4 / LLMs

Improving LLM’s Reasoning In Production - The Structured Approach

This guide on achieving better reasoning performance of LLMs intends to complement theguide on prompt engineering by programmatic and systematic to increase flexibility while keeping the amount of operational variability to a minimum.

E-Learning Platforms

What is Duolingo Max? - The GPT-4 Powered Language Learning Tool Explained

Duolingo is a leading educational technology company that specializes in app-based language learning. With over 300 million users worldwide, Duolingo offers courses in 40 languages and provides a gamified learning experience through its app. The company utilizes artificial intelligence to personalize lessons, provide real-time feedback, and create interactive exercises. Duolingo's mission is to make language learning accessible and enjoyable for learners of all ages and backgrounds.

GDPR & Security Guides

LLM Security Guide - Understanding the Risks of Prompt Injections and Other Attacks on Large Language Models

The increasing use of Large Language Models (LLMs) in various applications has brought attention to the vulnerabilities and security risks associated with these models. Prompt poisoning and security vulnerabilities in LLM apps can lead to prompt injection attacks, data leakage, and other harmful consequences. It is essential for organizations to prioritize prompt security measures to mitigate these risks and protect sensitive data.

GDPR & Security Guides

GDPR for US Companies - Everything You Need To Know About GDPR Compliance

GDPR, or the General Data Protection Regulation, is a set of rules and regulations that aim to protect the personal data of individuals in the European Union. It replaces outdated data protection rules and provides greater protection and rights to individuals in the digital age. Compliance with GDPR is essential for businesses to avoid penalties and maintain trust with European clients.