Improving LLM’s Reasoning In Production - The Structured Approach

This guide on achieving better reasoning performance of LLMs complements our guide on prompt engineering which explains how to improve LLM setups by systematic and programmatic means to enable flexibility while keeping the amount of operational variability to a minimum. Of course, the language component of a prompt can also be inspected and improved which is called In-context learning (ICL). There are a lot of papers on how to achieve better LLM reasoning performance through this technique that is all about the choice of words and their sequence in a prompt. For anyone interested, the known NLP-technique Retrieval-Augmented Generation (RAG) is a sub technique within the concept of ICL. ReAct and ReWOO are usually always combined with an API to facilitate dynamic context exploitation for deeper user information needs.

Chain of Thought (CoT) Prompting (Zero shot and Few shot Examples)

Chain of Thought Zero Shot Example Prompt from the paper:

https://cdn.prod.website-files.com/63811f0b654325b008a7c1dc/650ac2efa338e7d6eb96ab22_Untitled (6).png

Chain of Thought Few Shot Example Prompt from the paper:

Zero-shot CoT uses special instructions to trigger CoT prompting without any examples. For example, the instruction “Let’s think step by step” works well for many tasks. The final answer is extracted from the last step of the thinking process.

Few-shot CoT uses some examples of questions and answers with reasoning chains to guide LLMs to use CoT prompting. The examples are given before the actual question. The final answer is also extracted from the last step of the thinking process.

In a nutshell, CoT prompting improves the performance of LLMs on many tasks, especially math and logic problems. However, it does not work for all tasks and sometimes gives wrong or multiple answers.

Reflexion Prompting (Reflect on x first and then reply)

Reflexion Prompt Example from the paper:

https://cdn.prod.website-files.com/63811f0b654325b008a7c1dc/650ac29da4b01c81f001a769_Untitled (1).png

Reflexion works by making the agents verbally reflect on the feedback they receive from the tasks, and use their own reflections to improve their performance. Reflexion does not require changing the model weights, and can handle different kinds of feedback, such as numbers or words. It outperforms baseline agents and is especially effective for tasks such as making decisions, writing code. Because the LLM will flag the output as “reflected”, eg. giving it a CoT-style wrong answer plus a REFLECTED correction input. The model will thus be on the lookout for minor errors and answer with a REFLECTED output when doing other reasoning tasks. It is hence obviously not very useful in settings for use cases where being perceived as human is important for the success of the product.

Directional Stimulus Prompt (Do x while taking y related to x into account)‍

Directional Stimulus Prompt Example from the paper:

https://cdn.prod.website-files.com/63811f0b654325b008a7c1dc/650ac2aa9cd8d1e54d9fc174_Untitled (2).png

Directional stimulus prompts act as instance-specific additional inputs to guide LLMs in generating desired outcomes, such as including specific keywords in a generated summary. It’s very simple and straightforward. Like Prompt Alternating, you are most likely already using this technique in your setups already!

Tree of Thoughts (ToT) Prompting (Multiple experts vote on the right answer)

Tree of Thoughts Prompt Example from the paper:

https://cdn.prod.website-files.com/63811f0b654325b008a7c1dc/650ac3adf60de0f185288b7b_Untitled (8).png

Tree of Thoughts technique is effective for solving logic problems, such as finding the most likely location of a lost watch.It asks the LLM to imagine three different experts who are trying to answer a question based on a short story. The LLM has to generate the steps of thinking for each expert, as well as their critiques and likelihoods of their assertions and it also has to follow the rules of science and physics, and backtrack or correct itself if it finds a flaw in its logic. ToT is fun for riddles and very specific problems. It’s quite an academic technique and only useful for very specific and conversational AI use cases.

Reasoning And Acting (ReAct) Prompting (Combining reasoning step with possibility to act)

ReAct Prompt Example from the paper:

https://cdn.prod.website-files.com/63811f0b654325b008a7c1dc/650ac35a5bf4d19bcbe68207_Untitled (7).png

ReAct prompts consist of four components: a primary prompt instruction, ReAct steps, reasoning thoughts, and action commands. Setting up relevant sources of knowledge and APIs that can help the LLM perform the actions is vital. We do not recommend using ReAct in high-stakes environments.

Reasoning WithOut Observation (ReWOO) Prompting (Experts with clearly separated roles, eg. Planner and Solver)

Reasoning WithOut Observation Prompt Example from the paper:

https://cdn.prod.website-files.com/63811f0b654325b008a7c1dc/650ac32d5d6af89f75fe18ad_Untitled (4).png

ReWOO performs better than ReAct despite not relying on current and previous observations. ReAct suffers from tool failures, action loops, and lengthy prompts, while ReWOO can generate reasonable plans but sometimes has incorrect expectations or wrong conclusions. Improving the tool responses and the Solver prompt to enhance the reasoning performance is vital to have good ReWOO performance.

CoH is easy to optimize and does not rely on reinforcement learning or reward functions, unlike previous methods.

Chain of Density (CoD) Prompting (Summarizing recursively and keeping equal word length)

Chain of Density Prompt Example from the paper:

https://cdn.prod.website-files.com/63811f0b654325b008a7c1dc/650ac3413c2b2cc68da6baf5_Untitled (5).png

CoD is a big deal for data mining. We assume that semantic compression is a golden goose to get better and better data from sparse heterogeneous datasets.

Here’s what the prompt instructs the LLM to do:

Identify informative entities from the article that are missing from the previous summary.
Write a new summary of the same length that covers every entity and detail from the previous summary plus the missing entities.
Repeat these two steps five times, making each summary more concise and entity-dense than the previous one.

The purpose of this CoD is to write effective summaries that capture the main points and details of a text recursively, making output more dense every iteration. It uses fusion, compression, and removal of uninformative phrases to make space for additional entities. Of course, the output can be saved in JSON-format for further processing and function calling.

‍

Sparring Time With Opsie!

Opsie is our proprietary internal premise control sparring partner.

How do you prevent the model from overly focusing on the hint and missing the broader context?

Handling edge cases requires incorporating fallback mechanisms that trigger alternative prompts or additional processing layers. Continuous monitoring and retraining with edge case data can help models learn to balance hints with broader context effectively.

What mechanisms would you put in place to detect and address these issues proactively?

Proactive detection and addressing involve implementing monitoring and exception-handling mechanisms. Using watchdog timers, anomaly detection algorithms, and fallback strategies to simpler prompt methods can help. Continuous logging and analysis of tool performance data enable proactive adjustments.

How do you ensure that critical nuances and contextual richness are not lost while creating entity-dense summaries through semantic compression?

Maintaining nuances and context involves using attention mechanisms to weigh key elements more heavily and incorporating redundancy checks where the original context is referenced. This ensures critical information is preserved during compression phases.

How do you measure the trade-off between precision and performance in time-sensitive applications?

Measuring the trade-off between precision and performance involves defining metrics and thresholds. Adaptive iteration approaches, where the depth of summarization is dynamically adjusted based on the context’s complexity and required precision, help maintain efficiency.

How do you seamlessly integrate these prompting techniques with Retrieval-Augmented Generation to leverage their combined strengths?

Seamless integration with RAG involves optimizing the retrieval phase to align with the specific needs of the prompt method being applied. Unified embeddings, context-aware retrieval systems, query expansion, relevance feedback, and dynamic context adjustment enhance retrieval accuracy and relevance.

How To Improve Reasoning For LLMs?

Each approach has different advantages and disadvantages. You need to tinker and work your way up to achieve alignment of your use case and the input/output of your setup. So which technique is best? In our view, CoT, Directional Stimuli and RAG are the best way to build e-commerce chatbots where low latency answers are required. For high-stakes chatbots, for example in legal case research or research and development academic settings, agentic techniques like ReAct and ReWOO are best to get deep context exploitation results to the users.

Each of these techniques offers unique ways to enhance the performance of LLMs, making them more flexible, reliable, and context-aware. They are all based on the principle of prompt engineering, which involves the strategic use of prompts to guide the model's responses pragmatically and with minimal variability.

Let's Work Together Starting Today

If this work is of interest to you, then we’d love to talk to you. Please get in touch with our experts and we can chat about how we can help you get more out of your IT.

Send us a message and we’ll get right back to you. ->

Cloud

What is MIG? Multi-Instance GPU Benefits Explained

Multi-Instance GPU (MIG) is a new technology that allows a physical GPU to be partitioned into separate instances, providing significant benefits for AI deployments and GPU utilization. With MIG, a single GPU can be divided into multiple instances, each with its own high-bandwidth memory, cache, and compute cores. This enables fine-grained GPU provisioning, allowing IT and DevOps teams to allocate the right-sized GPU instance for each workload, optimizing resource utilization and improving performance.

Contract Review & Case Discovery

What is CaseFleet? Digitalization With AI In Case Management

CaseFleet is a tool that merges benefits of artificial intelligence with everyday case management processes. By automating tasks such as document organization, data extraction, and case analysis, CaseFleet improves efficiency and accuracy, saving valuable time and reducing manual work. With its ability to quickly analyze large volumes of data and provide valuable insights, CaseFleet empowers legal professionals to make more informed decisions and streamline their workflows.

Education

How AI is Transforming Employee Onboarding and E-Learning

Organizations are leveraging AI to revolutionize employee onboarding and e-learning. AI introduces innovative solutions that streamline processes and enhance learning experiences.

Education

What is Absorb LMS? E-Learning and Artificial Intelligence - A Perfect Match?

With Absorb LMS, administrators can use natural language to perform tasks and gather information, making LMS administration faster and more efficient. The AI-powered search functionality provides highly relevant search results, while AI-driven search optimization and Absorb Pinpoint transform video lessons into microlearning courses. With AI-powered transcription and search, learners can easily find the information they need, and organizations can gain valuable insights into training gaps and learner engagement. Overall, Absorb LMS and AI are enhancing learning experiences, driving engagement, and simplifying administration tasks.

Marketing

What Is Optimizely? How Marketers Use AI For Automated A/B Testing And Better Business Decision-Making

Optimizely provides a digital experience platform offering A/B and multivariate testing, personalization, and feature toggles, alongside content management and digital commerce. Its AI-powered DXP enhances the digital experience lifecycle with enterprise-ready applications and use cases.

Marketing

What Is MarketMuse? - Artificial Intelligence Use Cases for SEO

Market Muse is an AI-powered content planning and optimization tool that revolutionizes content marketing strategies. By utilizing AI and machine learning, Market Muse analyzes content, suggests topics to cover, and provides data-driven insights for content marketing strategies.

Our Work

We Replaced Four Facebook Ad Managers With OpenAI, Amazon Reviews, and Slack

We built a custom Slack Bot that analyzes Amazon reviews to create targeted FB ads, uses DALL·E for matching images, crunches ad data from Google Sheets, and predicts future ad performance. The total headcount of the creative team was reduced from five to only one in-house creative who controls and monitors the new workflow.

Marketing

Boosting Average Order Value with AI - How Zalando, Amazon, and Stitch Do It

Implementing AI in e-commerce can significantly increase average order value (AOV) by leveraging personalized product recommendations, optimizing pricing strategies, and automating customer support processes. AI-powered chatbots can provide instant assistance and product expertise, guiding customers towards higher-value purchases. AI can also analyze customer data to identify patterns and trends, allowing companies to create targeted marketing campaigns and deliver personalized messaging and offers.

Marketing

What Is Jasper? - AI in Sales and Marketing To Increase Revenue

Jasper AI is an innovative writing assistant that uses artificial intelligence to help users make money online. With its advanced natural language processing and machine learning algorithms, Jasper AI can generate high-quality, original content quickly and efficiently. Whether it's creating blog posts, social media content, or product descriptions, Jasper AI offers a wide range of features and templates to streamline content creation and maximize earning potential. From affiliate marketing to offering writing services, Jasper AI provides users with 24 different ways to generate income online.

Machine Learning