Effectively Harnessing LLMs: Non-Technical Strategies from a Technical Founder

A Step-by-Step Guide to Efficiently Apply the Power of Large Language Models to Accomplish Complex Tasks

May 02, 2024

A futuristic scene depicting a man teaching an AI how to perform a task. The man, in his 30s, has short black hair and wears a casual blue shirt and jeans. He stands beside a sleek, humanoid robot, which has a smooth silver surface and a screen displaying code. They are in a high-tech room filled with gadgets and screens showing various data. The man gestures towards a large digital display that illustrates the task, while the robot appears attentive, its head tilted slightly as if in concentration. — Generated by ChatGPT

Welcome to A Founder’s Life for Me! I’m Alek, and based on my experiences building companies, I provide practical recommendations on how to build your company and career.

Dealing with the shortcomings of AI.

Artificial intelligence (AI) has become a powerful technology over the last two years. Yet, as the founder of an AI company, I’m constantly dealing with its shortcomings.

You’ll leave this article with:

examples of tasks that out-of-the-box large language models (LLMs) struggle with
strategies to improve the results you’re getting from LLMs

Subscribe below to continue learning from my experiences as a founder through weekly 10-minute reads.

LLMs struggle with complex deductive reasoning.

My software company, SolidlyAI, analyzes client meetings to help B2B teams drive more revenue and retain their clients. Solidly needs to know “who said what” in every client meeting to perform its analysis. I’ll spare some of the details, but one task the LLM is left with is to identify the speakers from a call transcript like this:

Speaker A: “Hey Ginny, how’s it going?”
Speaker B: “Howdy.”
Speaker C: “Hey Fred. Not bad. How are you doing, George?”
Speaker B: “Things are good here. Not too hot outside today.”
[conversation continues]

To understand “who said what,” the names of Speaker A, Speaker B, and Speaker C need to be identified. This task requires more deductive reasoning than LLMs can easily accomplish today.1

V1: Start simple with your LLM prompts.

I always start simple when working with LLM prompts. I tell the AI what its task is going to be with as few instructions as possible:

AI Instructions:
You are a helpful AI assistant who is an expert in speaker inference. I will send you call transcripts with unidentified speakers (e.g., ‘Speaker A,’ ‘Speaker B,’ ‘Speaker C’). You will respond with your best guess at the names of these speakers.

This provides the LLM with an understanding of the goal and the flexibility to determine how to achieve it. It leaves it to the AI’s ‘judgment’ on exactly how it will determine who is who. Sometimes, the simple solution works, and you don’t need to spend more time on it! But, especially with more complex tasks, the simple solution leads to three main challenges:

Accuracy: Because of the complexity of the task, the LLM very often gets the answers wrong.
1. In the example above, the AI might incorrectly respond with “Speaker A is Ginny.”
Reliability: LLMs aren’t deterministic. You will get different answers if you send the AI the same question multiple times.
1. In the example above, sometimes the AI will say, “Speaker B is Ginny,” and sometimes the AI will say, “Speaker C is Ginny.”
Explainability: The LLM won’t explain its reasoning.
1. In the example above, if the AI responds with “Speaker C is Ginny,” we don’t know why it made that decision.

For Solidly to effectively understand what happens on a call, I needed to solve these challenges.

V2: Ask the LLM for an explanation.

An initial solution to these challenges is to ask the AI to show its work:

AI Instructions:
You are a helpful AI assistant who is an expert in speaker inference. I will send you call transcripts with unidentified speakers (e.g., ‘Speaker A,’ ‘Speaker B,’ ‘Speaker C’). You will respond with your best guess at the names of these speakers. You will support your answer with evidence from the call.

Requiring the AI to explain itself helps solve the explainability challenge, but it only marginally helps solve the accuracy and reliability challenges. I would receive responses like:

“Speaker A is Ginny because they introduce themself as Ginny.”
“Speaker B is Ginny because they respond after a question is directed at Ginny.”
“Speaker C is Ginny because they respond to the question directed at Ginny.”

Answer #3 is correct, but I’d only get that answer 40% of the time. The accuracy and reliability were still falling short. But now, I understand the reasons behind the wrong answers.

V3: Set specific rules to fix identified issues.

Now that I understand the reasons why the AI is incorrectly identifying the speakers, I can explain the bad behavior back to the AI:

AI Instructions:
You are a helpful AI assistant who is an expert in speaker inference. I will send you call transcripts with unidentified speakers (e.g., ‘Speaker A,’ ‘Speaker B,’ ‘Speaker C’). You will respond with your best guess at the names of these speakers. You will support your answer with evidence from the call. You strictly follow these instructions:
1. When using introductions as evidence, only use first-person sentences (e.g., “I am Ginny.”)
2. When using responses to questions as evidence, you will only infer the name of the speaker if they clearly respond to the original question. As an example, if Speaker A asks “How are you, Ginny?” and Speaker B replies but doesn’t answer the question you can’t use this as evidence that Speaker B is Ginny.

After trial and error on many different call transcripts, my ruleset grew, and so did the accuracy and reliability of the AI’s responses. Speaker identification accuracy increased to 60%.

I still wasn’t fully satisfied with the performance. As transcripts got longer, performance declined; there was often contradictory evidence throughout the call. Ginny might respond to a question directed at Fred, and Fred might role-play as George. People might talk over each other, resulting in a disjointed and hard-to-decipher transcript. Meetings are messy.

V4: Go beyond using only an LLM.

Instead of solely relying on an LLM, I’m augmenting it with a separate algorithm to determine the speaker. Instead of asking the LLM to perform the entire task of identifying speakers, I ask the LLM to do the partial task of extracting the evidence:

AI Instructions:
You are a helpful AI assistant who is an expert in conversation analysis. I will send you call transcripts with unidentified speakers (e.g., ‘Speaker A,’ ‘Speaker B,’ ‘Speaker C’). You will analyze the transcript and respond with a list of supporting evidence. You will use the following as evidence:
1. If a speaker introduces themself in the first-person (e.g., “I am Ginny”)
2. If a speaker responds to a question directed at a named person
[other rules…]

Here is an example format of your response:
Speaker A is likely Ginny because: [evidence 1, evidence 2, …]
Speaker A is likely Fred because: [evidence 1, evidence 2, …]

Instead of asking the AI to respond with the final answer, I only ask it to identify supporting evidence. From there, I assign a name to a speaker through a deterministic ruleset. The rule could be something like, “Only assign a name to a speaker if you have more than five pieces of evidence.”2

The LLM is more accurate and reliable because I’m asking it to perform a simpler task: returning a summary of the evidence from the call. From there, my ruleset is easy to understand and refine. I have full control over the rules, and my rules will always produce the same results. So, answers can be tuned for accuracy and reliability.

Start simple and iterate.

We live in a world where AI can automate tasks it never previously had, but we need to be mindful of its limits. The best way to do this is to start simple and measure the results you get. From there, iterate.

LLMs and other forms of AI are great at creating structured data out of unstructured data, which is incredibly powerful. However, if you ask it to perform tasks requiring complex deductive reasoning, you’ll struggle to get the results you want. To get an AI to perform tasks more effectively, you can:

Ask it to explain itself
Add rules to reduce the frequency of wrong answers
Simplify the task you’re asking it to achieve

Thanks for reading! Questions or ideas for topics? Email me.

April 2024… I’m sure AI will be able to complete this task easily. If you’re not familiar with what I mean by LLMs, try ChatGPT, Perplexity, or Claude.

I am glossing over data structures here for simplicity. I’m actually asking the AI to respond in a JSON format. For each speaker, the AI is responding with a list of potential names:
{ ”speakers”: [
{
“speaker_a_candidates”: [
{ “name”: ‘Ginny’, “evidence”: [‘evidence 1’, ‘evidence 2’, ‘…’]},
{ “name”: ‘Fred’, “evidence”: [‘evidence 1’, ‘evidence 2’, ‘…’]},
]
}
] }

A Founder's Life for Me

Discussion about this post