By Sam Lijin - @sxlijin

This post will be interesting to you if:

you're trying to get structured output from an LLM,

you've tried response_format: "json" and function calling and been disappointed by the results,

and function calling and been disappointed by the results, you're tired of stacking regex on regex on regex to extract JSON from an LLM,

you're trying to figure out what your options are.

Everyone using LLMs in production runs into this problem sooner or later: what we really want is a magical black box that returns JSON in exactly the format we want. Unfortunately, LLMs return English, not JSON, and it turns out that converting English to JSON is kinda hard.

Here's every framework we could find that solves this problem, and how they compare.

(Disclaimer: as a player in this space, we're a little biased!)

Note: we've omitted LangChain from this list because we haven't heard of anyone using it in production - look no further than the top posts of all time on /r/LangChain.

*: Honorable mention to Microsoft's AICI, which is working on creating a shim for cooperative constraints implemented in Python/JS using a WASM runtime. Haven't included it in the list because it seems more low-level than the others, and setup is very involved.

1: Applying constraints to OpenAI models can be very error-prone, because the OpenAI API does not expose sufficient information about the underlying model operations for the framework to actually apply constraints effectively. See this discussion about limitations from the LMQL documentation.

2: Transformers refers to "HuggingFace Transformers"

3: Constrained streaming generation produces partial objects, but no good ways of interacting with the partial objects, since they are not yet parse-able. We only consider a framework to support streaming if it allows interacting with the partial objects (e.g. if streaming back an object with properties foo and bar , you can access obj.foo before bar has been streamed to the client).

Most of our criteria are pretty self-explanatory, but there are two that we want to call out:

Does it handle/prevent malformed JSON? If so, how?

LLMs make a lot of the same mistakes that humans do when producing JSON (e.g. a } in the wrong place or a missing comma), so it's important that the framework can help you handle these errors.

A lot of frameworks "solve" this by feeding the malformed JSON back into the LLM and asking it to repair the JSON. This kinda works, but it's also slow and expensive. If your LLM calls individually take multiple seconds already, you don't really want to make that even slower!

There are two techniques that exist for handling or preventing this: actually parse the malformed JSON (BAML takes this approach) or constrain the LLM's token generation to guarantee that valid JSON is produced (this is what Outlines, Guidance, and a few others do).

Parsing the malformed JSON is our preferred approach: it most closely aligns with what the LLM was designed to do (emit tokens), is fast (takes microseconds), and flexible (allows working with any LLM). It does have limitations: it can't magically make sense of completely nonsensical JSON, after all.

Applying constraints to LLM token generation, by contrast, can be robust, but has its own issues: doing this efficiently requires applying runtime transforms to the model itself, so this only works with self-hosted models (e.g. Llama, Transformers) and does not work with models like OpenAI's ChatGPT or Anthropic's Claude.

Can you see the actual prompt? Do you have full control over the prompt?

You might remember this from "Fuck You, Show Me The Prompt".

Prompts are how we "program" LLMs to give us output.

The best way to get an LLM to return structured data is to craft a prompt designed to return data matching your specific schema. To do that, you need to

see the prompt actually getting sent to ChatGPT, and try different prompts.

Most frameworks, unfortunately, have hardcoded templates baked in which prevent doing this.

Example code

For each framework listed above, we've included example code, from the framework's documentation, provides for how you would use it.

From baml-examples/fastapi-starter/fast_api_starter/app.py :

from baml_client import b resume = """John Doe [...] Experience: Software Engineer Intern [...]""" async def async_call (): parsed = await b.ExtractResume(resume) async def streamed_call (): stream = b.stream.ExtractResume(resume) async for partial in stream: print (partial) # This is an object with auto complete for the partial Resume type response = await stream.get_final_result() # auto complete here to the full Resume type

From baml-examples/fastapi-starter/baml_src/extract_resume.baml :

class Resume { name string education Education [] skills string[] } class Education { school string degree string year int } function ExtractResume (raw_text : string) -> Resume { client GPT4 prompt # " Parse the following resume and return a structured representation of the data in the schema below. Resume: --- {{raw_text}} --- Output JSON format (only include these fields, and no others): {{ ctx.output_format(prefix=null) }} Output JSON: " # }

From baml-examples/nextjs-starter/app/api/example_baml/route.ts :

import b from './baml_client' import { Role } from './baml_client/types' ; // Async call const result = await b. ClassifyMessage ({ convo: [ { role: Role.Customer, content: "I want to cancel my subscription" } ] }); // Streamed call const stream = b.stream. ClassifyMessage ({ convo: [ { role: Role.Customer, content: "I want to cancel my subscription" } ] }); for await ( const partial of stream) { console. log (partial); // Autocompletes to a Category[] } const final = await stream. get_final_result (); // Autocompletes to a Category[]

From baml-examples/nextjs-starter/baml_src/classify_message.baml :

enum Category { Refund CancelOrder TechnicalSupport AccountIssue Question } class Message { role Role content string } enum Role { Customer Assistant } template_string PrintMessage (msg : Message , prefix : string ? ) # " {{ _.role('user' if msg.role == " Customer " else 'assistant') }} {% if prefix %} {{ prefix }} {% endif %} {{ msg.content }} " # function ClassifyMessage (convo : Message []) -> Category [] { client GPT4 prompt # " {# Prompts are auto-dedented and trimmed. We use JINJA for our prompt syntax (but we added some static analysis to make sure it's valid!) #} {{ ctx.output_format(prefix=" Classify with the following json : ") }} {% for c in convo %} {{ PrintMessage(c, 'This is the message to classify:' if loop.last and convo|length > 1 else null ) }} {% endfor %} {{ _.role('assistant') }} JSON array of categories that match: " # }

From baml-ruby-starter/examples.rb :

require_relative "baml_client/client" b = Baml :: BamlClient .from_directory( "baml_src" ) input = "Can't access my account using my usual login credentials" classified = b. ClassifyMessage ( input: input) puts classified.categories

From baml-ruby-starter/baml_src/classify_message.baml :

enum Category { Refund CancelOrder TechnicalSupport AccountIssue Question } class MessageFeatures { categories Category [] } function ClassifyMessage (input : string) -> MessageFeatures { client GPT4Turbo prompt # " {# _.role(" system ") starts a system message #} {{ _.role(" system ") }} Classify the following INPUT. {{ ctx.output_format }} {# This starts a user message #} {{ _.role(" user ") }} INPUT: {{ input }} Response: " # }

From simple_prediction.py :

class Labels ( str , enum . Enum ): SPAM = "spam" NOT_SPAM = "not_spam" class SinglePrediction ( BaseModel ): """ Correct class label for the given text """ class_label: Labels def classify (data: str ) -> SinglePrediction: return client.chat.completions.create( model = "gpt-3.5-turbo-0613" , response_model = SinglePrediction, messages = [ { "role" : "user" , "content" : f "Classify the following text: { data } " , }, ], ) # type: ignore prediction = classify( "Hello there I'm a nigerian prince and I want to give you money" ) assert prediction.class_label == Labels. SPAM

From simple_prediction/index.ts :

import { z } from "zod" enum CLASSIFICATION_LABELS { "SPAM" = "SPAM" , "NOT_SPAM" = "NOT_SPAM" } const SimpleClassificationSchema = z. object ({ class_label: z. nativeEnum ( CLASSIFICATION_LABELS ) }) const createClassification = async ( data : string ) => { const classification = await client.chat.completions. create ({ messages: [{ role: "user" , content: `"Classify the following text: ${ data }` }], model: "gpt-3.5-turbo" , response_model: { schema: SimpleClassificationSchema, name: "SimpleClassification" }, max_retries: 3 , seed: 1 }) return classification } const classification = await createClassification ( "Hello there I'm a nigerian prince and I want to give you money" ) // OUTPUT: { class_label: 'SPAM' } console. log ({ classification }) assert ( classification?.class_label === CLASSIFICATION_LABELS . SPAM , `Expected ${ classification ?. class_label } to be ${ CLASSIFICATION_LABELS . SPAM }` )

From examples/sentiment/demo.py :

import asyncio import sys from dotenv import dotenv_values import schema as sentiment from typechat import Failure, TypeChatJsonTranslator, TypeChatValidator, create_language_model, process_requests async def main (): env_vals = dotenv_values() model = create_language_model(env_vals) validator = TypeChatValidator(sentiment.Sentiment) translator = TypeChatJsonTranslator(model, validator, sentiment.Sentiment) async def request_handler (message: str ): result = await translator.translate(message) if isinstance (result, Failure): print (result.message) else : result = result.value print ( f "The sentiment is { result.sentiment } " ) file_path = sys.argv[ 1 ] if len (sys.argv) == 2 else None await process_requests( "????> " , file_path, request_handler) if __name__ == "__main__" : asyncio.run(main())

From examples/sentiment/schema.py :

from dataclasses import dataclass from typing_extensions import Literal, Annotated, Doc @dataclass class Sentiment : """ The following is a schema definition for determining the sentiment of a some user input. """ sentiment: Annotated[Literal[ "negative" , "neutral" , "positive" ], Doc( "The sentiment for the text" )]

From examples/sentiment/src/main.ts :

import { createJsonTranslator, createLanguageModel } from "typechat" ; import { processRequests } from "typechat/interactive" ; import { createTypeScriptJsonValidator } from "typechat/ts" ; import { SentimentResponse } from "./sentimentSchema" ; const dotEnvPath = findConfig ( ".env" ); assert (dotEnvPath, ".env file not found!" ); dotenv. config ({ path: dotEnvPath }); const model = createLanguageModel (process.env); const schema = fs. readFileSync (path. join (__dirname, "sentimentSchema.ts" ), "utf8" ); const validator = createTypeScriptJsonValidator < SentimentResponse >(schema, "SentimentResponse" ); const translator = createJsonTranslator (model, validator); // Process requests interactively or from the input file specified on the command line processRequests ( "????> " , process.argv[ 2 ], async ( request ) => { const response = await translator. translate (request); if ( ! response.success) { console. log (response.message); return ; } console. log ( `The sentiment is ${ response . data . sentiment }` ); });

From examples/sentiment/src/sentimentSchema.ts :

export interface SentimentResponse { sentiment : "negative" | "neutral" | "positive" ; // The sentiment of the text }

From examples/Sentiment/Program.cs :

using Microsoft . TypeChat ; namespace Sentiment ; public class SentimentApp : ConsoleApp { JsonTranslator < SentimentResponse > _translator ; public SentimentApp () { OpenAIConfig config = Config. LoadOpenAI (); // Although this sample uses config files, you can also load config from environment variables // OpenAIConfig config = OpenAIConfig.LoadFromJsonFile("your path"); // OpenAIConfig config = OpenAIConfig.FromEnvironment(); _translator = new JsonTranslator < SentimentResponse >( new LanguageModel (config)); } public override async Task ProcessInputAsync ( string input , CancellationToken cancelToken ) { SentimentResponse response = await _translator. TranslateAsync (input, cancelToken); Console. WriteLine ( $"The sentiment is { response . Sentiment }" ); } }

From examples/Sentiment/SentimentSchema.cs :

using System . Text . Json . Serialization ; using Microsoft . TypeChat . Schema ; namespace Sentiment ; public class SentimentResponse { [ JsonPropertyName ( "sentiment" )] [ JsonVocab ( "negative | neutral | positive" )] public string Sentiment { get ; set ; } }

From the Marvin docs: