Announcing Logfire.jl & TextPrompts.jl - Observability and Prompt Management for Julia GenAI

I’ve been jealous of the Python ecosystem’s incredible tooling for GenAI applications - the ability to see what’s happening under the hood, track costs, debug weird responses. So I’m excited to bring some of that to Julia!

Fun fact: You can use this combo from Julia, Python and Typescript, making it easier to share your work.

Logfire.jl

Logfire.jl is an unofficial SDK for a service built by the team behind Pydantic (probably one of the most popular Python packages that makes Python more like Julia… Docs: Logfire - Pydantic Logfire Documentation)
It provides OpenTelemetry-based observability for LLM apps with native PromptingTools.jl integration.

using Logfire, PromptingTools

Logfire.configure(service_name = "my-app")
Logfire.instrument_promptingtools!()

aigenerate("Hello!")  # automatically traced with tokens, cost, latency

What I love about it:

  • Flexible instrumentation - choose which models to trace, set different service names per environment
  • Manual spans with do-block syntax for tracing any code:
    with_span("my-pipeline") do
        # your code here
    end
    
  • Error tracking - API keys wrong? Provider out of credits or hit rate limits? These are such a pain to debug on remote servers. Now every error is captured with full context. No more guessing what went wrong.

On pricing: Logfire cloud has a generous free tier (hundreds of thousands of conversations/month). But you can also export to local backends like Jaeger or Langfuse - all the observability without logs leaving your machine.

For experts: It has powerful SQL-powered Explore mode, powerful navigation and organization features, and easy way to copy out / download your traces (or any of its attributes).

Give it a try!



TextPrompts.jl

TextPrompts.jl stores prompts as text files with optional TOML metadata. Two perks: makes it easier to version and evolve your prompts (use your favorite AI agent – they like to auto-optimize text-based prompts) or to catch placeholder typos before they become confusing LLM responses.

using TextPrompts
using PromptingTools: SystemMessage, UserMessage, aigenerate

prompt = load_prompt("system.txt")
sys_msg = prompt(; language = "Julia", task = "explain macros") |> SystemMessage
user_msg = UserMessage("...")
aigenerate([sys_msg, user_msg])

The power combo: Version your prompts in git, trace every call in Logfire, point your coding agent at both → continuous prompt improvement loop.

Both packages are cross-language compatible with their Python/TypeScript versions. Feedback welcome!

Example to run

# =============================================================================
# Logfire.jl + TextPrompts.jl - Simple Example
# =============================================================================
#
# This example uses a temporary environment to demonstrate Logfire with TextPrompts.
#
# In practice, prompts are saved as .txt files with TOML metadata headers.
# The metadata can include version, changelog, and notes for a coding agent
# to track what worked, what to avoid, etc. - creating a feedback loop where
# the agent improves prompts based on Logfire traces.
#
# WHY TextPrompts.format()?
# =========================
# The biggest error is unfilled placeholders or typos in variable names.
# Format catches these BEFORE the LLM call - not in a confusing response.
#
# CROSS-LANGUAGE:
# ===============
# Logfire + TextPrompts are available in Python, TypeScript, and Julia.
# Your prompts and observability setup work across all three languages!
#
# Run: julia scratch.jl

# =============================================================================
# Step 1: Temporary environment setup
# =============================================================================
using Pkg
Pkg.activate(temp = true)
Pkg.add(["Logfire", "TextPrompts", "PromptingTools", "DotEnv"])

using DotEnv
DotEnv.load!()

using Logfire
using TextPrompts
using PromptingTools
using PromptingTools: SystemMessage, UserMessage

# =============================================================================
# Step 2: Configure Logfire
# =============================================================================
Logfire.configure(service_name = "textprompts-example")
Logfire.instrument_promptingtools!()

# =============================================================================
# Step 3: Create a prompt template
# =============================================================================
# In practice, this would be a .txt file in your prompts/ directory.
# The TOML header tracks metadata - version, author, and agent notes.

prompt_content = """
---
title = "Julia Expert Assistant"
version = "1.1.0"
description = "System prompt for Julia coding assistant"

# Agent notes (for coding agents improving this prompt):
# - v1.1.0: Added "be concise" - responses were too long
# - v1.0.0: Initial version
# - TODO: Add examples of good responses
# - AVOID: Don't ask clarifying questions, just answer
---

You are an expert {language} developer.
Your specialty is {specialty}.
Be concise and practical. No preamble.
"""

# For this example, we write it to a temp file
prompt_file = tempname() * ".txt"
write(prompt_file, prompt_content)

# =============================================================================
# Step 4: Load and use the prompt
# =============================================================================
template = load_prompt(prompt_file)

println("Loaded: $(template.meta.title) v$(template.meta.version)")
println("Placeholders: $(template.placeholders)")

# Format the prompt - this validates all placeholders are filled!
system_msg = template(; language = "Julia", specialty = "DataFrames.jl") |> SystemMessage
user_msg = UserMessage("How do I filter rows where column :age > 30?")

# Call the LLM - automatically traced by Logfire
response = aigenerate([system_msg, user_msg]; model = "gpt4om")
println("\nResponse:\n", response.content)

# =============================================================================
# Step 5: See why Format matters
# =============================================================================
println("\n" * "="^50)
println("Catching placeholder typos:")
try
    # Typo: 'langauge' instead of 'language'
    template(; langauge = "Julia", specialty = "DataFrames.jl")
catch e
    println("Error caught BEFORE LLM call: ", e.message)
end

# =============================================================================
# Step 6: API errors are also traced
# =============================================================================
println("\n" * "="^50)
println("API errors captured in Logfire:")
try
    # Bad api_kwargs - this will fail at the API level
    # Logfire traces the error so you can debug later!
    logfire_span = with_span("api-error-example") do span
        aigenerate(
            "Hello";
            model = "gpt4om",
            api_kwargs = (; max_tookens = 100, temperatur = 0.5)  # typos!
        )
    end
catch e
    println("API error caught: ", first(string(e), 100), "...")
    println("Check Logfire - the failed span shows exactly what went wrong!")
end

# =============================================================================
# Cleanup
# =============================================================================
Logfire.shutdown!()

println("\nDone! Check traces at https://logfire.pydantic.dev")
println("""
The workflow:
  1. Store prompts as .txt files with metadata headers
  2. Agent notes in metadata track what works/what to avoid
  3. Logfire captures every LLM call
  4. Point your coding agent at traces + prompts
  5. Agent improves prompts, bumps version, logs changes
  6. Repeat - continuous prompt improvement!
""")


4 Likes