Skip to main content
LLM usage is optional and must be explicitly configured. You can maintain full control by running local models on your own systems through Ollama, LM Studio, or similar tools. No data is sent to external services unless you configure a cloud provider.

Configuration via settings

Configure AI settings through the web interface:
  1. Go to SettingsSelf-Hosting
  2. Scroll to the AI Provider section
  3. Configure:
    • OpenAI Access Token - Your API key
    • OpenAI URI Base - Custom endpoint (leave blank for OpenAI)
    • OpenAI Model - Model name (required for custom endpoints)
Settings in the UI override environment variables.

OpenAI compatible API

Sure supports any OpenAI-compatible API endpoint, giving you flexibility to use:
  • OpenAI - Direct access to GPT models
  • Ollama - Run models locally on your hardware
  • LM Studio - Local model hosting with a GUI
  • OpenRouter - Access to multiple providers (Anthropic, Google, etc.)
  • Other providers - Groq, Together AI, Anyscale, Replicate, and more

OpenAI

OPENAI_ACCESS_TOKEN=sk-proj-...
# No other configuration needed
Recommended models:
  • gpt-4.1 - Default, best balance of speed and quality
  • gpt-5 - Latest model, highest quality
  • gpt-4o-mini - Cheaper, good quality

Ollama (local)

# Dummy token (Ollama doesn't need authentication)
OPENAI_ACCESS_TOKEN=ollama-local

# Ollama API endpoint
OPENAI_URI_BASE=http://localhost:11434/v1

# Model you pulled
OPENAI_MODEL=llama3.1:13b
Install and run Ollama:
# macOS
brew install ollama

# Linux
curl -fsSL https://ollama.com/install.sh | sh

# Start Ollama
ollama serve

# Pull a model
ollama pull llama3.1:13b

LM Studio (local)

  1. Download from lmstudio.ai
  2. Download a model through the UI
  3. Start the local server
  4. Configure Sure:
OPENAI_ACCESS_TOKEN=lmstudio-local
OPENAI_URI_BASE=http://localhost:1234/v1
OPENAI_MODEL=your-model-name

OpenRouter

Access multiple providers through a single API:
OPENAI_ACCESS_TOKEN=your-openrouter-api-key
OPENAI_URI_BASE=https://openrouter.ai/api/v1
OPENAI_MODEL=google/gemini-2.0-flash-exp
Recommended models:
  • google/gemini-2.5-flash - Fast and capable
  • anthropic/claude-sonnet-4.5 - Excellent reasoning
  • anthropic/claude-haiku-4.5 - Fast and cost-effective

AI cache management

Sure caches AI-generated results (like auto-categorization and merchant detection) to avoid redundant API calls and costs.

What is the AI cache?

When AI rules process transactions, Sure stores:
  • Enrichment records - Which attributes were set by AI (category, merchant, etc.)
  • Attribute locks - Prevents rules from re-processing already-handled transactions
This caching means:
  • Transactions won’t be sent to the LLM repeatedly
  • Your API costs are minimized
  • Processing is faster on subsequent rule runs

When to reset the AI cache

You might want to reset the cache when:
  • Switching LLM models - Different models may produce better categorizations
  • Improving prompts - After system updates with better prompts
  • Fixing miscategorizations - When AI made systematic errors
  • Testing - During development or evaluation of AI features
Resetting the AI cache will cause all transactions to be re-processed by AI rules on the next run. This will incur API costs if using a cloud provider.

How to reset the AI cache

Via UI (recommended):
  1. Go to SettingsRules
  2. Click the menu button (three dots)
  3. Select Reset AI cache
  4. Confirm the action
The cache is cleared asynchronously in the background. Automatic reset: The AI cache is automatically cleared for all users when the OpenAI model setting is changed. This ensures that the new model processes transactions fresh.

What happens when cache is reset

  • AI-locked attributes are unlocked - Transactions can be re-enriched
  • AI enrichment records are deleted - The history of AI changes is cleared
  • User edits are preserved - If you manually changed a category after AI set it, your change is kept

Evaluation system

Test and compare different LLMs for your specific use case. The eval system helps you benchmark models for transaction categorization, merchant detection, and chat assistant functionality. See the evaluation framework documentation for details on:
  • Running evaluations
  • Comparing models
  • Creating custom datasets
  • Langfuse integration for tracking experiments

Docker compose example

services:
  sure:
    environment:
      - OPENAI_ACCESS_TOKEN=ollama-local
      - OPENAI_URI_BASE=http://ollama:11434/v1
      - OPENAI_MODEL=llama3.1:13b
    depends_on:
      - ollama
  
  ollama:
    image: ollama/ollama:latest
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama
    # Uncomment if you have an NVIDIA GPU
    # deploy:
    #   resources:
    #     reservations:
    #       devices:
    #         - driver: nvidia
    #           count: 1
    #           capabilities: [gpu]

volumes:
  ollama_data: