Custom endpoints
When the built-in providers do not cover your needs, you can add custom OpenAI-compatible endpoints. This lets you connect to self-hosted models, API proxies, internal gateways, or any provider that implements the OpenAI chat completions API format.
Custom endpoints use the custom-* provider ID pattern and appear alongside the built-in providers in the agent model selector.
When to Use Custom Endpoints
Custom endpoints are useful for several scenarios:
- Self-hosted models: connect to vLLM, TGI, LocalAI, LM Studio, or any other inference server running on your network
- API proxies and gateways: route requests through a corporate proxy, rate-limiting gateway, or logging middleware
- Unlisted providers: connect to a new provider that Ironspire does not yet include as a built-in option
- Development and testing: point agents at a mock server or staging endpoint during development
- Cost optimisation: use a provider that offers better pricing for your specific workload
Any server that accepts OpenAI-format chat completion requests (POST /v1/chat/completions) should work as a custom endpoint.
Adding a Custom Endpoint
- Open Settings with Ctrl+, (or Cmd+, on macOS)
- Go to the Providers tab
- Scroll to the bottom of the provider list
- Click Add Custom Endpoint
- Fill in the configuration fields (detailed below)
- Click Save
Ironspire runs a health check immediately after saving. If the endpoint is reachable and responds correctly, the status dot turns green.
Configuration Fields
| Field | Required | Description |
|---|---|---|
| Name | Yes | A display name for the endpoint (shown in the model selector) |
| Base URL | Yes | The root URL of the API (e.g. https://my-server.example.com/v1) |
| API Key | No | An authentication token sent as a Bearer header. Leave blank if the endpoint does not require authentication. |
| Model ID | Yes | The model identifier sent in the request body (e.g. meta-llama/Llama-3.1-70B-Instruct) |
| Display Name | No | A friendly model name shown in the UI. Defaults to the Model ID if left blank. |
| Context Window | No | The model's context window size in tokens. Used for compaction calculations and the context gauge. |
| Max Output | No | The maximum number of output tokens. Used to set the max_tokens parameter. |
The Base URL should point to the root of the API, not the specific completions path. Ironspire appends /chat/completions automatically.
If your endpoint uses a non-standard path (e.g. /api/generate instead of /v1/chat/completions), include the full path up to but not including the completions segment. Ironspire always appends the final path component.
Provider Quirks
Not all OpenAI-compatible servers behave identically. The Provider Quirks section lets you adjust how Ironspire communicates with the endpoint.
Stream Toggle
Toggle streaming on or off. Most modern inference servers support streaming (stream: true in the request body), but some older or simpler servers only support non-streaming responses.
- Streaming on (default): Ironspire sends
stream: trueand processes server-sent events in real time. You see tokens appear as they are generated. - Streaming off: Ironspire waits for the complete response. The entire reply appears at once. Use this if your server does not support streaming or returns malformed stream events.
Tool Call Format
Select how the endpoint handles tool calls (function calling). Ironspire supports two formats:
| Format | Description |
|---|---|
| OpenAI-native (default) | Standard OpenAI tool call format with tool_calls in the assistant message and tool role for results. Use this for servers that fully implement the OpenAI tools API. |
| Legacy function | Older format using function_call in the assistant message and function role for results. Use this for servers that have not adopted the newer tools format. |
If your server does not support tool calling at all, agents using this endpoint will operate in chat-only mode (no MCP tools, no file access).
Editing a Custom Endpoint
- Open Settings > Providers
- Find your custom endpoint card
- Click the card to expand it
- Edit any field
- Changes save automatically
Ironspire re-runs the health check after each edit. If you change the base URL or API key, the health status resets and re-validates.
Removing a Custom Endpoint
- Open Settings > Providers
- Find your custom endpoint card
- Click the delete icon (trash) in the card header
- Confirm the deletion in the modal
Deleting a custom endpoint immediately disconnects any agents that are using it. Reassign those agents to a different provider before deleting, or they will show a provider error on the next message.
Agents that were using the deleted endpoint retain their conversation history. You can reassign them to a new provider and continue chatting without losing any messages.
Validation and Health Checks
When you save a custom endpoint, Ironspire performs several checks:
- URL validation: confirms the base URL is well-formed and uses HTTPS (or HTTP for localhost/private network addresses)
- Connection test: sends a lightweight request to the endpoint to verify it is reachable
- Authentication check: if an API key is provided, confirms the server does not reject it
- Model verification: sends a minimal chat completion request to verify the model ID is valid
The health dot reflects the result:
| Dot | Meaning |
|---|---|
| Green | All checks passed |
| Yellow | Endpoint reachable but rate-limited |
| Red | Connection failed, authentication rejected, or model not found |
Health checks continue to run periodically during active sessions, so the status stays current.
Example: Connecting to LM Studio
LM Studio provides a local OpenAI-compatible server. Here is a typical configuration:
| Field | Value |
|---|---|
| Name | LM Studio |
| Base URL | http://localhost:1234/v1 |
| API Key | (leave blank) |
| Model ID | meta-llama/Llama-3.1-8B-Instruct |
| Context Window | 131072 |
| Streaming | On |
| Tool Call Format | OpenAI-native |
After saving, the LM Studio endpoint appears in the model selector for any agent. You can assign it to specific agents while keeping other agents on cloud providers.
Example: Connecting to a Corporate Proxy
If your organisation routes all API traffic through an internal proxy:
| Field | Value |
|---|---|
| Name | Internal Gateway |
| Base URL | https://ai-gateway.corp.example.com/v1 |
| API Key | Your internal auth token |
| Model ID | gpt-4o (or whatever model the proxy routes to) |
| Streaming | On |
| Tool Call Format | OpenAI-native |
This is especially useful in enterprise environments where direct access to external APIs is restricted.