Claude Native API Integration
Instructions
The Claude series models can be accessed via the official native API. Before using, make sure to install or upgrade the anthropic dependency:
For non-Claude models, please use the OpenAI API format instead.
Models Information
Model | Claude Opus 4 | Claude Sonnet 4 | Claude Sonnet 3.7 | Claude Sonnet 3.5 | Claude Haiku 3.5 | Claude Opus 3 | Claude Haiku 3 |
---|---|---|---|---|---|---|---|
Extended Thinking | Yes | Yes | Yes | No | No | No | No |
Context Window | 200K | 200K | 200K | 200K | 200K | 200K | 200K |
Max Output | 32000 tokens | 64000 tokens | 64000 tokens | 8192 tokens | 8192 tokens | 4096 tokens | 4096 tokens |
Training Cut-off | Mar 2025 | Mar 2025 | Nov 2024 | Apr 2024 | July 2024 | Aug 2023 | Aug 2023 |
- For models version 3.5 and above, if you need output longer than 4096 tokens, be sure to explicitly specify the
"max_tokens"
parameter, referring to theMax Output
column in the table above. - For Sonnet 3.7, you can increase the max output from 64K to 128K by passing
extra_headers={"anthropic-beta": "output-128k-2025-02-19"}
. See the “Streaming 128K” example below.
Claude 4 New Features
New Refusal Stop Reason
Claude 4 models introduce a new refusal
stop reason for content that the model declines to generate for safety reasons:
When migrating to Claude 4, you should update your application to handle refusal
stop reasons.
Extended Thinking
With extended thinking enabled, the Messages API for Claude 4 models returns a summary of Claude’s full thinking process. Summarized thinking provides the full intelligence benefits of extended thinking, while preventing misuse.
While the API is consistent across Claude 3.7 and 4 models, streaming responses for extended thinking might return in a “chunky” delivery pattern, with possible delays between streaming events.
Summarization is processed by a different model than the one you target in your requests. The thinking model does not see the summarized output.
Interleaved Thinking
Claude 4 models support interleaving tool use with extended thinking, allowing for more natural conversations where tool uses and responses can be mixed with regular messages.
Interleaved thinking is in beta. To enable interleaved thinking, add the beta header interleaved-thinking-2025-05-14
to your API request:
Endpoint: POST
/v1/messages
Usage
Request Body
Request Parameters
Name | Location | Type | Required | Description |
---|---|---|---|---|
x-api-key | header | string | No | Bearer AIHUBMIX_API_KEY |
Content-Type | header | string | No | none |
body | body | object | No | none |
» model | body | string | Yes | none |
» messages | body | [object] | Yes | none |
»» role | body | string | No | none |
»» content | body | string | Yes | none |
» max_tokens | body | number | Yes | none |
Response Example
Response Results
Status Code | Status Description | Description | Data Model |
---|---|---|---|
200 | OK | none | Inline |
Migrating to Claude 4
If you’re migrating from Claude 3.7 to Claude 4 models, please note the following changes:
Update Model Names
Handle New Stop Reasons
Update your application to handle the new refusal
stop reason:
Remove Unsupported Features
- Token-efficient tool use: Only available in Claude Sonnet 3.7, no longer supported in Claude 4
- Extended output: The
output-128k-2025-02-19
beta header is only available in Claude Sonnet 3.7
If you’re migrating from Claude Sonnet 3.7, we recommend removing these beta headers from your requests:
Using Claude in Applications (Example: Lobe-Chat)
Here’s how you can configure Claude models in a third-party application like Lobe-Chat:
-
Navigate to the settings page and select Claude as your model provider.
-
Enter your API Key from AiHubMix.
-
Set the API proxy endpoint to:
-
(Recommended) Enable the “Client Request Mode” option.
-
Add your chosen model to the model list.
- It’s recommended to copy the model name from AiHubMix’s settings page and paste it in the application.