Gemini 原生接口

本指南介绍如何通过 AcceleAI 调用 Google Gemini 系列模型，支持原生 SDK 和 OpenAI 兼容两种方式，涵盖推理展示、多媒体处理、代码执行、上下文缓存、函数调用等功能。

原生 SDK 调用

安装 Google GenAI SDK：


pip install google-genai

配置客户端：


from google import genai
 
client = genai.Client(
    api_key="<ACCELE_AI_API_KEY>",
    http_options={"base_url": "https://api.acceleai.cn/gemini"}
)

基础调用示例：


from google.genai import types
 
response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="用简洁的语言解释相对论"
)
 
print(response.text)

OpenAI 兼容格式调用


from openai import OpenAI
 
client = OpenAI(
    api_key="<ACCELE_AI_API_KEY>",
    base_url="https://api.acceleai.cn/v1"
)
 
response = client.chat.completions.create(
    model="gemini-2.5-flash",
    messages=[
        {"role": "user", "content": "你好，请介绍一下 Gemini 模型"}
    ]
)
 
print(response.choices[0].message.content)

推理展示

Gemini 2.5 系列模型内置思维链推理能力。

原生方式


response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="这个数学问题的解题步骤是什么：x^2 + 5x + 6 = 0",
    config=types.GenerateContentConfig(
        thinking_config=types.ThinkingConfig(include_thoughts=True)
    )
)
 
for part in response.candidates[0].content.parts:
    if part.thought:
        print(f"思考: {part.text}")
    else:
        print(f"回答: {part.text}")

OpenAI 兼容方式

通过 reasoning_effort 参数控制推理强度：


response = client.chat.completions.create(
    model="gemini-2.5-flash",
    reasoning_effort="high",  # low / medium / high
    messages=[
        {"role": "user", "content": "分析这个算法的时间复杂度"}
    ]
)

模型差异：

Gemini 2.5 Flash：混合模型，支持 thinking_budget 控制（0-16384 tokens，默认 1024），可以关闭思维
Gemini 2.5 Pro：纯推理模型，无法关闭思维过程

多媒体文件处理

小文件（20MB 以内）

使用 inline_data 直接传入：


import base64
 
with open("image.jpg", "rb") as f:
    image_data = base64.standard_b64encode(f.read()).decode("utf-8")
 
response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents=[
        types.Content(
            parts=[
                types.Part(text="描述这张图片"),
                types.Part(
                    inline_data=types.Blob(
                        mime_type="image/jpeg",
                        data=image_data
                    )
                )
            ]
        )
    ]
)

支持的 MIME 类型包括 image/jpeg、image/png、audio/m4a、video/mp4 等。可通过媒体分辨率参数（MEDIA_RESOLUTION_LOW、MEDIA_RESOLUTION_MEDIUM）优化 token 消耗。

大文件（超过 20MB）

使用 Files API 上传后引用：


myfile = client.files.upload(file="path/to/large_video.mp4")
 
response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents=["描述这段视频的主要内容", myfile]
)
 
print(response.text)

通过 Files API 上传的文件将在 48 小时后自动删除，也可手动调用删除接口。

图像生成

Gemini 3 Pro 支持生成高达 4K 分辨率的图片：


response = client.models.generate_content(
    model="gemini-3-pro-image-preview",
    contents="一幅宁静的山间日出水彩画",
    config=types.GenerateContentConfig(
        response_modalities=["TEXT", "IMAGE"],
        image_config=types.ImageConfig(
            aspect_ratio="16:9",
            image_size="4K"
        )
    )
)
 
for part in response.candidates[0].content.parts:
    if part.inline_data:
        with open("output.png", "wb") as f:
            f.write(part.inline_data.data)

注意： 流式模式下仅返回推理内容，不包含图片数据，生成图片请使用非流式请求。

代码执行

启用自动代码解释器：


response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="计算前 100 个质数的和",
    config=types.GenerateContentConfig(
        tools=[types.Tool(code_execution=types.ToolCodeExecution)]
    )
)
 
for part in response.candidates[0].content.parts:
    if hasattr(part, "executable_code"):
        print(f"代码:\n{part.executable_code.code}")
    if hasattr(part, "code_execution_result"):
        print(f"结果:\n{part.code_execution_result.output}")

上下文缓存

Gemini 2.5 系列模型支持隐式上下文缓存，无需开发者额外配置。当请求的内容、模型和参数完全一致时，会自动命中缓存。

默认 TTL 为 1 小时
缓存命中后，token 费用仅为输入价格的 25%
通过 response.usage_metadata.cached_content_token_count 检查缓存命中情况

函数调用

通过 OpenAI 兼容格式进行函数调用时，必须设置 tool_choice="auto"，否则会报错：


tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "获取指定城市的天气信息",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {"type": "string", "description": "城市名称"}
                },
                "required": ["city"]
            }
        }
    }
]
 
response = client.chat.completions.create(
    model="gemini-2.5-flash",
    messages=[{"role": "user", "content": "北京今天天气怎么样？"}],
    tools=tools,
    tool_choice="auto"
)

Token 用量统计

原生方式


metadata = response.usage_metadata
print(f"输入 tokens: {metadata.prompt_token_count}")
print(f"输出 tokens: {metadata.candidates_token_count}")
print(f"思考 tokens: {metadata.thoughts_token_count}")
print(f"总计 tokens: {metadata.total_token_count}")

OpenAI 兼容方式


usage = response.usage
print(f"输入 tokens: {usage.prompt_tokens}")
print(f"输出 tokens: {usage.completion_tokens}")
print(f"总计 tokens: {usage.total_tokens}")

Gemini 2.5 系列模型的 temperature 范围为 0 至 2。