Cloudflare Workers AI - Running AI Inference at the Edge

2025.12.05

What is Workers AI

Cloudflare Workers AI is a service that allows you to run AI models on Cloudflare’s edge network. It performs AI inference with low latency and processes data without sending it to the cloud.

Supported Models

Text Generation (LLM)

ModelFeatures
Llama 3 8BGeneral purpose, high performance
Mistral 7BFast, efficient
Gemma 7BGoogle developed, lightweight
Phi-2Microsoft developed, compact

Image & Vision

ModelUse Case
Stable Diffusion XLImage generation
LLaVAImage understanding
CLIPImage classification

Audio

ModelUse Case
WhisperSpeech recognition
TTSText-to-speech

Basic Usage

Text Generation

// src/index.ts
export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const response = await env.AI.run('@cf/meta/llama-3-8b-instruct', {
      messages: [
        { role: 'system', content: 'You are a helpful assistant.' },
        { role: 'user', content: 'Tell me 3 benefits of TypeScript' }
      ]
    });

    return Response.json(response);
  }
};

Streaming Response

export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const stream = await env.AI.run('@cf/meta/llama-3-8b-instruct', {
      messages: [
        { role: 'user', content: 'Explain the future of AI' }
      ],
      stream: true
    });

    return new Response(stream, {
      headers: { 'content-type': 'text/event-stream' }
    });
  }
};

Image Generation

export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const response = await env.AI.run('@cf/stabilityai/stable-diffusion-xl-base-1.0', {
      prompt: 'A futuristic city with flying cars, cyberpunk style',
      num_steps: 20
    });

    return new Response(response, {
      headers: { 'content-type': 'image/png' }
    });
  }
};

Image Analysis

export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const imageData = await request.arrayBuffer();

    const response = await env.AI.run('@cf/llava-hf/llava-1.5-7b-hf', {
      image: [...new Uint8Array(imageData)],
      prompt: 'What is in this image?',
      max_tokens: 512
    });

    return Response.json(response);
  }
};

Speech Recognition

export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const audioData = await request.arrayBuffer();

    const response = await env.AI.run('@cf/openai/whisper', {
      audio: [...new Uint8Array(audioData)]
    });

    return Response.json({
      text: response.text,
      language: response.detected_language
    });
  }
};

Integration with Vectorize

// RAG (Retrieval Augmented Generation) implementation
export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const question = await request.text();

    // Vectorize the question
    const embedding = await env.AI.run('@cf/baai/bge-base-en-v1.5', {
      text: question
    });

    // Search for similar documents in vector DB
    const matches = await env.VECTORIZE.query(embedding.data[0], {
      topK: 3
    });

    // Have LLM answer using context
    const context = matches.map(m => m.metadata.text).join('\n');

    const response = await env.AI.run('@cf/meta/llama-3-8b-instruct', {
      messages: [
        { role: 'system', content: `Answer using the following context:\n${context}` },
        { role: 'user', content: question }
      ]
    });

    return Response.json(response);
  }
};

Pricing

Pay-as-you-go:
- Text generation: $0.011 / 1,000 neurons
- Image generation: $0.01 / image
- Speech recognition: $0.01 / minute

Free tier:
- Up to 10,000 neurons per day free

Deployment

# wrangler.toml
[ai]
binding = "AI"

# Deploy
npx wrangler deploy

Use Cases

✓ Chatbots
✓ Content generation
✓ Image processing pipelines
✓ Audio transcription
✓ Retrieval Augmented Generation (RAG)
✓ Content moderation

Summary

Cloudflare Workers AI is a powerful platform for running AI inference at the edge. With low latency, global distribution, and simple APIs, it makes AI application development easy.

← Back to list