Cloudflare Workers AI - Running AI Inference at the Edge | News

What is Workers AI

Cloudflare Workers AI is a service that allows you to run AI models on Cloudflare’s edge network. It performs AI inference with low latency and processes data without sending it to the cloud.

Supported Models

Text Generation (LLM)

Model	Features
Llama 3 8B	General purpose, high performance
Mistral 7B	Fast, efficient
Gemma 7B	Google developed, lightweight
Phi-2	Microsoft developed, compact

Image & Vision

Model	Use Case
Stable Diffusion XL	Image generation
LLaVA	Image understanding
CLIP	Image classification

Audio

Model	Use Case
Whisper	Speech recognition
TTS	Text-to-speech

Basic Usage

Text Generation

// src/index.ts
export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const response = await env.AI.run('@cf/meta/llama-3-8b-instruct', {
      messages: [
        { role: 'system', content: 'You are a helpful assistant.' },
        { role: 'user', content: 'Tell me 3 benefits of TypeScript' }
      ]
    });

    return Response.json(response);
  }
};

Streaming Response

export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const stream = await env.AI.run('@cf/meta/llama-3-8b-instruct', {
      messages: [
        { role: 'user', content: 'Explain the future of AI' }
      ],
      stream: true
    });

    return new Response(stream, {
      headers: { 'content-type': 'text/event-stream' }
    });
  }
};

Image Generation

export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const response = await env.AI.run('@cf/stabilityai/stable-diffusion-xl-base-1.0', {
      prompt: 'A futuristic city with flying cars, cyberpunk style',
      num_steps: 20
    });

    return new Response(response, {
      headers: { 'content-type': 'image/png' }
    });
  }
};

Image Analysis

export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const imageData = await request.arrayBuffer();

    const response = await env.AI.run('@cf/llava-hf/llava-1.5-7b-hf', {
      image: [...new Uint8Array(imageData)],
      prompt: 'What is in this image?',
      max_tokens: 512
    });

    return Response.json(response);
  }
};

Speech Recognition

export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const audioData = await request.arrayBuffer();

    const response = await env.AI.run('@cf/openai/whisper', {
      audio: [...new Uint8Array(audioData)]
    });

    return Response.json({
      text: response.text,
      language: response.detected_language
    });
  }
};

Integration with Vectorize

// RAG (Retrieval Augmented Generation) implementation
export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const question = await request.text();

    // Vectorize the question
    const embedding = await env.AI.run('@cf/baai/bge-base-en-v1.5', {
      text: question
    });

    // Search for similar documents in vector DB
    const matches = await env.VECTORIZE.query(embedding.data[0], {
      topK: 3
    });

    // Have LLM answer using context
    const context = matches.map(m => m.metadata.text).join('\n');

    const response = await env.AI.run('@cf/meta/llama-3-8b-instruct', {
      messages: [
        { role: 'system', content: `Answer using the following context:\n${context}` },
        { role: 'user', content: question }
      ]
    });

    return Response.json(response);
  }
};

Pricing

Pay-as-you-go:
- Text generation: $0.011 / 1,000 neurons
- Image generation: $0.01 / image
- Speech recognition: $0.01 / minute

Free tier:
- Up to 10,000 neurons per day free

Deployment

# wrangler.toml
[ai]
binding = "AI"

# Deploy
npx wrangler deploy

Use Cases

✓ Chatbots
✓ Content generation
✓ Image processing pipelines
✓ Audio transcription
✓ Retrieval Augmented Generation (RAG)
✓ Content moderation

Summary

Cloudflare Workers AI is a powerful platform for running AI inference at the edge. With low latency, global distribution, and simple APIs, it makes AI application development easy.

← Back to list

What is Workers AI

Supported Models

Text Generation (LLM)

Image & Vision

Audio

Basic Usage

Text Generation

Streaming Response

Image Generation

Image Analysis

Speech Recognition

Integration with Vectorize

Pricing

Deployment

Use Cases

Summary

Recommended Articles

Cloudflare Workers AI Enhancement - Evolution of Edge AI Inference

Amazon Q Developer - AWS-Integrated AI Coding Assistant

AWS re:Invent 2024 Summary - Major Enhancements to Generative AI Features

OpenAI o3 Model Announced - New Frontiers in Reasoning-Focused AI

Adobe Firefly - AI Image & Video Generation Platform for Commercial Use

AI Coding Tools 2025 Outlook - Comparison of Major Tools