AI & Machine Learning

Cloudflare's edge-native AI stack eliminates the typical deployment headaches. Run 50+ open-source models with zero GPU provisioning, no model serving frameworks — just ai.run() and you're in production.

Workers AI: Inference Without Infrastructure

Workers AI gives you instant access to 50+ open-source models — LLMs, embeddings, image generation, and more — running on Cloudflare's global GPU network.

from js import Response, env

async def on_fetch(request):
    ai = env.AI

    response = await ai.run(
        "@cf/meta/llama-3-8b-instruct",
        messages=[
            {"role": "user", "content": "Explain quantum computing"}
        ]
    )

    return Response.json(response)

AI via the Python SDK

from cloudflare import Cloudflare

client = Cloudflare()

# Run inference from any Python application
result = client.ai.run(
    "@cf/meta/llama-3-8b-instruct",
    account_id="your-account-id",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is edge computing?"}
    ]
)

AI Translation

Translate text between languages using Meta's M2M model — directly from the edge:

from cloudflare import Cloudflare

client = Cloudflare()

# Translate text using Meta's M2M model
result = client.ai.run(
    "@cf/meta/m2m100-1.2b",
    account_id="your-account-id",
    text="Hello, how are you?",
    source_lang="english",
    target_lang="spanish",
)
print(result)  # "Hola, ¿cómo estás?"

# Batch translation
texts = ["Hello", "Goodbye", "Thank you"]
translations = []
for text in texts:
    result = client.ai.run(
        "@cf/meta/m2m100-1.2b",
        account_id="your-account-id",
        text=text,
        source_lang="english",
        target_lang="japanese",
    )
    translations.append(result)

Vectorize: Edge-Native Vector Database

Building RAG applications or semantic search? Vectorize gives you a globally distributed vector database that works seamlessly with Workers AI.

Generate & Store Embeddings

# In a Python Worker
embedding = await ai.run(
    "@cf/baai/bge-base-en-v1.5",
    text="Python is awesome"
)

await env.VECTORIZE_INDEX.insert([{
    "id": "1",
    "values": embedding.data[0],
    "metadata": {
        "language": "python",
        "sentiment": "positive"
    }
}])

Semantic Search via the SDK

from cloudflare import Cloudflare
import numpy as np
from typing import List, Dict

client = Cloudflare()

# Create a vector index
index = client.vectorize.indexes.create(
    account_id="your-account-id",
    name="product-search",
    dimensions=384,  # Using all-MiniLM-L6-v2 embeddings
    metric="cosine",
)

# Index documents with embeddings
async def index_documents(documents: List[Dict]):
    vectors = []
    for doc in documents:
        embedding_response = await client.ai.run(
            "@cf/baai/bge-base-en-v1.5",
            account_id="your-account-id",
            text=doc["content"],
        )
        vectors.append({
            "id": doc["id"],
            "values": embedding_response.data[0],
            "metadata": {
                "title": doc["title"],
                "content": doc["content"],
                "category": doc.get("category", "general"),
            }
        })

    # Batch insert
    client.vectorize.indexes.insert(
        account_id="your-account-id",
        index_name="product-search",
        vectors=vectors,
    )

# Search for similar documents
async def search(query: str, top_k: int = 5):
    embedding_response = await client.ai.run(
        "@cf/baai/bge-base-en-v1.5",
        account_id="your-account-id",
        text=query,
    )

    results = client.vectorize.indexes.query(
        account_id="your-account-id",
        index_name="product-search",
        vector=embedding_response.data[0],
        top_k=top_k,
    )
    return results

RAG Pattern: AI-Powered API

Combine Workers AI and Vectorize to build a complete Retrieval-Augmented Generation (RAG) pipeline:

from js import Response, env
import json

async def on_fetch(request):
    if request.method == "POST":
        data = await request.json()

        # Generate embedding for the query
        embedding = await env.AI.run(
            "@cf/baai/bge-base-en-v1.5",
            text=data["query"]
        )

        # Semantic search for relevant context
        results = await env.VECTORIZE_INDEX.query(
            embedding.data[0], topK=5
        )

        # Build context from matches
        context = "\n".join([
            r.metadata.content for r in results.matches
        ])

        # Generate response with context
        response = await env.AI.run(
            "@cf/meta/llama-3-8b-instruct",
            messages=[
                {"role": "system", "content": f"Context: {context}"},
                {"role": "user", "content": data["query"]}
            ]
        )

        return Response.json({"answer": response.response})

Migration Reference

You're Using	Replace With	Why Switch
Transformers + CUDA	Workers AI	No GPU management, 50+ models ready
LangChain + OpenAI	LangChain + Workers AI	LangChain pre-installed, multiple models
ChromaDB / FAISS	Vectorize	Managed vector search, global distribution
Hugging Face Inference API	Workers AI	Lower latency, integrated platform