AI & Machine Learning
Cloudflare's edge-native AI stack eliminates the typical deployment headaches. Run 50+ open-source models with zero GPU provisioning, no model serving frameworks — just ai.run() and you're in production.
Workers AI: Inference Without Infrastructure
Workers AI gives you instant access to 50+ open-source models — LLMs, embeddings, image generation, and more — running on Cloudflare's global GPU network.
from js import Response, env
async def on_fetch(request):
ai = env.AI
response = await ai.run(
"@cf/meta/llama-3-8b-instruct",
messages=[
{"role": "user", "content": "Explain quantum computing"}
]
)
return Response.json(response) AI via the Python SDK
from cloudflare import Cloudflare
client = Cloudflare()
# Run inference from any Python application
result = client.ai.run(
"@cf/meta/llama-3-8b-instruct",
account_id="your-account-id",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is edge computing?"}
]
) AI Translation
Translate text between languages using Meta's M2M model — directly from the edge:
from cloudflare import Cloudflare
client = Cloudflare()
# Translate text using Meta's M2M model
result = client.ai.run(
"@cf/meta/m2m100-1.2b",
account_id="your-account-id",
text="Hello, how are you?",
source_lang="english",
target_lang="spanish",
)
print(result) # "Hola, ¿cómo estás?"
# Batch translation
texts = ["Hello", "Goodbye", "Thank you"]
translations = []
for text in texts:
result = client.ai.run(
"@cf/meta/m2m100-1.2b",
account_id="your-account-id",
text=text,
source_lang="english",
target_lang="japanese",
)
translations.append(result) Vectorize: Edge-Native Vector Database
Building RAG applications or semantic search? Vectorize gives you a globally distributed vector database that works seamlessly with Workers AI.
Generate & Store Embeddings
# In a Python Worker
embedding = await ai.run(
"@cf/baai/bge-base-en-v1.5",
text="Python is awesome"
)
await env.VECTORIZE_INDEX.insert([{
"id": "1",
"values": embedding.data[0],
"metadata": {
"language": "python",
"sentiment": "positive"
}
}]) Semantic Search via the SDK
from cloudflare import Cloudflare
import numpy as np
from typing import List, Dict
client = Cloudflare()
# Create a vector index
index = client.vectorize.indexes.create(
account_id="your-account-id",
name="product-search",
dimensions=384, # Using all-MiniLM-L6-v2 embeddings
metric="cosine",
)
# Index documents with embeddings
async def index_documents(documents: List[Dict]):
vectors = []
for doc in documents:
embedding_response = await client.ai.run(
"@cf/baai/bge-base-en-v1.5",
account_id="your-account-id",
text=doc["content"],
)
vectors.append({
"id": doc["id"],
"values": embedding_response.data[0],
"metadata": {
"title": doc["title"],
"content": doc["content"],
"category": doc.get("category", "general"),
}
})
# Batch insert
client.vectorize.indexes.insert(
account_id="your-account-id",
index_name="product-search",
vectors=vectors,
)
# Search for similar documents
async def search(query: str, top_k: int = 5):
embedding_response = await client.ai.run(
"@cf/baai/bge-base-en-v1.5",
account_id="your-account-id",
text=query,
)
results = client.vectorize.indexes.query(
account_id="your-account-id",
index_name="product-search",
vector=embedding_response.data[0],
top_k=top_k,
)
return results RAG Pattern: AI-Powered API
Combine Workers AI and Vectorize to build a complete Retrieval-Augmented Generation (RAG) pipeline:
from js import Response, env
import json
async def on_fetch(request):
if request.method == "POST":
data = await request.json()
# Generate embedding for the query
embedding = await env.AI.run(
"@cf/baai/bge-base-en-v1.5",
text=data["query"]
)
# Semantic search for relevant context
results = await env.VECTORIZE_INDEX.query(
embedding.data[0], topK=5
)
# Build context from matches
context = "\n".join([
r.metadata.content for r in results.matches
])
# Generate response with context
response = await env.AI.run(
"@cf/meta/llama-3-8b-instruct",
messages=[
{"role": "system", "content": f"Context: {context}"},
{"role": "user", "content": data["query"]}
]
)
return Response.json({"answer": response.response}) Migration Reference
| You're Using | Replace With | Why Switch |
|---|---|---|
| Transformers + CUDA | Workers AI | No GPU management, 50+ models ready |
| LangChain + OpenAI | LangChain + Workers AI | LangChain pre-installed, multiple models |
| ChromaDB / FAISS | Vectorize | Managed vector search, global distribution |
| Hugging Face Inference API | Workers AI | Lower latency, integrated platform |