Many companies eventually need to create a document search system to let customers search a support wiki, help employees find answers in internal docs, or pull relevant clauses from a legal archive. This is a great use case for an AI document search agent.
But if you’ve been assigned to build a document search agent, it sounds straightforward until you hit the challenging parts: embedding uploaded files without blocking HTTP traffic, streaming answers before the model finishes thinking, and keeping API keys out of version control.
This tutorial builds a document search agent using the Laravel AI SDK, which lets you connect your application to AI providers like OpenAI, Anthropic, and Gemini. The agent accepts file uploads, processes them asynchronously, stores vector embeddings for semantic search, and streams answers to the browser in real time.
We deploy it to Laravel Cloud because the infrastructure this agent requires comes preconfigured: dedicated queue workers, a pgvector-enabled PostgreSQL database, and encrypted environment variable storage for API keys.
What the Laravel AI SDK Gives You
The Laravel AI SDK is a first-party package that unifies OpenAI, Anthropic, Gemini, and many other providers behind a single interface. Install it with:
Then add at least one provider key to .env:
Three SDK features drive the architecture of this tutorial:
- Queue support:
->queue()dispatches an agent call as a background job and returns immediately, so the HTTP response does not wait on the model. - Streaming:
->stream()returns a server-sent events response so the browser receives tokens as the model generates them. - Similarity search: The built-in
SimilaritySearchtool embeds a user's query at runtime and runs a vector similarity query against your database, without you writing the SQL.
Each of those features requires infrastructure. A queue worker for ->queue(). A WebSocket server for ->broadcastOnQueue(). A vector-capable database for SimilaritySearch. You configure each one as part of the build, not as an afterthought.
Setting Up the Document Model
Create a migration with a vector column for storing embeddings. The index() call creates an approximate nearest-neighbor index, which makes similarity queries faster as your documents table grows:
The vector column type requires the pgvector extension. Laravel's cloud hosting platform, Laravel Cloud, bundles pgvector with its managed PostgreSQL database, so vector search works in production without any extra database configuration. For local development, add the extension to your Docker Compose setup.
Add the embedding cast to the Document model so Eloquent handles array serialization correctly:
When a user uploads a file, do not generate the embedding inside the HTTP request. A 10-page PDF typically takes three to five seconds to chunk and embed. Dispatch a job instead:
Inside ProcessDocument, use the AI SDK's Embeddings class to generate and store the vector:
The ProcessDocument job runs on a queue worker, not on a web server process. That separation is the point. Embedding a batch of 50 documents can take several minutes, and that work should not compete with incoming HTTP requests for CPU time.
Building the Document Search Agent
Generate the agent class with the following Artisan command:
The agent needs two capabilities: a SimilaritySearch tool to retrieve relevant chunks from the documents table, and the RemembersConversations trait so users can ask follow-up questions without restating context:
At runtime, SimilaritySearch embeds the user's query and runs whereVectorSimilarTo on the documents table automatically. The minSimilarity(0.6) threshold filters out weak matches before they reach the model.
The RemembersConversations trait stores conversation history in the database using the tables migrated during installation. Start a new conversation with forUser() and continue it on subsequent requests with continue():
Streaming Responses to the Browser
A document search agent typically takes three to eight seconds to run the similarity query, retrieve chunks, and compose a response. A blank page for that duration drives users away. Stream the response instead.
Return ->stream() directly from a route:
The Laravel AI SDK sends a server-sent events response automatically when you return the stream from a route. On the frontend, open an EventSource connection to receive tokens as they arrive.
For workloads where you want the agent to run in a background queue and broadcast tokens as they generate, use ->broadcastOnQueue():
This requires Laravel Reverb for the WebSocket layer. On Laravel Cloud, Reverb runs inside your application cluster. Add REVERB_APP_KEY, REVERB_APP_SECRET, and REVERB_HOST to your environment's variable settings, and broadcasting works without additional infrastructure setup.
Running Queue Workers on Laravel Cloud
Both the ProcessDocument job and queued agent calls require active queue workers. In development, php artisan queue:work handles this. In production, the queue configuration determines whether your AI feature holds up under load.
Create a dedicated worker cluster in Laravel Cloud rather than attaching workers as background processes to your app cluster. Embedding jobs and agent calls are memory-intensive and long-running. When they share a cluster with your web processes, a spike in document uploads slows every incoming HTTP request. Dedicated workers eliminate that contention.
Configure the worker process with an extended timeout:
Set --timeout=120 because agent calls and embedding jobs routinely exceed the default 60-second limit. A job that silently times out produces confusing failures. Set the timeout higher than your slowest expected job and let the queue driver surface the error clearly.
One scaling consideration: each application replica spawns the configured background processes independently. If you configure 10 queue:work processes across five replicas, you end up with 50 total workers. For most applications, start with two or three worker processes per replica and scale based on queue depth.
Set OPENAI_API_KEY and other provider keys in the Laravel Cloud environment settings panel, not in a .env file committed to your repository. Laravel Cloud encrypts those variables at rest and injects them into every process in the environment, including the worker cluster. That applies to preview environments too, so you can test agent behavior against a real queue worker before the code reaches production.
After each deployment, Laravel Cloud restarts queue workers automatically. You do not run php artisan queue:restart manually.
The Cloud CLI lets you handle all of this from your terminal. Install it globally and authenticate:
On your first deploy, run cloud ship. It reads your .env file, detects installed packages, including Laravel Reverb, and walks you through environment setup step by step. After that, redeploying is a single command:
You can also set or update environment variables without leaving the terminal:
Ship It Today
The agent you built here processes files asynchronously, retrieves context with vector search, streams answers to the browser, and persists conversation history across requests. Each of those capabilities required an infrastructure decision, and those decisions are now part of the application rather than a note at the bottom of a README.
The Laravel AI SDK provides ->queue(), ->stream(), ->broadcastOnQueue(), SimilaritySearch, and RemembersConversations as first-class primitives. Laravel Cloud provides dedicated worker clusters, encrypted environment variables, automatic worker restarts, and preview environments that match production exactly.
New Laravel Cloud accounts receive $5 in usage credits. That is enough to deploy this agent, spin up a managed PostgreSQL database with pgvector, and verify that the queue workers and vector search behave correctly in a real environment before you commit to anything.
To go further with the AI SDK, Building Multi-Agent Workflows with the Laravel AI SDK covers orchestrating multiple agents together. Are you more of a video learner? Here’s a list of all our videos about AI and Laravel.
