Agent Framework

RAGPythonApache 2.0

Haystack Agents

by deepset · deepset-ai/haystack

Haystack Agents is the agent layer of deepset's production-focused LLM framework. It lets you compose tool-using agents and RAG pipelines from typed components — retrievers, rerankers, generators, tools — with swappable backends and first-class evaluation.

Compared to more freewheeling frameworks, Haystack emphasizes stable contracts between components, explicit pipeline graphs, and tight feedback loops for quality. If you are shipping LLM features into a real product — not just demoing — Haystack's bias toward production ergonomics pays back quickly.

18k+

Stars

active since 2019

deepset

Backed By

Series B funded

Python 3.9+

Language

typed components

Apache 2.0

License

open source

Quick Install

pip install haystack-ai

Key Features

Typed Components

Every retriever, reranker, generator, and tool declares input/output sockets. Pipelines validate connections at build time, not runtime.

Agent Primitive

The Agent class wraps a ChatGenerator and a toolset, running the tool-call loop until a final message — no boilerplate agent state machines required.

Production Evaluation

Built-in evaluators for retrieval (MRR, recall) and generation (faithfulness, semantic similarity) let you measure pipeline quality before deploying.

Swappable LLMs

Generators for OpenAI, Anthropic, Cohere, Hugging Face, Ollama, vLLM, and any OpenAI-compatible endpoint. Swap without touching the pipeline wiring.

Document Stores

First-class integrations with Qdrant, Weaviate, Pinecone, Elasticsearch, OpenSearch, pgvector, and more — all behind the same DocumentStore interface.

Pipeline Serialization

Export any pipeline to YAML for reproducible deployments. Load in production with Pipeline.loads() — zero code paths in your config.

Execution Brief

Use this page as a rollout checklist, not just reference text.

Suggest update

Tool Mapping Lens

Organize Tools by Workflow Phase

Catalog-oriented pages work best when users can map discovery, evaluation, and rollout in a clear path instead of reading an undifferentiated list.

Define the job-to-be-done first
Group tools by stage
Prioritize by adoption friction

Actionable Utility Module

Skill Implementation Board

Use this board for Haystack Agents before rollout. Capture inputs, apply one decision rule, execute the checklist, and log outcome.

Input: Objective

Deliver one measurable improvement with haystack agents deepset rag framework python agent production

Input: Baseline Window

20-30 minutes

Input: Fallback Window

8-12 minutes

Decision Trigger	Action	Expected Output
Input: one workflow objective and release owner are defined	Run preview execution with fixed acceptance criteria.	Go or hold decision backed by repeatable evidence.
Input: output quality below baseline or retries increase	Limit scope, isolate root issue, and rerun controlled test.	One confirmed correction path before wider rollout.
Input: checks pass for two consecutive replay windows	Promote to broader traffic with fallback path active.	Stable rollout with low operational surprise.

Execution Steps

Record objective, owner, and stop condition.
Execute one controlled preview run.
Measure quality, latency, and correction burden.
Promote only when pass criteria are stable.

Output Template

tool=haystack agents deepset rag framework python agent production
objective=
preview_result=pass|fail
primary_metric=
next_step=rollout|patch|hold

Share execution feedback

What Is Haystack Agents?

Haystack is a Python framework by deepset for building LLM-powered applications — search, RAG, agents, and hybrids of all three. Haystack Agents specifically refers to the agent-building primitives introduced in Haystack 2.x, which layer tool-calling behavior on top of the framework's typed Pipeline and Component model.

The framework's core philosophy is composability with contracts. Each component declares its input and output types explicitly, and the Pipeline validates that connections are well-formed at construction time. This catches a whole class of wiring bugs that plague more dynamic frameworks, and makes Haystack pipelines safer to evolve over months of production use.

Haystack Agents give you two primary shapes: a ReAct-style reasoning loop for open-ended tool use, and a tool-call loop driven by native OpenAI / Anthropic function-calling APIs. The Agent class handles the loop, message formatting, and tool invocation so you can focus on defining tools and guarding them with validation.

The deepset team has invested heavily in evaluation. Haystack includes evaluators for retrieval quality, answer correctness, and end-to-end faithfulness. Running a pipeline through eval is a single function call, which means teams can tie CI pipelines to quality thresholds — something most agent frameworks still leave as an exercise to the reader.

How to Calculate Better Results with haystack agents deepset rag framework python agent production

Install Haystack with pip install haystack-ai. Add the integrations you need, for example pip install qdrant-haystack for Qdrant or ollama-haystack for local models. Each integration is a separate package so the core stays lean.

Define your tools. A tool is a Python function decorated (or wrapped) with the Tool class, with a clear name, description, and parameters schema. The agent uses the description to decide when to call the tool, so write it like a product spec, not a dev comment.

Instantiate a ChatGenerator (OpenAIChatGenerator, AnthropicChatGenerator, OllamaChatGenerator, etc.) and pass it plus your tools into Agent. Call agent.run(messages=[ChatMessage.from_user("...")]) and read the final assistant message out.

Wrap the agent in a Pipeline if you need RAG. Connect a retriever and reranker upstream, have them hand context to the agent, and optionally add a Router downstream for structured outputs. Serialize the pipeline to YAML for deployment.

Treat this page as a decision map. Build a shortlist fast, then run a focused second pass for security, ownership, and operational fit.

When a team keeps one shared selection rubric, tool adoption speeds up because evaluators stop debating criteria every time a new option appears.

Worked Examples

Internal support agent over company docs

Index your help-center articles into Qdrant via a Haystack indexing pipeline
Build a query pipeline: QdrantRetriever -> TransformersSimilarityRanker -> Agent
Define tools: search_tickets (Zendesk API), create_jira_issue, send_email
Wrap with an OpenAI gpt-4o-mini generator and give a clear system prompt
User asks "ticket 12345 is about SSO — find the root cause and file a Jira"
Agent retrieves SSO docs from Qdrant, calls search_tickets, reasons, and calls create_jira_issue with a composed summary

Outcome: An internal agent that combines documentation retrieval with real-world tool use, running on a stack your ops team can deploy and monitor like any other FastAPI service.

Quality-gated production deployment

You have a Haystack RAG pipeline ready for production
Write a labeled eval set of 200 Q/A pairs drawn from real user questions
Run the pipeline through evaluate() with retrieval + faithfulness evaluators
CI asserts retrieval MRR@5 > 0.75 and faithfulness > 0.9 before merging
A PR that swaps in a cheaper LLM fails the gate and gets rejected automatically
Only pipelines meeting the bar ship to production — quality regressions are caught pre-merge

Outcome: A quality safety net that turns "LLM in production" from a gut feeling into a measurable metric gated by CI.

Frequently Asked Questions

What is Haystack and Haystack Agents?

Haystack is an open-source Python framework by deepset for building LLM applications — including RAG pipelines, search systems, and tool-using agents. Haystack Agents is the agent layer introduced in Haystack 2.x that lets you assemble a ReAct-style or tool-calling agent on top of Haystack's typed Pipeline and Component primitives. It focuses on production concerns: evaluation, observability, deployment, and swappable backends.

How does Haystack compare to LangChain and LlamaIndex?

LangChain has the widest integration surface and fastest-moving ecosystem, but its abstractions have shifted repeatedly. LlamaIndex is focused on retrieval and index structures. Haystack's strength is typed components, explicit pipelines, and a production ethos inherited from deepset's search background. If you value clear contracts, stable APIs, and strong evaluation tooling, Haystack is often the steadier choice for production.

How do I build an agent with Haystack?

Install haystack-ai via pip. Create tool functions and wrap them with @tool or the Tool class. Instantiate a ChatGenerator (OpenAI, Anthropic, local vLLM, etc.), pass it and the tools to Agent, then call agent.run(messages=[...]). The agent loops internally, calling tools until it produces a final assistant message. Pipelines let you wrap an agent alongside retrievers for hybrid RAG + agent flows.

Does Haystack support local LLMs?

Yes. Haystack integrates with Ollama, Hugging Face Transformers, llama.cpp, vLLM, and any OpenAI-compatible endpoint. Swap the generator component and the rest of the pipeline stays the same. This makes Haystack particularly well-suited to on-prem or regulated environments where you cannot call a hosted API.

What does evaluation look like in Haystack?

Haystack ships evaluators for retrieval (recall, MRR, NDCG), generation (semantic answer similarity, faithfulness, context relevance), and end-to-end pipeline metrics. You define a labeled eval set once and run the same pipeline through evaluation with a one-liner. That tight loop between build and measure is one of Haystack's biggest production advantages.

Is Haystack suitable for enterprise deployments?

Yes. deepset offers deepset Cloud (hosted Haystack) and deepset Studio (a no-code builder that exports Haystack pipelines). Self-hosted, Haystack runs well in FastAPI services behind an API gateway. It has first-class support for pipeline serialization to YAML, making deployments reproducible across dev, staging, and prod environments.

Missing a better tool match?

Send the exact workflow you are solving and we will prioritize a new comparison or rollout guide.

Submit feedback