Web Research Agent¶
Runs an end-to-end web research workflow for a given query and returns a structured report plus references. You can constrain research to specific URLs and/or include local documents, and select different LLM/embedding models for speed, quality, and strategy. The node composes a report request from your inputs and calls a backend research service to generate results.

Usage¶
Use this node when you need a concise or in-depth research report with cited sources. Typical flow: provide your research query, choose a report type and format, optionally add task context, and decide whether to target specific URLs, include local documents, or allow broader web search. Adjust model choices for fast summaries versus deep reasoning. The output includes the final report and a references list.
Inputs¶
| Field | Required | Type | Description | Example |
|---|---|---|---|---|
| query | True | STRING | Your research question or topic. This is the main prompt that the agent investigates. | What are the latest trends and risks in using RAG for enterprise search? |
| report_type | True | CHOICE | The style/purpose of the report. Options are the available keys from report_types (e.g., overview, deep_dive, competitive_analysis; exact options depend on environment). | deep_dive |
| report_format | True | CHOICE | The output format for the report. Options are the available keys from report_formats (e.g., markdown, bullet_summary; exact options depend on environment). | markdown |
| task_context | False | STRING | Additional context to tailor the research (e.g., form inputs, audience, constraints). Appended to the query to guide the agent. | Audience: CTOs in fintech; Emphasize compliance and cost trade-offs. |
| embed_model | True | CHOICE | Embedding model used to retrieve relevant context. Provided as a choice from available embedding models. | text-embedding-3-small |
| fast_llm_model | True | CHOICE | LLM for fast operations like quick summaries. | gpt-4o-mini |
| smart_llm_model | True | CHOICE | LLM for higher-quality reasoning and composing the main report. | gpt-4o-2024-08-06 |
| strategic_llm_model | True | CHOICE | LLM for strategy-heavy tasks (planning the approach, complex reasoning). | o1-preview |
| source_urls | False | STRING | Limit research to specific sources (no page traversal). Provide a single URL or a comma-separated list. | https://openai.com/research, https://arxiv.org |
| combine_source_urls_with_web | False | BOOLEAN | If true, also perform a general web search in addition to the provided source_urls. | true |
| local_documents_path | False | STRING | Filesystem path to a folder of local documents to include in the research corpus. | /workspace/docs/rag-evaluations |
| combine_local_documents_with_web | False | BOOLEAN | If true, also perform web search in addition to local documents. | false |
Outputs¶
| Field | Type | Description | Example |
|---|---|---|---|
| report | STRING | The generated research report in the selected format. | # RAG in Enterprise Search: 2025 Outlook 1. Summary... |
| references | STRING | A serialized list or text block of sources cited in the report. | ["https://arxiv.org/abs/2401.12345", "https://openai.com/blog/..." ] |
Important Notes¶
- Report scope: If only source_urls are provided and web search is not combined, research is constrained to those URLs; adding local_documents_path enables a hybrid mode that includes local files.
- Input formatting: source_urls must be comma-separated; no automatic site crawling or pagination is performed for a base URL.
- Context handling: task_context is appended to the query to guide the agent’s focus and tone.
- Model selection: The node prefixes models internally for the backend (e.g., openai:model-name). Ensure chosen models are available in your environment.
- Service dependency: This node calls an external research service and may take up to ~180 seconds to complete; failures from the service propagate as errors.
- Access limits: This node may be subject to workspace or plan limits; exceeding limits can block execution.
Troubleshooting¶
- Non-200 error from research service: Verify your workspace has access and that the backend endpoint is configured. Try again later or simplify the query.
- Timeouts or long waits: Reduce scope (fewer source_urls, shorter query), choose faster models, or try again during lower load.
- Empty or low-quality results: Provide clearer task_context, pick a more capable smart_llm_model, add targeted source_urls or local documents.
- Invalid source_urls parsing: Ensure URLs are comma-separated with no trailing commas and include a scheme (https://).
- Local documents not included: Confirm local_documents_path exists and is accessible; set combine_local_documents_with_web if you also expect web sources.
- Unexpected format: Check the selected report_format and report_type; choose a different option better aligned with your needs.