Web URLs¶

Fetches content from web URLs indexed via the Carbon Web Scraper integration. Given selected Carbon file IDs, it queries the Carbon service and returns a single concatenated string containing the scraped text, segmented per file.

Usage¶

Use this node to pull text content from one or more web URLs that were previously indexed with the Carbon Web Scraper. Typically placed early in a workflow to gather web-sourced data for downstream tasks like retrieval-augmented generation, summarization, or analysis. Provide the Carbon file IDs (from a file/URL picker) and optionally a query to filter matching chunks on the server side.

Inputs¶

Field	Required	Type	Description	Example
file_ids	True	CARBON_FILE_IDS	JSON string of Carbon file IDs associated with the Web Scraper integration. These refer to the web URLs whose content should be retrieved.	["102938", "574839", "998877"]
query	False	STRING	Optional server-side filter to narrow the selection of returned content (e.g., matching keywords within the indexed data).	error handling best practices

Outputs¶

Field	Type	Description	Example
Output	STRING	A single concatenated string containing the retrieved content. Each file’s content is wrapped in a simple tag structure, for example: ....	...content... ...content chunk...

Important Notes¶

Integration binding: This node is bound to the Web Scraper integration; ensure the provided file_ids correspond to web URLs indexed by Carbon.
Input format: file_ids must be a valid JSON string (e.g., an array of IDs). Invalid JSON will cause the node to fail.
Hidden identity: salt_user_id is injected by the platform and should not be manually set.
Output structure: The result is a single STRING containing multiple sections, not a list of strings.
Filtering: The optional query input filters results on the server side before returning content.
Timeouts: The request to the Carbon service has a default timeout; large requests or service issues may cause timeouts.

Troubleshooting¶

Invalid file_ids JSON: If you see a JSON decoding error, ensure file_ids is a properly formatted JSON string (e.g., ["123", "456"]).
Empty output: If the Output is empty, verify that the provided file IDs exist, belong to the Web Scraper integration, and that your query (if set) is not overly restrictive.
Service request failed: Network or service errors indicate the Carbon service may be unreachable or down. Retry later and verify service health and connectivity.
Unexpected content format: The output is a single concatenated string with wrappers. If a downstream node expects a list, add a parsing step to split by tags.
Slow or timed-out requests: Reduce the number of file IDs, remove overly broad queries, or try again later. Very large datasets can increase response time.