Web URLs¶
Creates a Carbon connector configured for the Web Scraper integration. It ingests content from one or more web URLs using the standard Carbon data-connector flow provided by the base Carbon node. Behavior (inputs/outputs, sync, and error handling) is inherited from the generic Carbon data node.

Usage¶
Use this node when you need to ingest public web pages into your Carbon data workspace for downstream search, RAG, or analytics. Place it at the beginning of a data ingestion workflow, configure the standard Carbon connector settings (as exposed by the base Carbon data node), and connect its output to nodes that index, process, or consume the ingested content.
Inputs¶
| Field | Required | Type | Description | Example |
|---|---|---|---|---|
| urls | True | Not specified | One or more web URLs to scrape and ingest. Exact format and multiplicity are defined by the base Carbon data node. | https://example.com, https://example.com/docs/* |
| connector_authentication | False | Not specified | Standard Carbon connector credentials or token input as defined by the base Carbon data node. | |
| ingestion_options | False | Not specified | Common ingestion settings (e.g., crawl depth, include/exclude patterns, scheduling) exposed by the base Carbon data node, if available. | Not specified |
| destination_dataset | False | Not specified | Target dataset or collection within Carbon where scraped content will be stored, as defined by the base Carbon data node. | web_knowledge_base |
Outputs¶
| Field | Type | Description | Example |
|---|---|---|---|
| connector_output | Not specified | The standard output produced by Carbon data connector nodes (e.g., a handle or reference to the created/updated dataset or sync job). Exact type and structure are defined by the base Carbon data node. | Not specified |
Important Notes¶
- This node is a thin wrapper that selects the Web Scraper integration. All core behavior (inputs, outputs, validation, and execution) is defined by the base Carbon data node.
- The display name for this node is "Web URLs".
- Any required authentication, workspace, or dataset configuration must be provided via the standard inputs exposed by the base Carbon data node.
- The exact input and output types are not specified in this file and may vary with the version of the base Carbon data node.
Troubleshooting¶
- If inputs are unclear or missing, refer to the base Carbon data node documentation for the full list of required/optional parameters.
- If ingestion fails, verify that the URLs are accessible and valid, and check that your Carbon credentials (if required) are correctly configured.
- If no data appears downstream, ensure the destination dataset/workspace exists and that the sync job completed successfully as reported by the base Carbon data node.