Skip to content

Web URLs

Creates a Carbon connector configured for the Web Scraper integration. It ingests content from one or more web URLs using the standard Carbon data-connector flow provided by the base Carbon node. Behavior (inputs/outputs, sync, and error handling) is inherited from the generic Carbon data node.
Preview

Usage

Use this node when you need to ingest public web pages into your Carbon data workspace for downstream search, RAG, or analytics. Place it at the beginning of a data ingestion workflow, configure the standard Carbon connector settings (as exposed by the base Carbon data node), and connect its output to nodes that index, process, or consume the ingested content.

Inputs

FieldRequiredTypeDescriptionExample
urlsTrueNot specifiedOne or more web URLs to scrape and ingest. Exact format and multiplicity are defined by the base Carbon data node.https://example.com, https://example.com/docs/*
connector_authenticationFalseNot specifiedStandard Carbon connector credentials or token input as defined by the base Carbon data node.
ingestion_optionsFalseNot specifiedCommon ingestion settings (e.g., crawl depth, include/exclude patterns, scheduling) exposed by the base Carbon data node, if available.Not specified
destination_datasetFalseNot specifiedTarget dataset or collection within Carbon where scraped content will be stored, as defined by the base Carbon data node.web_knowledge_base

Outputs

FieldTypeDescriptionExample
connector_outputNot specifiedThe standard output produced by Carbon data connector nodes (e.g., a handle or reference to the created/updated dataset or sync job). Exact type and structure are defined by the base Carbon data node.Not specified

Important Notes

  • This node is a thin wrapper that selects the Web Scraper integration. All core behavior (inputs, outputs, validation, and execution) is defined by the base Carbon data node.
  • The display name for this node is "Web URLs".
  • Any required authentication, workspace, or dataset configuration must be provided via the standard inputs exposed by the base Carbon data node.
  • The exact input and output types are not specified in this file and may vary with the version of the base Carbon data node.

Troubleshooting

  • If inputs are unclear or missing, refer to the base Carbon data node documentation for the full list of required/optional parameters.
  • If ingestion fails, verify that the URLs are accessible and valid, and check that your Carbon credentials (if required) are correctly configured.
  • If no data appears downstream, ensure the destination dataset/workspace exists and that the sync job completed successfully as reported by the base Carbon data node.