Package Configuration

The xdr_package.yaml file is the central configuration file for the XDR Data Engine. It defines the system's behavior, data processing pipelines, and schema configurations.

Global Settings

global_settings:
  ingestion_template_s3_bucket: "development-config-bucket-afterburner"
  schema_common_version: "v001.001.005"
  schema_output_path: "../.xdr_schema_output/"
  ingestion_pipeline_output_path: "../.xdr_ingestion_pipelines_output"
  upload_ingestion_template_output_path: "../.xdr_ingestion_template_output"
  default_profile: "ghostburner"
  target_path: "~/.xdr/xdr_targets.yaml"

Organizations Configuration

Define organizations and their cluster settings:

organisations:
  - org_id: "org123"
    cluster_name: ""
  - org_id: "org456"
    cluster_name: ""

Schema Configuration

Build Settings

build_schemas:
  no_cluster_declarations_needed: true
  use_replicated_merge_tree: false

apply_schemas:
  do_add_roles: false
  do_add_columns: true

Schema Definitions

schemas:
  logs_alerts:
    name: "logs_alerts"
    meta_schema: "logs_alerts.csv"
    meta_schema_version: "v001.000.000"
    derived_schema_file_path: "logs_alerts/logs_alerts_sub.csv"
    additional_fields_config: "logs_alerts/logs_alerts_add.csv"

Ingestion Pipeline Configuration

Pipeline Globals

ingestion_pipeline_globals:
  ingestion_data_dir: "/vector-data-dir"
  ingestion_mtls_path: "/etc/vector_tls"
  kafka_mtls_path: "/etc/vector_mtls"

Pipeline Definition

Each pipeline can have multiple stages:

ingestion_pipelines:
  - name: "logs_alerts"
    stages:
      - name: "finalise"
        description: "Finalization stage"
        config:
          kafka_source_topic: "logs_alerts"
          kafka_sink_topic: "logs_alerts_load"
          # ... additional configuration

Vector Templates

Define vector transformation templates:

vector_templates:
  core:
    - name: "000-source-file.yml"
      version: "v001.000.000"
    - name: "001-source-kafka-aws-saas.yml"
      version: "v001.000.000"

Best Practices

  1. Version Management

    • Use semantic versioning for schemas and templates

    • Document version changes

    • Maintain backward compatibility

  2. Pipeline Configuration

    • Group related transformations into stages

    • Use descriptive names for pipelines and stages

    • Document pipeline dependencies

  3. Schema Management

    • Keep schema definitions organized

    • Use consistent naming conventions

    • Document schema relationships

  4. Resource Management

    • Configure appropriate resource limits

    • Monitor pipeline performance

    • Adjust configurations based on usage patterns

Common Tasks

Adding a New Schema

  1. Define schema in the schemas section

  2. Specify meta schema and version

  3. Add any derived schema configurations

  4. Configure additional fields if needed

Creating a New Pipeline

  1. Add pipeline definition under ingestion_pipelines

  2. Configure required stages

  3. Define transformation steps

  4. Set appropriate resource limits

Updating Vector Templates

  1. Add new template under vector_templates

  2. Specify version number

  3. Reference in pipeline configurations

  4. Test transformations before deployment

Last updated