CLI Beta Features

Introduction

Welcome to the Vector Ingestion Pipeline Builder! This tool facilitates the configuration and deployment of ingestion pipelines using Vector.

Description

The Vector Ingestion Pipeline Builder simplifies the process of configuring and deploying ingestion pipelines by providing a clean and intuitive command-line interface (CLI).

Configuration Files

To get started with the Vector Ingestion Pipeline Builder, follow these steps:

  1. Configuration: Create a xdr_package.yaml file with the necessary settings and define target configuration file xdr_targets.yaml for each environment.

  2. Usage: Use the provided CLI commands to build ingestion pipelines, upload configurations, templates, and MaxMind GeoIP databases to your desired storage location.

xdr_package.yaml

The xdr_package.yaml file serves as the central configuration file for the Vector Ingestion Pipeline Builder. It contains global settings, such as file paths and bucket names, as well as definitions for ingestion pipelines and templates.

Example xdr_package.yaml format :

global_settings:
  ingestion_template_s3_bucket: ghostburner-config-bucket
  ingestion_pipeline_output_path: ../.xdr_ingestion_pipelines_output 
  default_target: localhost
  target_path: ~/.xdr/xdr_targets.yaml


ip_config_standard_enrichment:
  # Paths to standard enrichment files
  - name: hypersec-enrichment-ch-array-fields.csv
    version: 100.000.000 
  # Other enrichment configurations...

ip_config_receiver:
  # Receiver configuration paths
  - name: hypersec-receiver-event-category-map.csv
    version: 100.000.000 
  # Other receiver configurations...

vector_enrichment_maxmind_geoip:
- name: GeoLite2-Country.mmdb
  version: 100.000.000
- name: GeoLite2-City.mmdb
  version: 100.000.000
- name: GeoLite2-ASN.mmdb
  version: 100.000.000

vector_templates:
  core:
  - name: 000-source-file.yml
    version: 100.000.000
  - name: 000-source-test-file-with-receiver-emulation.yml
    version: 100.000.000
  - name: 001-source-kafka-aws-saas.yml
    version: 100.000.000
  - name: 101-transform-flatten-message.yml
    version: 100.000.000

ingestion_pipeline_globals: 
  ingestion_data_dir: "/vector-data-dir"    # This points vector pods where to save the checkpointing and state.
  ingestion_mtls_path: "/etc/vector_tls"
  kafka_mtls_path: "/etc/vector_mtls"
  kafka_brokers: "b-2.ghostburner.ra5n0s.c3.kafka.ap-southeast-2.amazonaws.com:9094,b-3.ghostburner.ra5n0s.c3.kafka.ap-southeast-2.amazonaws.com:9094,b-1.ghostburner.ra5n0s.c3.kafka.ap-southeast-2.amazonaws.com:9094"

ingestion_pipelines:
  - name: logs_nxlog_windows
    stages:
      - name: finalise
        description: "Finalization stage for processing Windows NXLog data"
        config:
          kafka_source_topic: "logs_nxlog_windows"
          kafka_source_topic_suffix: "_land"
          kafka_sink_topic: "logs_nxlog_windows_load"
          kafka_consumer_group: "vector-logs-nxlog-windows-finalize"
          container_config:
            min_replicas: 1
            max_replicas: 9
            target_memory_util_percentage: 125
            target_CPU_utilization_percentage:  125
            persistence_size: 1Gi
          config_files:
            ip_fields: '/standard_enrichment_files/hypersec-enrichment-ip-fields.csv'                  # 
            extract_fields_csv: '/standard_enrichment_files/hypersec-enrichment-extract-fields.csv'    # 
            field_setter_iteration_group: 'finalise'                                                   # 
          steps:
            - id: hs-xdr-vector-ct-all-main
              type: base
            - id: 001-source-kafka-aws-saas
              type: source
            - id: 101-transform-flatten-message
              type: transform
            - id: 104-transform-extract-fields
              type: transform
            - id: 105-transform-junk-filter
              type: filter
            - id: 115-transform-timestamp-load-field
              type: transform
            - id: 119-transform-split-ip-field
              type: transform
            - id: 129-transform-event-hash-field
              type: transform
            - id: 202-sink-kafka-aws-saas
              type: sink
      - name: load_ch
        description: "ClickHouse loading stage"
        config:
          kafka_source_topic: "logs_nxlog_windows"
          kafka_source_topic_suffix: "_load"
          kafka_consumer_group: "vector_logs_nxlog_windows-load-ch-1"
          container_config:
            min_replicas: 1
            max_replicas: 1
            target_memory_util_percentage: 125
            targetCPUUtilizationPercentage:  125
            persistence_size: 1Gi
          config_files:
            json_fields: '/standard_enrichment_files/hypersec-enrichment-json-fields.csv'
            supported_timestamp_format: '["%FT%X%.3fZ", "%FT%X%.6fZ", "%FT%X%.9fZ", "%F %X%.3f", "%F %X%.6f", "%F %X%.9f","%FT%X%.3f", "%FT%X%.6f", "%FT%X%.9f","%FT%XZ","%FT%X","%F %X", "%FT%X"]'
            remap_fields: '/standard_enrichment_files/hypersec-enrichment-ch-remap-fields.csv'
          steps:
            - id: hs-xdr-vector-ct-all-main
              type: base
            - id: 001-source-kafka-aws-saas
              type: source
            - id: 101-transform-flatten-message
              type: transform
            - id: 108-transform-ch-json-remap
              type: transform
            - id: 102-transform-remap-fields
              type: transform
            - id: 116-transform-fields-to-camel-case
              type: transform
            - id: 109-transform-ch-custom-transformations
              type: transform
            - id: 201-sink-clickhouse-saas
              type: sink
      - name: load_os
        description: "OpenSearch loading stage"
        config:
          kafka_source_topic: "logs_nxlog_windows"
          kafka_source_topic_suffix: "_load"
          kafka_consumer_group: "vector-logs-nxlog-windows-load-os-1"
          container_config:
            min_replicas: 1
            max_replicas: 1
            target_memory_util_percentage: 125
            targetCPUUtilizationPercentage:  125
            persistence_size: 1Gi
          config_files:
            remap_fields: '/standard_enrichment_files/hypersec-enrichment-ch-remap-fields.csv'
            opensearch_endpoint: 'https://vpc-ghostburner-vc3t7e5zqeeu4nsa6vrcz6hwfy.ap-southeast-2.es.amazonaws.com'
          steps:
            - id: hs-xdr-vector-ct-all-main
              type: base
            - id: 001-source-kafka-aws-saas
              type: source
            - id: 101-transform-flatten-message
              type: transform
            - id: 111-transform-os-set-data-stream-fields
              type: transform
            - id: 102-transform-remap-fields
              type: transform
            - id: 116-transform-fields-to-camel-case
              type: transform
            - id: 204-sink-opensearch-aws-saas-stream
              type: sink
      - name: load_s3
        description: "S3 loading stage"
        config:
          kafka_source_topic: "logs_nxlog_windows_load"
          kafka_source_topic_suffix: ""
          kafka_consumer_group: "vector-logs-nxlog-windows-load-s3"
          container_config:
            min_replicas: 1
            max_replicas: 1
            target_memory_util_percentage: 125
            targetCPUUtilizationPercentage:  125
            persistence_size: 1Gi
          sink_config:
            sink_aws_s3_bucket_name.: 'ghostburner-archive'
            sink_aws_s3_region: 'ap-southeast-2'
            sink_aws_s3_batch_max_bytes: "21474836480"
            sink_aws_s3_batch_timeout_secs: "3600"
          steps:
            - id: hs-xdr-vector-ct-all-main
              type: base
            - id: 001-source-kafka-aws-saas
              type: source
            - id: 101-transform-flatten-message
              type: transform
            - id: 207-sink-aws-s3
              type: sink

Target Configuration Files

Target configuration files provide specific settings for different environments, such as development, staging, and production. These files override settings defined in the xdr_package.yaml file to tailor the pipeline setup for each environment.

Example xdr_targets,yaml format:

xdr_targets.yaml

default_target: dev
targets:
  dev:
    # Development environment configurations
    ch_host: localhost
    ch_port: 8123
    hunt_config_path: ../.xdr_hunt_config/first_ten_windows_audit_hunt/hunt
    hunt_rules_path: ../.xdr_hunt_config/first_ten_windows_audit_hunt/rules
    ip_config_bucket_name: xdr_config_bucket
    ip_config_bucket_region: ap-southeast-2
    ip_config_standard_enrichment_path: /etc/vector/vector_templates/standard_enrichment_files
    ip_config_geo_ip_path: vector_templates/geoip
    ip_config_receiver_path: vector_templates/vector_receiver
    ip_templates_path: /etc/vector/vector_templates
    # Other environment configurations....

Templates Details:

Global Settings

Setting
Description

ingestion_template_s3_bucket

Specifies the S3 bucket name where ingestion templates and configuration files are stored.

Vector Core Configuration

Setting
Description

ch_array_fields

Path to the CSV file defining fields that should be treated as arrays within ClickHouse.

ch_event_subschema_map

CSV file specifying the subschema mapping for events.

ch_fields_with_invalid_char

Lists fields with characters that are considered invalid for ClickHouse and need special handling.

ch_remap_fields

Defines how certain fields should be remapped in the ClickHouse schema.

domain_fields

Specifies the fields related to domain names for enrichment or processing.

extract_fields

Lists fields where specific values need to be extracted from composite fields.

filter_whitelist

Provides a whitelist of fields and their acceptable values for filtering purposes.

ip_fields

Defines fields to be treated or processed as IP addresses.

json_fields

Specifies which fields contain JSON data and require special processing.

logs_syslog_remap_fields

Contains syslog fields mapping specific to logs ingestion and processing.

maxmind_list

Path to the CSV file listing fields enriched using MaxMind databases.

public_suffix_list

Provides the CSV file mapping domain names to their respective public suffixes.

time_fields

Lists fields related to timestamps and their formats.

ts_match_tz

Path to the file defining mappings between timestamps and timezone data.

tz_offset_mapping

Specifies mappings between geographic locations and their timezone offsets.

update_datatype

Defines fields that require datatype updates or transformations.

hypersec_receiver_event_category_map

Maps event categories to predefined classifications within the ClickHouse schema.

Vector Enrichment MaxMind GeoIP

Setting
Description

name

The name of the MaxMind GeoIP database file.

version

The version of the GeoIP database being used.

Vector Templates

Core Templates

Setting
Description

name

Unique identifier for the vector template file.

version

Specifies the version of the template to help manage updates and compatibility.

Custom Templates

Setting
Description

name

Defines the name of a custom vector template not included in the core list.

version

Indicates the version of the custom template for management and reference.

Ingstion Pipeline CLI Commands

  • build-ingestion-pipelines: Builds ingestion pipelines based on the configuration.

Example:

xdrcli build-ingestion-pipelines --xdr_package_file_path /path/to/xdr_package.yaml --ingestion_pipeline_output_path /path/to/output --xdr_root_log_path /path/to/logs --target production --target_file_path /pathto/.xdr/xdr_targets.yaml

Options:

  • --xdr_package_file_path: Path to the xdr_package.yaml configuration file. (Required)

  • --ingestion_pipeline_output_path: Path to the directory where ingestion output will be stored. (Required)

  • --xdr_root_log_path: Path to the common directory for logs. (Optional, default: current directory)

  • --target: Target name for the specific environment. (Optional, default: None)

  • --target_file_path: Path to the target's configuration file. (Optional, default: None)

  • upload-vector-core-config: Uploads core configuration files to an S3 bucket.

Example:

xdrcli upload-vector-core-config --aws-profile myprofile --log_path /path/to/logs

Options:

  • --aws-profile: AWS profile to use for authentication. (Optional, default: None)

  • --log_path: Path to the directory where logs will be stored. (Optional, default: current directory)

  • upload-vector-templates: Uploads vector templates to a specified S3 bucket.

    Example:

xdrcli upload-vector-templates --aws-profile myprofile --log_path /path/to/logs

Options:

  • --aws-profile: AWS profile to use for authentication. (Optional, default: None)

  • --log_path: Path to the directory where logs will be stored. (Optional, default: current directory)

  • upload-vector-maxmind-mmdb: Uploads MaxMind GeoIP databases to an S3 bucket.

    Example:

xdrcli upload-vector-maxmind-mmdb --aws-profile myprofile --log_path /path/to/logs

Options:

  • --aws-profile: AWS profile to use for authentication. (Optional, default: None)

  • --log_path: Path to the directory where logs will be stored. (Optional, default: current directory)

Last updated