CLI Beta Features
Introduction
Welcome to the Vector Ingestion Pipeline Builder! This tool facilitates the configuration and deployment of ingestion pipelines using Vector.
Description
The Vector Ingestion Pipeline Builder simplifies the process of configuring and deploying ingestion pipelines by providing a clean and intuitive command-line interface (CLI).
Configuration Files
To get started with the Vector Ingestion Pipeline Builder, follow these steps:
Configuration: Create a
xdr_package.yamlfile with the necessary settings and define target configuration filexdr_targets.yamlfor each environment.Usage: Use the provided CLI commands to build ingestion pipelines, upload configurations, templates, and MaxMind GeoIP databases to your desired storage location.
xdr_package.yaml
xdr_package.yamlThe xdr_package.yaml file serves as the central configuration file for the Vector Ingestion Pipeline Builder. It contains global settings, such as file paths and bucket names, as well as definitions for ingestion pipelines and templates.
Example xdr_package.yaml format :
global_settings:
ingestion_template_s3_bucket: ghostburner-config-bucket
ingestion_pipeline_output_path: ../.xdr_ingestion_pipelines_output
default_target: localhost
target_path: ~/.xdr/xdr_targets.yaml
ip_config_standard_enrichment:
# Paths to standard enrichment files
- name: hypersec-enrichment-ch-array-fields.csv
version: 100.000.000
# Other enrichment configurations...
ip_config_receiver:
# Receiver configuration paths
- name: hypersec-receiver-event-category-map.csv
version: 100.000.000
# Other receiver configurations...
vector_enrichment_maxmind_geoip:
- name: GeoLite2-Country.mmdb
version: 100.000.000
- name: GeoLite2-City.mmdb
version: 100.000.000
- name: GeoLite2-ASN.mmdb
version: 100.000.000
vector_templates:
core:
- name: 000-source-file.yml
version: 100.000.000
- name: 000-source-test-file-with-receiver-emulation.yml
version: 100.000.000
- name: 001-source-kafka-aws-saas.yml
version: 100.000.000
- name: 101-transform-flatten-message.yml
version: 100.000.000
ingestion_pipeline_globals:
ingestion_data_dir: "/vector-data-dir" # This points vector pods where to save the checkpointing and state.
ingestion_mtls_path: "/etc/vector_tls"
kafka_mtls_path: "/etc/vector_mtls"
kafka_brokers: "b-2.ghostburner.ra5n0s.c3.kafka.ap-southeast-2.amazonaws.com:9094,b-3.ghostburner.ra5n0s.c3.kafka.ap-southeast-2.amazonaws.com:9094,b-1.ghostburner.ra5n0s.c3.kafka.ap-southeast-2.amazonaws.com:9094"
ingestion_pipelines:
- name: logs_nxlog_windows
stages:
- name: finalise
description: "Finalization stage for processing Windows NXLog data"
config:
kafka_source_topic: "logs_nxlog_windows"
kafka_source_topic_suffix: "_land"
kafka_sink_topic: "logs_nxlog_windows_load"
kafka_consumer_group: "vector-logs-nxlog-windows-finalize"
container_config:
min_replicas: 1
max_replicas: 9
target_memory_util_percentage: 125
target_CPU_utilization_percentage: 125
persistence_size: 1Gi
config_files:
ip_fields: '/standard_enrichment_files/hypersec-enrichment-ip-fields.csv' #
extract_fields_csv: '/standard_enrichment_files/hypersec-enrichment-extract-fields.csv' #
field_setter_iteration_group: 'finalise' #
steps:
- id: hs-xdr-vector-ct-all-main
type: base
- id: 001-source-kafka-aws-saas
type: source
- id: 101-transform-flatten-message
type: transform
- id: 104-transform-extract-fields
type: transform
- id: 105-transform-junk-filter
type: filter
- id: 115-transform-timestamp-load-field
type: transform
- id: 119-transform-split-ip-field
type: transform
- id: 129-transform-event-hash-field
type: transform
- id: 202-sink-kafka-aws-saas
type: sink
- name: load_ch
description: "ClickHouse loading stage"
config:
kafka_source_topic: "logs_nxlog_windows"
kafka_source_topic_suffix: "_load"
kafka_consumer_group: "vector_logs_nxlog_windows-load-ch-1"
container_config:
min_replicas: 1
max_replicas: 1
target_memory_util_percentage: 125
targetCPUUtilizationPercentage: 125
persistence_size: 1Gi
config_files:
json_fields: '/standard_enrichment_files/hypersec-enrichment-json-fields.csv'
supported_timestamp_format: '["%FT%X%.3fZ", "%FT%X%.6fZ", "%FT%X%.9fZ", "%F %X%.3f", "%F %X%.6f", "%F %X%.9f","%FT%X%.3f", "%FT%X%.6f", "%FT%X%.9f","%FT%XZ","%FT%X","%F %X", "%FT%X"]'
remap_fields: '/standard_enrichment_files/hypersec-enrichment-ch-remap-fields.csv'
steps:
- id: hs-xdr-vector-ct-all-main
type: base
- id: 001-source-kafka-aws-saas
type: source
- id: 101-transform-flatten-message
type: transform
- id: 108-transform-ch-json-remap
type: transform
- id: 102-transform-remap-fields
type: transform
- id: 116-transform-fields-to-camel-case
type: transform
- id: 109-transform-ch-custom-transformations
type: transform
- id: 201-sink-clickhouse-saas
type: sink
- name: load_os
description: "OpenSearch loading stage"
config:
kafka_source_topic: "logs_nxlog_windows"
kafka_source_topic_suffix: "_load"
kafka_consumer_group: "vector-logs-nxlog-windows-load-os-1"
container_config:
min_replicas: 1
max_replicas: 1
target_memory_util_percentage: 125
targetCPUUtilizationPercentage: 125
persistence_size: 1Gi
config_files:
remap_fields: '/standard_enrichment_files/hypersec-enrichment-ch-remap-fields.csv'
opensearch_endpoint: 'https://vpc-ghostburner-vc3t7e5zqeeu4nsa6vrcz6hwfy.ap-southeast-2.es.amazonaws.com'
steps:
- id: hs-xdr-vector-ct-all-main
type: base
- id: 001-source-kafka-aws-saas
type: source
- id: 101-transform-flatten-message
type: transform
- id: 111-transform-os-set-data-stream-fields
type: transform
- id: 102-transform-remap-fields
type: transform
- id: 116-transform-fields-to-camel-case
type: transform
- id: 204-sink-opensearch-aws-saas-stream
type: sink
- name: load_s3
description: "S3 loading stage"
config:
kafka_source_topic: "logs_nxlog_windows_load"
kafka_source_topic_suffix: ""
kafka_consumer_group: "vector-logs-nxlog-windows-load-s3"
container_config:
min_replicas: 1
max_replicas: 1
target_memory_util_percentage: 125
targetCPUUtilizationPercentage: 125
persistence_size: 1Gi
sink_config:
sink_aws_s3_bucket_name.: 'ghostburner-archive'
sink_aws_s3_region: 'ap-southeast-2'
sink_aws_s3_batch_max_bytes: "21474836480"
sink_aws_s3_batch_timeout_secs: "3600"
steps:
- id: hs-xdr-vector-ct-all-main
type: base
- id: 001-source-kafka-aws-saas
type: source
- id: 101-transform-flatten-message
type: transform
- id: 207-sink-aws-s3
type: sinkTarget Configuration Files
Target configuration files provide specific settings for different environments, such as development, staging, and production. These files override settings defined in the xdr_package.yaml file to tailor the pipeline setup for each environment.
Example xdr_targets,yaml format:
xdr_targets.yaml
xdr_targets.yamldefault_target: dev
targets:
dev:
# Development environment configurations
ch_host: localhost
ch_port: 8123
hunt_config_path: ../.xdr_hunt_config/first_ten_windows_audit_hunt/hunt
hunt_rules_path: ../.xdr_hunt_config/first_ten_windows_audit_hunt/rules
ip_config_bucket_name: xdr_config_bucket
ip_config_bucket_region: ap-southeast-2
ip_config_standard_enrichment_path: /etc/vector/vector_templates/standard_enrichment_files
ip_config_geo_ip_path: vector_templates/geoip
ip_config_receiver_path: vector_templates/vector_receiver
ip_templates_path: /etc/vector/vector_templates
# Other environment configurations....Templates Details:
Global Settings
ingestion_template_s3_bucket
Specifies the S3 bucket name where ingestion templates and configuration files are stored.
Vector Core Configuration
ch_array_fields
Path to the CSV file defining fields that should be treated as arrays within ClickHouse.
ch_event_subschema_map
CSV file specifying the subschema mapping for events.
ch_fields_with_invalid_char
Lists fields with characters that are considered invalid for ClickHouse and need special handling.
ch_remap_fields
Defines how certain fields should be remapped in the ClickHouse schema.
domain_fields
Specifies the fields related to domain names for enrichment or processing.
extract_fields
Lists fields where specific values need to be extracted from composite fields.
filter_whitelist
Provides a whitelist of fields and their acceptable values for filtering purposes.
ip_fields
Defines fields to be treated or processed as IP addresses.
json_fields
Specifies which fields contain JSON data and require special processing.
logs_syslog_remap_fields
Contains syslog fields mapping specific to logs ingestion and processing.
maxmind_list
Path to the CSV file listing fields enriched using MaxMind databases.
public_suffix_list
Provides the CSV file mapping domain names to their respective public suffixes.
time_fields
Lists fields related to timestamps and their formats.
ts_match_tz
Path to the file defining mappings between timestamps and timezone data.
tz_offset_mapping
Specifies mappings between geographic locations and their timezone offsets.
update_datatype
Defines fields that require datatype updates or transformations.
hypersec_receiver_event_category_map
Maps event categories to predefined classifications within the ClickHouse schema.
Vector Enrichment MaxMind GeoIP
name
The name of the MaxMind GeoIP database file.
version
The version of the GeoIP database being used.
Vector Templates
Core Templates
name
Unique identifier for the vector template file.
version
Specifies the version of the template to help manage updates and compatibility.
Custom Templates
name
Defines the name of a custom vector template not included in the core list.
version
Indicates the version of the custom template for management and reference.
Ingstion Pipeline CLI Commands
build-ingestion-pipelines: Builds ingestion pipelines based on the configuration.
Example:
xdrcli build-ingestion-pipelines --xdr_package_file_path /path/to/xdr_package.yaml --ingestion_pipeline_output_path /path/to/output --xdr_root_log_path /path/to/logs --target production --target_file_path /pathto/.xdr/xdr_targets.yamlOptions:
--xdr_package_file_path: Path to thexdr_package.yamlconfiguration file. (Required)--ingestion_pipeline_output_path: Path to the directory where ingestion output will be stored. (Required)--xdr_root_log_path: Path to the common directory for logs. (Optional, default: current directory)--target: Target name for the specific environment. (Optional, default: None)--target_file_path: Path to the target's configuration file. (Optional, default: None)upload-vector-core-config: Uploads core configuration files to an S3 bucket.
Example:
xdrcli upload-vector-core-config --aws-profile myprofile --log_path /path/to/logsOptions:
--aws-profile: AWS profile to use for authentication. (Optional, default: None)--log_path: Path to the directory where logs will be stored. (Optional, default: current directory)upload-vector-templates: Uploads vector templates to a specified S3 bucket.Example:
xdrcli upload-vector-templates --aws-profile myprofile --log_path /path/to/logsOptions:
--aws-profile: AWS profile to use for authentication. (Optional, default: None)--log_path: Path to the directory where logs will be stored. (Optional, default: current directory)upload-vector-maxmind-mmdb: Uploads MaxMind GeoIP databases to an S3 bucket.Example:
xdrcli upload-vector-maxmind-mmdb --aws-profile myprofile --log_path /path/to/logsOptions:
--aws-profile: AWS profile to use for authentication. (Optional, default: None)--log_path: Path to the directory where logs will be stored. (Optional, default: current directory)
Last updated