XDR Data Engine Overview
What is XDR Data Engine?
XDR Data Engine is a powerful toolset designed to streamline the management and packaging of security data processing components. It provides a comprehensive solution for handling beats schemas, vector templates, and XDR configurations in an automated and efficient manner.
Prerequisites and Dependencies
Required Access
GitLab account with access to HyperSec repositories
GitLab access token with registry read permissions
Access to the following package repositories:
Python Wheel Package
Derive Schemas Package
Meta Schemas Package
Ingestion Pipelines Package
System Requirements
Python 3.11 or higher
Docker and Docker Compose
Git
curl, wget, unzip
4GB RAM minimum (8GB recommended)
20GB disk space minimum
Core Components
1. Schema Management
Build and manage ClickHouse schemas for security data
Version control and track schema changes
Automated schema updates and migrations
Performance optimization through intelligent indexing
2. Data Pipeline Management
Configure and manage Vector-based data ingestion pipelines
Template-based configuration for consistent data processing
Support for multiple data sources and formats
Efficient data routing and transformation
3. Hunt Framework
Execute and manage threat hunting operations
Parallel processing capabilities
Real-time hunt status monitoring
Flexible hunt configuration options
4. CLI Tools
Comprehensive command-line interface for all operations
Automated workflow support
Configuration management
Health monitoring and diagnostics
Package Structure
The XDR Data Engine consists of several key packages:
Core Python Package
Main CLI tool and core functionality
Available as a wheel package from GitLab registry
Schema Packages
Derive Schemas: Base schema definitions
Meta Schemas: Schema metadata and relationships
Used for data structure management
Ingestion Pipeline Package
Vector templates and configurations
Data transformation rules
Pipeline definitions
XDE Overview
The system components includes:
Schema Builder
Creates and maintains data schemas
Manages schema versions
Handles schema migrations
Vector Pipeline Manager
Manages data ingestion
Handles data routing
Processes transformations
Hunt Framework
Executes hunting operations
Manages hunt scheduling
Processes results
API Servers
Schema management API
Hunt operation API
Configuration API
Configuration Hierarchy
Settings are applied in the following order:
Environment Variables
CLI Parameters
xdr_package.yaml settings
xdr_targets.yaml settings
Best Practices
1. Installation and Setup
Use the provided setup script
Keep packages updated
Follow version control best practices
Document custom configurations
2. Schema Management
Version all schemas
Test changes in development
Monitor performance impacts
Keep schema documentation updated
3. Pipeline Configuration
Use templates consistently
Validate changes in test environment
Monitor pipeline performance
Document transformations
4. Hunt Operations
Set appropriate timeouts
Configure thread limits
Monitor execution status
Document hunt rules
Getting Started
For detailed setup instructions, see the Quick Start Guide.
Additional resources:
Package Configuration
Schema Management
Support and Resources
Release Notes
GitLab Repository: https://gitlab.com/hypersec-repo/hyperstack/xdr-data-engine
Last updated