Hunt Overview

High Level Framework

Hunt Framework.drawio

The Hunt Framework is an integral component of the XDR system, designed to empower security teams by facilitating the creation, scheduling, and execution of cyber detection logic across various customer databases. This document details the framework, focusing on its components, functionality, and the innovative checkpointing mechanism that ensures effective and efficient hunt operations.

TODO: Migrate confluence page content: https://hypersec.atlassian.net/wiki/spaces/HYPXDR/pages/1642889217/Hunt+Framework

Checkpointing

Checkpointing is a pivotal feature of the Hunt Framework, designed to ensure efficient execution of hunts by tracking their progress across runs. This mechanism helps avoid redundant data processing and ensures that each hunt resumes from where it left off, particularly beneficial in case of failures or interruptions.

The process of checkpointing is best understood through a detailed look at its execution flow across four specific runs:

Hunt Checkpoint Workflow

Runs 1

  • Pre Checkpoint: null - Marks the initial state before the hunt begins.

  • Execution Inputs:

    • start_time: 12:00:00 - Scheduled start time of the hunt.

    • log_buffer: 60 sec - The buffer time allocated for log processing.

    • table_name, rule_name: Identifiers for the hunt's target table and rule.

  • Rule Execution: 12:01:00 - The actual execution time of the rule.

  • Post Checkpoint: 11:59:00 - Updated to reflect successful execution and progress.

Run 2

  • Pre Checkpoint: 11:59:00 - Unchanged from the previous runs due to the failure.

  • Execution Inputs: Adjustments are made to account for the new execution time.

  • Rule Execution: 12:21:00 - The execution is successful, moving the checkpoint forward.

  • Post Checkpoint: Remains at 11:59:00 due to rule failure, indicating no progress was made.

Run 3

  • Pre Checkpoint: 11:59:00 - Unchanged from the previous runs due to the failure.

  • Execution Inputs: Adjustments are made to account for the new execution time.

  • Rule Execution: 12:21:00 - The execution is successful, moving the checkpoint forward.

  • Post Checkpoint: 12:19:00 - Updated to reflect successful execution and progress.

Run 4

  • Pre Checkpoint: 12:19:00 - Updated to reflect the last successful checkpoint.

  • Execution Inputs: Similar to previous runs, with necessary adjustments for time.

  • Rule Execution: Successfully completed at 12:21:00.

  • Post Checkpoint: 12:29:00 - Further updated to reflect the new state of progression.

Significance of Checkpointing

Checkpointing plays a critical role in maintaining efficiency in hunt operations, especially across distributed query engines, by offering:

  • Resilience: Allows hunts to resume from the last successful checkpoint, preventing unnecessary data reprocessing.

  • Efficiency: Reduces the system's workload by avoiding duplicated efforts and focusing only on new or unprocessed data.

  • Accuracy: Guarantees comprehensive coverage of all relevant data up until the most recent checkpoint.

Deployment and Operations

Implementing hunt operations and the checkpointing feature requires careful planning and setup. Key steps include configuring hunt YAML files, deploying the Hunt Scheduler, and ensuring executable rules are accurately defined and tested.

It’s recommended that users regularly review hunt outcomes, adjust configurations as necessary, and monitor system performance to ensure the Hunt Framework continues to deliver optimal results in detecting and responding to cyber threats.

Hunt Scheduling Workflow

Hunt Scheduling Workflow

Event Loop Initialization

  • At the beginning of the asyncio program, an event loop is initialized. This loop acts as a manager for asynchronous operations.

Coroutines and Tasks

  • Coroutines, defined with async def, represent units of work.

  • Tasks are created from these coroutines to schedule and execute them concurrently within the event loop.

Distribution of Work

  • When run with multiple tasks, such as 40 queries and 40 customers, asyncio efficiently distributes them across different threads.

Threading Pool Execution

  • The event loop oversees a threading pool, comprising multiple threads capable of executing tasks concurrently.

  • Each task submitted to the threading pool is executed in its own thread, allowing multiple blocking tasks to run simultaneously without impeding the event loop.

Concurrency and Parallelism

  • Asynchronous tasks, including coroutines and blocking tasks, are executed concurrently. This means that tasks progress simultaneously, enhancing overall performance.

Task Distribution

  • Tasks are distributed across the available threads in the threading pool by the event loop.

  • The event loop manages the scheduling and execution of these tasks, ensuring optimal utilization of system resources.

Event Loop Management

  • Continuously monitoring for events, such as completion of I/O operations or firing of timer events, the event loop schedules and dispatches tasks accordingly.

Last updated