Receiver Setup

Introduction

The receiver component is the initial point of data capture in our ingestion pipeline. It listens on designated ports to collect incoming data, processes it through various transformations, and forwards it to the appropriate destinations. This document provides a comprehensive overview of the receiver's architecture, configuration, and the logic behind its control field mappings.

Key Concepts

Components

The receiver comprises four main components:

Source: Captures incoming data.
Timestamp Transforms: Normalizes timestamp fields.
Event Category Mapping: Assigns events to categories based on specific criteria.
Sink: Forwards processed data to destinations like Kafka or Prometheus.

Control Fields

.tags.event.category: Specifies the category of the event.
.tags.event.org_id: Identifies the originating organization of the event.

Detailed Explanation

The receiver has four key components:

Source Configuration (hypersec-receiver-source.yml): Sets up the listener on specified ports.
Timestamp Transform (hypersec-receiver-transform-timestamp.yml): Normalizes and validates timestamp fields.
Event Category Transform (hypersec-receiver-transform-event-category.yml): Maps events to categories.
Sink Configuration (hypersec-receiver-sink.yml): Defines where to send the processed data.

1. Source Configuration

The source configuration sets up Vector to listen for incoming data.

Explanation:

Type: Specifies that Vector will receive data from another Vector instance.
Address: The network address and port where Vector listens for incoming data.
TLS Configuration: Ensures secure communication using mutual TLS authentication.

2. Timestamp Transformation

This step ensures that timestamp fields are normalized and consistent across all events, regardless of the format or source time zone. It includes logic for parsing, validating, and adjusting timestamps to a standardized format that can be used uniformly throughout the ingestion pipeline.

Key Points:

Setting timestamp_received: Captures the exact time when the event was ingested into the system.
Normalization Logic: Parses and validates timestamps from various fields and formats, converting them into a standard format (ISO 8601).

Timestamp Normalization Process:

Identify Timestamp Fields: Uses an enrichment table (time_fields) to determine which fields contain timestamp data. The table (hypersec-enrichment-time-fields.csv) lists fields like EventReceivedTime, @timestamp, event.created, and others, each marked whether it should update the main .timestamp field.
Enrichment Table Details:
- Each entry in the table has:
  - iteration_group: Identifies the group for which the field should be processed (e.g., receiver).
  - time_field: Specifies the field containing timestamp data.
  - set_timestamp: A boolean indicating whether this field should overwrite the primary .timestamp field.
Parse Timestamps: Attempts to parse each identified timestamp field. The logic handles different formats, including:
- Epoch Timestamps: Converts timestamps from seconds, milliseconds, or nanoseconds into a standard UTC string.
- String-based Timestamps: Uses a list of supported formats ($SUPPORTED_TIMESTAMP_FORMAT) to attempt parsing. If a timezone offset is not provided in the timestamp string, the transform logic appends the appropriate offset based on matched fields.
Timezone Matching:
- Uses the hypersec-enrichment-ts-match-tz.csv enrichment table to determine timezone adjustments. If a specific match_field and match_field_value combination exists (e.g., tags.event.category matches logs_beats_winlogbeat), it will use the defined timezone and offset.
- If no match is found, the system checks for a timezone from tags.collector.timezone or defaults to UTC.
Timezone Offset Application:
- Converts the timestamp using the identified timezone offset from hypersec-enrichment-tz-offset-mapping.csv, mapping time zone identifiers (like Africa/Accra) to their respective UTC offsets.
- The offset is applied to adjust timestamps for accurate event time representation.
Validate Timestamps: Validates parsed timestamps to ensure they are not set in the future:
- Compares each parsed timestamp against a threshold (e.g., 30 minutes beyond the current time).
- If a timestamp is invalid or in the future, it is replaced with the current time (now()), and an error tag (tags.event.error) is appended to indicate the issue.
Fallback Mechanism: If parsing fails or results in an invalid timestamp, the current time (now()) is used as a fallback. This ensures that every event has a usable timestamp, even if the original data is incomplete or incorrect.
Final Timestamp Assignment:
- Updates the .timestamp field with the value from the parsed time field if set_timestamp is true.
- Ensures a standardized timestamp format is used across all events for downstream processing.

Environment Variables:

FIELD_SETTER_ITERATION_GROUP: Specifies the group for enrichment table lookups.
VECTOR_ENRICHMENT_PATH: Path where enrichment files like hypersec-enrichment-time-fields.csv, hypersec-enrichment-ts-match-tz.csv, and hypersec-enrichment-tz-offset-mapping.csv are stored.
SUPPORTED_TIMESTAMP_FORMAT: A list of timestamp string formats that the system attempts to parse.

3. Event Category Mapping

This step assigns a category to each event to ensure that events are correctly routed and processed downstream. By default, the hypercollector or the log collection method should set the .tags.event.category field to the appropriate value. This ensures that events are routed correctly without additional processing.

Explanation of Control Field Logic

Purpose: To categorize events for downstream processing and analytics, ensuring they are routed to the correct Kafka topics and handled appropriately.
Control Fields:
- .tags.event.category: Used to route events to the correct topic in Kafka. It should ideally be set by the hypercollector or collection method.
- .tags.event.org_id and .org_id: Identify the source organization of the event.

When to Use Event Category Mapping:

There are three primary scenarios where you need to use the receiver's event category mapping functionality:

Lack of Control at Collection:
- If you cannot set the .tags.event.category during data collection, you can leverage the receiver's category mapping to assign categories based on event content.
Splitting a Source into Multiple Categories:
- If you need to split events from a single source into separate categories for downstream processing, the mapping functionality allows you to assign different categories based on specific event characteristics.
Overriding Existing Categories:
- If you need to override the category set by the hypercollector or source—for instance, to standardize categories or correct misclassifications—the mapping logic can replace the existing .tags.event.category value.

High-Level Logic:

Load Mapping Rules:
- Fetch rules from the event_category_map enrichment table.
Apply Rules:
- Iterate over the rules to find a matching condition based on event fields.
Set or Override Category:
- Once a match is found, set the .tags.event.category field accordingly and update is_mapped to true to prevent further mapping. This will override any existing category value.

Mapping CSV File

The event_category_map enrichment table contains specific mappings to override or set the event category when necessary. Below is an example of the mapping CSV file:

iteration_group

match_field

match_field_value

set_field

set_value

comment

receiver

tags.event.type

event.linux.host

tags.event.category

logs_syslog_linux

HyperCollector default integrated Linux host syslog data (TCP 12205)

receiver

tags.event.type

event.linux.audit

tags.event.category

logs_syslog_linux_audit

HyperCollector default integrated Linux audit data (TCP 12201)

receiver

tags.event.type

event.netflow

tags.event.category

logs_netflow

NetFlow v9, v5, IPFIX, sFlow

receiver

SourceModuleType

im_msvistalog

tags.event.category

logs_nxlog_windows

HyperCollector default integrated NXLog CE Windows

receiver

tags.event.type

event.hypersec.windows

tags.event.category

logs_hypersec_windows

HyperSec native Windows agent data

receiver

tags.event.type

internal.metric

tags.event.category

logs_hypercol_metric

HyperCollector metrics

receiver

tags.event.type

internal.log

tags.event.category

logs_hypercol_internal

HyperCollector internal logs

receiver

tags.event.type

event.linux.syslog

tags.event.category

logs_syslog_linux

HyperCollector Linux syslog data

Explanation of Fields:

iteration_group: Specifies the group of mappings to apply; in this case, it's receiver.
match_field: The field in the event data to match.
match_field_value: The value to match within the specified field.
set_field: The field to set when a match is found.
set_value: The value to assign to the set field.
comment: Provides additional context about the mapping.

Example Usage:

The mapping file is used to split syslog Linux events from Linux audit events or to override categories when necessary.

Default Logic for Unmapped Sources

For sources that are not explicitly mapped:

Default Assignment:
- The .tags.event.category is expected to be set by the hypercollector or the collection method. If the category is already present in the event data, no further action is needed.
Metadata Checks:
- For core hypercollector sources, there is logic to detect event category values from metadata fields, such as [email protected]_category or [email protected].
- If the event does not have [email protected]_category or [email protected] set but the .tags.event.type field is present, the receiver will map .tags.event.type to .tags.event.category or set it to logs_syslog if appropriate.
Fallback to Unmatched:
- If the category cannot be determined, the receiver assigns unmatched to .tags.event.category.

Additional Processing Steps

Format Category Field: Replace dots and dashes with underscores in the category name to ensure consistency in topic naming and downstream processing.
Organization ID Handling: Set .tags.event.org_id and .org_id: Ensure both fields are set, defaulting to unknown if necessary.

Example Scenario: HyperCollector Set and Unmapped

Example Scenarios

Scenario 1: HyperCollector or Custom source sets category, Unmapped Source

Given:

An event from a source not listed in the event_category_map.
The event includes the field .tags.event.category set by the hypercollector.

Process:

Mapping Check:
- No matching rule is found in the event_category_map.
Default Logic:
- Check for Existing Category: The receiver finds that .tags.event.category is already set.
- Use Existing Category: The receiver uses this value without further mapping.
Outcome:
- The event is processed with the existing category, ensuring correct routing.

Scenario 2: HyperCollector or custom source does not set category, Unmapped Source

Given:

An event from a source not listed in the event_category_map.
The event does not include the field .tags.event.category.

Process:

Mapping Check:
- No matching rule is found in the event_category_map.
Default Logic:
- Check for Existing Category: The receiver finds that .tags.event.category is not set.
- Metadata Checks: Attempts to extract category information from .@metadata fields but finds none.
- Metadata Checks: Attempts to extract category information from .tags.event.type fields but finds none.
- Set Category to Unmatched: The receiver assigns unmatched to .tags.event.category.
Outcome:
- The event is forwarded to the unmatched topic for review and potential action.

Scenario 3: HyperCollector Sets Category, Overridden by Mapping

Given:

An event from a source listed in the event_category_map.
The event includes the field .tags.event.category set by the hypercollector.

Process:

Mapping Check:
- A matching rule is found in the event_category_map based on event fields (e.g., tags.event.type).
Override Category:
- Apply Mapping: The receiver overrides the existing .tags.event.category with the value specified in the mapping (set_value).
- Update is_mapped: Sets is_mapped to true to prevent further mapping.
Outcome:
- The event is processed with the new category from the mapping, ensuring it is routed according to the overridden value.

Explanation:

Reason for Overriding:
- This approach is used when there is a need to standardize categories, correct misclassifications, or enforce specific routing rules that differ from the source-provided category.
Impact:
- The receiver's mapping logic takes precedence over the category set by the hypercollector or source when a mapping rule matches.

Example:

Event Details:
- .tags.event.type is event.linux.audit.
- .tags.event.category is initially set to logs_syslog_linux by the hypercollector.
Mapping Rule:
- Match on tags.event.type equal to event.linux.audit.
- Set tags.event.category to logs_syslog_linux_audit.
Process:
- The mapping rule matches, and the receiver overrides .tags.event.category to logs_syslog_linux_audit.
Outcome:
- The event is routed to the logs_syslog_linux_audit topic, ensuring it is processed appropriately for audit logs.

4. Sink Configuration

Defines the destinations where processed data is sent:

Key Points:

Kafka Sink: Sends events to Kafka topics based on their category.
Dynamic Topic Assignment: Uses the .tags.event.category field to determine the topic name.
TLS Configuration: Ensures secure communication with Kafka brokers.
Buffering: Configures how data is buffered before being sent to Kafka.

Environment Variables and Configurations

VECTOR_DATA_DIR: Directory for Vector's state and buffers.
VECTOR_MTLS_PATH: Path to TLS certificates for secure communication.
VECTOR_ENRICHMENT_PATH: Path to enrichment tables used in transformations.
KAFKA_BROKERS: List of Kafka broker addresses.
KAFKA_MTLS_PATH: Path to Kafka TLS certificates.
VECTOR_BUFFER_SIZE: Maximum size for the disk buffer.

Examples

Scenario: Event Category Mapping

Given:

An event with the field .source.type equal to apache_access.
The event_category_map contains a rule:
- Match Field: source.type
- Match Value: apache_access
- Set Field: tags.event.category
- Set Value: web_access_logs

Process:

Mapping Check: The event's source.type matches apache_access.
Category Assignment: Sets .tags.event.category to web_access_logs.
Topic Routing: The event is sent to the Kafka topic web_access_logs_land.

Scenario: Timestamp Normalization

Given:

An event with a timestamp field event_time containing 2023-10-15T12:34:56.
No timezone information is present.

Process:

Timestamp Parsing: Attempts to parse event_time using supported formats.
Timezone Assignment:
- Checks for timezone mappings in time_match_timezone.
- Uses the default timezone from tags.collector.timezone or defaults to UTC.
Timestamp Validation: Ensures the timestamp is not in the future.
Final Assignment: Normalizes event_time and updates .timestamp if configured.

Conclusion

The receiver component is a crucial part of our ingestion pipeline, responsible for capturing incoming data, normalizing timestamps, categorizing events, and forwarding them to the appropriate destinations. By leveraging Vector's capabilities and our custom configurations, we ensure that data is processed efficiently and accurately.

Key Takeaways:

Modular Design: The receiver is organized into sources, transforms, and sinks for clarity and maintainability.
Flexible Configuration: Environment variables and enrichment tables allow for dynamic adjustments without code changes.
Robust Mapping Logic: The control field mappings ensure that events are correctly categorized and routed.
Security: TLS configurations secure communications between components.

For further details or assistance, please refer to the configuration files or contact the technical team.

PreviousIngestion Pipeline Overview NextClickHouse Loading

Last updated 1 year ago