Receiver Setup
Introduction
The receiver component is the initial point of data capture in our ingestion pipeline. It listens on designated ports to collect incoming data, processes it through various transformations, and forwards it to the appropriate destinations. This document provides a comprehensive overview of the receiver's architecture, configuration, and the logic behind its control field mappings.
Key Concepts
Components
The receiver comprises four main components:
Source: Captures incoming data.
Timestamp Transforms: Normalizes timestamp fields.
Event Category Mapping: Assigns events to categories based on specific criteria.
Sink: Forwards processed data to destinations like Kafka or Prometheus.
Control Fields
.tags.event.category: Specifies the category of the event..tags.event.org_id: Identifies the originating organization of the event.
Detailed Explanation
The receiver has four key components:
Source Configuration (
hypersec-receiver-source.yml): Sets up the listener on specified ports.Timestamp Transform (
hypersec-receiver-transform-timestamp.yml): Normalizes and validates timestamp fields.Event Category Transform (
hypersec-receiver-transform-event-category.yml): Maps events to categories.Sink Configuration (
hypersec-receiver-sink.yml): Defines where to send the processed data.
1. Source Configuration
The source configuration sets up Vector to listen for incoming data.
Explanation:
Type: Specifies that Vector will receive data from another Vector instance.
Address: The network address and port where Vector listens for incoming data.
TLS Configuration: Ensures secure communication using mutual TLS authentication.
2. Timestamp Transformation
This step ensures that timestamp fields are normalized and consistent across all events, regardless of the format or source time zone. It includes logic for parsing, validating, and adjusting timestamps to a standardized format that can be used uniformly throughout the ingestion pipeline.
Key Points:
Setting
timestamp_received: Captures the exact time when the event was ingested into the system.Normalization Logic: Parses and validates timestamps from various fields and formats, converting them into a standard format (ISO 8601).
Timestamp Normalization Process:
Identify Timestamp Fields: Uses an enrichment table (
time_fields) to determine which fields contain timestamp data. The table (hypersec-enrichment-time-fields.csv) lists fields likeEventReceivedTime,@timestamp,event.created, and others, each marked whether it should update the main.timestampfield.Enrichment Table Details:
Each entry in the table has:
iteration_group: Identifies the group for which the field should be processed (e.g.,receiver).time_field: Specifies the field containing timestamp data.set_timestamp: A boolean indicating whether this field should overwrite the primary.timestampfield.
Parse Timestamps: Attempts to parse each identified timestamp field. The logic handles different formats, including:
Epoch Timestamps: Converts timestamps from seconds, milliseconds, or nanoseconds into a standard UTC string.
String-based Timestamps: Uses a list of supported formats (
$SUPPORTED_TIMESTAMP_FORMAT) to attempt parsing. If a timezone offset is not provided in the timestamp string, the transform logic appends the appropriate offset based on matched fields.
Timezone Matching:
Uses the
hypersec-enrichment-ts-match-tz.csvenrichment table to determine timezone adjustments. If a specificmatch_fieldandmatch_field_valuecombination exists (e.g.,tags.event.categorymatcheslogs_beats_winlogbeat), it will use the definedtimezoneandoffset.If no match is found, the system checks for a timezone from
tags.collector.timezoneor defaults toUTC.
Timezone Offset Application:
Converts the timestamp using the identified timezone offset from
hypersec-enrichment-tz-offset-mapping.csv, mapping time zone identifiers (likeAfrica/Accra) to their respective UTC offsets.The offset is applied to adjust timestamps for accurate event time representation.
Validate Timestamps: Validates parsed timestamps to ensure they are not set in the future:
Compares each parsed timestamp against a threshold (e.g., 30 minutes beyond the current time).
If a timestamp is invalid or in the future, it is replaced with the current time (
now()), and an error tag (tags.event.error) is appended to indicate the issue.
Fallback Mechanism: If parsing fails or results in an invalid timestamp, the current time (
now()) is used as a fallback. This ensures that every event has a usable timestamp, even if the original data is incomplete or incorrect.Final Timestamp Assignment:
Updates the
.timestampfield with the value from the parsed time field ifset_timestampistrue.Ensures a standardized timestamp format is used across all events for downstream processing.
Environment Variables:
FIELD_SETTER_ITERATION_GROUP: Specifies the group for enrichment table lookups.VECTOR_ENRICHMENT_PATH: Path where enrichment files likehypersec-enrichment-time-fields.csv,hypersec-enrichment-ts-match-tz.csv, andhypersec-enrichment-tz-offset-mapping.csvare stored.SUPPORTED_TIMESTAMP_FORMAT: A list of timestamp string formats that the system attempts to parse.
3. Event Category Mapping
This step assigns a category to each event to ensure that events are correctly routed and processed downstream. By default, the hypercollector or the log collection method should set the .tags.event.category field to the appropriate value. This ensures that events are routed correctly without additional processing.
Explanation of Control Field Logic
Purpose: To categorize events for downstream processing and analytics, ensuring they are routed to the correct Kafka topics and handled appropriately.
Control Fields:
.tags.event.category: Used to route events to the correct topic in Kafka. It should ideally be set by the hypercollector or collection method..tags.event.org_idand.org_id: Identify the source organization of the event.
When to Use Event Category Mapping:
There are three primary scenarios where you need to use the receiver's event category mapping functionality:
Lack of Control at Collection:
If you cannot set the
.tags.event.categoryduring data collection, you can leverage the receiver's category mapping to assign categories based on event content.
Splitting a Source into Multiple Categories:
If you need to split events from a single source into separate categories for downstream processing, the mapping functionality allows you to assign different categories based on specific event characteristics.
Overriding Existing Categories:
If you need to override the category set by the hypercollector or source—for instance, to standardize categories or correct misclassifications—the mapping logic can replace the existing
.tags.event.categoryvalue.
High-Level Logic:
Load Mapping Rules:
Fetch rules from the
event_category_mapenrichment table.
Apply Rules:
Iterate over the rules to find a matching condition based on event fields.
Set or Override Category:
Once a match is found, set the
.tags.event.categoryfield accordingly and updateis_mappedtotrueto prevent further mapping. This will override any existing category value.
Mapping CSV File
The event_category_map enrichment table contains specific mappings to override or set the event category when necessary. Below is an example of the mapping CSV file:
receiver
tags.event.type
event.linux.host
tags.event.category
logs_syslog_linux
HyperCollector default integrated Linux host syslog data (TCP 12205)
receiver
tags.event.type
event.linux.audit
tags.event.category
logs_syslog_linux_audit
HyperCollector default integrated Linux audit data (TCP 12201)
receiver
tags.event.type
event.netflow
tags.event.category
logs_netflow
NetFlow v9, v5, IPFIX, sFlow
receiver
SourceModuleType
im_msvistalog
tags.event.category
logs_nxlog_windows
HyperCollector default integrated NXLog CE Windows
receiver
tags.event.type
event.hypersec.windows
tags.event.category
logs_hypersec_windows
HyperSec native Windows agent data
receiver
tags.event.type
internal.metric
tags.event.category
logs_hypercol_metric
HyperCollector metrics
receiver
tags.event.type
internal.log
tags.event.category
logs_hypercol_internal
HyperCollector internal logs
receiver
tags.event.type
event.linux.syslog
tags.event.category
logs_syslog_linux
HyperCollector Linux syslog data
Explanation of Fields:
iteration_group: Specifies the group of mappings to apply; in this case, it's
receiver.match_field: The field in the event data to match.
match_field_value: The value to match within the specified field.
set_field: The field to set when a match is found.
set_value: The value to assign to the set field.
comment: Provides additional context about the mapping.
Example Usage:
The mapping file is used to split syslog Linux events from Linux audit events or to override categories when necessary.
Default Logic for Unmapped Sources
For sources that are not explicitly mapped:
Default Assignment:
The
.tags.event.categoryis expected to be set by the hypercollector or the collection method. If the category is already present in the event data, no further action is needed.
Metadata Checks:
For core hypercollector sources, there is logic to detect event category values from metadata fields, such as
[email protected]_categoryor[email protected].If the event does not have
[email protected]_categoryor[email protected]set but the.tags.event.typefield is present, the receiver will map.tags.event.typeto.tags.event.categoryor set it tologs_syslogif appropriate.
Fallback to Unmatched:
If the category cannot be determined, the receiver assigns
unmatchedto.tags.event.category.
Additional Processing Steps
Format Category Field: Replace dots and dashes with underscores in the category name to ensure consistency in topic naming and downstream processing.
Organization ID Handling: Set
.tags.event.org_idand.org_id: Ensure both fields are set, defaulting tounknownif necessary.
Example Scenario: HyperCollector Set and Unmapped
Example Scenarios
Scenario 1: HyperCollector or Custom source sets category, Unmapped Source
Given:
An event from a source not listed in the
event_category_map.The event includes the field
.tags.event.categoryset by the hypercollector.
Process:
Mapping Check:
No matching rule is found in the
event_category_map.
Default Logic:
Check for Existing Category: The receiver finds that
.tags.event.categoryis already set.Use Existing Category: The receiver uses this value without further mapping.
Outcome:
The event is processed with the existing category, ensuring correct routing.
Scenario 2: HyperCollector or custom source does not set category, Unmapped Source
Given:
An event from a source not listed in the
event_category_map.The event does not include the field
.tags.event.category.
Process:
Mapping Check:
No matching rule is found in the
event_category_map.
Default Logic:
Check for Existing Category: The receiver finds that
.tags.event.categoryis not set.Metadata Checks: Attempts to extract category information from
.@metadatafields but finds none.Metadata Checks: Attempts to extract category information from
.tags.event.typefields but finds none.Set Category to Unmatched: The receiver assigns
unmatchedto.tags.event.category.
Outcome:
The event is forwarded to the
unmatchedtopic for review and potential action.
Scenario 3: HyperCollector Sets Category, Overridden by Mapping
Given:
An event from a source listed in the
event_category_map.The event includes the field
.tags.event.categoryset by the hypercollector.
Process:
Mapping Check:
A matching rule is found in the
event_category_mapbased on event fields (e.g.,tags.event.type).
Override Category:
Apply Mapping: The receiver overrides the existing
.tags.event.categorywith the value specified in the mapping (set_value).Update
is_mapped: Setsis_mappedtotrueto prevent further mapping.
Outcome:
The event is processed with the new category from the mapping, ensuring it is routed according to the overridden value.
Explanation:
Reason for Overriding:
This approach is used when there is a need to standardize categories, correct misclassifications, or enforce specific routing rules that differ from the source-provided category.
Impact:
The receiver's mapping logic takes precedence over the category set by the hypercollector or source when a mapping rule matches.
Example:
Event Details:
.tags.event.typeisevent.linux.audit..tags.event.categoryis initially set tologs_syslog_linuxby the hypercollector.
Mapping Rule:
Match on
tags.event.typeequal toevent.linux.audit.Set
tags.event.categorytologs_syslog_linux_audit.
Process:
The mapping rule matches, and the receiver overrides
.tags.event.categorytologs_syslog_linux_audit.
Outcome:
The event is routed to the
logs_syslog_linux_audittopic, ensuring it is processed appropriately for audit logs.
4. Sink Configuration
Defines the destinations where processed data is sent:
Key Points:
Kafka Sink: Sends events to Kafka topics based on their category.
Dynamic Topic Assignment: Uses the
.tags.event.categoryfield to determine the topic name.TLS Configuration: Ensures secure communication with Kafka brokers.
Buffering: Configures how data is buffered before being sent to Kafka.
Environment Variables and Configurations
VECTOR_DATA_DIR: Directory for Vector's state and buffers.VECTOR_MTLS_PATH: Path to TLS certificates for secure communication.VECTOR_ENRICHMENT_PATH: Path to enrichment tables used in transformations.KAFKA_BROKERS: List of Kafka broker addresses.KAFKA_MTLS_PATH: Path to Kafka TLS certificates.VECTOR_BUFFER_SIZE: Maximum size for the disk buffer.
Examples
Scenario: Event Category Mapping
Given:
An event with the field
.source.typeequal toapache_access.The
event_category_mapcontains a rule:Match Field:
source.typeMatch Value:
apache_accessSet Field:
tags.event.categorySet Value:
web_access_logs
Process:
Mapping Check: The event's
source.typematchesapache_access.Category Assignment: Sets
.tags.event.categorytoweb_access_logs.Topic Routing: The event is sent to the Kafka topic
web_access_logs_land.
Scenario: Timestamp Normalization
Given:
An event with a timestamp field
event_timecontaining2023-10-15T12:34:56.No timezone information is present.
Process:
Timestamp Parsing: Attempts to parse
event_timeusing supported formats.Timezone Assignment:
Checks for timezone mappings in
time_match_timezone.Uses the default timezone from
tags.collector.timezoneor defaults toUTC.
Timestamp Validation: Ensures the timestamp is not in the future.
Final Assignment: Normalizes
event_timeand updates.timestampif configured.
Conclusion
The receiver component is a crucial part of our ingestion pipeline, responsible for capturing incoming data, normalizing timestamps, categorizing events, and forwarding them to the appropriate destinations. By leveraging Vector's capabilities and our custom configurations, we ensure that data is processed efficiently and accurately.
Key Takeaways:
Modular Design: The receiver is organized into sources, transforms, and sinks for clarity and maintainability.
Flexible Configuration: Environment variables and enrichment tables allow for dynamic adjustments without code changes.
Robust Mapping Logic: The control field mappings ensure that events are correctly categorized and routed.
Security: TLS configurations secure communications between components.
For further details or assistance, please refer to the configuration files or contact the technical team.
Last updated