The anomaly detection pipeline via ADBox can be easily customize by creating a use case. In this context, a use case is a sequence of actions to be performed and the characteristics of the desired outcome. Examples of "informal" use cases are:
- Create and train a detector using data about the Linux resource usage with using data from March.
- Create and train a detector using data about the Linux resource usage with using data from March and apply it predict the anomalies on May the 3rd.
- Use detector X for real-time detection.
- Use detector X for batch detection on batches of size 10.
These informal use cases can be translated into real action by using a provided YAML template, as explained in the following section.
The YAML file for detector training and prediction includes parameters to configure the training and prediction processes. Below is a guide explaining the purpose of each parameter, its default value, and format.
Represents the data source index where the training data should be fetched from.
- Default value:
default
which will be processed as{current_year}-*-*
(fetches all data for the current year). - Format:
YYYY-MM-DD
or with wildcards (*
).
Specifies if the given input features include categorical features.
- Default value:
False
(default as features are numerical). - Format: Boolean (
True
orFalse
).
List of columns used as features to train the detector.
- Default value:
data.cpu_usage_%
,data.memory_usage_%
. - Format: List of strings.
Specifies if the column values should be aggregated.
- Default value:
True
. - Format: Boolean (
True
orFalse
).
- Configuration for data aggregation. Required if
aggregation
isTrue
.- fill_na_method: Method to handle null values.
- Default value:
Zero
. - Format: String to be selected from
"Linear"
,"Previous"
,"Subsequent"
,"Zero"
or"Fixed"
.
- Default value:
- padding_value: Value used when
fill_na_method
isFixed
.- Default value:
0
. - Format: Integer (if
fill_na_method
isFixed
).
- Default value:
- granularity: Granularity to aggregate the input data.
- Default value:
1min
. - Format: String (
"1min"
,"5min"
,"1s"
,"1hour"
, etc.).
- Default value:
- features: Key-value pairs of features and aggregation methods.
- Default value:
data.cpu_usage_%
:["average", "max"]
.data.memory_usage_%
:["average", "max"]
.
- Format: String to list of strings. Aggregation method to be selected among
"average"
,"max"
,"min"
,"count"
, or"sum"
.
- Default value:
- fill_na_method: Method to handle null values.
- Description: Configurations for model training.
- window_size: Size of the training window.
- Default value:
10
. - Format: Integer.
- Default value:
- epochs: Number of epochs for training.
- Default value:
30
. - Format: Integer.
- Default value:
- window_size: Size of the training window.
- Description: Name for the detector.
- Default value:
default
which is converted asdetector_<current timestamp>
. - Format: String.
- Description: Specifies the detection run mode.
- Default value:
default
. ("historical"
run mode). - Format: String (
"historical"
,"batch"
,"realtime"
).- HISTORICAL: Performs detection on historical data.
- BATCH: Performs detection on current data in batches.
- REALTIME: Performs detection on real-time data.
- Description: Date string to fetch data from the Wazuh index.
- Default value:
default
(current day). - Format:
YYYY-MM-DD
or with wildcards (*
).
- Description: Detector ID for the selected detector.
- Default value:
default
(most recently trained detector). - Format: String.
- Description: Start time for detection.
- Default value:
default
(start of the current date). - Format:
YYYY-MM-DDTHH:MM:SSZ
.
- Description: End time for detection.
- Default value:
default
(current timestamp of the current date). - Format:
YYYY-MM-DDTHH:MM:SSZ
.
- Description: Batch size for the
batch
run mode. - Default value:
10
. - Format: Integer.
training:
index_date: "default"
categorical_features: false
columns:
- "data.cpu_usage_%"
- "data.memory_usage_%"
aggregation: true
aggregation_config:
fill_na_method: "Zero"
granularity: "1min"
features:
data.cpu_usage_%:
- "average"
- "max"
data.memory_usage_%:
- "average"
- "max"
train_config:
window_size: 10
epochs: 30
display_name: "default"
prediction:
run_mode: "default"
index_date: "default"
detector_id: "default"
start_time: "default"
end_time: "default"
batch_size: 10