Threat Hunting with behavior prediction

We are constantly searching for elements that help us to predict how we will be attacked and thus do something proactively to reduce these risks. Cyber Threat Intelligence (CTI) is the discipline that collects, analyzes and contextualizes information about cyber threats in order to understand who the adversaries are, how they operate, what they are looking for and what techniques they use, to anticipate attacks and strengthen defense. CTI enables security teams to make better informed decisions.

“Technique Inference Engine (TIE), presented by Matthew Turner (MITRE Engineer) at the recent Artificial Intelligence for Cybersecurity (AICS 2025) workshop, is an innovative tool that has the potential to revolutionize the way we conduct threat hunting and tackle complex adversarial campaigns. TIE uses advanced machine learning algorithms and probabilistic reasoning to analyze large volumes of data and extract behavioral patterns that indicate the presence of malicious activity.

TIE proposes the ability to infer techniques and tactics used by adversaries, even when they try to hide their tracks. This tool would allow security analysts to identify and track adversary campaigns over time, even as they change their tools and methods. The model hopes to help “predict” future moves by adversaries, allowing defenders to take proactive measures to protect their systems.

Technique Inference Engine
TIE is a predictive model developed by the MITRE Center for Threat-Informed Defense, designed to support threat analysts and incident response teams. Based on the MITRE ATT&CK® framework, TIE analyzes observed techniques, and infers possible additional techniques used in an adversarial campaign – even if they are not explicitly reported in available CTI (Cyber Threat Intelligence).
In short: TIE helps answer the question, “What else should I be looking for?”

Value Points
One of the biggest challenges in threat hunting is dealing with incomplete information. Often, CTI reports do not document all the TTPs (tactics, techniques and procedures) involved in an intrusion. TIE addresses this problem by applying recommendation models (like those used by Netflix or Spotify, but in cybersecurity), to predict techniques that are likely to be occurring, but have not yet been seen.
We can separate TIE into three elements of value:
⦁ The largest known public dataset of CTIs labeled with ATT&CK techniques.

⦁ Algorithms such as Weighted Matrix Factorization (WMF) and Bayesian Personalized Ranking (BPR) for inference techniques.

⦁ A simple and effective web interface for entering observed techniques and receiving prioritized suggestions.

Good dataset, good results
TIE was trained on the largest publicly known dataset of threat intelligence reports (CTI) tagged with ATT&CK techniques. It has a total of:
⦁ 6,236 CTI reports.

⦁ 43,899 technical observations.

⦁ 96% coverage of the ATT&CK Enterprise v15 framework.

⦁ Main sources: OpenCTI, TRAM, ATT&CK Flows, Adversary Emulation Plans and Campaigns.
This dataset reflects both the variety of techniques reported and the biases and incompleteness typical of manual analysis.

Modeling the problem
TIE models the problem as a “collaborative recommendation task” with implicit feedback. Instead of trying to guess “whether or not a technique is part of a campaign”, it focuses on predicting a ranking of likely techniques, given a partial input of observed techniques.
⦁ A binary matrix A (reports x techniques) is constructed, with ones where the technique is observed, and implicit zeros (potentially true or false negatives).

⦁ The objective is to complete the matrix: which techniques are likely from what I observed?
Simplistically, let’s imagine walking into a huge library, we know what books some people have read… but we’d like to guess what other books they might like. That’s where “recommender systems” like WMF and BPR come in. They try to “guess hidden connections” between things we have observed (e.g., techniques used by an attacker) or things that are probably also related (other techniques we haven’t seen yet).
Weighted Matrix Factorization (WMF)
This system works like putting together boxes with common tastes. Each attack report (CTI) and each ATT&CK technique is represented as a point on a map. The model tries to group reports and techniques that often appear together. If the campaign observed certain techniques, the model looks on that “map” to see which ones are nearby and suggests them as likely.
The matrix of observations mentioned above is factored into latent vectors of reports and techniques. A loss function is used that penalizes less the absences, assuming low confidence in the zeros. This gives us a fast training system, low computational cost and good performance.
Bayesian Personalized Ranking (BPR)
On the other hand, this is the “this is better than that” strategy. The model does not try to predict yes or no, but to order. It learns that if report A used technique X, then it probably prefers X over Y”. Thousands of comparisons are created and the way of ordering techniques is adjusted accordingly. This system, on the other hand, learns as if it were playing a “Which is more likely?” video game, many times, until it finds good rankings.
A negative sampling method is used here that trains the model to rank observed techniques above unobserved ones. It improves the relative order of predictions, but can be biased towards very frequent techniques. As a consequence, more training time is required.

Simple to use
The tool is based on JS, without the need of a backend, everything runs in the user’s browser. Using as input a list of observed techniques. The output is the top inferred techniques, sorted by probability of association.

Input of 3 techniques on the web interface

Output of 20 predicted techniques on the web interface

Conclusions
It is well known that the context is one of constantly evolving threats, the available information is enormous and almost never complete, tools such as TIE, represent a key step for cybersecurity teams. The approach allows for expanded visibility into adversarial campaigns without relying exclusively on explicit documentation in CTI reports.
For organizations, this means having an ally that helps prioritize the search, anticipate attacker movements and make more informed decisions in less time. The open design and ease of use make it a resource to integrate into any modern threat hunting strategy.

References
– Technique Inference Engine https://center-for-threat-informed-defense.github.io/technique-inference-engine/#/
– TIE Github Project – https://github.com/center-for-threat-informed-defense/technique-inference-engine
– Technique Inference Engine: A Recommender Model to Support Cyber Threat Hunting, Matthew J. Turner, Mike Carenzo, Jackie Lasky, James Morris-King, James Ross (March 2025) https://arxiv.org/abs/2503.04819v1

Threat Hunting with behavior prediction

Tactical Awareness and Network Teaming: Beyond Traditional Pentesting

Certifications

Achievements

Policies

RRSS