Malicious URLs, attachments leading to phishing or malware, and conversational payloads such as Business Email Compromise (BEC) represent some of the most concerning attack vectors facing organizations today. But this very channel is also one of the most exploited vectors for data leaks and insider threats. Traditional Data Loss Prevention (DLP) solutions often rely on static rules, keyword dictionaries, or pattern-matching (e.g., credit card numbers, social security numbers). While these methods can block obvious leaks, they fail to catch nuanced, context-rich scenarios such as:
To overcome these limitations, a new generation of DLP must go beyond signatures and patterns — it must understand deeper meaning i.e. intent of an email, and use contextual reasoning to detect DLP.
In this blog we share the approach in the NACE™ engine to detect data loss prevention in email messages.
We define DLP identification as an intent identification with contextual reasoning. Email message can be broken into two components:
m={b,h} where b is the body text and h is the set of SMTP headers.
The objective is to learn a function f such that:
f(m)→{0,1},where 1=sensitive data leakage, 0=benign communication.
Since sensitivity is rarely determined by surface tokens alone, classification must incorporate semantic, thematic, and topic identification to isolate intent, along with contextual reasoning using SMTP headers, to detect sensitive data leakage. We define the conditional probability of an email containing sensitive data as:
P(y=1∣b,h,c)
where:
Data Loss Prevention detection by NACE™
Unlike traditional DLP systems that approximate this probability using pattern-matching functions, NACE™ takes a reasoning-driven approach leveraging:
Thus, the decision boundary is not defined by token-level features but by higher-order semantics and contextual reasoning to determine Data Loss Prevention.
Conclusion
Data Loss Prevention (DLP) cannot rely solely on surface-level patterns or keywords; effective detection requires understanding the deeper intent of an email and using contextual reasoning across both the email body and headers. NACE™, Intent-based threat prevention™, AI platform achieves this by combining topic modeling, intent analysis, heuristic header checks, and contextual reasoning, creating a decision boundary based on higher-order semantics rather than simple token-level features.