Extending NACE™ Intent-based Threat Prevention™ AI Platform to detect Data Loss Prevention (DLP)

Written by Abhishek Singh, Co-Founder and CTO | Sep 18, 2025 1:43:31 AM

Malicious URLs, attachments leading to phishing or malware, and conversational payloads such as Business Email Compromise (BEC) represent some of the most concerning attack vectors facing organizations today. But this very channel is also one of the most exploited vectors for data leaks and insider threats. Traditional Data Loss Prevention (DLP) solutions often rely on static rules, keyword dictionaries, or pattern-matching (e.g., credit card numbers, social security numbers). While these methods can block obvious leaks, they fail to catch nuanced, context-rich scenarios such as:

An employee accidentally sharing intellectual property disguised in plain text.
A business partner sending sensitive financial updates that resemble harmless project notes.
Threat actors exfiltrating data using subtle social engineering and disguised payloads.

To overcome these limitations, a new generation of DLP must go beyond signatures and patterns — it must understand deeper meaning i.e. intent of an email, and use contextual reasoning to detect DLP.

In this blog we share the approach in the NACE™ engine to detect data loss prevention in email messages.

Detection of DLP by NACE™

We define DLP identification as an intent identification with contextual reasoning. Email message can be broken into two components:

m={b,h} where b is the body text and h is the set of SMTP headers.

The objective is to learn a function f such that:

f(m)→{0,1},where 1=sensitive data leakage, 0=benign communication.

Since sensitivity is rarely determined by surface tokens alone, classification must incorporate semantic, thematic, and topic identification to isolate intent, along with contextual reasoning using SMTP headers, to detect sensitive data leakage. We define the conditional probability of an email containing sensitive data as:

P(y=1∣b,h,c)

where:

b is the email body,
h represents the SMTP headers, and
c represents contextual reasoning based on intent, and header analysis.

Data Loss Prevention detection by NACE™

Unlike traditional DLP systems that approximate this probability using pattern-matching functions, NACE™ takes a reasoning-driven approach leveraging:

Topic modeling: to infer topics T(b).
Intent analysis: To assess whether the email body b involves sensitive or potential data leakage, NACE™ uses LLM-based fine tuned response generation along with the similarity analysis to compute cosine similarity against pre-stored embeddings representing sensitive intents.
Heuristic Analysis: to capture SMTP headers information H(h).
Contextual Reasoning: to jointly model (b,h,c) and determine sensitive data leakage in an email.

Thus, the decision boundary is not defined by token-level features but by higher-order semantics and contextual reasoning to determine Data Loss Prevention.

Conclusion

Data Loss Prevention (DLP) cannot rely solely on surface-level patterns or keywords; effective detection requires understanding the deeper intent of an email and using contextual reasoning across both the email body and headers. NACE™, Intent-based threat prevention™, AI platform achieves this by combining topic modeling, intent analysis, heuristic header checks, and contextual reasoning, creating a decision boundary based on higher-order semantics rather than simple token-level features.

View full post