Extending NACE™ Intent-based Threat Prevention™ AI Platform for Multiple Languages

Written by Abhishek Singh, Co-Founder and CTO | Jul 9, 2025 6:57:17 PM

Our core technology, NACE™, is an Intent-based Threat Prevention™ AI platform designed to detect evasive malicious attachments and URLs from first principles—eliminating reliance on detecting malicious payloads or landing phishing URLs, and thereby making it immune to evasion techniques intended to conceal them.

The AI technologies and techniques in NACE™ perform semantic and thematic analysis to determine the purpose or deeper meaning, i.e. intent, of an email by analyzing text from the body, attachments, and subject of an email. This is achieved using fine-tuned classifiers, similarity analysis, hierarchical topic modeling, phrase-based topic modeling, and a Cross Encoder-based semantic re-ranker to derive the email’s intent. The contextual relationship between the derived intent, along with the auxiliary information from Call to Action URLs, and deep file parsing of attachments and SMTP headers are then used to determine if the attachment or URLs are malicious or benign or the conversation of an email is a BEC scam.

Since intent forms the core feature of detection, NACE™ has been extended to understand semantics in different languages besides English. This blog will share some of the details on how NACE™ handles emails in multiple languages, and the decisions which were made to support multiple languages.

Design Considerations

In order to identify the language of an email, we experimented with the LangDetect, FastText, and Lingua (in both standard and low-accuracy modes). In our performance evaluation of language detection libraries, we benchmarked LangDetect, FastText, and Lingua (in both standard and low-accuracy modes) across a range of input sizes to assess their computational efficiency. The results demonstrate that FastText exhibits the lowest average latency and most favorable scaling characteristics, with inference time increasing sublinearly with input length. Lingua in standard mode showed higher processing time than FastText but maintained predictable linear scaling. The low-accuracy variant of Lingua reduced latency modestly while preserving acceptable performance, making it a viable option for time-sensitive pipelines. In contrast, LangDetect incurred the highest latency and displayed greater variance in execution time, especially with longer inputs—likely due to its reliance on probabilistic models and less optimized internal logic. In the context of email processing, where throughput and low-latency inference are critical, FastText emerged as the most performant choice for language identification due to its minimal and consistent inference time. Lingua, on the other hand, offers a configurable trade-off between accuracy and performance, making it suitable for workflows where detection precision is more important than response time.

Figure 1: Average Inference Time vs. Input Size for Language Detection Libraries

Intent-Preserving Translation

Based on experimental evaluations, FastText emerged as the most effective model for identifying the language of incoming emails. To ensure accurate intent identification across multilingual inputs, our NACE platform implements a robust language normalization pipeline. The process starts by extracting text from the subject, body, and attachments of each email. The extracted content undergoes preprocessing to remove noise—such as punctuation, hyphens, special characters etc.. If the extracted text is short, it is combined with the subject before being passed to the language classification stage. Language detection is performed using FastText, shallow neural networks, and also by Naive Bayes models. If the detected language is English, the email proceeds directly to the NACE™ engine for semantic analysis and identification of embedded thematics—i.e., the intent of the email—and using it as a feature set for threat detection. Otherwise, an LLM is invoked with fine-tuned parameters to translate the email content into English while preserving its semantic and thematic context.

Figure 2.0 Multilingual Email Normalization and Language Conversion Pipeline

This translation step ensures the retention of the email’s purpose and meaning—i.e., its intent—enabling uniform downstream processing and significantly enhancing the platform’s ability to detect phishing, BEC, emails having malicious attachments, and other classes of email based threats across diverse languages.

Summary

The rise of AI tools like FraudGPT and WormGPT has significantly enhanced the capabilities of threat actors, enabling them to launch sophisticated phishing, BEC, and malicious attachment campaigns in multiple languages with ease. This dramatically expands the attack surface for organizations by making multi-language social engineering attacks more scalable and convincing. By focusing on intent as the core feature, NACE™ effectively addresses this challenge by extending its semantic understanding to multiple languages, ensuring robust threat prevention in an increasingly complex linguistic landscape.

View full post