Tags:

Phishing, Malware, Technical Leadership

Harnessing Language Models to Stop Evasive Malicious Email Attachments/URLs

Abhishek Singh, Co-Founder and CTO

Jul 15, 2024 3:38:13 PM

Introduction

The HP Q1 2024 Threat Report [2] highlights that 80% of malware is delivered via email, with 12% bypassing detection technologies to reach endpoints. The 2023 Verizon Data Breach Report also indicates that 35% of ransomware infections originated from email. The "inhospitality" malspam campaign leveraged email [3] as a delivery vector to target the hotel industry, resulting in stolen passwords leading to the compromised accounts. A recent [4] LockBit Black ransomware campaign leveraged 1,500 unique IP addresses to send emails with ZIP attachments containing executable code. It is evident that email remains a significant and persistent threat vector for cyber attacks.

The Problem

Two primary factors contribute to malicious attachments and URLs bypassing email technology to reach the endpoint:

The volume and cost challenges of detection technologies, such as sandbox scanning, lead to selective scanning and inadvertent bypasses.

Reliance on Malicious Payload: Detection technologies, including manual and auto-generated signatures, sandboxing, and machine learning models (such as decision trees and neural networks trained on malicious and benign samples), are all designed to identify threats based on the characteristics of the malicious payload.

However, evasive[1] multi-stage malware and techniques such as downloaders, droppers, employing sleep calls, debugger information, checking for version, execution environment, leveraging obfuscation, password protected archives, signed files etc., as well as phishing URLs using redirects, CAPTCHA, QR codes, obfuscated HTML pages, etc., all successfully hide malicious payloads when analyzed by the technologies above, resulting in evasion and reaching the endpoint.

Case Study: Reliance on Malicious Payloads

Explaining the reliance on a malicious payload with an example, Figure 1.0 shows the SHA256 hash of an email misclassified as benign. This email contained an obfuscated HTML attachment with a phishing URL that was accessed through redirection.

Figure 1.0 Virus Total score of email delivering malicious HTML attachment

The HTML attachment also had a detection score of zero in VirusTotal and was missed by 64 scanning technologies.

Figure 2.0 Virus Total score of Malicious HTML Attachment

The HTML attachment containing JavaScript code shown in figure 3.0 decrypts an AES-encrypted URL using a specified key and initialization vector (IV). It incorporates DOM manipulation to dynamically construct a URL based on encrypted data and a hidden base64-decoded email address embedded in the HTML. By decrypting the URL and retrieving the email address securely from the DOM, the script ensures sensitive information remains concealed. Upon page load, it seamlessly redirects the user to the dynamically generated URL, leveraging cryptographic techniques for secure data handling and user navigation.

<p id="uxo" style="display:none;">c2hlcnJpLnJ1c2hpbkBhcmhlYXJ0LmNvbQ==</p>

<script src="https://cdnjs.cloudflare.com/ajax/libs/crypto-js/4.1.1/crypto-js.min.js"></script>

<script>

</script>

Figure 3.0 Obfuscated HTML File having Redirected URL

The decrypted URL is hxxps[:]//297a902c.92be49460795772ebc6eb006.workers.dev, which is behind Cloudflare (a Content Delivery Network provider), and after redirection finally leads to hxxps[:]//hpjhq[.]com/?kgubvsqi, which has been classified as phishing by just one out of 93 detection technologies present in Virus Total.

Figure 4.0 Virus Total Score of Redirected URL

To summarize the exploitation delivered by an email, since only the landing URL contains a malicious payload, it was classified as malicious while the preceding stages were classified as benign.

Figure 5.0 Current State: Stage of delivering exploitation along with Verdicts for each of the objects

Generative AI for Evasion

Generative AI tools like FraudGPT and WormGPT can adapt, learn, and facilitate the crafting of payloads using evasion techniques. These never ending techniques include adding sleep calls, waiting for mouse clicks before execution, checking for debuggers, analysis environments, obfuscating to conceal features, utilizing redirects, captchas, and more. These methods aim to hide malicious payloads or behaviors during scans by detection technologies, resulting in misclassification as benign.

Introducing Neural Analysis and Correlation Engine (NACE)

To address the challenge of detecting evasive malicious attachments and URLs, we have specifically designed and implemented a patent-pending Neural Analysis and Correlation Engine (NACE). It does not require a malicious payload or behavior for its decision-making. For the attack discussed in case study, as shown in figure 6.0, NACE detected the email as malicious, blocking it at the inception, without needing the subsequent landing URL and its associated payload.

Figure 6.0 NACE detected the email as malicious, blocking it at the inception

NACE detects malicious attachments and URLs by understanding the semantics of the email, leveraging the learning around them as a feature instead of relying on the final malicious payload to determine which typically occurs in the later stages of attachment analysis. NACE employs a layered approach utilizing multiple current generative AI-based models, such as Meta - Llamar3 (leveraging transformer-based architecture), CLIP (Contrastive Language-Image Pre-Training) etc. to derive deeper meaning i.e. semantic and thematic structure embedded within the email's body, the text in the attachment, images in the email, and the subject. This learning around the semantics and thematic is used as a feature set to detect malicious attachments, URLs and identity based attacks.

Since NACE does not rely on the malicious payload, the detected malicious attachments can be Ransomware, Cryptominer, QR code, Backdoor, Data Miner, Keylogger, Downloader, Dropper, Launcher, Remote Access Trojan etc. Malicious call-to-action URLs can be Phishing URLs, hosting malware employing evasion techniques such as redirects, behind captcha, QR code, embedded in files, delivered as a brand impersonation attack etc. If malicious attachments and URLs are sent in east-west traffic the email account is marked as compromised. In our subsequent blogs, we plan to share the finer details of our NACE technology.

ComparisonOfMalwareDetPrev

Table 1.0 Comparison of Detection Technologies

When examining emails with NACE, we classified many as malicious. However, VirusTotal indicated that these emails were detected by fewer than 3 out of 64 detection technologies, suggesting they bypassed current detection systems. The appendix contains some of these hashes. NACE successfully detected evasive malware including HTML smuggling campaigns, Microsoft credential phishing scams, and MS Office remote template injection attacks. NACE has also detected APT attacks (shown in Figure 7.0) targeting a defense-related organization that traditional detection technologies missed.

Figure 7.0 APT Detected by NACE

Learning

Traditional technologies like signature-based detection, sandboxing, and machine learning/deep learning rely on examining malicious payloads, which often remain hidden during analysis due to evasion techniques. This allows multi-stage, evasive, AI-generated malicious attachments and call-to-action URLs to reach endpoints. Endpoint technologies (NGAV, EDR, XDR, MDR, Active Defense) focus on detection and remediation, with post-execution dwell time becoming critical in their approach. NACE aims to solve the problem of detecting and preventing current evasive malicious attachments and URLs, at the inception stage, before it reaches the endpoint. It does this by learning from the semantic and thematic structures embedded in emails, rather than relying solely on the malicious payload, which is the root cause of evasion of the current technologies.

Interested in learning more about NACE™? Our security experts are here to help you stop evasive threats, malicious email, and AI-powered attacks.

References:

[1] Abhishek Singh, Zheng Bu, "Hot Knives Through Butter: Evading File-based Sandboxes.", Black Hat 2013.https://media.blackhat.com/us-13/US-13-Singh-Hot-Knives-Through-Butter-Evading-File-based-Sandboxes-WP.pdf

[2] HP 2024, Q1 Threat Insights Report, HP Wolf Security Threat Insights Report Q3 2023

[3] “Inhospitality” malspam campaign targets hotel industry, https://news.sophos.com/en-us/2023/12/19/inhospitality-malspam-campaign-targets-hotel-industry/

[4] Botnet sent millions of emails in LockBit Black ransomware campaign

https://www.bleepingcomputer.com/news/security/botnet-sent-millions-of-emails-in-lockbit-black-ransomware-campaign/

Appendix

Examples of Malicious SHA256 detected by NACE

[Fewer than 3 out of 64 detection technologies in VT classified them as malicious]

4c11dc76cf3909c164870911d25ed528f075e7905b8f1a9359b0291ac495fe16
5a4c73a9ae37a46b141c840180c2730164f01c28b4c218cd62f501c107310ff7
60139f7b1692c70c761e3bde1e52a954d5e4411d6676cbf9f0d2d926edae0fda
02b4211c449f6ee34f6211d5eae7fb0ec573ab304a209354111e7d2ee3a23464
efd9496b19d13e5f56fe6356371e1f0caff2093cd2b30d5604cb8c3c8cfa4881
997111c5541b47d7fc94d4e9ea587a19fa29225103964297b1995128607890bf
a79ddc3e079195d5c76e82975c1214b3309f356a4a17e93c8ba29e0fc9d57f08
218492736e0cea9ab4a689b3ab15a4c2eac4017b9629f5c716e69a5f5162794e
1ebd24d84ef56395ffb2d55b9e6f138a6d4356e95c1bf43b87b37a9900101c1f

[Zero-Detection on VT, at the time of writing blog]

7a960d10bd82ff045010683f914494be0f2b39d1ec7f6d610c999304a76155e3
2613ed1a01711b3f67cc00c22f3a85775ae5fff036e079cf358882c9f5728ad4
753d42902c846ee3b8af5c0ca2a8d41c76acd68f9bc6b8bbefe3ba7bd5c35065
9db759050b19d10d031d0aff204d17b092cbbd867dbd789276f90b9a0e49d39f
4fb65a37b319e41a7ee942b30c9dd04074d74bed76c7fe7188b320b3442270e9
69a03ff4311b98a26c892f2b10fc22a077729d5aa9cf53a5996323db2f5e96b2
b876c5049004fbd7d8d83f81b5248723b1ba5d83cb697d0ec8295e355349bea3
193dfae1b388ddf98587814531b68853fdfa1af2b6a937c11368208a3729fbca
fcc4c06d8498918d1683d9f885e63bbcde396c89c6fcc5c317ed8cefc0a08793
fb41849965ae6cc086a246e311f92bdd6862e60e181441f3352003e431dc7526
e6a261a39600f8ed55b52080d4349006321a64f61b5d23a4c3a728ea9e92777e
d635da3a0e3613b18b66eee9905343188094beb3af774e63152dbc883eeca7cd
c22b9f4b7a5160639f49353bd74dd44a3952916480b3a42f7a1a686ca702727b
a0085d9c9fd5e014654ba3b68a6540247ea4ee49be7d7425d8e724153a04e6fe
983366cadd9d203454bede3ff2d5b0082c5e04e26b17e19a1bb058fb969faac2
96e2a2f43bcd67a9b6db9eb0ef2452b460a9d958c8427f091c61beff1d37d4f7
5a30049a33f3ddddaca69c36aa305569fef2da4138bdded372d9fa8918bfdf6d
4696af33dea35c6a7c60b267a14c6eca9f7ba75f6fa39d2c4a5a91da09c51f37
0732807f565edada40d740c9eef8e6daaa0faf106f91bfa18d8c942bd3122818
0496905c63c668c6b80001a07f2578c73684b7917541e52fd32b207aa5a8c360
b9a309c08336acbf290aab2554a02443685fdbb391d5a9b983fdd3a8dd5b994a
Bb664e769e1065ac3e0131ce9fee20e19cc8e9df0dad6c5fe6cc65c42bbff099
ee5dc654e7989cb5f3a1404a6ba844e7e0fe7b8ea6c2c64067f80bd676ae1345
27dc01e7ceca8602ab2aeb1858f22b8d04b260098d5ae6bef79859cdd6c0fa95
23abbde5fe52f0503db0b8eb2d5f7d58d0a4f2181f4593946b05fe268a3aaf60
dcacbaf55e9b4863ef8a34f47589b0ea179aa22f17fedbf5f1396eb8ef3c2b90

Post by Abhishek Singh, Co-Founder and CTO
Jul 15, 2024 3:38:13 PM