Detecting Malware in Network Traffic From TLS Fingerprints
2021-04-06
Introduction
An increasing percentage of malware use encrypted traffic to communicate with their command and control server. Hence it becomes difficult to detect malware from the payload when the malware is communicating via a TLS connection. When a client application establishes connection with it's server, the client application does so by performing TLS handshakes with the server. This communication is not encrypted and it can help in detecting malware since it provides us with a set of observable features.
To intiate communication via TLS, the client application first sends a client hello packet for which the server responds with a server hello packet. In a client hello packet, the client sends the supported TLS versions, an ordered list of offered ciphersuites, compression methods and additional parameters like supported elliptic curves and signature algorithms. The fingerprint of a client hello packet is constructed by concatenating the above mentioned field values to form a string seperated by a delimitter and computing the strings SHA-512 hash value. The server responds to the client hello with a server hello packet containing the following fields of interest for fingerprinting: the selected TLS protocol version, the selected cipher suite and compression method and a list of extensions. The server chooses the ciphersuite of highest preference which it can satisfy.
Detecting malware from TLS fingerprints
The following are few methods which can be utilized for detecting malware from TLS Fingerprint string.
Using a fingerprint database to detect client application
Every client application has a TLS fingerprint associated with it. A fingerprint database of known fingerprint and the associated process can be collected to detect the process associated with the fingerprint. A connection is suspicious when an unknown fingerprint is encountered. Cisco provides an open database of the fingerprints and the process as part of their tool mercury.
Benign applications (like traffic from an enterprise network) update their application regulary and use the latest security features and protocols. For example, most of the benign applications use the latest version of Firefox browser but malwares use older version of Firefox browser. Malwares make advantage of an bug or security loophole in the older version of the application. A fingerprint associated with an older application process can signal a presence of malicious activity.
Classifier based approach
Malware applications sometimes use older version of security protocols and the ciphersuites offered by them differ from the ciphersuites offered by benign applications. Similarly, the encryption algorithm, the advertised TLS extension can help in building feature set for a classifier model based on machine learning techniques like logistic regression or random forest to detect malware from the client hello packet. Such a technique has been used in Anderson et al, 2018..
Drawbacks
A significant drawback of using the fingerprint based approach is the high number of false positives it generates. A fingerprint can match to more than one process and in that case, it will be difficult to detect the process which has generated the fingerprint. For example, Abuse.ch reports a set of fingerprints which belongs to malwares but also cautions that these fingerprints might also match benign process.
While each browser, TLS library or a product produces a unique client hello fingerprint, the parameters used by the application is not statically defined. For example, browsers alter their behavior based on hardware support, operating system or user preferences. Browsers also allow users to disable cipher suites. Some browsers like Chrome changes their behavior depending on the hardware support. For example, chrome prioritizes ChaCha20 cipher on devices which lacks AES hardware acceleration. These optimizations can result in several valid ciphers and extensions for the same version of an application making them difficult to fingerprint.
In the paper Anderson et al, 2020, the knowledge of processes which produce a fingerprint and destination context in which the fingerprint is observed is used to find the process generating the fingerprint using a naive bayes classifier. A caveat here is that the detection approach depends on the fingerprint database. It involves regular updation of fingerprint database to detect process accurately which may not be possible in all network settings.
One may wonder why client fingerprints are used rather than the server fingerprints. The use of server fingerprints also suffers from the problem of falsepositives. Moreover, a server application may choose a random ciphersuite instead of the highest preference ciphersuite offered by the client. By this way, a malicious server can evade being detected.
References
- TLS fingerprinting with JA3 and JA3S
- This site has a good overview of an TLS connection.
- Anderson, B., Paul, S., and McGrew, D. (2018). Deciphering malware’s use of TLS (without decryption). Journal of Computer Virology and Hacking Techniques, 14(3), 195-211
- Anderson, Blake, and David McGrew. "Accurate TLS Fingerprinting using Destination Context and Knowledge Bases." arXiv preprint arXiv:2009.01939(2020)