I wanted a simple but reliable solution for malware scanning untrusted files that are downloaded to my NAS / Homelab.
Important Context
A few years ago I was building a file ingress solution for a client.
This solution involved 3rd parties uploading files to this web platform and the platform needed to pass the files through a security scanner prior to being processed by downstream systems.
As part of my testing of the file ingress system, the product chosen (not my choice) to perform the scans seemed to always give the “ALL CLEAR” to larger files (100+ MB).
After some poking and prodding, I found it was simply ignoring any file over a certain limit.
It didn’t crash, it didn’t give an error message, it didn’t give a “file too large” warning. It would claim there was no virus, without having even checked!
In this situation, it was not possible to control the size of the files being ingested, and auditing requirements were not compatible with software that so blatantly lied.
So I got on a call with one of the vendor’s engineers who said to my face, that files over 2 GB (the max limit the scanner could be increased to) couldn’t contain viruses.
If this sounds absurd to you, dear reader, that’s because it is. Malware can come in all shapes and sizes, even hiding in files more than a mere 2 GB.
But alas, this lead me to the realization that hardly any Antivirus solution (the ones that scan files for malware signatures at least) scan above that, no matter what you configure the limit to be. It’s a hardcoded maximum limit.
If this information - that scanners ignore files over a certain size - should leak to malware developers, I fear all hope is lost…
In all seriousness, I do understand the reasoning here.
File scanners work by comparing the contents of a file against a database of strings, ‘signatures’ that indicate the presence of known malware. This signature database is huge, and the larger the file, the slower the scan and the more memory it takes to perform.
What I don’t get is why can’t I increase the limit, speed be damned. If I need a file scanned, then I want it scanned.I also get the argument that malware is less typically found in sizable files. it would be desirable to the malware developer to have something that can be downloaded quickly and reliably, on machines with both lots and little free space.
But again, a ‘preference’ is not a certainty.Also, I get that signature scanners might not be the best solution, and heuristics-based might be more appropriate, but that’s besides the point.
Problem Statement
As my Homelab / NAS (Network Attached Storage) grew and matured, I found myself in need of a similar solution. I had files coming in that I didn’t nessecerally trusted the contents and origins of, but before I started using them I wanted to do some security related checking.
Problem is: I’m cheap, and don’t want to shell out $$ for proper endpoint protection software.
Thus, I needed a way to scan files for malware upon ingestion, and notify on the result.
The solution needs to reliably detect new files as they come in, scan the file or give a warning that it couldn’t (e.g. too large), and perhaps some other metadata.
Solution
What I ended up with is a custom golang commandline application, creativly named av-scanner.
It has 2 modes of operation:
- Sweeping: Go through all the files in a given directory and scan them.
- Watching (experimental): Watch the filesystem for new files to be created, and then scan that particular file.
It would be inefficient to contantly sweep and rescan all files, so it actually utilises a sqlite DB to record all the files it’s previously scanned, with some smarts to determine the need to rescan or not.
The scanner collects various bits of metadata to accompany a scan result (Using clamav), including the size and a determined MIME type (detected using the magic number via the file comand. The file extension can’t be trusted).
Once it’s collected all this information and recorded it to the database, a notification is sent out (using ntfy).
Get it for yourself
Now that this project is working reasonably well for me, I’m happy to open-source it.
It’s free for anyone to use, contructivly criticise, etc.
You can find the scanner, along with installation and usage instructions here.
Support and Contributions
You can open an Issue on GitHub but this tool is provided with NO warantees or guarantees.
I make no promisies to help you if things go wrong.
However, if you have ideas that can improve this tool please provide a PR and I’ll be happy to take a look
Side Notes
This solution is actually my second attempt at this. I previously had a hacky bash script that did most of the same things but was a little unreliable.
I wanted to improve parts of it and thought that a proper programming language using a CLI framework would be a better option.
This old version didn’t use a database, and instead relied on saving a special log file and using a base64 encoding of the filename as an indentifier.
I also had a lot of trouble getting the watch mode (with inotifywait) to trigger reliably, and thus spent more developement effort in the sweep mode.
The old (legacy) shell script is saved in the GitHub Repo as a historical artifact.
