How does the Trellix virus scanning engine work?

The Trellix virus-scanning engine is a complex data analyzer. The exact process of analysis depends on the object (often a file) being scanned and the type of viruses being sought. However, the following stages describe the general approach that the virus-scanning engine uses.

Identifying the type of the object

This stage determines which type of object is being scanned. Files that contain executable code, for example, need to be scanned.

Different types of files in Microsoft Windows systems, for example, are distinguished by their file extensions, such as .EXE and .TXT. However, any file can be renamed to hide its true identity, so the contents of the file must first be determined.

Each type of object requires its own special processing. If the type cannot be infected with a virus, no further scanning needs to be done. For example, a picture stored in a file of bitmap format cannot be infected.

Decoding the object

This stage decodes the contents of the object, so that the virus scanner "understands" what it is looking at. For example, a compressed Zip file cannot be interpreted until it has been expanded back to its original contents. The same applies to non-compressed files too. For example, the engine must decode a Microsoft Word document (.DOC) file to find any macro viruses.

File decoding can become quite complex when a file contains further encoded files. For example, a Zip archive file might contain a mixture of other archives and document files. After the engine decodes the original Zip file, the engine must also decode and separately scan the files inside.

Looking for the virus

This complex stage of virus scanning is controlled by the virus definition (DAT) files. The scan.dat file contains thousands of different drivers. Each driver has detailed instructions on how to find a particular virus or type of virus.

The engine can find a simple virus by starting from a known place in the file, then searching for its virus signature. Often, the engine needs to search only a small part of a file to determine that the file is free from viruses.

A virus signature is a sequence of characters that uniquely identify the virus, such as a message that the virus may display on the screen, or a fragment of computer code. Care is taken when choosing these signatures to avoid falsely detecting viruses inside clean files. More complex viruses avoid detection with simple signature scanning by using two popular techniques:

Encryption — The data inside the virus is encrypted so that antivirus scanners cannot see the messages or computer code of the virus. When the virus is activated, it converts itself into a working version, then executes.

Polymorphism — This process is similar to encryption, except that when the virus replicates itself, it changes its appearance.

Using heuristic analysis

Using only virus signatures, the engine cannot detect a new virus because its signature is not yet known. Therefore the engine can use an additional technique for heuristic analysis.

Programs, documents, or email messages that carry a virus often have distinctive features. They might attempt unprompted modification of files, invoke mail clients, or use other means to replicate themselves. The engine analyzes the program code to detect these kinds of computer instructions. The engine also searches for "legitimate" non-virus-like behavior, such as prompting the user before taking action, and thereby avoids raising false alarms.

By using these techniques, the engine can detect many new viruses.

Calculating the checksum

This stage exactly identifies the virus. The engine performs a mathematical calculation over the virus data to produce a unique number for the checksum. The engine compares this checksum against previously calculated values in one of the DAT files (scan.dat) to identify the virus exactly.

Cleaning

This stage cleans the object. Usually, the engine can clean an infected file satisfactorily. However, some viruses can alter or destroy data to an extent where a file cannot be fixed. The engine can easily clean macro viruses by erasing the macro from the infected document.

Executable viruses are more complex. The engine must restore the original path of execution through the program so that the virus does not become active. For example, a virus might append itself to the end of an executable program file. To run, the virus must divert the path of execution away from the original code to itself. After becoming active, the virus redirects the path of execution to the original code to avoid suspicion. The engine can disable this virus by removing the diversion to the virus code. To clean the file, the engine then erases the virus code.