COMDATs, TCE and ICF

Quote from MS Knowledge base article Q151501:

The linker version 5.00.7022 that was shipped with Visual C++ 5.0 does two types of optimizations to decrease the image size and increase the program speed. Transitive COMDAT Elimination(TCE) and Identical COMDAT Folding(ICF). TCE removes unreferenced COMDATs, while ICF folds identical COMDATs into one copy. The linker option for TCE is /OPT:REF and the option for ICF is /OPT:ICF.

Whilst I particularly like the phrase "Transitive COMDAT Elimination", the term function level linking (as used in the Microsoft IDE), is probably the better description. Basically, functions are packaged within the OBJ file, and can then be selectively included by the linker.

Why use function level linking?

Running MASM against your assembly source produces object files, and the linker then combines these files into a single executable file. In general, the process the linker uses is to simply include the entire object file if any external symbol within that object file is referenced.

If every function within an object file is used for an application, this is the behaviour that we want. When reusing code, however, our object files often contain many functions that do not get referenced. For this reason, Microsoft added the /Gy switch to their compiler, ensuring redundant and identical code does not find its way into the final executable.

MASM support

There is, unfortunately, no support (that I could find) within MASM for function level linking. This is a shame, because MASM programmers are generally looking for small, tight code. Of course you could just create an OBJ file for each function and combine the lot into a static library, but this would completely destroy the modularity of your source files.

How Visual C++ does it

Playing around with the /Gy switch in Visual C++ and analysing the results with DUMPBIN revealed the following:

  • Each function is in its own COFF section, but each of these sections has the same name, so that the linker combines them into a single section within the executable.
  • Each COFF section header that contains a packaged function is flagged as "communal".
  • Each COFF section header has a matching symbol record in the COFF symbol table, and each of these has an auxiliary record with a flag specifying how the linker should treat the packaged function.
  • The first symbol in the symbol table that follows a section header symbol and is marked as being in that section is the COMDAT symbol, i.e. the section will only be included if this symbol is referenced.

Mimicking Visual C++ with MASM

Sections

The first problem is how to place each function in its own COFF section whilst keeping the name the same so that the linker will combine the sections. MASM can produce COFF sections using the SEGMENT directive. If we place each function in its own segment, each with its own name, we get the sections we are looking for. However each section ends up as its own section in the executable, meaning each function costs 4kb. If we give each segment the same name, however, we only get one section.

The solution is to borrow the mechanism that the linker uses to produce the import table. If you add a '$' sign and some characters to the end of the name used for a segment, the linker will strip it and everything that follows when it processes the name. The assembler on the other hand will treat everything as a different name. This also allows us to use the standard name ".text" for the code section.

The segment definition we are looking for, therefore, looks like this:

_TEXT$xx SEGMENT BYTE FLAT 'CODE'
myFunc PROC
...
ENDP
_TEXT$xx ENDS

Where xx is a unique identifier for each function to be included in the OBJ file.

Using this method, we get a long way towards the output that VisualC++ produces. The section headers are created, the section header symbols and auxiliary symbols are inserted in the symbol table and the first symbol in the section is the function name, as required. All we are short of now are the correct flags.

Flags

To enable the linker to work with the sections that we have created, we need to add some information to the object file. MASM is not capable of doing this, so I have written a utility to do it as a post assembly stage. This is available free of charge with all source code (i.e. its not supported!) by clicking the link below. Aside from the executable and the C++ source code, I've included an assembly project demonstrating the effects.

Download EnableCOMDATs.zip

Notes

Use of /Zi flag

The MASM /Zi flag includes debug information in the created object file. This debug information references everything within the assembly, and thus the linker will include everything in the final executable.