Opera File Formats

Introduction

This document describes the new binary file formats introduced with Opera version 5 for the various files used in the cache and cookie management.

The new generic format that these files are based on is structured as a sequence of tagged records with a given length. Each record may contain a number of different data types such as strings and integers, as well as arbitrary binary data, such as new records in this format.

The generic format is NOT backwards compatible with Opera 3.x and earlier, but is intended to be reasonably backwards compatible for version 4.0 and later. This means that if new fields are added by a future version, older versions will still be able to read the information that they understand, while ignoring the fields they do not understand.

There are some limits, but they are mostly concerned with the number of significant bits in the integers that are used to indicate record lengths. The formats are presently used to store the disk cache index file (dcache4.url), the visited links file (vlink4.dat), the download rescue file (download.dat) and the cookie file (cookies4.dat). The formats are described in the following sequence:

The formats for the windows history, news files and global history files are the same as the ones used by Opera v3.x and are outside the scope of this document.

The Generic Binary Format

Data types

Integers used in the format are unsigned, and stored in big-endian/network style (Most Significant Byte first). Integers stored inside the records are also stored in the big-endian format, but may be signed, and may be truncated.

The following datatypes are used in this document:

int*Signed integer of * bits
uint*Unsigned integer of * bits
byte8 bit unsigned value
stringSequence of characters (not null terminated)
time_tuint32 representing a time value in seconds since 00:00 Jan 1, 1970 GMT. The representation may be increased past 32 bit in the future
tag_id_typeUnsigned integer whose size is selected by the idtag_length header field. The application must convert from this type to its internal unsigned integer representation, preferably uint32. For more information, see the file header format.
payload_length_typeUnsigned integer whose size is selected by the length_length header field. The application must convert from this type to its internal unsigned integer representation, preferably uint32. For more information, see the file header format.
recordSeparately defined sequence of fields

Data records

The general record format has this form:

struct record
{
// application specific tag to identify content type
tag_id_type tag_id;
// length of payload
payload_length_type length;
// Payload/content of the record
bytepayload[length];
};

NOTE: The number of bytes in tag and length may change, see below.

The fields of each record have the following meanings:

tag_id

The identifier of the record. This value is application specific, and can be used to indicate the meaning of the payload content.

The actual content type of the record depends on the definitions used for the actual file or super-record.

Tag_id values in which the MSB (Most Significant Bit) is set to 1, are reserved for records with implicit no length. The tag_id field is NOT followed by a length field, nor a payload buffer. Such records are used as Boolean flags: True if present, False if not present.

In the binary storage of a file this means that the MSB of the internal storage integer must be stored as the MSB of the first byte in the tag field. This places a limit on how many tags can be used for a given tag_id integer length. When a file is read into a program, the program must take care to move the MSB of the binary stored tag to a common (internal) bit position, such as the MSB of the program's own unsigned integers.

bytesMax id available (excluding MSB)
10x7f
20x7fff
30x7fffff
40x7fffffff

While it is technically possible to use the same tag id (without the MSB) for a normal record and a flag record (such as 0x0001 [16 bit tag] for the payload record and 0x8001 for the flag record), this is not encouraged.

length
This field is the number of bytes in the payload that immediately follow the field. It may be zero.
payload

The payload is a sequence of bytes of the length indicated by the length field.

The meaning of the contents is indicated by the definition for the given record or file structure. Examples of organization may be an array of records, unsigned integers, signed integers, or characters.

It is recommended that only records of the types described here are used if the type of the data varies, as variable (un-tagged) type formats tend to be inflexible and difficult to maintain across versions, especially when compatibility with older versions is desired.

Single item integers (signed or unsigned) may be truncated (zero bytes removed), but arrays of integers must always use a fixed number of bytes to represent values and derive the number of items from the payload length. If the number of bytes needed to represent the values changes in a future version a new tag should be used.

File Format

These elements are not stored as records but directly in binary:

uint32 file_version_number;
uint32 app_version_number;
// number of bytes in the id tag, presently 1
uint16 idtag_length;   
// number of bytes in the length part of a record, presently 2
uint16 length_length;  
// array of records, number determined by length of file
struct record items[]; 

The present version number of the file format (file_version_number) is 0x00001000, where the lower 12 bits (bitmask 0x00000fff) represent the minor version number, the rest is the major version number. Changes in the minor version must not be used if the file format is changed in such a manner that older versions of the software cannot read the file successfully. If the major version number is newer (or older) than the application can read, it must not read the file.

The integer sizes are absolute for a given major version, and the integer size for the file version number is fixed in any version.

The "app_version_number" is the version number of the application and is independent of the file_version_number. It may be used by the application to determine necessary actions needed to provide forward or backward compatibility that is outside the scope of the file formats. The interpretation of the application version number is application dependent.

The "idtag_length" and "length_length" fields gives the number of bytes used in the records for the idtags, as defined by the tag_id_type, and the payload length fields, as defined by payload_length_type, respectively.

Specifically, the values of these fields define tag_id_type and payload_length_type as the following integer types:

Valuetag_id_typepayload_length_type
inidtag_lengthlength_length
1uint8uint8
2uint16uint16
3uint24uint24
4uint32uint32

The application's internal representation of these types is not defined, but uint32 is recommended. How an application should handle "idtag_length" or "length_length" values larger than 4, or values larger than its internal unsigned integer size, is not defined, but the application should implement the rules specified in the forward compatibility guide for such situations.

Present versions of Opera 4.x uses idtag_length=1 (uint8) and length_length=2 (uint16).

After the header, only records follow. The organization of the records and their interpretation is application specific.

Forward Compatibility

An older version of an application using this file format that is NOT able to use long integers should, regardless of this, try to process the file, but should bypass the record if the tag of the record's numerical value exceeds the version's own integer range, i.e. the integer overflows. However, if the the length of the record exceeds the application's limits on integers or buffer capabilities, it must not continue to process the file.

All applications must ignore tag values that they do not understand.

The Cache File Formats

This section details the record tags and formats used for the visited link file (vlink4.dat), the disk cache index file (dcache4.url) and the download rescue file (download.dat). The present app_version_number of these files is 0x00020000 (major version 2, minor 0).

These files use records (of different tag values) which contain a sequence of records with tags from the same set of tag ids. The different files use these tags for their records:

FileTag idVersion number
Disk cache0x010x00020000
Visited Links0x020x00020000
Download0x410x00020000

Each file consists of records of ONLY this type, with the exception of the Disk cache index file, which also contains a single record with the id 0x40, which contains a 5 character string used to find the next free cache file number (oprXXXXX).

Each record is again a sequence of records with the same binary representation format as the records in the file.

Common Elements Between All Files

These elements are used by all of the cache related files. In the case of the visited links, these are the only fields presently used.

NOTE: "(0x0001 | MSB_VALUE)" means that the most significant bit in the local unsigned integer is to be set. If 32 bit values are used, that means the tag's value is 0x80000001.

Tag IDContentsMeaning
0x0003stringThe name of the URL, fully qualified
0x0004time_tLast visited
(0x000b | MSB_VALUE)flagThe URL is a result of a form query
0x0022recordContains the name and last visited time of relative link in the document. May repeat

Content tags of relative link (tag 0x0022) records

Tag IDContentsMeaning
0x0023stringThe name of the relative link
0x0024time_tLast visited

Fields Used by Disk Cache and Download Rescue Files

Tag IDContentsMeaning
0x0005time_tLocaltime, when the file was last loaded, not GMT
0x0007uint8

Status of load:

2
Loaded
4
Loading aborted
5
Loading failed
0x0008uint32Content size
0x0009stringMIME type of content
0x000astringCharacter set of content
(0x000c | MSB_VALUE)flagFile is downloaded and stored locally on user's disk, and is not part of the disk cache directory
0x000dstringName of file (cache files: only local to cache directory)
(0x000f | MSB_VALUE)flagAlways check if modified
0x0010recordContains the HTTP protocol specific information

Fields used only by the download rescuefile

Tag IDContentsMeaning
0x0028time_tIdentifies the time when the loading of the last/previous segment of the downloaded file started.
0x0029time_tIdentifies the time when the loading of the last/previous segment of the downloaded file was stopped.
0x002Auint32How many bytes were in the previous segement of the file being downloaded. If the time the loading ended is not known, this value will be assumed to be zero (0) and the download speed set to zero (unknown).

Fields used in the HTTP protocol specific record

All methods are by default GET, at present it is not possible to cache POST requests.

Tag IDContentsMeaning
0x0015stringHTTP date header
0x0016time_tExpiry date
0x0017stringLast modified date
0x0018stringMIME type of document
0x0019stringEntity tag
0x001AstringMoved to URL (Location header)
0x001BstringResponse line text
0x001Cuint32Response code
0x001DstringRefresh URL
0x001Euint32Refresh delta time
0x001FstringSuggested file name
0x0020stringContent Encodings
0x0021stringContent Location
0x0025uint32

Together with tag 0x0026 (both must be present) this identifies the User Agent string last used to load the resource. This value identifies the User Agent string.

This value is used internally, and should not be modified.

0x0026uint32

Together with tag 0x0025 (both must be present) this identifies the User Agent string last used to load the resource. This value identifies the User Agent sub version.

This value is used internally, and should not be modified.

(0x0030 | MSB_VALUE)flagReserved for future use
(0x0031 | MSB_VALUE)flagReserved for future use

Cookie File format

This section describes the record tags and formats used for the storage of cookies (cookies4.dat). The present app_version_number of this file type is 0x00002000 (major version 2, minor 0).

The cookie file is organized as a tree of domain name components, each component then holds a tree of path components and each path component may contain a number of cookies.

NOTE: The components are a sequence of records, teminated with a flag record, not a single record.

Structure

Domain components

The domain components are used to organize the cookies for each server and domain for which cookies or cookie filtering capabilities are defined.

A domain component is started with a domain record, which holds the domain name and some flags for that particular domain. It is then followed by a path component holding the cookies and subdirectory path components (and cookies), followed with a path component terminator and any number of subdomain components before it is terminated by a domain-end flag record.

E.g: cookies for the domain www.opera.com will be stored in this manner:

["com" record]
  ["opera" record]
["www" record
  [cookies]
  [Path components]
  [Path component terminator]
  [other domains]
[end of domain flag ("www")]
  [end of domain flag ("opera")]
[end of domain flag ("com")]

All names of domain components are non-dotted, except IP addresses, which can only be stored with the complete IP address as a Quad dotted string, e.g. "10.11.12.13", are stored at the top level, and cannot contain any subdomains.

A Domain Record uses the tag "0x01" and contains a sequence of these fields:

Tag IDContentsMeaning
0x001EstringThe name of the domain part
0x001Fint8

How cookies are filtered for this domain. If not present, the filtering of the parent domain is used.

  1. All cookies from this domain are accepted.
  2. No cookies from this domain are accepted.
  3. All cookies from this server are accepted. Overrides 1 and 2 for higher level domains automatics.
  4. No cookies from this server are accepted. Overrides 1 and 2 for higher level domains.

Domain settings apply to all subdomains, except those with a server specific selection.

0x0021int8 Handling of cookies that have explicit paths which do not match the URL setting the cookies. If enabled in the privacy preferences the default is to warn the user, but when warning is enabled such cookies can be filtered by their domains: Value 1 indicates reject, and 2 is accept automatically.
0x0025int8

While in the "Warn about third party cookies" mode, this field can be used to automatically filter such cookies.

  1. All third party cookies from this domain are accepted.
  2. No third party cookies from this domain are accepted.
  3. All third party cookies from this server are accepted. Overrides 1 and 2 for higher level domains automatics.
  4. No third party cookies from this server are accepted. Overrides 1 and 2 for higher level domains.

Domain settings apply to all subdomains, except those with a server specific selection.

This record can be followed by zero or more path components defining toplevel paths on servers in the domain and always terminated by a path component terminator record. Then zero or more domain components may follow.

A domain component is terminated by a (0x0004 | MSB_VALUE) flag record.

Path Components

The path components organize the cookies defined for a given directory in a given domain, as well any subdirectories of this directory that have cookies defined.

Except for the path component starting immediately after the domain component record, each path component always starts with a path record, and is then followed by any number of cookie records and subdirectory path components.

The path record uses the record id "0x0002" and the record has this field record:

Tag IDContentsMeaning
0x001DstringThe name of the path part

The path component terminator is the (0x0005 | MSB_VALUE) flag record.

Cookie Records

The cookie entries are stored in records of type "0x0003" and have the following field records:

Tag IDContentsMeaning
0x0010stringThe name of the cookie
0x0011stringThe value of the cookie
0x0012time_tExpiry date
0x0013time_tLast used
0x0014stringComment/Description of use (RFC 2965)
0x0015stringURL for Comment/Description of use (RFC 2965)
0x0016stringThe domain received with version=1 cookies (RFC 2965)
0x0017stringThe path received with version=1 cookies (RFC 2965)
0x0018stringThe port limitations received with version=1 cookies (RFC 2965)
(0x0019 | MSB_VALUE)flagThe cookie will only be sent to HTTPS servers.
0x001Aint8+Version number of cookie (RFC 2965)
(0x001B | MSB_VALUE)flagThis cookie will only be sent to the server that sent it.
(0x001C | MSB_VALUE)flagReserved for delete protection: Not yet implemented
(0x0020 | MSB_VALUE)flagThis cookie will not be sent if the path is only a prefix of the URL. If the path is /foo, /foo/bar will match but not /foobar.
(0x0022 | MSB_VALUE)flagIf true, this cookie was set as the result of a password login form, or by a URL that was retrieved using a cookie that can be tracked back to such a cookie.
(0x0023 | MSB_VALUE)flagIf true, this cookie was set as the result of a HTTP authentication login, or by a URL that was retrieved using a cookie that can be tracked back to such a cookie.
(0x0024 | MSB_VALUE)flag

In "Display Third party cookies" mode this flag will be set if the cookie was set by a third party server, and only these cookies will be sent if the URL is a third party. Cookies that were received when loading a URL from the server directly will not be sent to third party URLs in this mode. The reverse is NOT true.

NOTE: If a third party server redirects back to the first party server, the redirected URL is considered third party.