Skip to main content

File Format Specification

Version: 1.2.0 · Extension: .zt

Motivation

Most tensor formats equate "tensor" with "flat array of one dtype." This works for dense weights but breaks down for sparse matrices, quantized weight groups, or newer numeric types. These formats either can't express them or force you to flatten everything into separate arrays held together by naming conventions. See the introduction for a feature comparison across formats.

Design Goals

zTensor starts from a different premise: a tensor is a composite object, not a flat buffer. The format is built around three principles:

  • Simple. The file is a flat sequence of data blobs followed by a single metadata index. No nested containers, no back-patching, no code execution. A minimal reader is a few dozen lines.
  • Performant. Data blobs are 64-byte aligned for direct memory-mapping and SIMD access. Metadata is separated from bulk data, so a reader can enumerate every object's name, shape, and type without touching any data bytes.
  • Extensible. New object layouts, data types, and metadata fields can be added without breaking existing readers. The format uses optional fields and an open type system, not a fixed enum, so it evolves without version bumps.

Key Concepts

A .zt file contains named objects. An object is the format's abstraction for a tensor: instead of treating a tensor as a flat array of one dtype, an object is a composite with a shape, a format that describes its layout, and one or more components that hold the actual bytes.

A component is a single contiguous blob on disk. It has a storage type (dtype) that determines byte width, and an optional logical type (type) that says what those bytes mean. For a standard float32 object, dtype is "f32" and type is absent, so no interpretation is needed. For an FP8 object, dtype is "u8" (because FP8 is stored as raw bytes) while type is "f8_e4m3fn" (telling the reader how to decode them). This separation keeps the storage layer stable (readers always know how many bytes to read) while the logical layer can grow to cover new numeric formats without any changes to the container.

This design makes a dense object simple (one component named "data"), but also supports sparse matrices (separate values, indices, indptr components) and quantized weights (separate packed_weight, scales, zeros components) without special-casing in the container format. The object knows its own structure.

The rest of this document is organized from abstract to concrete: the manifest schema (what you write), the type system (how bytes are interpreted), the object formats (what layouts exist), and finally the binary layout (how it's arranged on disk).

1. Manifest Schema

The manifest is a CBOR-encoded map (RFC 7049) stored near the end of the file. Its location and how to find it are described in Binary Layout.

Root

{
"version": "1.2.0",
"attributes": {
"framework": "PyTorch",
"license": "Apache-2.0"
},
"objects": {
"layer1.weight": { "..." : "..." },
"layer1.bias": { "..." : "..." }
}
}
FieldTypeRequiredDescription
versionstringYesSpec version (e.g., "1.2.0").
attributesmapNoArbitrary key-value metadata for the whole file.
objectsmapYesNamed object definitions. Keys are object names (e.g., "layer1.weight").

Object

Each entry in objects describes one logical object.

FieldTypeDescription
shape[uint64]Dimensions (e.g., [1024, 768]).
formatstringLayout: dense, sparse_csr, sparse_coo, quantized_group, etc. See Object Formats.
attributesmapOptional per-object metadata.
componentsmapOne entry per data blob. Keys are role names (e.g., "data", "scales").

Component

A component points to a single contiguous blob on disk and describes how to interpret its bytes.

{
"dtype": "u8",
"type": "f8_e4m3fn",
"offset": 1024,
"length": 4096,
"encoding": "raw",
"digest": "sha256:8f4a..."
}
FieldTypeDefaultDescription
dtypestringrequiredStorage type, one of the 13 fixed primitives. Determines byte width.
typestringnullLogical type. When absent, the logical type equals dtype.
offsetuint64requiredAbsolute byte offset in the file. Must be a multiple of 64.
lengthuint64requiredBytes on disk. For "zstd" encoding, this is the compressed size.
uncompressed_lengthuint64nullOriginal size before compression. Required when encoding is "zstd".
encodingstring"raw""raw" or "zstd".
digeststringnullChecksum of the stored (possibly compressed) bytes. Format: "algorithm:hex" (e.g., "sha256:8f4a...").

2. Type System

Storage types (dtype)

A closed set of 13 hardware-native primitives. Every component must use one of these as its dtype. The dtype alone determines the byte width of each element.

CategoryTypesEncoding
Floatf64, f32, f16, bf16IEEE 754 / BFloat16
Signed integeri64, i32, i16, i8Two's complement
Unsigned integeru64, u32, u16, u8Unsigned
Booleanbool1 byte. 0x00 = false, 0x01 = true.

Logical types (type)

An open, extensible set that gives meaning to raw dtype bytes. When type is absent, the logical type is the same as dtype, and no extra interpretation is needed.

Simple types: one storage element per logical element (1:1):

typedtypeNotes
f8_e4m3fnu8NVIDIA / OCP FP8
f8_e5m2u8OCP FP8
f8_e4m3fnuzu8AMD FP8
f8_e5m2fnuzu8AMD FP8

Compound types: multiple storage elements per logical element:

typedtypeRatioNotes
complex64f322:1Interleaved [real, imag] pairs
complex128f642:1Interleaved [real, imag] pairs

Computing data size:

  • Simple: product(shape) * byte_size(dtype)
  • Compound: product(shape) * ratio * byte_size(dtype)

Readers that encounter an unrecognized type MAY fall back to loading raw dtype elements.

3. Object Formats

Each object's format field selects how its components are interpreted. All index components (indices, indptr, coords) across sparse formats MUST use dtype: "u64".

dense

Standard contiguous array in row-major (C-contiguous) order.

ComponentDescription
dataThe data elements.

Readers SHOULD memory-map data when encoding is "raw".

sparse_csr

Compressed Sparse Row.

ComponentDescription
valuesNon-zero elements.
indicesColumn index for each value.
indptrRow pointers (length = rows + 1).

sparse_coo

Coordinate list.

ComponentDescription
valuesNon-zero elements.
coordsCoordinate indices, stored as a flat array of length ndim * nnz in Structure-of-Arrays order: all row indices first, then all column indices, etc.

quantized_group

Block-wise quantization (e.g., GPTQ). Packed weights with separate scale and zero-point arrays.

ComponentDescription
packed_weightQuantized data (e.g., i32 packing 8 x 4-bit values).
scalesPer-group scaling factors.
zerosPer-group zero-points.

Quantization parameters (bits, group_size, packing) are stored in the object's attributes.

Example: 4-bit GPTQ, shape [4096, 4096]:

{
"shape": [4096, 4096],
"format": "quantized_group",
"attributes": {
"bits": 4,
"group_size": 128,
"packing": "8_per_i32"
},
"components": {
"packed_weight": { "dtype": "i32", "offset": 1024, "length": 8388608 },
"scales": { "dtype": "f16", "offset": 8389632, "length": 262144 },
"zeros": { "dtype": "f16", "offset": 8651776, "length": 262144 }
}
}

4. Binary Layout

The on-disk format is append-only: data blobs are written sequentially, then the manifest is appended at the end. This makes writing simple (no seeking back to patch headers) and keeps all metadata in one place.

File structure

+---------------------------------------+ <-- Offset 0
| Magic Header (8 bytes) | "ZTEN1000"
+---------------------------------------+
| |
| Component Blob A | <-- offset % 64 == 0
| |
+---------------------------------------+
| Zero Padding (0-63 bytes) |
+---------------------------------------+
| Component Blob B | <-- offset % 64 == 0
+---------------------------------------+
| ... |
+---------------------------------------+
| CBOR Manifest (variable length) |
+---------------------------------------+
| Manifest Size (8 bytes, uint64 LE) |
+---------------------------------------+
| Magic Footer (8 bytes) | "ZTEN1000"
+---------------------------------------+ <-- EOF

Byte order

All multi-byte values in a .zt file are Little-Endian: structural integers (manifest size), component data, and all dtype elements. Writers on big-endian hosts MUST byte-swap before writing.

The only exception is CBOR's own internal length prefixes, which are Big-Endian per RFC 7049. This is handled transparently by any CBOR library.

Alignment and padding

  • Every component blob starts at an offset divisible by 64. This enables direct memory-mapping and SIMD access without copying.
  • Gaps between blobs are filled with 0x00 bytes.
  • The magic footer repeats the header (ZTEN1000) so readers can detect truncated files.

5. Reading a File

Algorithm

  1. Seek to EOF - 16. Read 16 bytes.
  2. Verify the last 8 bytes are ZTEN1000. If not, the file is corrupt or not a .zt file.
  3. Decode the first 8 bytes as uint64 LE to get manifest_size.
  4. If manifest_size > 1 GB, abort (prevents denial-of-service via oversized manifests).
  5. Seek to EOF - 16 - manifest_size. Read manifest_size bytes.
  6. Decode the buffer as CBOR to get the manifest.
  7. To load a component:
    • Seek to component.offset.
    • Read component.length bytes.
    • If encoding is "zstd", decompress (using uncompressed_length to pre-allocate).
    • Interpret the resulting bytes as component.dtype elements.

Security rules

  • No code execution. Parsers MUST NOT evaluate or execute any data. No pickle, no eval.
  • Bounds checking. offset + length MUST NOT exceed the file size.
  • Decompression limits. When uncompressed_length is present, reject values exceeding a reasonable maximum before decompressing.
  • Padding bytes. Writers MUST set all padding to 0x00. Readers MAY ignore padding content.

Appendix: Version Policy

Minor version increments (e.g., 1.1 to 1.2) only add optional fields or new logical types. Readers MUST ignore unknown fields. Major version increments (e.g., 1.x to 2.x) may change the container structure or manifest schema.