Project Graph

PRG File Format Specification

  • MIME Type: application/vnd.project-graph
  • File Extension: .prg
  • Author: zty012 <z@2y.nz>
  • Version: 0.1
  • Status: Draft

1. Introduction

The PRG file format is a container-based format for storing diagrams created in the Project Graph application. It leverages the ubiquitous ZIP archive format to bundle a primary serialized graph description (stage.msgpack) with its associated binary attachments (images, documents, etc.).

The design goals of this format are:

  • Portability: To serve as a single, shareable file containing all project assets.
  • Interoperability: To be based on well-established standards (ZIP, MessagePack) for ease of implementation.
  • Extensibility: To allow for future evolution of the format while maintaining backward compatibility.

2. Overall Container Structure

A PRG file MUST be a valid ZIP archive. The structure within the ZIP filesystem is as follows:

stage.msgpack
<uuid>.<ext>
  • All paths within the ZIP archive MUST use the forward slash (/) as the directory separator.
  • All filenames MUST be encoded using UTF-8.

3. The Stage File (stage.msgpack)

3.1. Format & Encoding

The stage.msgpack file MUST exist at the root of the ZIP container. Its content MUST be a binary stream that is a valid MessagePack serialization of a Graphif Serializer serialized array.

3.2. Content Schema

The deserialized content of stage.msgpack MUST be an array. Each element in this array MUST be an object representing a Stage Object.

3.2.1. Stage Object

A Stage Object (an object) MUST contain the following entry:

  • uuid (Key): A string value representing the unique identifier for this stage.

The Stage Object MAY contain any number of other entries to define the graph's nodes, edges, properties, and references to attachments. The specific schema for these entries is defined by the Graphif Serializer specification.

Any implementation that encounters a Stage Object with an unknown structure SHOULD ignore the unrecognized entries and continue processing.

4. The Attachments Directory (attachments/)

4.1. Purpose

The attachments/ directory is an optional container for any binary or text files referenced by the stage(s), such as images, documents, or other media.

4.2. Naming Convention

Files within the attachments/ directory MUST be named using a Universally Unique Identifier (UUID) followed by a file extension that implies the file's MIME type.

Example:

  • attachments/b54c5f6c-6f28-4d65-bcc5-4c891c6dbd77.png
  • attachments/f8a3d2b1-4e5c-6789-0123-456789abcdef.pdf

4.3. Referencing Attachments

Stage Objects reference these files by their UUID filename (without the extension). For example, an ImageNode object would reference the attachment b54c5f6c-6f28-4d65-bcc5-4c891c6dbd77.png using the string "b54c5f6c-6f28-4d65-bcc5-4c891c6dbd77".

5. Security Considerations

Implementors and users of this format should be aware of several security-related aspects:

  1. ZIP Container Risks:

    • Compression Bombs: A PRG file may contain a small ZIP that decompresses to an extremely large amount of data, causing denial-of-service. Implementations MUST impose reasonable limits on the number of extracted files and the total uncompressed size.
    • Path Traversal: Maliciously crafted ZIP entries could have names like ../../../some_important_file. Implementations MUST NOT extract files to filesystem, instead, read them directly from the ZIP stream to memory or a controlled environment.
  2. Attachment Risks: The attachments directory can contain any file type. The application processing the PRG file is responsible for handling each attachment in a secure manner (e.g., run script files in sandbox, detect malware in attachments).

6. Future Considerations

This section outlines potential extensions to the format for future discussion and development.

6.1. Metadata (metadata.msgpack)

A future version may introduce a dedicated file at the root of the container (e.g., metadata.msgpack) to store information about the project itself, decoupling it from the stage data.

6.2. Versioning (versions/ directory)

A versions/ directory could be introduced to store historical snapshots of the stage.msgpack file, enabling built-in version control and audit trails. Each snapshot could be a copy of stage.msgpack named by a timestamp or commit hash.

6.3. Sub-Stages (sub/ directory)

To avoid the inefficiency of nested ZIP files (storing a .prg inside another .prg), a sub/ directory could store additional Stage files. This would allow for complex, multi-stage projects within a single container.

stage.msgpack
<uuid>.<ext>
stage.msgpack
stage.msgpack
stage.msgpack
stage.msgpack
stage.msgpack
stage.msgpack

Sub-stages can be nested to arbitrary depth, with each sub-stage having its own stage.msgpack file. References between stages MUST be done using the UUID of the sub-stage.

6.3.1 Representing Sub-Stages

Because UUID is unique, so use just one UUID to represent a sub-stage.

6.4. Workspace Settings (settings.msgpack)

Application-specific settings (e.g., default node styles, view preferences) could be stored in a root-level file like settings.json, making the PRG file a self-contained workspace.

6.5. Anchors

To represent a node or a region, we need to design an anchor syntax that can uniquely identify elements within the PRG file. The proposed syntax uses UUIDs to reference specific nodes or regions.

Node UUIDs MUST be separated by ;.

# wrapped for readability
file:///home/user/project.prg#
ec32d43d-7890-4e45-a28f-b31bca4dafea;
6c95db6b-b64e-49e8-a3b0-3a39ee2588c9;
49828d7f-7d05-48d9-bcb5-a699782f9880

This example references three nodes:

  • ec32d43d-7890-4e45-a28f-b31bca4dafea
  • 6c95db6b-b64e-49e8-a3b0-3a39ee2588c9
  • 49828d7f-7d05-48d9-bcb5-a699782f9880

6.6. Fractional indexing

Source: Realtime editing of ordered sequences | Figma Blog

Instead of OT, Figma uses a trick that’s often used to implement reordering on top of a database. Every object has a real number as an index and the order of the children for an element of the tree is determined by sorting all children by their index. To insert between two objects, just set the index for the new object to the average index of the two objects on either side. We use arbitrary-precision fractions instead of 64-bit doubles so that we can’t run out of precision after lots of edits.

We can also use numbers to identify nodes in a stage. This can make it easier to represent the order of nodes (z-index).