PRG File Format Specification
- MIME Type:
application/vnd.project-graph
- File Extension:
.prg
- Author: zty012 <z@2y.nz>
- Version: 0.1
- Status: Draft
1. Introduction
The PRG file format is a container-based format for storing diagrams created in the Project Graph application. It leverages the ubiquitous ZIP archive format to bundle a primary serialized graph description (stage.msgpack
) with its associated binary attachments (images, documents, etc.).
The design goals of this format are:
- Portability: To serve as a single, shareable file containing all project assets.
- Interoperability: To be based on well-established standards (ZIP, MessagePack) for ease of implementation.
- Extensibility: To allow for future evolution of the format while maintaining backward compatibility.
2. Overall Container Structure
A PRG file MUST be a valid ZIP archive. The structure within the ZIP filesystem is as follows:
- All paths within the ZIP archive MUST use the forward slash (
/
) as the directory separator. - All filenames MUST be encoded using UTF-8.
3. The Stage File (stage.msgpack
)
3.1. Format & Encoding
The stage.msgpack
file MUST exist at the root of the ZIP container. Its content MUST be a binary stream that is a valid MessagePack serialization of a Graphif Serializer serialized array.
3.2. Content Schema
The deserialized content of stage.msgpack
MUST be an array
. Each element in this array MUST be an object
representing a Stage Object.
3.2.1. Stage Object
A Stage Object (an object
) MUST contain the following entry:
uuid
(Key): Astring
value representing the unique identifier for this stage.
The Stage Object MAY contain any number of other entries to define the graph's nodes, edges, properties, and references to attachments. The specific schema for these entries is defined by the Graphif Serializer specification.
Any implementation that encounters a Stage Object with an unknown structure SHOULD ignore the unrecognized entries and continue processing.
4. The Attachments Directory (attachments/
)
4.1. Purpose
The attachments/
directory is an optional container for any binary or text files referenced by the stage(s), such as images, documents, or other media.
4.2. Naming Convention
Files within the attachments/
directory MUST be named using a Universally Unique Identifier (UUID) followed by a file extension that implies the file's MIME type.
Example:
attachments/b54c5f6c-6f28-4d65-bcc5-4c891c6dbd77.png
attachments/f8a3d2b1-4e5c-6789-0123-456789abcdef.pdf
4.3. Referencing Attachments
Stage Objects reference these files by their UUID filename (without the extension). For example, an ImageNode
object would reference the attachment b54c5f6c-6f28-4d65-bcc5-4c891c6dbd77.png
using the string "b54c5f6c-6f28-4d65-bcc5-4c891c6dbd77"
.
5. Security Considerations
Implementors and users of this format should be aware of several security-related aspects:
-
ZIP Container Risks:
- Compression Bombs: A PRG file may contain a small ZIP that decompresses to an extremely large amount of data, causing denial-of-service. Implementations MUST impose reasonable limits on the number of extracted files and the total uncompressed size.
- Path Traversal: Maliciously crafted ZIP entries could have names like
../../../some_important_file
. Implementations MUST NOT extract files to filesystem, instead, read them directly from the ZIP stream to memory or a controlled environment.
-
Attachment Risks: The attachments directory can contain any file type. The application processing the PRG file is responsible for handling each attachment in a secure manner (e.g., run script files in sandbox, detect malware in attachments).
6. Future Considerations
This section outlines potential extensions to the format for future discussion and development.
6.1. Metadata (metadata.msgpack
)
A future version may introduce a dedicated file at the root of the container (e.g., metadata.msgpack
) to store information about the project itself, decoupling it from the stage data.
6.2. Versioning (versions/
directory)
A versions/
directory could be introduced to store historical snapshots of the stage.msgpack
file, enabling built-in version control and audit trails. Each snapshot could be a copy of stage.msgpack
named by a timestamp or commit hash.
6.3. Sub-Stages (sub/
directory)
To avoid the inefficiency of nested ZIP files (storing a .prg
inside another .prg
), a sub/
directory could store additional Stage files. This would allow for complex, multi-stage projects within a single container.
Sub-stages can be nested to arbitrary depth, with each sub-stage having its own stage.msgpack
file. References between stages MUST be done using the UUID of the sub-stage.
6.3.1 Representing Sub-Stages
Because UUID is unique, so use just one UUID to represent a sub-stage.
6.4. Workspace Settings (settings.msgpack
)
Application-specific settings (e.g., default node styles, view preferences) could be stored in a root-level file like settings.json
, making the PRG file a self-contained workspace.
6.5. Anchors
To represent a node or a region, we need to design an anchor syntax that can uniquely identify elements within the PRG file. The proposed syntax uses UUIDs to reference specific nodes or regions.
Node UUIDs MUST be separated by ;
.
# wrapped for readability
file:///home/user/project.prg#
ec32d43d-7890-4e45-a28f-b31bca4dafea;
6c95db6b-b64e-49e8-a3b0-3a39ee2588c9;
49828d7f-7d05-48d9-bcb5-a699782f9880
This example references three nodes:
ec32d43d-7890-4e45-a28f-b31bca4dafea
6c95db6b-b64e-49e8-a3b0-3a39ee2588c9
49828d7f-7d05-48d9-bcb5-a699782f9880
6.6. Fractional indexing
Source: Realtime editing of ordered sequences | Figma Blog
Instead of OT, Figma uses a trick that’s often used to implement reordering on top of a database. Every object has a real number as an index and the order of the children for an element of the tree is determined by sorting all children by their index. To insert between two objects, just set the index for the new object to the average index of the two objects on either side. We use arbitrary-precision fractions instead of 64-bit doubles so that we can’t run out of precision after lots of edits.
We can also use numbers to identify nodes in a stage. This can make it easier to represent the order of nodes (z-index).