Rendering Optimization Strategies
A summary of all discussed optimization techniques for achieving high-performance rendering (e.g., 144fps with large design documents).
Transform & Geometry
-
Transform Cache
- Store
local_transformand derivedworld_transform. - Use dirty flags and top-down updates.
- Store
-
Geometry Cache
- Cache
local_bounds,world_bounds. - Used for culling, layout, and hit-testing.
- Cache
-
Flat Scene Graph + Parent Pointers
- Flat arena with parent/children relationships.
- Enables O(1) access and traversal.
Rendering Pipeline
-
GPU Acceleration (Skia Backend::GL/Vulkan)
- Use hardware compositing, filters, transforms.
-
Scene-Level Picture Caching
- Use
SkPictureto record full-scene vector draw ops. - Serves as the always-up-to-date canonical snapshot.
- Resolution-independent; ideal for rerendering or tile regeneration.
- Use
-
Tile-Based Raster Cache (Hybrid Rendering)
- Render the full viewport, take snapshot. debounced (after no more changes. e.g. 150ms)
- Divide the snapshot into fixed-size tiles (e.g., 512×512).
- When new area discovered, render the cached, non-overlapping parts with tile cache. only render newly discovered area.
- Repeat step 1.
- Optional padding per tile to account for effects (blur, shadows).
-
Dynamic Mode Switching (Picture vs Tile)
- Render from
SkPicturedirectly during normal zoom or active edits. - Fallback to raster tiles for zoomed-out or complex views.
- Tile invalidation/redraw is driven by zoom level, camera transform, or frame budget.
- Render from
-
Dirty & Re-Cache Strategy
- Nodes marked dirty will trigger re-recording of affected picture regions or tiles.
- Use change tracking to only re-record minimum needed areas.
- Recording large subtrees is expensive—optimize granularity based on tree structure.
-
Scene Cache Config / Strategy
-
Defines how scene caching is organized.
-
Properties include:
-
depth:0→ Entire scene is one cache.1→ Cache per top-level container.n→ Cache at depthn, chunking deeper layers.
-
mode:AlwaysPicture,Hybrid,AlwaysTile -
tile_size,tile_padding -
zoom_threshold_for_tiles -
frame_budget_threshold_ms -
use_bbh,enable_lod, etc.
-
-
Cache accessors like
get_picture_cache_by_id()support scoped re-rendering.
-
-
Will-Change Optimization
-
Nodes marked with "will-change" are expected to become dirty soon.
-
Examples:
- Image node waiting on async src resolution
- Text node waiting on font availability
-
Tree holders of such nodes are chunked for localized re-recording.
-
Prevents re-recording full subtrees—minimizes recording cost.
-
-
Flattened Render Command List
-
Scene is compiled into a flat list of
RenderCommandstructs with resolved:- Transform
- Clip bounds
- Opacity
- Z-order
-
Enables non-recursive rendering and independent layer recording.
-
Required for tiling at arbitrary depths and for caching subtrees.
Example:
Logical Tree:
Frame
└── Group
├── Rect1
├── Rect2
└── Rect3
Flattened:
[
RenderCommand { node_id: Rect1, transform: ..., clip: ..., z: ... },
RenderCommand { node_id: Rect2, transform: ..., clip: ..., z: ... },
RenderCommand { node_id: Rect3, transform: ..., clip: ..., z: ... },
]- Each command can be grouped and recorded separately into its own
SkPicture. - Nesting is preserved logically via sort order, but rendering is flat.
- This model is essential for dynamic caching, parallel planning, and GPU-aware scheduling.
-
-
Dirty-Region Culling
- Use camera’s
visible_rectto cullworld_bounds. - Optional: accelerate with quadtree or BVH.
- Use camera’s
-
Minimize Canvas State Changes
- Reuse transforms and paints.
- Precompute common values like DPI × Zoom × ViewMatrix.
- Tight Bounds for
save_layerOperations
- Always provide explicit bounds to
save_layercalls instead of using unbounded layers. - Unbounded
save_layercreates full-canvas offscreen buffers, which can be 100× or more larger than necessary. - Compute tight bounds that include all visual content:
- Base shape bounds (in local coordinates)
- Effect expansions (drop shadows: offset + spread + 3× blur radius)
- Transform to world coordinates before passing to
save_layer
- Critical for blend mode isolation: Blend modes require
save_layerfor isolation semantics. Using tight bounds reduces offscreen buffer size dramatically. - Coordinate space consistency: Ensure bounds are in the correct coordinate space (world space) when
save_layeris called, accounting for transforms applied before the layer. - Future optimization: Consider pre-computing blend mode isolation bounds in the geometry stage alongside render bounds for unified geometry computation.
- Text & Path Caching
- Cache laid-out paragraphs and SVG paths keyed by node ID.
- Each entry stores a hash of the text/style or path string and the current font repository generation.
- Caches are invalidated when fonts or the original data change.
- Hit testing reuses these paths for
path.containschecks.
- Render Pass Flattening
- Group nodes with same blend/composite states.
- Sort draw calls for fewer GPU flushes.
Zoom & Interaction Optimization
Scaling-specific optimization tricks for real-world authoring UX (zoom/pinch).
- Adaptive Interactive Resolution (LOD While Zooming)
- Maintain
interactive_scales ∈ (0, 1] - Effective render resolution:
effective_dpr = device_dpr * s - Drive
sby zoom velocity, not zoom level:- High
|d(zoom)/dt|→ lowers(faster) - Near-zero velocity → ramp
s→ 1 (sharpen)
- High
Practical heuristics:
- Hysteresis: Enter low-res when
v > V_hi, exit whenv < V_lo - Settle timer: After last zoom input, wait 80–150ms, then refine
- Smoothing: EMA on velocity to avoid flicker:
v_smooth = lerp(v_smooth, v_raw, α)(α≈0.2–0.4)
Suggested scale mapping (simple, stable):
s = clamp(1 / (1 + k * v_smooth), s_min, 1)- Typical:
s_min=0.25–0.5,ktuned to device
-
Two-Phase Rendering: "Fast Preview" Then "Refine"
During active zoom:
- Render coarse:
- Reuse cached tiles even if they're "wrong density" and just scale them
- Prefer nearest/linear sampling for speed
- Skip expensive effects (see below)
After settle:
- Render final:
- Regenerate tiles at full
effective_dprfor the current zoom - Do it progressively (center first)
- Regenerate tiles at full
- Progressive Refinement Ordering (Perceptual Priority)
When restoring full quality, update tiles in this order:
- Tiles near focal point (pinch center / cursor)
- Tiles in the visible viewport
- Tiles in a margin ring (prefetch)
Implementation detail:
- Compute
priority = distance(tile_center, focal_point) - Process in a min-heap / bucketed rings
- Crossfade to Hide "Pop"
To avoid harsh swaps from blurry→sharp:
- Keep old tile texture around for ~80–120ms
- Blend old → new with a short fade (or swap on vsync boundaries)
This is especially important for text and thin strokes.
- Effect LOD: Degrade Expensive Passes Only While Scaling
While zooming, selectively replace/skip heavy operations:
- Drop
saveLayer+ImageFilterchains (blur, shadows, backdrop) → cheap placeholder - Reduce blur sigma / shadow blur radius proportionally to
s - Replace complex paths with simplified geometry (optional)
- Clamp very thin strokes to a minimum pixel width to prevent shimmer
After settle: restore exact effects.
Key point: This is UX-acceptable because users perceive motion first, fidelity second.
- Stable Tile Identity: Cache in World-Space, Not Zoom-Space
Avoid "cache per zoom level" keys. Prefer:
- Tile key based on world coordinates at a reference density
- During zoom:
- Sample existing tiles (scaled)
- Only re-render when zoom exceeds your density budget
If GPU supports it, use mipmaps for zoom-out so tiles remain usable.
- Snap Bounds to Reduce Thrash
If zoom changes continuously, tiny float differences can cause re-raster storms.
- Snap tile bounds / layer bounds to integer pixels in device space
- Quantize
effective_dprto a small set (e.g.{0.5, 0.67, 0.75, 1.0})
This greatly increases cache hit rate during pinch.
- Temporal Throttling: Don't Re-Render on Every Wheel Tick
During active zoom:
- Rate-limit expensive re-raster to e.g. 30–60 Hz
- Still update transform (scale) every frame for responsiveness
- "Refine" pass runs only after settle
This keeps input feel crisp even on heavy documents.
- Text-First Refinement (Optional but Huge for Authoring Tools)
People notice text blur more than shape blur.
After settle (or even during slow zoom), prioritize:
- Glyph atlas/text tiles
- Selection/caret overlays at full resolution
Even if the scene is still refining, the editor feels "sharp".
- Interaction Overlays Rendered at Full-Res
Always render UI overlays separately at native resolution:
- Selection bounds, handles, guides, cursor, rulers
- Snapping hints, hover highlights
Even if content is temporarily low-res, the tool still feels precise.
Image Optimization
- LoD / Mipmapped Image Swapping
- Use lower-res versions of images at low zoom.
- Prevents high GPU bandwidth use at low visibility.
- ImageRepository with Transform-Aware Access
- Pick image resolution based on projected screen size.
- TODO: currently we select mipmap levels solely by the size of the drawing rectangle. This is a temporary strategy until a proper cache invalidation mechanism based on zoom is introduced.
Text & Glyph Optimization
-
Glyph Cache (Atlas or Paragraph Caching)
- Cache rasterized or vector glyphs used across the document.
- Prevents redundant layout or rendering of text.
- Essential for high-DPI or frequently zoomed views.
Engine-Level
- Precomputed World Transforms
- Avoid recalculating transforms per draw call.
- Essential for random-access rendering.
- Flat Table Architecture
- All node data (transforms, bounds, styles) stored in flat maps.
- Enables fast diffing, syncing, and concurrent access.
- Callback-Based Traversal with Fn/FnMut
- Owner controls child behavior via inlined, zero-cost closures.
- Scene Planner & Scheduler
- A dynamic system that builds the flat render list per frame.
- Reacts to scene changes, memory pressure, or frame budget changes.
- Drives the decision to re-record, cache, evict, or downgrade fidelity.
Optional Advanced
- Multithreaded Scene Update
- Parallelize transform/bounds resolution.
- CRDT-Ready Data Stores
- Flat table model enables future collaboration support.
- BVH or Quadtree Spatial Index
- Build dynamic index from
world_boundsfor fast spatial queries.
With Compromises
Practical, UX-safe tradeoffs that simplify implementation and improve performance, especially under load. These techniques sacrifice exactness for speed — but in ways users won’t notice.
-
Quantize Camera Transform
Instead of using fully continuous float precision for the camera position and zoom, round them to the nearest N units (e.g., 0.1 for position, 0.01 for zoom):
This list is designed to help evolve a renderer from minimal single-threaded mode to scalable, GPU-friendly real-time performance.