Stable-Layers: Fine-Tuning Image Layer Decomposition with VLM-Scored RL

Abstract

We present Stable-Layers, a reinforcement learning framework that eliminates the need for paired supervision by fine-tuning a pretrained layer decomposition model using only feedback from a vision-language model (VLM). Starting from Qwen-Image-Layered, we apply Flow-GRPO with LoRA adaptation, sampling multiple candidate decompositions per image, scoring them with a VLM, and optimising the policy from group-relative advantages.

The key challenge lies in designing a reliable reward signal: VLMs scoring samples in isolation tend to compress their judgements into a narrow band, leaving GRPO with little within-group variance to learn from. We address this with a two-stage evaluation pipeline that pairs structured per-sample scoring across five edit-centric criteria with a grid-based calibration step in which the VLM re-scores all candidates side-by-side. Trained entirely on unlabelled images, Stable-Layers produces decompositions with stronger layer separation, fewer blank or artifact-heavy layers, and lower per-layer reconstruction error compared to the base model.

Qualitative Comparison

Side-by-side decompositions on held-out images. Columns show the input, composite, and individual layers on white backgrounds. The base Qwen-Image-Layered model frequently leaves Layer 0 degenerate and duplicates the composite across foreground slots; Stable-Layers isolates distinct semantic elements with cleaner alpha masks and less colour bleed into transparent regions.

Per-Layer Reconstruction Error

L1 error of each predicted layer against held-out references, grouped by the number of layers L. Lower is better. Stable-Layers reduces mean error across all settings and consistently improves the earlier layers, with the largest gains on the dominant first layer (Pred 0).

L = 2 · n=3 samples

L = 3 · n=29 samples

L = 4 · n=90 samples

BibTeX

@article{rowles2026stablelayers,
  title   = {Stable-Layers: Fine-Tuning Image Layer Decomposition Models
             with VLM-Scored Reinforcement Learning},
  author  = {Rowles, Ciara and Adithyan, Reshinth and Pinnaparaju, Nikhil
             and Voleti, Vikram and Boss, Mark},
  journal = {Preprint},
  year    = {2026}
}

Stable-Layers:
Fine-Tuning Image Layer Decomposition Models
with VLM-Scored Reinforcement Learning

Abstract

Method

Qualitative Comparison

Per-Layer Reconstruction Error

Layer Decomposition Gallery

BibTeX