
Menu Test: Why GLM-Image Beats Diffusion Models at Legible Pricing
A practical menu benchmark you can run at home—testing price readability, alignment, and typography using GLM-Image with a clear scoring rubric.
The real-world problem: prices + alignment
Menus are a brutal test:
- lots of small text
- currency + decimals
- tight columns and spacing
GLM-Image is designed to handle information-dense layouts and improved text rendering via its hybrid architecture. (Hugging Face)
The menu benchmark (simple but revealing)
What you generate (3 menu types)
- Coffee shop (short)
- Bistro dinner menu (medium)
- Cocktail menu (dense)
The rubric (score each 0–5)
- Legibility: can you read every item and price?
- Numeric accuracy: do prices match exactly?
- Column alignment: are dots/columns consistent?
- Hierarchy: headings vs items vs descriptions
- No hallucinated items: does it invent extra dishes?
3 copy-paste menu prompts
Tip: put all required text in quotes. (GitHub)
A) Coffee menu
Clean cafe menu board, minimalist typography, white background. Title: "COFFEE". Items and prices exactly: "Espresso — $2.50", "Americano — $3.00", "Latte — $4.25", "Cappuccino — $4.25", "Mocha — $4.75". Footer: "Oat milk +$0.75". Perfect punctuation and numerals, aligned columns.
B) Bistro menu (two columns)
Elegant restaurant menu on textured cream paper. Left column heading "STARTERS" with: "Soup of the Day — $8", "Caesar Salad — $12". Right column heading "MAINS" with: "Roast Chicken — $24", "Seared Salmon — $28", "Mushroom Risotto — $22". Use consistent em dashes and right-aligned prices.
C) Cocktail menu (dense)
Cocktail menu, dark background, gold accents, high contrast typography. Title: "COCKTAILS". List exactly with prices: "Negroni — $14", "Old Fashioned — $15", "Margarita — $13", "Espresso Martini — $16", "Paloma — $13". Keep every letter readable, no extra words.
How to compare against diffusion-only models
Run the same prompts in:
- SDXL / Flux / any diffusion-only model you have …and score them with the rubric. You'll usually see diffusion models "stylize" text into near-text.
Publishable result format (for your blog)
- 1 image grid per menu type
- A score table (GLM-Image vs others)
- Notes: where errors happen (prices, currency, alignment)
Author

Categories
More Posts

The AR + Diffusion Hybrid Explained (With Diagrams)
GLM-Image uses autoregressive planning for layout + diffusion decoding for pixel fidelity. Here's the intuition, diagrams, and what it means for text rendering.


Benchmark Replication: CVTG-2K-Style Cases + Downloadable Prompts
Recreate the key “text-in-image” tests (CVTG-2K style) with prompts you can copy, run, and compare across models.


Transformers AR Stage Deep Dive: What Are the 256→4K Tokens?
GLM-Image generates image tokens autoregressively—starting from ~256 tokens and expanding to 1K–4K. Here's what that means for layouts, typography, and control.

