Transformers AR Stage Deep Dive: What Are the 256→4K Tokens?

The two-stage token plan

GLM-Image's AR generator (initialized from GLM-4-9B-0414) produces:

a compact encoding (~256 tokens)
then expands to 1K–4K tokens tied to high-res outputs (GitHub)

Think of it as: outline → detailed blueprint.

Why token expansion helps typography

Typography needs:

consistent strokes across letters
consistent spacing across words
consistent alignment across blocks

A “blueprint” stage can reserve space for text blocks and maintain hierarchy (headline > subhead > body).

What you can control (as a user)

You don't directly edit these tokens in most workflows. But you do influence them via:

explicit layout instructions
clear hierarchy language
exact quoted text (GitHub)
limiting each block to a reasonable length

A “token-friendly” layout prompt

Use numbered blocks to force structure:

Poster layout with four zones: (1) Top headline: "[HEADLINE]" (2) Subheadline: "[SUBHEAD]" (3) Center image: [describe subject] (4) Footer bar: "[CTA]" and "[URL]" Use clean alignment, consistent kerning, no typos.

Debugging when AR “wanders”

If the model adds extra words:

reduce creative adjectives
re-assert “exactly this text and nothing else”
shorten the text per block

The two-stage token plan

GLM-Image's AR generator (initialized from GLM-4-9B-0414) produces:

a compact encoding (~256 tokens)
then expands to 1K–4K tokens tied to high-res outputs (GitHub)

Think of it as: outline → detailed blueprint.

Why token expansion helps typography

Typography needs:

consistent strokes across letters
consistent spacing across words
consistent alignment across blocks

A “blueprint” stage can reserve space for text blocks and maintain hierarchy (headline > subhead > body).

What you can control (as a user)

You don't directly edit these tokens in most workflows. But you do influence them via:

explicit layout instructions
clear hierarchy language
exact quoted text (GitHub)
limiting each block to a reasonable length

A “token-friendly” layout prompt

Use numbered blocks to force structure:

Debugging when AR “wanders”

If the model adds extra words:

reduce creative adjectives
re-assert “exactly this text and nothing else”
shorten the text per block

The two-stage token plan

Why token expansion helps typography

What you can control (as a user)

A “token-friendly” layout prompt

Debugging when AR “wanders”

Author

Categories

More Posts

Prompting for Professional Menus: GLM-Image vs Ordinary Models

Educational Infographics: Visualizing Data with GLM-Image

GLM-Image Layout Keywords Cheatsheet: Master Spatial Control in Prompts

Transformers AR Stage Deep Dive: What Are the 256→4K Tokens?

The two-stage token plan

Why token expansion helps typography

What you can control (as a user)

A “token-friendly” layout prompt

Debugging when AR “wanders”

Author

Categories

More Posts

Prompting for Professional Menus: GLM-Image vs Ordinary Models

Educational Infographics: Visualizing Data with GLM-Image

GLM-Image Layout Keywords Cheatsheet: Master Spatial Control in Prompts