Skill: Nano Banana Prompting

What

Gemini image models ("Nano Banana" = Flash, "Nano Banana Pro" = Pro) respond to narrative-driven structured prompts, not keyword stuffing. These are language models that generate images — prompt them like you're briefing a cinematographer, not tagging a search engine.

Model Landscape (as of March 2026)

Community Name	Model	Notes
Nano Banana	Gemini 2.5 Flash Image	Original. Good quality, fast.
Nano Banana Pro	Gemini 3 Pro Image	Best quality. Best text rendering. Google Search grounding.
Nano Banana 2	Gemini 3.1 Flash Image	Pro-level quality at Flash speed. Now default across Gemini apps.

Max reference images: Up to 14 images can be sent alongside a prompt with Nano Banana Pro. Assign clear roles: "Image A for pose, Image B for style, Image C for environment."

Why

Gemini's image generation sits on top of its full multimodal encoder. It REASONS about the prompt (up to 32K tokens of input). This means:

Long, structured prompts work better than short keyword lists
Natural language descriptions beat comma-separated tags
The model builds a 3D mental model of the scene before rendering
It can handle complex spatial relationships if you describe them clearly

When to Use Nano Banana for Accuracy

Nano Banana is the BEST choice for images requiring precision — maps, charts, data visualizations, infographics, technical illustrations, architectural diagrams. The model's reasoning engine means it can:

Place labels at correct geographic positions on maps
Render data-accurate bar charts if you specify exact values
Draw trade route lines between named points
Create multi-layer infographics with proper spatial hierarchy

For GatorSquare investigations: Maps of oil routes, shipping lanes, military positions, financial flow diagrams, timeline visualizations — all of these should go through Nano Banana with structured narrative prompts, not keyword-based generation.

Always specify: "16:9 widescreen cinematic frame" for all GatorSquare images.

Keyword-style prompting (from Midjourney/SD era) actively hurts Gemini output.

How

Prompt Structure

Follow this order: Subject → Action/Context → Style/Medium → Lighting/Color → Camera/Composition

Or the simpler version: [What] + [Doing What] + [Where] + [How It Looks] + [Technical]

The 15-50 Word Sweet Spot

For standard panels, 15-50 words of core description is optimal. Too short = ambiguous. Too long = confused model. BUT: for establishing shots or complex compositions, longer is fine — Gemini handles it.

Affirmative Over Negative

YES: "Clean neutral grey background, studio lighting"
NO: "Don't add busy backgrounds, no dark lighting"
Testing showed affirmative framing beat negative in every test or tied. Never lost.

Defensive Prompting

When you need a specific style maintained, explicitly ban shortcuts:

"Do NOT add any artistic filter, do NOT stylize beyond the specified style"
"Maintain the exact Disney Pixar 3D animation style — do not shift to photorealism or 2D illustration"
Models take the easiest shortcut to satisfy a prompt. Defensive lines close those exits.

Color Hex Codes Over Color Names

Hex codes reduce ambiguity. "accent color #FF6B35" beats "orange accent". Use hex for any color that matters.

Two-Step Method for Text in Images

When you need text rendered in the image (signs, labels, titles):

First confirm the exact text content in pure text mode
Then ask the model to render that confirmed text into the image This separates spelling accuracy from image composition, dramatically reducing typos.

Material Specificity

Don't say generic things. Be surgically specific:

NO: "a suit jacket" → YES: "navy blue tweed blazer with brass buttons"
NO: "a hospital bed" → YES: "white-sheeted NHS hospital bed with chrome side rails, head elevated 30 degrees"
NO: "warm lighting" → YES: "warm golden afternoon light from a window behind the bed, backlighting the subjects"

Identity Block Pattern (for character consistency)

Create a fixed text block per character. Repeat it VERBATIM in every prompt where that character appears. Only change action/pose/camera between frames:

IDENTITY BLOCK (Ovi):
"A 3-year-old South Asian girl with black wavy chin-length bob, sparkly butterfly clip on the RIGHT side, gap-tooth, enormous dark brown eyes, round toddler cheeks. Wearing purple pajamas with white stars, fluffy bunny slippers. Carrying a stuffed bunny toy."

FRAME 1: [IDENTITY BLOCK] + "mid-stride running away from camera, arms pumping"
FRAME 2: [IDENTITY BLOCK] + "at the bed edge, hands gripping white sheet, looking up"
FRAME 3: [IDENTITY BLOCK] + "pulling herself onto the bed, feet dangling"

Keep character tokens at the START of the prompt — token position affects weight.

SCHEMA Framework (Advanced)

Three tiers of prompt control (from SCHEMA paper, validated across 4,800+ images):

BASE (~5% control): Simple, short prompts. Use for transitions/simple scenes.
MEDIO (~50% control): Structured with subject, action, style, lighting, camera.
AVANZATO (~95% control): Full 7-component prompts with mandatory compliance directives.

Maps to Gator Square model allocation:

Flash panels = BASE/MEDIO prompts (short, delta-focused, chain carries the rest)
Pro panels = AVANZATO prompts (full structured prompts with defensive lines)

Anatomical Anchoring (Hands & Bodies)

Extra hands/limbs are the #1 artifact. Prevent them proactively:

Always pair "hands" with an action verb + spatial anchor: "right hand resting flat on table," "left hand gripping coffee cup handle," "both hands clasped in lap"
Never mention "hands" alone — the model hallucinates extra when hands lack purpose
Specify finger count for close-ups: "five-fingered hand" or "hand with five distinct digits"
Describe occlusion: "left forearm partially hidden behind table edge" — occlusion implies depth and prevents duplicate limb generation
Anchor body parts to fixed surfaces: "elbow resting on armrest," "feet flat on floor" — floating limbs breed duplicates
For multi-character scenes: specify EACH character's hand state. "Character A: hands in lap. Character B: right hand on table, left hand holding cup."

Multi-Character Scene Management

When two or more characters appear:

Label each character by name at prompt start: "TWO people visible: RAHUL (buzz cut, leather jacket) on the LEFT, DEV (glasses, hoodie) on the RIGHT."
Describe spatial relationship: "Rahul sits across the table from Dev, approximately 3 feet apart"
Assign distinct actions to each: "Rahul is gesturing with his right hand while Dev leans back with arms crossed"
Never leave a character's pose undefined — undefined characters default to generic poses that duplicate nearby elements

Chained Prompt Discipline

For ref_panels chained prompts (panel-02 onward):

Limit to 1-2 changes per prompt — the model carries everything else from the previous frame
Restate character identity block even in chained prompts — chaining carries VISUAL style, not character identity
Keep chained prompts under 80 words of core description — the reference image does most of the work
"Same scene" + delta only: what MOVED, what CHANGED expression, what shifted position. Nothing else.

Reference Image Role Assignment

When sending multiple reference images:

Explicitly state each image's role: "Reference Image 1: character face and build. Reference Image 2: character expressions and emotions. Reference Image 3: previous panel for scene continuity."
Prioritize quality over quantity: 2-3 clear references beat 6 mediocre ones
High-resolution references only: low-detail refs cause hallucinated features
The LAST image's aspect ratio is adopted — always send the ref you want to match last

Common Mistakes

Keyword stuffing ("4k, hyper-realistic, cinematic, octane render, trending on artstation") — Gemini ignores or misinterprets this
Negative prompts from Stable Diffusion era ("no extra fingers, no deformed faces") — use affirmative instead
Under-specifying materials ("a dress" vs "a floor-length emerald silk gown with gold embroidery")
Changing character description wording between frames — use EXACT same identity block
Putting character description at the END of prompt instead of the beginning
Overloading prompts — cramming 200+ words of conflicting instructions. Gemini starts ignoring half when prompts get too long. Focus on subject + action + style.
Undefined hand positions — mentioning characters without specifying what their hands are doing. Every visible character needs hand-state defined.
Full scene rewrites in chained prompts — when ref_panels is set, describe only what CHANGED. Full rewrites fight the reference image.

References

Max Woolf's deep analysis: minimaxir.com (Nano Banana prompts)
SCHEMA paper: arxiv.org/abs/2602.18903
Google DeepMind Prompt Guide: deepmind.google/models/gemini-image/prompt-guide/
awesome-nanobanana-pro (GitHub): curated prompt library by ZeroLu
Charlie Hills' Substack: multi-part prompting series

Cleanup After Download

After downloading any generated image from the browser:

Move immediately from Downloads to
Delete the original from Downloads
Downloads is transit, not storage. Clean up every time.

Skill: Nano Banana Prompting

What

Model Landscape (as of March 2026)

Community Name	Model	Notes
Nano Banana	Gemini 2.5 Flash Image	Original. Good quality, fast.
Nano Banana Pro	Gemini 3 Pro Image	Best quality. Best text rendering. Google Search grounding.
Nano Banana 2	Gemini 3.1 Flash Image	Pro-level quality at Flash speed. Now default across Gemini apps.

Max reference images: Up to 14 images can be sent alongside a prompt with Nano Banana Pro. Assign clear roles: "Image A for pose, Image B for style, Image C for environment."

Why

Gemini's image generation sits on top of its full multimodal encoder. It REASONS about the prompt (up to 32K tokens of input). This means:

Long, structured prompts work better than short keyword lists
Natural language descriptions beat comma-separated tags
The model builds a 3D mental model of the scene before rendering
It can handle complex spatial relationships if you describe them clearly

When to Use Nano Banana for Accuracy

Place labels at correct geographic positions on maps
Render data-accurate bar charts if you specify exact values
Draw trade route lines between named points
Create multi-layer infographics with proper spatial hierarchy

Always specify: "16:9 widescreen cinematic frame" for all GatorSquare images.

Keyword-style prompting (from Midjourney/SD era) actively hurts Gemini output.

How

Prompt Structure

Follow this order: Subject → Action/Context → Style/Medium → Lighting/Color → Camera/Composition

Or the simpler version: [What] + [Doing What] + [Where] + [How It Looks] + [Technical]

The 15-50 Word Sweet Spot

Affirmative Over Negative

YES: "Clean neutral grey background, studio lighting"
NO: "Don't add busy backgrounds, no dark lighting"
Testing showed affirmative framing beat negative in every test or tied. Never lost.

Defensive Prompting

When you need a specific style maintained, explicitly ban shortcuts:

"Do NOT add any artistic filter, do NOT stylize beyond the specified style"
"Maintain the exact Disney Pixar 3D animation style — do not shift to photorealism or 2D illustration"
Models take the easiest shortcut to satisfy a prompt. Defensive lines close those exits.

Color Hex Codes Over Color Names

Hex codes reduce ambiguity. "accent color #FF6B35" beats "orange accent". Use hex for any color that matters.

Two-Step Method for Text in Images

When you need text rendered in the image (signs, labels, titles):

First confirm the exact text content in pure text mode
Then ask the model to render that confirmed text into the image This separates spelling accuracy from image composition, dramatically reducing typos.

Material Specificity

Don't say generic things. Be surgically specific:

NO: "a suit jacket" → YES: "navy blue tweed blazer with brass buttons"
NO: "a hospital bed" → YES: "white-sheeted NHS hospital bed with chrome side rails, head elevated 30 degrees"
NO: "warm lighting" → YES: "warm golden afternoon light from a window behind the bed, backlighting the subjects"

Identity Block Pattern (for character consistency)

Create a fixed text block per character. Repeat it VERBATIM in every prompt where that character appears. Only change action/pose/camera between frames:

IDENTITY BLOCK (Ovi):
"A 3-year-old South Asian girl with black wavy chin-length bob, sparkly butterfly clip on the RIGHT side, gap-tooth, enormous dark brown eyes, round toddler cheeks. Wearing purple pajamas with white stars, fluffy bunny slippers. Carrying a stuffed bunny toy."

FRAME 1: [IDENTITY BLOCK] + "mid-stride running away from camera, arms pumping"
FRAME 2: [IDENTITY BLOCK] + "at the bed edge, hands gripping white sheet, looking up"
FRAME 3: [IDENTITY BLOCK] + "pulling herself onto the bed, feet dangling"

Keep character tokens at the START of the prompt — token position affects weight.

SCHEMA Framework (Advanced)

Three tiers of prompt control (from SCHEMA paper, validated across 4,800+ images):

BASE (~5% control): Simple, short prompts. Use for transitions/simple scenes.
MEDIO (~50% control): Structured with subject, action, style, lighting, camera.
AVANZATO (~95% control): Full 7-component prompts with mandatory compliance directives.

Maps to Gator Square model allocation:

Flash panels = BASE/MEDIO prompts (short, delta-focused, chain carries the rest)
Pro panels = AVANZATO prompts (full structured prompts with defensive lines)

Anatomical Anchoring (Hands & Bodies)

Extra hands/limbs are the #1 artifact. Prevent them proactively:

Always pair "hands" with an action verb + spatial anchor: "right hand resting flat on table," "left hand gripping coffee cup handle," "both hands clasped in lap"
Never mention "hands" alone — the model hallucinates extra when hands lack purpose
Specify finger count for close-ups: "five-fingered hand" or "hand with five distinct digits"
Describe occlusion: "left forearm partially hidden behind table edge" — occlusion implies depth and prevents duplicate limb generation
Anchor body parts to fixed surfaces: "elbow resting on armrest," "feet flat on floor" — floating limbs breed duplicates
For multi-character scenes: specify EACH character's hand state. "Character A: hands in lap. Character B: right hand on table, left hand holding cup."

Multi-Character Scene Management

When two or more characters appear:

Label each character by name at prompt start: "TWO people visible: RAHUL (buzz cut, leather jacket) on the LEFT, DEV (glasses, hoodie) on the RIGHT."
Describe spatial relationship: "Rahul sits across the table from Dev, approximately 3 feet apart"
Assign distinct actions to each: "Rahul is gesturing with his right hand while Dev leans back with arms crossed"
Never leave a character's pose undefined — undefined characters default to generic poses that duplicate nearby elements

Chained Prompt Discipline

For ref_panels chained prompts (panel-02 onward):

Limit to 1-2 changes per prompt — the model carries everything else from the previous frame
Restate character identity block even in chained prompts — chaining carries VISUAL style, not character identity
Keep chained prompts under 80 words of core description — the reference image does most of the work
"Same scene" + delta only: what MOVED, what CHANGED expression, what shifted position. Nothing else.

Reference Image Role Assignment

When sending multiple reference images:

Explicitly state each image's role: "Reference Image 1: character face and build. Reference Image 2: character expressions and emotions. Reference Image 3: previous panel for scene continuity."
Prioritize quality over quantity: 2-3 clear references beat 6 mediocre ones
High-resolution references only: low-detail refs cause hallucinated features
The LAST image's aspect ratio is adopted — always send the ref you want to match last

Common Mistakes

Keyword stuffing ("4k, hyper-realistic, cinematic, octane render, trending on artstation") — Gemini ignores or misinterprets this
Negative prompts from Stable Diffusion era ("no extra fingers, no deformed faces") — use affirmative instead
Under-specifying materials ("a dress" vs "a floor-length emerald silk gown with gold embroidery")
Changing character description wording between frames — use EXACT same identity block
Putting character description at the END of prompt instead of the beginning
Overloading prompts — cramming 200+ words of conflicting instructions. Gemini starts ignoring half when prompts get too long. Focus on subject + action + style.
Undefined hand positions — mentioning characters without specifying what their hands are doing. Every visible character needs hand-state defined.
Full scene rewrites in chained prompts — when ref_panels is set, describe only what CHANGED. Full rewrites fight the reference image.

References

Max Woolf's deep analysis: minimaxir.com (Nano Banana prompts)
SCHEMA paper: arxiv.org/abs/2602.18903
Google DeepMind Prompt Guide: deepmind.google/models/gemini-image/prompt-guide/
awesome-nanobanana-pro (GitHub): curated prompt library by ZeroLu
Charlie Hills' Substack: multi-part prompting series

Cleanup After Download

After downloading any generated image from the browser:

Move immediately from Downloads to
Delete the original from Downloads
Downloads is transit, not storage. Clean up every time.

Skill: Nano Banana Prompting

Additional Files (20)

Skill: Nano Banana Prompting

What

Model Landscape (as of March 2026)

Why

When to Use Nano Banana for Accuracy

How

Prompt Structure

The 15-50 Word Sweet Spot

Affirmative Over Negative

Defensive Prompting

Color Hex Codes Over Color Names

Two-Step Method for Text in Images

Material Specificity

Identity Block Pattern (for character consistency)

SCHEMA Framework (Advanced)

Anatomical Anchoring (Hands & Bodies)

Multi-Character Scene Management

Chained Prompt Discipline

Reference Image Role Assignment

Common Mistakes

References

Cleanup After Download

Related Skills

<h1 align="center">

Frontend Typescript Linting.mdc

2. Apply Deepthink Protocol (reason about dependencies

Additional Files (20)

Skill: Nano Banana Prompting

What

Model Landscape (as of March 2026)

Why

When to Use Nano Banana for Accuracy

How

Prompt Structure

The 15-50 Word Sweet Spot

Affirmative Over Negative

Defensive Prompting

Color Hex Codes Over Color Names

Two-Step Method for Text in Images

Material Specificity

Identity Block Pattern (for character consistency)

SCHEMA Framework (Advanced)

Anatomical Anchoring (Hands & Bodies)

Multi-Character Scene Management

Chained Prompt Discipline

Reference Image Role Assignment

Common Mistakes

References

Cleanup After Download