Skill: Nano Banana Prompting
What
Gemini image models ("Nano Banana" = Flash, "Nano Banana Pro" = Pro) respond to narrative-driven structured prompts, not keyword stuffing. These are language models that generate images — prompt them like you're briefing a cinematographer, not tagging a search engine.
Model Landscape (as of March 2026)
| Community Name | Model | Notes |
|---|
| Nano Banana | Gemini 2.5 Flash Image | Original. Good quality, fast. |
| Nano Banana Pro | Gemini 3 Pro Image | Best quality. Best text rendering. Google Search grounding. |
| Nano Banana 2 | Gemini 3.1 Flash Image | Pro-level quality at Flash speed. Now default across Gemini apps. |
Max reference images: Up to 14 images can be sent alongside a prompt with Nano Banana Pro. Assign clear roles: "Image A for pose, Image B for style, Image C for environment."
Why
Gemini's image generation sits on top of its full multimodal encoder. It REASONS about the prompt (up to 32K tokens of input). This means:
- Long, structured prompts work better than short keyword lists
- Natural language descriptions beat comma-separated tags
- The model builds a 3D mental model of the scene before rendering
- It can handle complex spatial relationships if you describe them clearly
When to Use Nano Banana for Accuracy
Nano Banana is the BEST choice for images requiring precision — maps, charts, data visualizations, infographics, technical illustrations, architectural diagrams. The model's reasoning engine means it can:
- Place labels at correct geographic positions on maps
- Render data-accurate bar charts if you specify exact values
- Draw trade route lines between named points
- Create multi-layer infographics with proper spatial hierarchy
For GatorSquare investigations: Maps of oil routes, shipping lanes, military positions, financial flow diagrams, timeline visualizations — all of these should go through Nano Banana with structured narrative prompts, not keyword-based generation.
Always specify: "16:9 widescreen cinematic frame" for all GatorSquare images.
Keyword-style prompting (from Midjourney/SD era) actively hurts Gemini output.
How
Prompt Structure
Follow this order: Subject → Action/Context → Style/Medium → Lighting/Color → Camera/Composition
Or the simpler version: [What] + [Doing What] + [Where] + [How It Looks] + [Technical]
The 15-50 Word Sweet Spot
For standard panels, 15-50 words of core description is optimal. Too short = ambiguous. Too long = confused model. BUT: for establishing shots or complex compositions, longer is fine — Gemini handles it.
Affirmative Over Negative
- YES: "Clean neutral grey background, studio lighting"
- NO: "Don't add busy backgrounds, no dark lighting"
- Testing showed affirmative framing beat negative in every test or tied. Never lost.
Defensive Prompting
When you need a specific style maintained, explicitly ban shortcuts:
- "Do NOT add any artistic filter, do NOT stylize beyond the specified style"
- "Maintain the exact Disney Pixar 3D animation style — do not shift to photorealism or 2D illustration"
- Models take the easiest shortcut to satisfy a prompt. Defensive lines close those exits.
Color Hex Codes Over Color Names
Hex codes reduce ambiguity. "accent color #FF6B35" beats "orange accent". Use hex for any color that matters.
Two-Step Method for Text in Images
When you need text rendered in the image (signs, labels, titles):
- First confirm the exact text content in pure text mode
- Then ask the model to render that confirmed text into the image
This separates spelling accuracy from image composition, dramatically reducing typos.
Material Specificity
Don't say generic things. Be surgically specific:
- NO: "a suit jacket" → YES: "navy blue tweed blazer with brass buttons"
- NO: "a hospital bed" → YES: "white-sheeted NHS hospital bed with chrome side rails, head elevated 30 degrees"
- NO: "warm lighting" → YES: "warm golden afternoon light from a window behind the bed, backlighting the subjects"
Identity Block Pattern (for character consistency)
Create a fixed text block per character. Repeat it VERBATIM in every prompt where that character appears. Only change action/pose/camera between frames:
IDENTITY BLOCK (Ovi):
"A 3-year-old South Asian girl with black wavy chin-length bob, sparkly butterfly clip on the RIGHT side, gap-tooth, enormous dark brown eyes, round toddler cheeks. Wearing purple pajamas with white stars, fluffy bunny slippers. Carrying a stuffed bunny toy."
FRAME 1: [IDENTITY BLOCK] + "mid-stride running away from camera, arms pumping"
FRAME 2: [IDENTITY BLOCK] + "at the bed edge, hands gripping white sheet, looking up"
FRAME 3: [IDENTITY BLOCK] + "pulling herself onto the bed, feet dangling"
Keep character tokens at the START of the prompt — token position affects weight.
SCHEMA Framework (Advanced)
Three tiers of prompt control (from SCHEMA paper, validated across 4,800+ images):
- BASE (~5% control): Simple, short prompts. Use for transitions/simple scenes.
- MEDIO (~50% control): Structured with subject, action, style, lighting, camera.
- AVANZATO (~95% control): Full 7-component prompts with mandatory compliance directives.
Maps to Gator Square model allocation:
- Flash panels = BASE/MEDIO prompts (short, delta-focused, chain carries the rest)
- Pro panels = AVANZATO prompts (full structured prompts with defensive lines)
Anatomical Anchoring (Hands & Bodies)
Extra hands/limbs are the #1 artifact. Prevent them proactively:
- Always pair "hands" with an action verb + spatial anchor: "right hand resting flat on table," "left hand gripping coffee cup handle," "both hands clasped in lap"
- Never mention "hands" alone — the model hallucinates extra when hands lack purpose
- Specify finger count for close-ups: "five-fingered hand" or "hand with five distinct digits"
- Describe occlusion: "left forearm partially hidden behind table edge" — occlusion implies depth and prevents duplicate limb generation
- Anchor body parts to fixed surfaces: "elbow resting on armrest," "feet flat on floor" — floating limbs breed duplicates
- For multi-character scenes: specify EACH character's hand state. "Character A: hands in lap. Character B: right hand on table, left hand holding cup."
Multi-Character Scene Management
When two or more characters appear:
- Label each character by name at prompt start: "TWO people visible: RAHUL (buzz cut, leather jacket) on the LEFT, DEV (glasses, hoodie) on the RIGHT."
- Describe spatial relationship: "Rahul sits across the table from Dev, approximately 3 feet apart"
- Assign distinct actions to each: "Rahul is gesturing with his right hand while Dev leans back with arms crossed"
- Never leave a character's pose undefined — undefined characters default to generic poses that duplicate nearby elements
Chained Prompt Discipline
For ref_panels chained prompts (panel-02 onward):
- Limit to 1-2 changes per prompt — the model carries everything else from the previous frame
- Restate character identity block even in chained prompts — chaining carries VISUAL style, not character identity
- Keep chained prompts under 80 words of core description — the reference image does most of the work
- "Same scene" + delta only: what MOVED, what CHANGED expression, what shifted position. Nothing else.
Reference Image Role Assignment
When sending multiple reference images:
- Explicitly state each image's role: "Reference Image 1: character face and build. Reference Image 2: character expressions and emotions. Reference Image 3: previous panel for scene continuity."
- Prioritize quality over quantity: 2-3 clear references beat 6 mediocre ones
- High-resolution references only: low-detail refs cause hallucinated features
- The LAST image's aspect ratio is adopted — always send the ref you want to match last
Common Mistakes
- Keyword stuffing ("4k, hyper-realistic, cinematic, octane render, trending on artstation") — Gemini ignores or misinterprets this
- Negative prompts from Stable Diffusion era ("no extra fingers, no deformed faces") — use affirmative instead
- Under-specifying materials ("a dress" vs "a floor-length emerald silk gown with gold embroidery")
- Changing character description wording between frames — use EXACT same identity block
- Putting character description at the END of prompt instead of the beginning
- Overloading prompts — cramming 200+ words of conflicting instructions. Gemini starts ignoring half when prompts get too long. Focus on subject + action + style.
- Undefined hand positions — mentioning characters without specifying what their hands are doing. Every visible character needs hand-state defined.
- Full scene rewrites in chained prompts — when ref_panels is set, describe only what CHANGED. Full rewrites fight the reference image.
References
- Max Woolf's deep analysis: minimaxir.com (Nano Banana prompts)
- SCHEMA paper: arxiv.org/abs/2602.18903
- Google DeepMind Prompt Guide: deepmind.google/models/gemini-image/prompt-guide/
- awesome-nanobanana-pro (GitHub): curated prompt library by ZeroLu
- Charlie Hills' Substack: multi-part prompting series
Cleanup After Download
After downloading any generated image from the browser:
- Move immediately from Downloads to
- Delete the original from Downloads
- Downloads is transit, not storage. Clean up every time.