aibody.art

This ComfyUI workflow is a simple text-to-image setup designed for running GGUF models in ComfyUI. In the screenshot, the workflow uses the SWIFT! Fast and Detailed ZIT model, a GGUF model based on the Z Image Turbo ecosystem.

The workflow is intentionally minimal. It loads the GGUF model, loads CLIP, loads VAE, processes a positive and negative prompt, sets the image resolution and sampling steps, generates the image through KSampler, decodes the latent image with VAE, and finally saves the result.

It is a good starter workflow for users who want to test GGUF models in ComfyUI without building a complex node system from scratch.

Download Workflow: HERE

or HERE

 


What This Workflow Does

This workflow generates an image from a text prompt.

The basic generation path is:

GGUF Model → CLIP → Positive / Negative Prompt → KSampler → VAE Decode → Save Image

In the screenshot, the final image is generated at:

1088 × 1600 px

This is a vertical portrait format, suitable for fashion-style images, character portraits, editorial images, social media posts, and cover-style generations.


1. Unet Loader (GGUF)

The first node is Unet Loader (GGUF).

This node loads the main image generation model in GGUF format. In the screenshot, the selected model is:

swiftFastAndDetailed_v10Preview.gguf

This is the core model responsible for image generation. It controls the overall image quality, structure, detail level, style behavior, and how the prompt is interpreted.

The GGUF format is important because it allows certain models to run in a more optimized way. In this workflow, the GGUF model is loaded directly through the Unet Loader (GGUF) node and then sent into the KSampler.

In simple terms:

Unet Loader (GGUF) = loads the main AI image model

2. Load CLIP

The Load CLIP node loads the text encoder.

In the screenshot, the selected CLIP file is:

qwen_3_4b.safetensors

The type is set to:

lumina2

CLIP is responsible for understanding the text prompt. It converts the written prompt into conditioning data that the image model can understand.

Without CLIP, the model would not know what the user wants to generate. The text prompt, for example:

white uniform shirt
black pleated skirt
realistic snapshot
cinematic

is converted into a mathematical representation and then passed into the sampler.

In simple terms:

Load CLIP = loads the text understanding system

3. Load VAE

The Load VAE node loads the VAE model.

In the screenshot, the selected VAE file is:

ae.safetensors

VAE stands for Variational Autoencoder. In ComfyUI workflows, VAE is usually responsible for decoding the latent image into a normal visible image.

The KSampler does not directly create a normal image file. It creates a latent representation of the image. The VAE then converts that latent data into an actual image that can be previewed and saved.

The process looks like this:

latent image → VAE Decode → final image

In simple terms:

Load VAE = loads the decoder that turns latent data into a visible image

4. Load LoRA (Model and CLIP)

The screenshot also shows a Load LoRA (Model and CLIP) node.

However, this node appears to be inactive or not used in the final generation path. It is darker and partially disabled-looking, which suggests that the workflow can support LoRA, but the current generation does not rely on it.

LoRA is used to modify the behavior of the base model. It can add a specific style, character type, visual theme, pose behavior, clothing style, lighting style, or detail enhancement.

In this workflow, LoRA is optional.

You can use it if you want to add an extra style or concept to the model. But for a clean and simple GGUF workflow, it can remain disabled.

In simple terms:

Load LoRA = optional style or concept modifier

5. CLIP Text Encode — Positive Prompt

The CLIP Text Encode (Positive Prompt) node contains the main prompt.

This is where the user describes what should appear in the image.

In the screenshot, the positive prompt describes a portrait-style image of a young woman wearing a white shirt and black pleated skirt, with photographic and cinematic quality keywords.

The prompt includes terms such as:

high quality
realistic snapshot
cinematic
amateur photo

This node takes the text prompt and sends it through CLIP. The output becomes the positive conditioning, which is then connected to the positive input of the KSampler.

In simple terms:

Positive Prompt = tells the model what to generate

6. CLIP Text Encode — Negative Prompt

The CLIP Text Encode (Negative Prompt) node contains the negative prompt.

This tells the model what it should avoid.

In the screenshot, the negative prompt includes:

underage
lowres
blurry
bad anatomy
deformed hands
extra fingers
low quality
artifacts
child

The negative prompt is important because it helps reduce common AI image problems such as blurry images, bad anatomy, broken hands, extra fingers, artifacts, or low-quality output.

This text is also processed through CLIP, but the result is connected to the negative input of the KSampler.

In simple terms:

Negative Prompt = tells the model what to avoid

7. Width

The Width node controls the width of the generated image.

In the screenshot, the value is:

1088

This means the final image will be 1088 pixels wide.

The width value is sent into the latent image setup, which defines the canvas size before the image generation starts.

In simple terms:

Width = image width in pixels

8. Height

The Height node controls the height of the generated image.

In the screenshot, the value is:

1600

Together with the width, this creates a final image size of:

1088 × 1600 px

This is a vertical portrait ratio, useful for portrait photography, fashion images, character renders, social media covers, and vertical editorial-style images.

In simple terms:

Height = image height in pixels

9. Steps

The Steps node controls how many sampling steps the model uses during generation.

In the screenshot, the value is:

10

This is a relatively low number of steps, but it makes sense for Turbo-style models, which are designed to generate good results with fewer steps than older diffusion models.

More steps can sometimes improve detail, but they also increase generation time. For this workflow, 10 steps is a fast and practical setting.

In simple terms:

Steps = how many generation passes the sampler performs

10. KSampler

The KSampler is the main generation node.

This is where the actual image generation happens.

The KSampler receives:

model
positive conditioning
negative conditioning
latent image
seed
steps
CFG
sampler
scheduler
denoise

In the screenshot, the KSampler settings are:

seed: 396242236693412
control after generate: randomize
steps: 10
cfg: 1.0
sampler_name: euler
scheduler: simple
denoise: 1.00

Seed

The seed controls randomness.

Using the same seed, prompt, model, resolution, and settings can reproduce a very similar image. In this workflow, the seed is set to randomize after generation, which means each new generation can create a different result.

Seed = random starting point for the image

Steps

The sampler uses 10 steps.

Steps = 10

This keeps the generation fast.

CFG

CFG is set to:

1.0

CFG controls how strongly the model follows the prompt. In many traditional Stable Diffusion workflows, CFG values are often higher, such as 5–8. However, for Turbo-style models, lower CFG values are common.

CFG = prompt guidance strength

Sampler

The sampler is set to:

euler

Euler is a simple and fast sampler. It is often used in workflows focused on speed.

Sampler = Euler

Scheduler

The scheduler is set to:

simple

The scheduler controls how the noise is handled across the sampling steps.

Scheduler = Simple

Denoise

Denoise is set to:

1.00

A denoise value of 1.00 means the image is generated fully from noise. This is standard for text-to-image generation.

Denoise 1.00 = full generation from scratch

11. Latent Image

The workflow generates the image in latent space first.

A latent image is not a normal visible image yet. It is an internal representation used by the diffusion model during generation.

The latent size is based on:

width: 1088
height: 1600

The KSampler creates the latent result, and then the VAE converts it into a visible image.

In simple terms:

Latent Image = the hidden image representation before VAE decoding

12. VAE Decode

The VAE Decode step converts the latent image into a normal image.

Although the VAE Decode node is partly hidden in the screenshot, its function is clear from the connections.

It receives:

samples from KSampler
VAE from Load VAE

Then it outputs the final visible image.

The process is:

KSampler samples → VAE Decode → image

In simple terms:

VAE Decode = turns the generated latent into a visible image

13. Save Image

The final node is Save Image.

This node saves the generated image to disk.

In the screenshot, the filename prefix is:

Image-%date:yyyyMMddhhmmss%

This means the saved image file will start with Image- and then include the current date and time. This prevents new generations from overwriting previous images.

The preview inside the Save Image node shows the final generated image at:

1088 × 1600 px

In simple terms:

Save Image = saves the final generated image file

Full Workflow Process

The entire workflow can be understood as a simple step-by-step generation pipeline:

1. Unet Loader loads the GGUF model.
2. Load CLIP loads the text encoder.
3. Load VAE loads the image decoder.
4. Positive Prompt is converted into positive conditioning.
5. Negative Prompt is converted into negative conditioning.
6. Width and Height define the image size.
7. Steps defines how many sampling steps will be used.
8. KSampler generates the image in latent space.
9. VAE Decode converts the latent image into a visible image.
10. Save Image saves the final result.

Recommended Settings from the Screenshot

Based on the screenshot, the workflow uses the following setup:

Model: swiftFastAndDetailed_v10Preview.gguf
CLIP: qwen_3_4b.safetensors
VAE: ae.safetensors
Sampler: Euler
Scheduler: Simple
CFG: 1.0
Steps: 10
Denoise: 1.00
Resolution: 1088 × 1600

These settings are designed for fast generation while keeping the image detailed and clean.


Who Is This Workflow For?

This workflow is useful for users who:

- want to run GGUF models in ComfyUI,
- want a simple text-to-image workflow,
- are testing Z Image Turbo-style models,
- want fast image generation,
- want to understand the basic ComfyUI generation pipeline,
- prefer a clean node setup without too many advanced modules.

It is especially useful as a beginner-friendly workflow because every important part of the image generation process is visible and easy to understand.


What Can Be Added Later?

This workflow is simple, but it can be expanded.

Possible upgrades include:

- image upscale after generation,
- Face Detailer for improving faces,
- Hand Detailer for better hands,
- active LoRA support,
- batch generation,
- image preview before saving,
- automatic prompt saving,
- ControlNet,
- IPAdapter,
- custom resolution presets,
- high-resolution fix workflow.

These additions can make the workflow more powerful, but the current version is better for learning and testing because it stays clean and easy to read.


Summary

This ComfyUI workflow is a simple and practical GGUF text-to-image setup. It uses a GGUF UNet model, CLIP text encoding, VAE decoding, positive and negative prompts, resolution controls, KSampler generation, and final image saving.

The main advantage of this workflow is simplicity. It shows the full generation path clearly:

model → prompt → sampler → VAE → saved image

For users who want to test the SWIFT! Fast and Detailed ZIT GGUF model or similar Z Image Turbo GGUF models, this is a good starting point. It is fast, easy to understand, and simple to modify later.

Share

Share this article

Facebook X LinkedIn WhatsApp Pinterest

Leave a Reply

Your email address will not be published. Required fields are marked *