ComfyUI Hires Fix Latent Upscaling Guide
After using ComfyUI for image generation over the past year, I've watched countless creators struggle with one frustrating limitation. You generate a beautiful image at 512x512 or 768x768, only to find it falls apart when you need print quality or large format output. Standard upscaling destroys the fine details that make AI art compelling.
Hires Fix changes everything. This ComfyUI technique performs latent space upscaling during image generation, allowing for higher resolution output with significantly better quality than traditional post-generation upscaling methods.
Hires Fix is a ComfyUI technique that generates high-resolution images by upscaling in latent space during the generation process. It first creates a base image at lower resolution, then upscales it in latent space before applying final detail passes with denoising.
When I first switched from Automatic1111 to ComfyUI, the learning curve felt steep. The node-based workflow system confused me, and Hires Fix seemed like an advanced feature I'd never master. After dozens of failed attempts and blurry outputs, I finally cracked the code. This guide shortcuts that learning process for you.
By the end of this article, you'll know how to build a working Hires Fix workflow from scratch, understand every parameter, and troubleshoot common issues. If you're new to local AI image generation, I recommend starting there before diving into ComfyUI workflows.
Understanding Latent Upscaling
Latent Space: The compressed mathematical representation where AI models like Stable Diffusion actually process images. Working in latent space is more efficient than pixel space because it requires less computation while preserving the essential image information.
Latent upscaling works differently than traditional upscaling methods. Instead of enlarging pixels after generation, it scales up the latent representation itself. This preserves the model's understanding of image content and allows the diffusion process to add appropriate details at the higher resolution.
Think of it this way. Pixel upscaling is like taking a small photo and stretching it. The gaps between pixels get filled with guesses, often resulting in blur or artifacts. Latent upscaling is like having the artist paint the image at a larger size from the start. The model adds detail consistent with the original generation.
| Feature | Latent Upscaling | Pixel Upscaling |
|---|---|---|
| When Applied | During generation | After generation completes |
| Detail Quality | High - model adds consistent details | Variable - depends on upscaler model |
| Processing Time | Longer - integrated into generation | Faster - separate quick pass |
| VRAM Usage | Higher - processes larger tensors | Lower - works on finished image |
| Style Consistency | Excellent - same model generates details | Good - but may introduce artifacts |
In my experience testing both methods extensively, latent upscaling produces noticeably better results for AI-generated images. The details feel more natural because the same model that created the base image is adding the finer details. Pixel upscalers like ESRGAN work well, but they sometimes add textures that don't match the generation style.
Key Takeaway: "Latent upscaling in ComfyUI produces higher quality results than traditional upscaling because it lets the diffusion model add details at the target resolution, maintaining consistency with the original generation style."
Setup and Prerequisites
Before building your Hires Fix workflow, ensure your system meets the requirements. Hires Fix is more demanding than standard generation because it processes larger tensors in latent space.
Quick Summary: You'll need ComfyUI installed, an NVIDIA GPU with at least 8GB VRAM (12GB recommended), and an upscaler model. AMD GPUs can work but may require additional configuration.
System Requirements
- GPU: NVIDIA GPU with 8GB+ VRAM minimum. For 1024x1024 output with SDXL, 12GB+ is recommended. Check our guide to the best GPUs for AI image generation if you're planning an upgrade.
- RAM: 16GB system RAM minimum, 32GB recommended for larger batches.
- Storage: 10GB+ free space for models and upscalers.
- Software: ComfyUI installed and working with basic text-to-image generation.
Installing Upscaler Models
ComfyUI includes basic upscaling methods, but for best results you'll want dedicated upscaler models. Download these from Civitai or Hugging Face and place them in your ComfyUI models directory.
The default path for upscaler models is typically ComfyUI/models/upscale_models/. Some popular options include 4x-UltraSharp for photos, Real-ESRGAN for general use, and ESRGAN Anime for illustrated content.
Building Your Hires Fix Workflow Step by Step
Now let's build the complete Hires Fix workflow from scratch. I've tested this workflow across multiple projects and it consistently produces excellent results. If you're interested in other ComfyUI workflows, check out our Qwen Image Edit guide for advanced editing techniques.
- Load Checkpoint: Add the "Load Checkpoint" node and select your Stable Diffusion model. This provides both the model and CLIP needed for generation.
- Load Positive and Negative Prompts: Add two "CLIP Text Encode" nodes. Connect the CLIP output from your checkpoint to both. Enter your positive prompt in one and negative prompt in the other.
- Create Empty Latent Image: This node defines your base resolution. For Hires Fix, start with a smaller base like 512x512 or 640x640. The upscale step will increase this to your target resolution.
- First KSampler (Base Generation): Add a KSampler node and connect:
- Model from checkpoint to model input
- Positive prompt to positive input
- Negative prompt to negative input
- Empty latent image to latent image input
Set this KSampler to generate your base image with 20-30 steps.
- VAE Decode (Optional Preview): Connect the first KSampler output to a VAE Decode node if you want to preview the base image. This helps verify the composition before upscaling.
- Latent Upscale Node: Add the "Latent Upscale" node (sometimes called "ImageScale" or "Upscale Latent"). Connect the output from your first KSampler to the latent input. Set your target width and height here - this is your final output resolution.
- Second KSampler (Hires Pass): Add another KSampler for the detail pass. Connect the upscaled latent output to this KSampler's latent image input. Use the same model, positive prompt, and negative prompt as the first KSampler.
- Final VAE Decode: Connect the second KSampler output to your final VAE Decode node to convert the latent back to a viewable image.
- Save Image: Connect the VAE decode output to a "Save Image" node to output your final high-resolution result.
Important: Always use the same seed for both KSamplers when you want consistent results. If you want variation in the detail pass, you can slightly modify the seed, but drastic changes will create inconsistent output.
Hires Fix Parameters Explained
Understanding these parameters makes the difference between blurry and sharp output. After testing hundreds of combinations, here are my recommendations.
| Parameter | Description | Recommended | Effect |
|---|---|---|---|
| Upscale Factor | How much to enlarge the latent | 1.5x - 2.0x | Higher = more detail increase |
| Base Width/Height | Starting resolution | 512-768 | Larger base = longer generation |
| Target Width/Height | Final output resolution | 1024-1536 | Determined by upscale factor |
| Hires Denoising | How much to change during detail pass | 0.3-0.5 | Lower = more consistent, Higher = more detail |
| Hires Steps | Sampling steps for detail pass | 15-25 | More steps = finer detail, longer time |
| Upscaler Model | Model used for latent upscaling | 4x-UltraSharp or nearest-exact | Affects quality and style of upscale |
Denoising Strength Deep Dive
Denoising strength during the Hires pass is the most critical parameter to understand. This controls how much the second KSampler can modify the upscaled latent.
Set denoising too low (0.1-0.2) and the upscaling won't add meaningful detail. The image will look like a simple enlargement of the base. Set it too high (0.7+) and you'll lose coherence with the base generation. The image might change completely rather than enhance.
I've found 0.35-0.45 to be the sweet spot for most use cases. Portraits benefit from slightly lower denoising (0.3-0.35) to preserve facial features. Landscapes and abstract art can use higher denoising (0.4-0.5) for more dramatic detail enhancement.
Choosing the Right Upscaler Model
The upscaler model you choose significantly impacts your final output quality. After testing dozens of options across different image types, here's what I've learned.
| Model | Best For | Strengths | Weaknesses |
|---|---|---|---|
| 4x-UltraSharp | Photography, portraits | Excellent detail retention, sharp results | Can oversharpen some art styles |
| Real-ESRGAN | General purpose | Balanced across all content types | Not specialized for any particular style |
| ESRGAN Anime | Anime, illustrations | Preserves line art, enhances cel shading | Not ideal for photorealistic content |
| Nearest-Exact | Latent upscaling (built-in) | Fast, preserves latent structure | Less detail enhancement than models |
| Bicubic/Lanczos | Quick previews | Very fast, minimal VRAM | Lowest quality output |
For my work, I use 4x-UltraSharp for 80% of projects. It produces consistently sharp results without introducing obvious artifacts. The only time I switch away is for anime-style content, where ESRGAN Anime handles line art much better.
VRAM Optimization Tips
Running Hires Fix with limited VRAM can be frustrating. I've spent countless hours optimizing workflows to run on 8GB GPUs. Here's what works.
VRAM Usage by Resolution
~8GB VRAM
~12GB VRAM
~16GB+ VRAM
Memory-Saving Techniques
- Use tiled VAE decoding: Some VAE models support tiled decoding, which processes the image in chunks rather than all at once. This can reduce peak VRAM usage by 30-40%.
- Reduce batch size: If you're generating multiple images, reduce to batch size 1. Each additional image multiplies VRAM requirements.
- Lower base resolution: Start at 512x512 instead of 768x768. The final output will still be high resolution after upscaling.
- Use fp16 precision: Enable half-precision mode in ComfyUI settings if available. This cuts VRAM usage roughly in half with minimal quality loss.
- Clear cache between generations: ComfyUI can cache tensors in memory. Manually clearing or restarting between large jobs can help.
If you're still struggling with memory issues, our guide on fixing low VRAM memory errors covers additional techniques specifically for Stable Diffusion-based tools.
SDXL Hires Fix Considerations
Working with SDXL models requires some adjustments to the Hires Fix approach. After extensive testing with SDXL workflows, I've found several key differences from SD 1.5 models.
SDXL's native resolution is 1024x1024, compared to 512x512 for SD 1.5. This means your base resolution should be higher when using SDXL with Hires Fix. I typically start at 768x768 or 896x896, then upscale to 1536x1536 or beyond.
The denoising strength for SDXL Hires Fix should generally be lower than for SD 1.5. I use 0.25-0.35 for SDXL compared to 0.35-0.45 for SD 1.5. SDXL already generates more detail at base resolution, so it needs less enhancement during the Hires pass.
Pro Tip: SDXL models include a refiner model that can be used instead of traditional Hires Fix. The refiner runs at the target resolution and adds fine details. For more SDXL-specific workflows, see our ComfyUI SDXL anime guide.
Troubleshooting Common Issues
Despite following all the steps correctly, things can still go wrong. I've encountered every issue below multiple times. Here's how to fix them.
| Problem | Cause | Solution |
|---|---|---|
| Blurry output | Denoising too low or upscaler issue | Increase Hires denoising to 0.4-0.5, try different upscaler model |
| Output changed completely | Denoising too high | Reduce Hires denoising to 0.25-0.35 |
| VRAM out of memory error | Resolution too high for GPU | Lower target resolution or use VRAM optimization techniques |
| Checkerboard pattern | Upscaler model issue | Switch to nearest-exact or different upscaler model |
| Doubled features/objects | Second KSampler added new elements | Lower Hires denoising, reduce Hires steps |
| Workflow not loading | Missing custom nodes | Install required nodes via ComfyUI Manager |
Frequently Asked Questions
What is Hires Fix in ComfyUI?
Hires Fix is a ComfyUI technique for high-resolution image generation that upscales images in latent space during the generation process, producing superior detail preservation compared to traditional post-generation upscaling methods.
How does Hires Fix improve image quality?
Hires Fix improves quality by upscaling in latent space before final detail passes, allowing the diffusion model to add appropriate details at the target resolution rather than simply enlarging pixels after generation completes.
What are the best Hires Fix settings for ComfyUI?
For most use cases, start with base resolution of 512-768, upscale factor of 1.5-2x, Hires denoising of 0.35-0.45, and 15-25 Hires steps. Adjust denoising lower for portraits (0.3-0.35) and higher for landscapes (0.4-0.5).
What upscaler models work with ComfyUI Hires Fix?
Popular upscaler models include 4x-UltraSharp for photography and portraits, Real-ESRGAN for general use, ESRGAN Anime for illustrations, and the built-in nearest-exact for fast latent upscaling with minimal VRAM usage.
Why is my Hires Fix output blurry?
Blurry output usually means your Hires denoising is too low (below 0.3) or you're using an inappropriate upscaler model. Increase denoising to 0.4-0.5 and try switching to 4x-UltraSharp or Real-ESRGAN.
How much VRAM do I need for Hires Fix?
Minimum 8GB VRAM for 512x768 to 1024x1536 upscaling, 12GB recommended for 768x768 to 1536x1536. For larger resolutions like 2048x2048, you'll need 16GB+ VRAM or memory optimization techniques.
Can I use Hires Fix with SDXL models?
Yes, Hires Fix works with SDXL but requires adjustments. Start with higher base resolution (768x768+) and use lower denoising strength (0.25-0.35) since SDXL already generates more detail at native resolution compared to SD 1.5 models.
Final Recommendations
Mastering Hires Fix in ComfyUI takes practice, but the results are worth the effort. The workflow I've shared has been refined through hundreds of generations across multiple projects. Start with the recommended settings, then adjust based on your specific use case.
Remember that Hires Fix is fundamentally about letting the diffusion model add details at your target resolution. The denoising strength parameter controls how much freedom the model has during this process. Lower values preserve the base image structure, while higher values allow more dramatic detail enhancement.
For print work, I always use Hires Fix over traditional upscaling. The difference in quality at 300 DPI is significant. For web content and social media, standard upscaling might suffice, but Hires Fix still produces noticeably better results.
The best way to learn is to experiment. Try different upscaler models, adjust denoising in small increments, and compare results. Save workflows that work well for specific types of content. Over time, you'll develop intuition for what settings work best for your style.
