SillyTavern and KoboldCPP Full Starter Guide: Local AI Roleplay Setup

What Are SillyTavern and KoboldCPP?

SillyTavern is a user-friendly interface for AI-powered roleplay and chat, while KoboldCPP is a high-performance backend that runs AI models locally on your computer. Together, they enable completely private, unlimited AI character interactions without internet connectivity or monthly fees.

Local AI roleplay has exploded in popularity during 2026, and for good reason. After spending months testing various setups, I’ve found that the SillyTavern and KoboldCPP combination offers the best balance of features, performance, and ease of use for beginners and advanced users alike.

This setup gives you complete privacy. Everything runs on your computer with no data sent to cloud services. You also get unlimited usage without API limits or monthly subscriptions, and freedom from content censorship found in commercial AI services.

In this guide, I’ll walk you through everything you need to know to set up your own local AI roleplay system. By the end, you’ll have a working installation and your first character ready to chat.

Quick Answer: You can run this setup on any modern computer with 8GB+ RAM. A GPU helps but isn’t required. The entire installation takes about 15-30 minutes.

Prerequisites & System Requirements

Quick Summary: You need a computer with 8GB+ RAM (16GB recommended), 50GB+ free storage, and a 64-bit operating system. NVIDIA GPUs work best, but AMD and CPU-only setups are supported.

Before diving in, let’s make sure your system can handle local AI inference. I’ve tested this on various hardware configurations, and here’s what I’ve learned about realistic requirements.

Minimum vs Recommended Specifications

Component	Minimum	Recommended	Optimal
RAM	8GB	16GB	32GB+
VRAM	None (CPU)	8GB	12GB+
Storage	30GB free	50GB free	100GB+ SSD
OS	Windows 10/11	Windows 11/Linux	Linux (Ubuntu)

Hardware Tier Recommendations

Basic Tier (8GB RAM, No GPU)

Run 7B parameter models at Q4 quantization. Expect 3-8 tokens per second. Perfect for casual roleplay and experimentation.

Mid Tier (16GB RAM, 8GB VRAM)

Run 13B models comfortably at Q4. Expect 15-30 tokens per second. Great balance for most users.

High Tier (32GB RAM, 16GB+ VRAM)

Run 30B+ models at higher quantization. Expect 30+ tokens per second. For power users and quality-focused roleplay.

Operating System Requirements

This setup works on Windows 10/11 and most Linux distributions. macOS is supported but with limitations (mainly M1/M2 Macs have specific requirements). I primarily use Windows 11 for testing, and these instructions reflect that experience.

For Windows users, you’ll need the Visual C++ Redistributable. Most gaming PCs already have this, but if you see errors about missing DLLs, download it from Microsoft’s official website.

Installing KoboldCPP (The Backend)

Quick Summary: Download KoboldCPP from GitHub, extract the files, and run the executable. The default settings work for most users. Configure GPU acceleration in the launcher window.

KoboldCPP is the backend that actually runs the AI model. Think of it as the engine that powers everything. I’ve installed KoboldCPP dozens of times across different computers, and the process is straightforward once you know what to look for.

Step 1: Download KoboldCPP

Visit the official GitHub repository: Go to github.com/LostRuins/koboldcpp
Navigate to Releases: Click on “Releases” in the right sidebar or look for the latest release link
Download the correct version: Choose koboldcpp_novid_win64.zip for Windows or koboldcpp_linux.zip for Linux
Extract the files: Create a folder named “KoboldCPP” and extract the contents there

Pro Tip: Create a dedicated “AI” folder in your Documents or on a drive with plenty of space. I keep mine at D:\AI\KoboldCPP to keep things organized.

Step 2: Launch KoboldCPP for the First Time

Run koboldcpp.exe (Windows) or ./koboldcpp (Linux). You’ll see a launcher window with several options. Don’t worry about most settings yet.

For your first run, use these default settings:

Threads: Auto (or leave at default)
GPU Layers: Auto detect
Context Size: 2048 (good starting point)
Batch Size: 512

Step 3: Configure GPU Acceleration (If Available)

If you have an NVIDIA GPU, KoboldCPP should detect it automatically. Look for “CUDA” or “cuBLAS” in the backend selection. For AMD GPUs, you’ll need ROCm installed, which is more complex.

GPU Layers: This setting controls how many model layers run on your GPU versus CPU. More GPU layers = faster response times. Start with “-1” (all layers on GPU) and reduce if you get out-of-memory errors.

Step 4: Start the Server

Click “Launch” to start KoboldCPP. A command window will open showing server status. Look for the line that says something like:

Server listening on http://127.0.0.1:5001

This means KoboldCPP is running and ready to accept connections. Keep this window open in the background.

Installing SillyTavern (The Frontend)

Quick Summary: Download SillyTavern from GitHub, extract to a folder, and run Start.bat (Windows) or the shell script (Linux/Mac). SillyTavern runs in your browser at localhost:8000.

SillyTavern is the user interface where you’ll create characters, manage chats, and interact with your AI models. I find it much more polished and feature-rich than alternatives.

Step 1: Download SillyTavern

Visit the official repository: Go to github.com/SillyTavern/SillyTavern
Download the latest release: Click “Code” then “Download ZIP” or use the direct download link from Releases
Extract to a folder: I recommend D:\AI\SillyTavern or similar location

Step 2: Launch SillyTavern

On Windows, simply double-click Start.bat. This batch file handles everything automatically. On Linux or Mac, run the appropriate shell script.

A command window will open showing initialization progress. Once complete, it will display something like:

SillyTavern is listening on: http://localhost:8000

Step 3: Open SillyTavern in Your Browser

Your default browser should open automatically to the SillyTavern interface. If not, navigate to http://localhost:8000 manually.

Important: SillyTavern runs locally in your browser. No internet connection is required for basic operation, though you’ll need it to download models initially.

Connecting SillyTavern to KoboldCPP

Quick Summary: In SillyTavern, go to Settings > Connect > API, select “KoboldCPP” as the backend, enter URL “http://127.0.0.1:5001”, and click Connect. The status should change to “Connected”.

Now comes the crucial part: connecting the two applications. I’ve seen many beginners get stuck here, but the process is simple once you understand it.

Step 1: Open Connection Settings

In SillyTavern, click the “Connect” button in the left sidebar (three vertical dots icon)
Select “KoboldCPP” from the dropdown under “API Source”
Enter the KoboldCPP URL: http://127.0.0.1:5001
Click “Connect”

Step 2: Verify the Connection

After a moment, the status should change from “Disconnected” to “Connected” with a green indicator. If you see an error, check that KoboldCPP is still running in the background.

Once connected, you’ll see additional options appear in SillyTavern’s interface, including the ability to load models and adjust generation settings.

Not Connecting? Make sure both windows are open. Check that no firewall is blocking localhost connections. Try restarting KoboldCPP and SillyTavern in that order.

Step 3: Test with a Simple Prompt

Before loading a full model, you can test the connection by sending a simple message. Create a temporary character or use the default one, then type “Hello” in the chat box.

If everything is working, you’ll see a loading indicator followed by a response. If you get an error, double-check your connection settings.

Downloading and Loading AI Models

Quick Summary: Download GGUF format models from Hugging Face. For beginners, I recommend a 7B model at Q4 quantization. Place models in KoboldCPP’s “models” folder, then load through the interface.

You need an AI model to actually generate responses. KoboldCPP supports the GGUF format, which has become the standard for local LLMs.

Understanding GGUF Models

GGUF Format: A file format that contains compressed AI models optimized for local inference. The “Q4” or “Q5” in filenames refers to quantization level – lower numbers mean smaller files but slightly reduced quality.

Where to Download Models

The primary source for GGUF models is Hugging Face. Here are some popular starting points:

TheBloke: Largest collection of quantized models, always reliable
Metharme: Excellent for roleplay and creative writing
Pygmalion: Specifically trained for character roleplay
Mythalion: Great balance of creativity and coherence

Recommended Models by Hardware

Hardware	Recommended Model	File Size	Performance
8GB RAM, No GPU	7B Q4	4-5 GB	3-8 t/s
16GB RAM, 8GB VRAM	13B Q4	8-9 GB	15-30 t/s
32GB RAM, 16GB+ VRAM	30B Q4 or 13B Q5	16-20 GB	30+ t/s

Loading Your First Model

Download a model: Click the download icon on Hugging Face for your chosen model
Place in KoboldCPP’s models folder: Usually found at the same level as the executable
In KoboldCPP, click “Load Model” and navigate to your downloaded file
Wait for loading: Large models take 1-3 minutes to load

Once loaded, the model is ready to generate responses. You’ll see the model name and some stats in the KoboldCPP window.

Creating Your First Character

Quick Summary: Click the character icon in SillyTavern, then “Create New Character.” Fill in name, description, personality, and first message. Example dialogues help the model understand the character’s voice.

This is where the fun begins. Creating compelling characters is both an art and a science. I’ve created hundreds of characters over the past year, and here’s what works best.

Anatomy of a Character Card

Every character card in SillyTavern contains several key components:

Name: The character’s name
Description: Physical appearance, background, and context
Personality: Traits, behaviors, and mannerisms
First Message: The opening greeting that starts conversations
Example Dialogue: Sample conversations that demonstrate voice
Scenario: Optional setting or context for the roleplay

Step-by-Step Character Creation

Open Character Management: Click the character icon (looks like a person) in the left sidebar
Click “Create New Character”: This opens the character editor
Fill in the Basics: Enter a name and brief description
Write the Personality: Be specific about traits and behaviors
Create the First Message: This sets the tone for all interactions
Add Example Dialogue: This is crucial for consistent characterization
Save and Start Chatting: Click save, then select your new character

Character Card Template

Here’s a template I use when creating new characters. Feel free to modify it:

Name: [Character Name]

Description: [Physical appearance, age, occupation, setting]

Personality: [3-5 key personality traits with brief explanations]

First Message: [Greeting that establishes the character and situation]

Scenario: [Optional: world context and current situation]

Example Dialogue:
<START>
{{user}}: [Example user input]
{{char}}: [How your character responds]
<START>
{{user}}: [Another example]
{{char}}: [Another characteristic response]

Expert Tip: The example dialogue section is the most important part of a character card. I’ve found that 3-5 good examples dramatically improve character consistency compared to lengthy descriptions.

Importing Pre-made Characters

You don’t have to create everything from scratch. Sites like Chub.ai host thousands of community-created character cards. Simply download a PNG card file and drag it into SillyTavern to import.

Using SillyTavern Features

Quick Summary: SillyTavern offers presets, lorebooks, group chats, and advanced formatting. The “Presets” menu contains pre-configured generation settings optimized for different use cases.

Once you have a character loaded and connected to KoboldCPP, you can start exploring SillyTavern’s features. I’ve spent countless hours testing these, and here are the highlights.

Basic Chat Controls

Type in the input box: Press Enter or click Send to generate responses
Regenerate: Click the regeneration icon if you don’t like the response
Edit messages: Click the pencil icon to edit previous messages
Continue: Generate more text from the last response

Generation Settings (Presets)

SillyTavern comes with presets that configure how the AI generates text. Key parameters include:

Temperature: Controls randomness. Lower (0.7-1.0) for focused responses, higher (1.0-1.5) for creative variety.

Top P: Limits word choices to the most likely options. Keep between 0.8-0.95 for balanced responses.

Repetition Penalty: Reduces repetitive text. Increase if the AI keeps saying the same things.

Lorebooks and World Info

For complex roleplays with established settings, lorebooks provide context about characters, locations, and items. I’ve used lorebooks for fantasy worlds with detailed magic systems and historical contexts.

To create a lorebook, go to the “World Info” tab and add entries. Each entry has triggers (keywords) that activate it when mentioned in conversation.

Saving and Exporting Chats

SillyTavern automatically saves your conversations. You can export chats as text files or JSON backups. I recommend creating regular backups, especially for long-running roleplays you care about.

Troubleshooting Common Issues

Quick Summary: Most issues relate to memory errors, connection failures, or slow generation. Solutions include using smaller models, checking firewall settings, and adjusting GPU layers.

After helping dozens of users set up their systems, I’ve identified the most common problems and their solutions.

Connection Issues

“SillyTavern won’t connect to KoboldCPP”

Solutions: Verify both are running. Check the URL matches KoboldCPP’s displayed address. Disable firewall temporarily to test. Try restarting both applications in order (KoboldCPP first, then SillyTavern).

Memory Errors

“Out of memory” or “CUDA out of memory”

Solutions: Reduce GPU layers or use a smaller model. Close other applications. Increase your system’s page file. For CPU inference, reduce context size and batch size in KoboldCPP settings.

Slow Generation

Responses taking too long

Solutions: Enable GPU acceleration if available. Use a smaller model or lower quantization. Reduce context size. Close background applications. Consider upgrading RAM for better performance.

Model Loading Failures

“Failed to load model” errors

Solutions: Verify the file is a valid GGUF model. Check for file corruption (redownload if needed). Ensure you have enough free RAM. Try a different quantization level.

AMD GPU Issues

AMD GPUs require ROCm instead of CUDA. This setup is more complex and may not work on all cards. Check the KoboldCPP wiki for specific ROCm installation instructions. Some AMD GPUs work better with CPU inference.

Advanced Tips and Optimization

Performance Secret: After testing 50+ model configurations, I’ve found that 13B models at Q4 quantization offer the best balance of quality and speed for most users with 16GB+ RAM.

Once you have the basics working, here are some advanced techniques to improve your experience.

Parameter Tuning for Roleplay

For roleplay specifically, I recommend these starting settings:

Temperature: 1.0-1.2 (balanced creativity)
Top P: 0.9-0.95
Top K: 40-100
Repetition Penalty: 1.1-1.2
Context Size: 2048-4096 (if RAM allows)

Context Management

Long conversations can degrade quality as the context fills. Use these strategies:

Summarize periodically: Add summaries of past events to keep context fresh
Use lorebooks: Offload persistent information to world info
Split long sessions: Start new chats for major story arcs

Group Chat Setup

SillyTavern supports multiple characters in a single conversation. I’ve run group RPs with 5+ characters successfully. Enable this in the character menu and add multiple characters to the chat.

Automated Scripts and Extensions

For power users, SillyTavern supports JavaScript-based extensions that can automate tasks, modify outputs, and add custom behaviors. This requires programming knowledge but enables virtually unlimited customization.

Frequently Asked Questions

What is SillyTavern?

SillyTavern is a user-friendly interface for running AI-powered text conversations and roleplay scenarios locally on your computer. It connects to various AI backends to enable character-based chat, story writing, and interactive fiction without needing internet connectivity or cloud services.

What is KoboldCPP?

KoboldCPP is a high-performance backend that runs AI language models locally on your computer. It is optimized for both CPU and GPU inference and supports the popular GGUF model format, making it one of the fastest backends for local AI roleplay applications.

How do I install SillyTavern?

Download SillyTavern from GitHub, extract the files to a folder, and run Start.bat on Windows or the shell script on Linux/Mac. The application runs in your browser at localhost:8000. No installation is required – it is portable and runs from the extracted folder.

How do I install KoboldCPP?

Download KoboldCPP from GitHub Releases, extract the files, and run the executable. On first launch, configure your GPU settings if applicable, then click Launch to start the server. KoboldCPP will begin listening on http://127.0.0.1:5001 by default.

How do I connect SillyTavern to KoboldCPP?

In SillyTavern, click the Connect button in the sidebar, select KoboldCPP from the dropdown, enter the URL http://127.0.0.1:5001, and click Connect. Ensure KoboldCPP is running in the background. The status should change to Connected when successful.

What models work with KoboldCPP?

KoboldCPP supports GGUF format models, which is the standard for local LLM inference. Popular options include models from TheBloke, Metharme, Pygmalion, and Mythalion. Look for models labeled as chat or roleplay-optimized for the best experience.

Can I run SillyTavern without a GPU?

Yes, SillyTavern and KoboldCPP work with CPU-only inference. You will need more system RAM to compensate – at least 16GB is recommended for 7B models. Generation speed will be slower at 3-8 tokens per second compared to GPU acceleration.

How much RAM do I need for KoboldCPP?

For 7B models at Q4 quantization, you need 8GB minimum (12GB recommended). For 13B models, 16GB is minimum. For 30B+ models, 32GB+ is required. The model file size plus your operating system and other applications must fit in available RAM.

What are the best models for roleplay?

For roleplay, I recommend Metharme 7B or 13B for uncensored creative roleplay, Mythalion 13B for detailed storytelling, and Pygmalion models specifically trained on character roleplay data. Start with 7B models if you have limited hardware.

How do I create a character in SillyTavern?

Click the character icon in the sidebar, then Create New Character. Fill in the name, description, personality traits, first message, and add example dialogues showing how the character speaks. Example dialogue is crucial for consistent characterization.

Why won’t SillyTavern connect to KoboldCPP?

Verify both applications are running. Check that the URL matches KoboldCPP’s displayed address (default http://127.0.0.1:5001). Try disabling your firewall temporarily. Restart both applications, launching KoboldCPP first before SillyTavern.

How do I fix out of memory errors?

Reduce GPU layers in KoboldCPP settings, use a smaller model or lower quantization level, increase your system’s page file, or close other applications. For CPU inference, reduce the context size and batch size parameters.

Can I use AMD GPU with KoboldCPP?

Yes, but it requires ROCm instead of CUDA. This setup is more complex and may not work on all AMD cards. Check the KoboldCPP GitHub wiki for specific ROCm installation instructions. Some AMD users report better results with CPU-only inference.

What is GGUF format?

GGUF is a file format for compressed AI models optimized for local inference. The format contains quantized models that maintain quality while reducing file size. The Q4, Q5, Q6 in filenames indicate quantization levels – lower numbers mean smaller files but slightly reduced quality.

Next Steps and Resources

Congratulations on setting up your local AI roleplay system. After spending months with this setup, I can confidently say it’s worth the initial effort. The privacy, unlimited usage, and creative freedom are unmatched.

For further learning, I recommend joining the r/SillyTavern subreddit and r/LocalLLaMA community. These forums are invaluable for troubleshooting, model recommendations, and character sharing.

The official documentation at docs.sillytavern.app and the KoboldCPP wiki cover advanced features I couldn’t include in this starter guide.

Happy roleplaying, and may your characters come alive in ways you never expected.