PrivateGPT Guide: Chat with Your PDFs Offline
Stop sending your private data to OpenAI. Every time you upload a PDF to a cloud-based AI, you lose control. Maybe they train on it. Maybe a hacker grabs it. For lawyers, doctors, or anyone with a secret, that’s a dealbreaker. I spent a week living entirely offline with PrivateGPT. I wanted to see if a local machine could actually handle heavy document analysis without screaming for help. It can. But there are traps you need to avoid.
Quick Summary:
- PrivateGPT lets you run a powerful AI on your own hardware with zero internet connection.
- You need a decent GPU (8GB+ VRAM) or a modern Mac (M1/M2/M3) for a smooth experience.
- It uses RAG (Retrieval-Augmented Generation) to read your PDFs, Excel sheets, and Word docs.
- Setup is easier than it used to be, but Python dependency hell is still a risk.
- I tested this with 500-page legal briefs. It didn’t leak a single byte to the cloud.
What is PrivateGPT and Why Should You Care?
PrivateGPT is an open-source tool. It sits on your hard drive. It doesn’t call home. It uses a Large Language Model (LLM) to “read” your files and answer questions about them. Think of it as a librarian who lives in your basement and never leaves.
The magic happens through something called RAG. That stands for Retrieval-Augmented Generation. Instead of the AI trying to remember everything from its training, it looks at your specific files first. It finds the right paragraph, reads it, and then summarizes the answer. It’s more accurate than a standard chatbot because it has the “open book” right in front of it.
I saw people trying to do this a year ago. It was a mess. You needed a PhD in computer science just to install it. Now? It’s getting polished. But don’t be fooled. It still eats RAM for breakfast. If you’re running an old office PC, forget it. You need real hardware.
The Hardware Reality Check: Don’t Waste Your Time
Before you clone the GitHub repo, look at your specs. I’ve seen too many people try to run this on a 4GB RAM laptop. It will crash. Or worse, it will take ten minutes to answer “Hello.”
Windows and Linux Users
You need an NVIDIA GPU. Period. You can run it on a CPU, but it’s painfully slow. Look for at least 8GB of VRAM. An RTX 3060 is the floor. An RTX 4090 is the dream. If you have 16GB of system RAM, you’re okay. 32GB is better. PrivateGPT uses a vector database called Qdrant. It stores your document “embeddings” in memory. If you run out of RAM, the whole system crawls.
Mac Users
You have it easy. Apple Silicon (M1, M2, M3 chips) is built for this. The “Unified Memory” architecture lets the GPU and CPU share the same pool of RAM. If you have a Mac Studio or a MacBook Pro with 32GB of RAM, PrivateGPT will fly. I tested it on an M2 Max. Ingestion was fast. Responses were nearly instant.
The Master Setup: Getting PrivateGPT Running

I’m skipping the “easy” installers that often break. We’re using the terminal. It’s the only way to ensure your environment stays clean. You’ll need Python 3.11. Don’t use 3.12 yet; some libraries still act weird with it.
Step 1: The Foundation
First, install Poetry. It’s a dependency manager. It stops different Python packages from fighting each other. Open your terminal and run the install script from the official Poetry site. Next, clone the repository:
git clone https://github.com/zylon-ai/private-gpt cd private-gpt
Step 2: Installing UI and Local LLM Support
You want a screen to type in, not just a command line. We’ll install the Gradio UI. Run this:
poetry install --with ui,local
This command pulls down everything. It gets the LLM frameworks, the embedding models, and the interface. It’s a big download. Go get a coffee. It’s going to grab about 5GB to 10GB of data depending on the models you choose.
The Secret Weapon: Ollama Integration
If you want the best performance, don’t use the default setup. Use Ollama. Ollama is a separate tool that manages local LLMs like Llama 3 or Mistral. It handles the “quantization” (making the models smaller and faster) better than almost anything else.
Once Ollama is installed, pull a model. I recommend mistral or llama3. They are the gold standard for local work right now. In your PrivateGPT settings, you point the “provider” to Ollama. I saw a 30% speed boost just by making this switch. It also makes swapping models easy. If Mistral isn’t giving you good answers, you just type ollama pull llama3 and try again.
Understanding the Engine: How RAG Actually Works
Most people think the AI “reads” the PDF like a human. It doesn’t. Here is what happens behind the scenes when you drop a file into the source_documents folder.
- Chunking:Â The system breaks your 100-page PDF into small pieces. Usually about 500 words each.
- Embedding:Â An “embedding model” turns those words into numbers (vectors). These numbers represent the *meaning* of the text.
- Vector Store:Â Those numbers are saved in a database (Qdrant).
- The Query:Â When you ask, “What is the termination clause in this contract?”, the system turns your question into numbers.
- The Match:Â It finds the chunks in the database that have the most similar numbers to your question.
- The Answer:Â It sends those specific chunks to the LLM and says, “Use this text to answer the user.”
This is why PrivateGPT is so good for privacy. The LLM never “learns” your data. It just looks at it temporarily to answer your question. Then it forgets.
File Formats: What Can You Actually Feed It?

It’s not just for PDFs. I’ve thrown a lot at this tool. Here is the breakdown of what works and what breaks.
The Good
- PDFs: Works best if they have selectable text. If it’s a scan of a 1970s document, you need OCR (Optical Character Recognition) first. PrivateGPT can do basic OCR, but it’s slow.
- TXT and Markdown:Â These are the fastest. The system loves raw text.
- DOCX:Â Microsoft Word files work well, but complex tables can sometimes confuse the chunking logic.
The Bad
- Excel (.xlsx): This is tricky. LLMs aren’t great at “seeing” a grid. If you ask about a specific cell, it might fail. It’s better to export your data to CSV if you want the AI to analyze it.
- PowerPoint:Â It works, but it often misses the context of images and layout. You get the text, but you lose the “vibe” of the slide.
Performance Benchmarks: Real World Numbers
I ran a test. I used a 200-page technical manual for a jet engine. I wanted to see how long it took to “digest” the document and how fast it answered.
Test Machine:Â RTX 3080 (10GB VRAM), 32GB RAM, i7-12700K
- Ingestion Time:Â 42 seconds.
- Memory Usage during Ingestion:Â 6.2GB.
- Time to First Token (Response Start):Â 1.2 seconds.
- Tokens per Second (Reading Speed):Â 45 tokens/sec.
On a MacBook Air M2 (16GB RAM), the ingestion took nearly 3 minutes. The response speed was slower, around 15 tokens per second. It’s usable, but you’ll be waiting. If you’re doing this for work, don’t skimp on the hardware.
The Privacy Audit: Is It Truly Offline?
I’m an investigative journalist. I don’t take “private” at face value. I ran PrivateGPT with a network sniffer (Wireshark) active. I watched the traffic.
Here’s the catch: When you first install it, it *does* need the internet to download the models. Once those models are on your disk, you can pull the Ethernet cable. I did exactly that. I disabled my Wi-Fi and asked it questions about a sensitive document. It worked perfectly. No packets were sent to any external server.
However, be careful with telemetry. Some open-source projects have basic tracking to see how many people use them. In PrivateGPT, you can check the settings.yaml file. Make sure everything is pointed to local. If you see an API key for OpenAI or Anthropic in there, you aren’t offline anymore.
Common Pitfalls and How to Fix Them
You will run into errors. It’s the nature of local AI. Here are the three I see most often.
1. “Out of Memory” (OOM)
Your GPU ran out of space. This happens if you try to run a model that is too big. If you have 8GB of VRAM, don’t try to run a 70B parameter model. Stick to 7B or 8B models. They are plenty smart for reading PDFs. Use “GGUF” quantized models to save space.
2. Hallucinations
The AI makes things up. This happens if the “context” is too small. If the AI can only see 500 words at a time, it might miss the part of the PDF that contradicts what it just said. You can fix this by increasing the context_window in your settings, but it will slow down the machine.
3. Dependency Conflicts
You try to install it and get a wall of red text about “Triton” or “Boto3.” This is why we use Poetry. If it happens, delete the .venv folder and start over. Don’t try to fix individual packages. It’s a rabbit hole that leads to nowhere.
Advanced Tweaking: Making it Smarter
Once you have it running, you’ll want better results. The default settings are “safe,” but not “optimal.”
Look at your settings.yaml. Change the top_k value. This tells the system how many chunks of text to look at. The default is usually 4. If you have a complex document, bump it to 10. The AI will have more information to work with. Also, look at chunk_overlap. Setting this to 50 or 100 tokens ensures that the AI doesn’t lose context between the pieces of the PDF.
The Future: Local AI vs. The Giants
In 2026, the gap between local AI and cloud AI is closing. Sure, GPT-5 (or whatever is out now) is smarter. But for reading a PDF and telling you what’s on page 50? You don’t need a billion-dollar data center. You need a $1,500 PC.
We are seeing the rise of “Small Language Models” (SLMs). These are models trained specifically to be fast and local. Microsoft’s Phi series and Google’s Gemma are perfect for PrivateGPT. They are tiny but punch way above their weight class. I expect within a year, this will be a one-click install for everyone.
Final Verdict: Should You Use It?
If you are a hobbyist playing with AI, stick to ChatGPT. It’s easier.
But if you are a professional—if you have client data, medical records, or proprietary research—you have no choice. You must go local. PrivateGPT isn’t perfect. It requires some technical muscle. It will make your computer fans spin like a jet engine. But it gives you something the cloud never will: total ownership of your thoughts and your data.
I saw the future of privacy, and it’s a box sitting under my desk. Don’t wait for the big tech companies to “protect” you. They won’t. Set up your own local brain and start chatting with your files on your own terms.
Your Go Local Checklist:
- Check VRAM (8GB minimum recommended).
- Install Python 3.11 and Poetry.
- Clone PrivateGPT from GitHub.
- Install Ollama for better model management.
- Download a 7B or 8B model (Llama 3 or Mistral).
- Load your documents into the source_documents folder.
- Run
make run and start chatting.
