ChatGPT Parser

Convert ChatGPT conversation exports to RAG-optimized markdown files.

Views1
PublishedJan 14, 2026

Loading actions...

5 minBeginnerpromptSingle file

Skill content

Main instructions and any bundled files for this skill.

markdown

ChatGPT Parser

Convert ChatGPT conversation exports to RAG-optimized markdown files.

Features

  • Parse ChatGPT JSON exports to markdown with YAML frontmatter
  • Handle all conversation branches (regenerated responses)
  • Support all content types: text, images, code execution, web searches, reasoning
  • Cross-platform filename sanitization
  • Date-based directory organization (YYYY_MM/)
  • Comprehensive summary index

Installation

Using Conda

# Create environment
conda env create -f environment.yml

# Activate environment
conda activate chatgpt-parser

# Install package
pip install -e .

Using pip

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -e .

Usage

You can run the parser in two ways:

Method 1: Direct Python Execution (No Installation)

Recommended if you just want to use it quickly:

# Basic usage
python -m src.cli conversations.json output/

# With options
python -m src.cli conversations.json output/ --verbose
python -m src.cli conversations.json output/ --overwrite
python -m src.cli conversations.json output/ --quiet

# Help
python -m src.cli --help

Method 2: Install as Command (Optional)

If you want a chatgpt-parser command:

# Install the package first
pip install -e .

# Now you can use the command directly
chatgpt-parser conversations.json output/

# With options
chatgpt-parser conversations.json output/ --verbose
chatgpt-parser conversations.json output/ --overwrite

# Help
chatgpt-parser --help

Note: The chatgpt-parser command is only available after running pip install -e .

Output Structure

output/
├── 2025_01/
│   ├── 2025_01_15_Conversation-Title-path-1.md
│   ├── 2025_01_15_Conversation-Title-path-2.md
│   └── 2025_01_18_Another-Conversation.md
├── 2025_02/
│   └── ...
├── assets/
│   ├── 2025_01/
│   │   ├── image1.png
│   │   └── image2.png
│   └── 2025_02/
│       └── ...
└── summary.json

Markdown Format

Each conversation is exported as markdown with YAML frontmatter:

---
conversation_id: "uuid-abc"
title: "Conversation Title"
created: "2025-01-15T14:30:22Z"
updated: "2025-01-16T10:15:33Z"
model: "gpt-4o"
branch: 1
total_branches: 2
message_count: 42
has_images: true
has_code_execution: true
has_web_search: false
has_reasoning: false
---

# Conversation Title

## User
*2025-01-15 14:30:22*

Message content...

## Assistant
*2025-01-15 14:32:15* | Model: gpt-4o

Response content...

RAG Integration

Using LangChain

from langchain.document_loaders import DirectoryLoader, UnstructuredMarkdownLoader
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings

# Load markdown files
loader = DirectoryLoader(
    "output/",
    glob="**/*.md",
    loader_cls=UnstructuredMarkdownLoader
)
documents = loader.load()

# Create vector store
embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(documents, embeddings)

# Query
results = vectorstore.similarity_search("your query here")

Using LlamaIndex

from llama_index import SimpleDirectoryReader, VectorStoreIndex

# Load documents
documents = SimpleDirectoryReader("output/", recursive=True).load_data()

# Create index
index = VectorStoreIndex.from_documents(documents)

# Query
query_engine = index.as_query_engine()
response = query_engine.query("your query here")

CLI Reference

# Direct Python execution (no installation required)
python -m src.cli INPUT_FILE OUTPUT_DIR [OPTIONS]

# OR after installing (pip install -e .)
chatgpt-parser INPUT_FILE OUTPUT_DIR [OPTIONS]

Arguments:
  INPUT_FILE          Path to conversations.json export file
  OUTPUT_DIR          Directory for markdown output

Options:
  -v, --verbose       Enable verbose (DEBUG) logging
  -q, --quiet         Suppress informational output (errors only)
  --overwrite         Overwrite existing markdown files
  --help              Show help message and exit
  --version           Show version and exit

Troubleshooting

Issue: "Invalid JSON at line X"

The conversations.json file is corrupted or malformed. Check the file encoding (should be UTF-8) and ensure it's valid JSON.

Issue: "No valid conversations found"

The JSON file may be empty or all conversations failed validation. Run with --verbose to see detailed error messages.

Issue: "Filename too long"

Conversation titles are automatically truncated to 140 characters. If you still see this error, it may be a platform-specific issue with deep directory nesting.

Issue: "Permission denied" when writing files

Check that you have write permissions in the output directory. On Windows, avoid writing to system directories.

Development

Running Tests

# Run all tests
pytest

# Run with coverage
pytest --cov=src --cov-report=html

# Run specific test file
pytest tests/contract/test_cli_interface.py

Project Structure

src/
├── models/          # Data models (Conversation, MessageNode, Content)
├── parsers/         # JSON parsing, tree traversal, content extraction
├── writers/         # Markdown generation, asset copying, summary
├── utils/           # Utilities (logging, filename sanitization)
└── cli.py          # CLI entry point

tests/
├── fixtures/        # Sample conversation data
├── contract/        # CLI and output format tests
├── integration/     # Full pipeline tests
└── unit/           # Component unit tests

License

MIT

Prompt Playground

1 Variable

Fill Variables

Preview

# ChatGPT Parser

Convert ChatGPT conversation exports to RAG-optimized markdown files.

## Features

- Parse ChatGPT JSON exports to markdown with YAML frontmatter
- Handle all conversation branches (regenerated responses)
- Support all content types: text, images, code execution, web searches, reasoning
- Cross-platform filename sanitization
- Date-based directory organization (YYYY_MM/)
- Comprehensive summary index

## Installation

### Using Conda

```bash
# Create environment
conda env create -f environment.yml

# Activate environment
conda activate chatgpt-parser

# Install package
pip install -e .
```

### Using pip

```bash
# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -e .
```

## Usage

You can run the parser in two ways:

### Method 1: Direct Python Execution (No Installation)

**Recommended if you just want to use it quickly:**

```bash
# Basic usage
python -m src.cli conversations.json output/

# With options
python -m src.cli conversations.json output/ --verbose
python -m src.cli conversations.json output/ --overwrite
python -m src.cli conversations.json output/ --quiet

# Help
python -m src.cli --help
```

### Method 2: Install as Command (Optional)

**If you want a `chatgpt-parser` command:**

```bash
# Install the package first
pip install -e .

# Now you can use the command directly
chatgpt-parser conversations.json output/

# With options
chatgpt-parser conversations.json output/ --verbose
chatgpt-parser conversations.json output/ --overwrite

# Help
chatgpt-parser --help
```

**Note**: The `chatgpt-parser` command is only available after running `pip install -e .`

## Output Structure

```
output/
├── 2025_01/
│   ├── 2025_01_15_Conversation-Title-path-1.md
│   ├── 2025_01_15_Conversation-Title-path-2.md
│   └── 2025_01_18_Another-Conversation.md
├── 2025_02/
│   └── ...
├── assets/
│   ├── 2025_01/
│   │   ├── image1.png
│   │   └── image2.png
│   └── 2025_02/
│       └── ...
└── summary.json
```

## Markdown Format

Each conversation is exported as markdown with YAML frontmatter:

```markdown
---
conversation_id: "uuid-abc"
title: "Conversation Title"
created: "2025-01-15T14:30:22Z"
updated: "2025-01-16T10:15:33Z"
model: "gpt-4o"
branch: 1
total_branches: 2
message_count: 42
has_images: true
has_code_execution: true
has_web_search: false
has_reasoning: false
---

# Conversation Title

## User
*2025-01-15 14:30:22*

Message content...

## Assistant
*2025-01-15 14:32:15* | Model: gpt-4o

Response content...
```

## RAG Integration

### Using LangChain

```python
from langchain.document_loaders import DirectoryLoader, UnstructuredMarkdownLoader
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings

# Load markdown files
loader = DirectoryLoader(
    "output/",
    glob="**/*.md",
    loader_cls=UnstructuredMarkdownLoader
)
documents = loader.load()

# Create vector store
embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(documents, embeddings)

# Query
results = vectorstore.similarity_search("your query here")
```

### Using LlamaIndex

```python
from llama_index import SimpleDirectoryReader, VectorStoreIndex

# Load documents
documents = SimpleDirectoryReader("output/", recursive=True).load_data()

# Create index
index = VectorStoreIndex.from_documents(documents)

# Query
query_engine = index.as_query_engine()
response = query_engine.query("your query here")
```

## CLI Reference

```
# Direct Python execution (no installation required)
python -m src.cli INPUT_FILE OUTPUT_DIR [OPTIONS]

# OR after installing (pip install -e .)
chatgpt-parser INPUT_FILE OUTPUT_DIR [OPTIONS]

Arguments:
  INPUT_FILE          Path to conversations.json export file
  OUTPUT_DIR          Directory for markdown output

Options:
  -v, --verbose       Enable verbose (DEBUG) logging
  -q, --quiet         Suppress informational output (errors only)
  --overwrite         Overwrite existing markdown files
  --help              Show help message and exit
  --version           Show version and exit
```

## Troubleshooting

### Issue: "Invalid JSON at line X"
The conversations.json file is corrupted or malformed. Check the file encoding (should be UTF-8) and ensure it's valid JSON.

### Issue: "No valid conversations found"
The JSON file may be empty or all conversations failed validation. Run with `--verbose` to see detailed error messages.

### Issue: "Filename too long"
Conversation titles are automatically truncated to 140 characters. If you still see this error, it may be a platform-specific issue with deep directory nesting.

### Issue: "Permission denied" when writing files
Check that you have write permissions in the output directory. On Windows, avoid writing to system directories.

## Development

### Running Tests

```bash
# Run all tests
pytest

# Run with coverage
pytest --cov=src --cov-report=html

# Run specific test file
pytest tests/contract/test_cli_interface.py
```

### Project Structure

```
src/
├── models/          # Data models (Conversation, MessageNode, Content)
├── parsers/         # JSON parsing, tree traversal, content extraction
├── writers/         # Markdown generation, asset copying, summary
├── utils/           # Utilities (logging, filename sanitization)
└── cli.py          # CLI entry point

tests/
├── fixtures/        # Sample conversation data
├── contract/        # CLI and output format tests
├── integration/     # Full pipeline tests
└── unit/           # Component unit tests
```

## License

MIT
Share: