ChatGPT Parser

Convert ChatGPT conversation exports to RAG-optimized markdown files.

PublishedJan 14, 2026

Loading actions...

5 minBeginnerpromptSingle file

Skill content

Main instructions and any bundled files for this skill.

markdown

ChatGPT Parser

Convert ChatGPT conversation exports to RAG-optimized markdown files.

Features

Parse ChatGPT JSON exports to markdown with YAML frontmatter
Handle all conversation branches (regenerated responses)
Support all content types: text, images, code execution, web searches, reasoning
Cross-platform filename sanitization
Date-based directory organization (YYYY_MM/)
Comprehensive summary index

Installation

Using Conda

# Create environment
conda env create -f environment.yml

# Activate environment
conda activate chatgpt-parser

# Install package
pip install -e .

Using pip

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -e .

Usage

You can run the parser in two ways:

Method 1: Direct Python Execution (No Installation)

Recommended if you just want to use it quickly:

# Basic usage
python -m src.cli conversations.json output/

# With options
python -m src.cli conversations.json output/ --verbose
python -m src.cli conversations.json output/ --overwrite
python -m src.cli conversations.json output/ --quiet

# Help
python -m src.cli --help

Method 2: Install as Command (Optional)

If you want a chatgpt-parser command:

# Install the package first
pip install -e .

# Now you can use the command directly
chatgpt-parser conversations.json output/

# With options
chatgpt-parser conversations.json output/ --verbose
chatgpt-parser conversations.json output/ --overwrite

# Help
chatgpt-parser --help

Note: The chatgpt-parser command is only available after running pip install -e .

Output Structure

output/
├── 2025_01/
│   ├── 2025_01_15_Conversation-Title-path-1.md
│   ├── 2025_01_15_Conversation-Title-path-2.md
│   └── 2025_01_18_Another-Conversation.md
├── 2025_02/
│   └── ...
├── assets/
│   ├── 2025_01/
│   │   ├── image1.png
│   │   └── image2.png
│   └── 2025_02/
│       └── ...
└── summary.json

Markdown Format

Each conversation is exported as markdown with YAML frontmatter:

---
conversation_id: "uuid-abc"
title: "Conversation Title"
created: "2025-01-15T14:30:22Z"
updated: "2025-01-16T10:15:33Z"
model: "gpt-4o"
branch: 1
total_branches: 2
message_count: 42
has_images: true
has_code_execution: true
has_web_search: false
has_reasoning: false
---

# Conversation Title

## User
*2025-01-15 14:30:22*

Message content...

## Assistant
*2025-01-15 14:32:15* | Model: gpt-4o

Response content...

RAG Integration

Using LangChain

from langchain.document_loaders import DirectoryLoader, UnstructuredMarkdownLoader
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings

# Load markdown files
loader = DirectoryLoader(
    "output/",
    glob="**/*.md",
    loader_cls=UnstructuredMarkdownLoader
)
documents = loader.load()

# Create vector store
embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(documents, embeddings)

# Query
results = vectorstore.similarity_search("your query here")

Using LlamaIndex

from llama_index import SimpleDirectoryReader, VectorStoreIndex

# Load documents
documents = SimpleDirectoryReader("output/", recursive=True).load_data()

# Create index
index = VectorStoreIndex.from_documents(documents)

# Query
query_engine = index.as_query_engine()
response = query_engine.query("your query here")

CLI Reference

# Direct Python execution (no installation required)
python -m src.cli INPUT_FILE OUTPUT_DIR [OPTIONS]

# OR after installing (pip install -e .)
chatgpt-parser INPUT_FILE OUTPUT_DIR [OPTIONS]

Arguments:
  INPUT_FILE          Path to conversations.json export file
  OUTPUT_DIR          Directory for markdown output

Options:
  -v, --verbose       Enable verbose (DEBUG) logging
  -q, --quiet         Suppress informational output (errors only)
  --overwrite         Overwrite existing markdown files
  --help              Show help message and exit
  --version           Show version and exit

Troubleshooting

Issue: "Invalid JSON at line X"

The conversations.json file is corrupted or malformed. Check the file encoding (should be UTF-8) and ensure it's valid JSON.

Issue: "No valid conversations found"

The JSON file may be empty or all conversations failed validation. Run with --verbose to see detailed error messages.

Issue: "Filename too long"

Conversation titles are automatically truncated to 140 characters. If you still see this error, it may be a platform-specific issue with deep directory nesting.

Issue: "Permission denied" when writing files

Check that you have write permissions in the output directory. On Windows, avoid writing to system directories.

Development

Running Tests

# Run all tests
pytest

# Run with coverage
pytest --cov=src --cov-report=html

# Run specific test file
pytest tests/contract/test_cli_interface.py

Project Structure

src/
├── models/          # Data models (Conversation, MessageNode, Content)
├── parsers/         # JSON parsing, tree traversal, content extraction
├── writers/         # Markdown generation, asset copying, summary
├── utils/           # Utilities (logging, filename sanitization)
└── cli.py          # CLI entry point

tests/
├── fixtures/        # Sample conversation data
├── contract/        # CLI and output format tests
├── integration/     # Full pipeline tests
└── unit/           # Component unit tests

License

MIT

Contents

Prompt Playground

1 Variable

Fill Variables

OPTIONS

Preview

# ChatGPT Parser

Convert ChatGPT conversation exports to RAG-optimized markdown files.

## Features

- Parse ChatGPT JSON exports to markdown with YAML frontmatter
- Handle all conversation branches (regenerated responses)
- Support all content types: text, images, code execution, web searches, reasoning
- Cross-platform filename sanitization
- Date-based directory organization (YYYY_MM/)
- Comprehensive summary index

## Installation

### Using Conda

```bash
# Create environment
conda env create -f environment.yml

# Activate environment
conda activate chatgpt-parser

# Install package
pip install -e .
```

### Using pip

```bash
# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -e .
```

## Usage

You can run the parser in two ways:

### Method 1: Direct Python Execution (No Installation)

**Recommended if you just want to use it quickly:**

```bash
# Basic usage
python -m src.cli conversations.json output/

# With options
python -m src.cli conversations.json output/ --verbose
python -m src.cli conversations.json output/ --overwrite
python -m src.cli conversations.json output/ --quiet

# Help
python -m src.cli --help
```

### Method 2: Install as Command (Optional)

**If you want a `chatgpt-parser` command:**

```bash
# Install the package first
pip install -e .

# Now you can use the command directly
chatgpt-parser conversations.json output/

# With options
chatgpt-parser conversations.json output/ --verbose
chatgpt-parser conversations.json output/ --overwrite

# Help
chatgpt-parser --help
```

**Note**: The `chatgpt-parser` command is only available after running `pip install -e .`

## Output Structure

```
output/
├── 2025_01/
│   ├── 2025_01_15_Conversation-Title-path-1.md
│   ├── 2025_01_15_Conversation-Title-path-2.md
│   └── 2025_01_18_Another-Conversation.md
├── 2025_02/
│   └── ...
├── assets/
│   ├── 2025_01/
│   │   ├── image1.png
│   │   └── image2.png
│   └── 2025_02/
│       └── ...
└── summary.json
```

## Markdown Format

Each conversation is exported as markdown with YAML frontmatter:

```markdown
---
conversation_id: "uuid-abc"
title: "Conversation Title"
created: "2025-01-15T14:30:22Z"
updated: "2025-01-16T10:15:33Z"
model: "gpt-4o"
branch: 1
total_branches: 2
message_count: 42
has_images: true
has_code_execution: true
has_web_search: false
has_reasoning: false
---

# Conversation Title

## User
*2025-01-15 14:30:22*

Message content...

## Assistant
*2025-01-15 14:32:15* | Model: gpt-4o

Response content...
```

## RAG Integration

### Using LangChain

```python
from langchain.document_loaders import DirectoryLoader, UnstructuredMarkdownLoader
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings

# Load markdown files
loader = DirectoryLoader(
    "output/",
    glob="**/*.md",
    loader_cls=UnstructuredMarkdownLoader
)
documents = loader.load()

# Create vector store
embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(documents, embeddings)

# Query
results = vectorstore.similarity_search("your query here")
```

### Using LlamaIndex

```python
from llama_index import SimpleDirectoryReader, VectorStoreIndex

# Load documents
documents = SimpleDirectoryReader("output/", recursive=True).load_data()

# Create index
index = VectorStoreIndex.from_documents(documents)

# Query
query_engine = index.as_query_engine()
response = query_engine.query("your query here")
```

## CLI Reference

```
# Direct Python execution (no installation required)
python -m src.cli INPUT_FILE OUTPUT_DIR [OPTIONS]

# OR after installing (pip install -e .)
chatgpt-parser INPUT_FILE OUTPUT_DIR [OPTIONS]

Arguments:
  INPUT_FILE          Path to conversations.json export file
  OUTPUT_DIR          Directory for markdown output

Options:
  -v, --verbose       Enable verbose (DEBUG) logging
  -q, --quiet         Suppress informational output (errors only)
  --overwrite         Overwrite existing markdown files
  --help              Show help message and exit
  --version           Show version and exit
```

## Troubleshooting

### Issue: "Invalid JSON at line X"
The conversations.json file is corrupted or malformed. Check the file encoding (should be UTF-8) and ensure it's valid JSON.

### Issue: "No valid conversations found"
The JSON file may be empty or all conversations failed validation. Run with `--verbose` to see detailed error messages.

### Issue: "Filename too long"
Conversation titles are automatically truncated to 140 characters. If you still see this error, it may be a platform-specific issue with deep directory nesting.

### Issue: "Permission denied" when writing files
Check that you have write permissions in the output directory. On Windows, avoid writing to system directories.

## Development

### Running Tests

```bash
# Run all tests
pytest

# Run with coverage
pytest --cov=src --cov-report=html

# Run specific test file
pytest tests/contract/test_cli_interface.py
```

### Project Structure

```
src/
├── models/          # Data models (Conversation, MessageNode, Content)
├── parsers/         # JSON parsing, tree traversal, content extraction
├── writers/         # Markdown generation, asset copying, summary
├── utils/           # Utilities (logging, filename sanitization)
└── cli.py          # CLI entry point

tests/
├── fixtures/        # Sample conversation data
├── contract/        # CLI and output format tests
├── integration/     # Full pipeline tests
└── unit/           # Component unit tests
```

## License

MIT

View Original Source

Related Skills

General

PromptBeginner5 minmarkdown

Untitled Skill

193

Jan 12, 2026

General

PromptBeginner5 minmarkdown

Frontend Typescript Linting.mdc

TypeScript and ESLint rules that MUST be followed when creating, modifying, or reviewing any file under apps/frontend/, including .ts, .tsx, .js, and .jsx files. Also apply when discussing frontend li...

160

Feb 15, 2026

General

PromptBeginner5 minmarkdown

2. Apply Deepthink Protocol (reason about dependencies

risks

125

Jan 15, 2026