ChatGPT Parser
Convert ChatGPT conversation exports to RAG-optimized markdown files.
Loading actions...
Skill content
Main instructions and any bundled files for this skill.
ChatGPT Parser
Convert ChatGPT conversation exports to RAG-optimized markdown files.
Features
- Parse ChatGPT JSON exports to markdown with YAML frontmatter
- Handle all conversation branches (regenerated responses)
- Support all content types: text, images, code execution, web searches, reasoning
- Cross-platform filename sanitization
- Date-based directory organization (YYYY_MM/)
- Comprehensive summary index
Installation
Using Conda
# Create environment
conda env create -f environment.yml
# Activate environment
conda activate chatgpt-parser
# Install package
pip install -e .
Using pip
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -e .
Usage
You can run the parser in two ways:
Method 1: Direct Python Execution (No Installation)
Recommended if you just want to use it quickly:
# Basic usage
python -m src.cli conversations.json output/
# With options
python -m src.cli conversations.json output/ --verbose
python -m src.cli conversations.json output/ --overwrite
python -m src.cli conversations.json output/ --quiet
# Help
python -m src.cli --help
Method 2: Install as Command (Optional)
If you want a chatgpt-parser command:
# Install the package first
pip install -e .
# Now you can use the command directly
chatgpt-parser conversations.json output/
# With options
chatgpt-parser conversations.json output/ --verbose
chatgpt-parser conversations.json output/ --overwrite
# Help
chatgpt-parser --help
Note: The chatgpt-parser command is only available after running pip install -e .
Output Structure
output/
├── 2025_01/
│ ├── 2025_01_15_Conversation-Title-path-1.md
│ ├── 2025_01_15_Conversation-Title-path-2.md
│ └── 2025_01_18_Another-Conversation.md
├── 2025_02/
│ └── ...
├── assets/
│ ├── 2025_01/
│ │ ├── image1.png
│ │ └── image2.png
│ └── 2025_02/
│ └── ...
└── summary.json
Markdown Format
Each conversation is exported as markdown with YAML frontmatter:
---
conversation_id: "uuid-abc"
title: "Conversation Title"
created: "2025-01-15T14:30:22Z"
updated: "2025-01-16T10:15:33Z"
model: "gpt-4o"
branch: 1
total_branches: 2
message_count: 42
has_images: true
has_code_execution: true
has_web_search: false
has_reasoning: false
---
# Conversation Title
## User
*2025-01-15 14:30:22*
Message content...
## Assistant
*2025-01-15 14:32:15* | Model: gpt-4o
Response content...
RAG Integration
Using LangChain
from langchain.document_loaders import DirectoryLoader, UnstructuredMarkdownLoader
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings
# Load markdown files
loader = DirectoryLoader(
"output/",
glob="**/*.md",
loader_cls=UnstructuredMarkdownLoader
)
documents = loader.load()
# Create vector store
embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(documents, embeddings)
# Query
results = vectorstore.similarity_search("your query here")
Using LlamaIndex
from llama_index import SimpleDirectoryReader, VectorStoreIndex
# Load documents
documents = SimpleDirectoryReader("output/", recursive=True).load_data()
# Create index
index = VectorStoreIndex.from_documents(documents)
# Query
query_engine = index.as_query_engine()
response = query_engine.query("your query here")
CLI Reference
# Direct Python execution (no installation required)
python -m src.cli INPUT_FILE OUTPUT_DIR [OPTIONS]
# OR after installing (pip install -e .)
chatgpt-parser INPUT_FILE OUTPUT_DIR [OPTIONS]
Arguments:
INPUT_FILE Path to conversations.json export file
OUTPUT_DIR Directory for markdown output
Options:
-v, --verbose Enable verbose (DEBUG) logging
-q, --quiet Suppress informational output (errors only)
--overwrite Overwrite existing markdown files
--help Show help message and exit
--version Show version and exit
Troubleshooting
Issue: "Invalid JSON at line X"
The conversations.json file is corrupted or malformed. Check the file encoding (should be UTF-8) and ensure it's valid JSON.
Issue: "No valid conversations found"
The JSON file may be empty or all conversations failed validation. Run with --verbose to see detailed error messages.
Issue: "Filename too long"
Conversation titles are automatically truncated to 140 characters. If you still see this error, it may be a platform-specific issue with deep directory nesting.
Issue: "Permission denied" when writing files
Check that you have write permissions in the output directory. On Windows, avoid writing to system directories.
Development
Running Tests
# Run all tests
pytest
# Run with coverage
pytest --cov=src --cov-report=html
# Run specific test file
pytest tests/contract/test_cli_interface.py
Project Structure
src/
├── models/ # Data models (Conversation, MessageNode, Content)
├── parsers/ # JSON parsing, tree traversal, content extraction
├── writers/ # Markdown generation, asset copying, summary
├── utils/ # Utilities (logging, filename sanitization)
└── cli.py # CLI entry point
tests/
├── fixtures/ # Sample conversation data
├── contract/ # CLI and output format tests
├── integration/ # Full pipeline tests
└── unit/ # Component unit tests
License
MIT
Prompt Playground
1 VariableFill Variables
Preview
# ChatGPT Parser
Convert ChatGPT conversation exports to RAG-optimized markdown files.
## Features
- Parse ChatGPT JSON exports to markdown with YAML frontmatter
- Handle all conversation branches (regenerated responses)
- Support all content types: text, images, code execution, web searches, reasoning
- Cross-platform filename sanitization
- Date-based directory organization (YYYY_MM/)
- Comprehensive summary index
## Installation
### Using Conda
```bash
# Create environment
conda env create -f environment.yml
# Activate environment
conda activate chatgpt-parser
# Install package
pip install -e .
```
### Using pip
```bash
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -e .
```
## Usage
You can run the parser in two ways:
### Method 1: Direct Python Execution (No Installation)
**Recommended if you just want to use it quickly:**
```bash
# Basic usage
python -m src.cli conversations.json output/
# With options
python -m src.cli conversations.json output/ --verbose
python -m src.cli conversations.json output/ --overwrite
python -m src.cli conversations.json output/ --quiet
# Help
python -m src.cli --help
```
### Method 2: Install as Command (Optional)
**If you want a `chatgpt-parser` command:**
```bash
# Install the package first
pip install -e .
# Now you can use the command directly
chatgpt-parser conversations.json output/
# With options
chatgpt-parser conversations.json output/ --verbose
chatgpt-parser conversations.json output/ --overwrite
# Help
chatgpt-parser --help
```
**Note**: The `chatgpt-parser` command is only available after running `pip install -e .`
## Output Structure
```
output/
├── 2025_01/
│ ├── 2025_01_15_Conversation-Title-path-1.md
│ ├── 2025_01_15_Conversation-Title-path-2.md
│ └── 2025_01_18_Another-Conversation.md
├── 2025_02/
│ └── ...
├── assets/
│ ├── 2025_01/
│ │ ├── image1.png
│ │ └── image2.png
│ └── 2025_02/
│ └── ...
└── summary.json
```
## Markdown Format
Each conversation is exported as markdown with YAML frontmatter:
```markdown
---
conversation_id: "uuid-abc"
title: "Conversation Title"
created: "2025-01-15T14:30:22Z"
updated: "2025-01-16T10:15:33Z"
model: "gpt-4o"
branch: 1
total_branches: 2
message_count: 42
has_images: true
has_code_execution: true
has_web_search: false
has_reasoning: false
---
# Conversation Title
## User
*2025-01-15 14:30:22*
Message content...
## Assistant
*2025-01-15 14:32:15* | Model: gpt-4o
Response content...
```
## RAG Integration
### Using LangChain
```python
from langchain.document_loaders import DirectoryLoader, UnstructuredMarkdownLoader
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings
# Load markdown files
loader = DirectoryLoader(
"output/",
glob="**/*.md",
loader_cls=UnstructuredMarkdownLoader
)
documents = loader.load()
# Create vector store
embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(documents, embeddings)
# Query
results = vectorstore.similarity_search("your query here")
```
### Using LlamaIndex
```python
from llama_index import SimpleDirectoryReader, VectorStoreIndex
# Load documents
documents = SimpleDirectoryReader("output/", recursive=True).load_data()
# Create index
index = VectorStoreIndex.from_documents(documents)
# Query
query_engine = index.as_query_engine()
response = query_engine.query("your query here")
```
## CLI Reference
```
# Direct Python execution (no installation required)
python -m src.cli INPUT_FILE OUTPUT_DIR [OPTIONS]
# OR after installing (pip install -e .)
chatgpt-parser INPUT_FILE OUTPUT_DIR [OPTIONS]
Arguments:
INPUT_FILE Path to conversations.json export file
OUTPUT_DIR Directory for markdown output
Options:
-v, --verbose Enable verbose (DEBUG) logging
-q, --quiet Suppress informational output (errors only)
--overwrite Overwrite existing markdown files
--help Show help message and exit
--version Show version and exit
```
## Troubleshooting
### Issue: "Invalid JSON at line X"
The conversations.json file is corrupted or malformed. Check the file encoding (should be UTF-8) and ensure it's valid JSON.
### Issue: "No valid conversations found"
The JSON file may be empty or all conversations failed validation. Run with `--verbose` to see detailed error messages.
### Issue: "Filename too long"
Conversation titles are automatically truncated to 140 characters. If you still see this error, it may be a platform-specific issue with deep directory nesting.
### Issue: "Permission denied" when writing files
Check that you have write permissions in the output directory. On Windows, avoid writing to system directories.
## Development
### Running Tests
```bash
# Run all tests
pytest
# Run with coverage
pytest --cov=src --cov-report=html
# Run specific test file
pytest tests/contract/test_cli_interface.py
```
### Project Structure
```
src/
├── models/ # Data models (Conversation, MessageNode, Content)
├── parsers/ # JSON parsing, tree traversal, content extraction
├── writers/ # Markdown generation, asset copying, summary
├── utils/ # Utilities (logging, filename sanitization)
└── cli.py # CLI entry point
tests/
├── fixtures/ # Sample conversation data
├── contract/ # CLI and output format tests
├── integration/ # Full pipeline tests
└── unit/ # Component unit tests
```
## License
MIT
Related Skills
Frontend Typescript Linting.mdc
TypeScript and ESLint rules that MUST be followed when creating, modifying, or reviewing any file under apps/frontend/, including .ts, .tsx, .js, and .jsx files. Also apply when discussing frontend li...
2. Apply Deepthink Protocol (reason about dependencies
risks