Automated Document Compliance Auditor
    
Loading actions...
Skill content
Main instructions and any bundled files for this skill.
Automated Document Compliance Auditor
A GenAI-powered tool that scans contracts and regulatory filings for missing clauses and suggests remediation using Anthropic's Claude API.
Overview
The Automated Document Compliance Auditor is a Flask-based web application that helps organizations ensure their documents comply with various regulations such as GDPR and HIPAA. It analyzes documents to identify missing clauses and provides AI-powered suggestions for remediation using Anthropic's Claude API.

Key Features
- Document Processing: Extract text from PDF, DOCX, and TXT files
- Rule-based Compliance Checking: Detect missing clauses using regex and keyword patterns
- AI-Powered Suggestions: Generate remediation text using Anthropic's Claude API
- Interactive UI: Real-time highlighting and inline editing with dark mode support
- Domain-specific Compliance: Support for GDPR, HIPAA, and other standards
- Error Handling: Centralized error handling system with user-friendly feedback
- Performance Optimization: Caching, pagination, and background task processing
- Security: Input validation, CSRF protection, and rate limiting
- API Access: RESTful API for programmatic access to all features
- PDF Export: Generate PDF reports for compliance results
Application Screenshots
Homepage/Dashboard
The main landing page showing the application overview and navigation options
Document List View
Browse uploaded documents with filtering and sorting options
Document Upload Interface
Upload new documents for compliance checking
Document Detail View
View document content with compliance issues highlighted
Compliance Check Results
View detailed compliance issues and get AI-powered suggestions
System Architecture
┌─────────────────────────────────────────────────────────────────┐
│ Client Browser │
└───────────────────────────────┬─────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Flask Web Server │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │
│ │ Routes │───▶│ Services │───▶│ Document Parser │ │
│ └─────────────┘ └─────────────┘ └─────────────────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │
│ │ Templates │ │ Rule Engine │ │ PDF Export Service │ │
│ └─────────────┘ └─────────────┘ └─────────────────────┘ │
│ │ │
└───────────────────────────┼─────────────────────────────────────┘
│
┌────────────────┼────────────────┐
│ │ │
▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌───────────────────┐
│ MongoDB │ │ Anthropic API │ │ Cache System │
│ (Document DB) │ │ (Claude LLM) │ │ (Flask-Caching) │
└─────────────────┘ └─────────────────┘ └───────────────────┘
Technology Stack
- Backend: Python with Flask
- Frontend: HTML, CSS, JavaScript with HTMX for interactivity
- Database: MongoDB for document storage
- Text Processing: PyPDF2, python-docx for document parsing
- AI Integration: Anthropic Claude API for generating suggestions
- Styling: Bootstrap 5 for responsive design with dark mode support
- Caching: In-memory caching with Flask-Caching
- Security: Flask-WTF for CSRF protection, input sanitization with Bleach
- API: RESTful API with rate limiting via Flask-Limiter
- PDF Generation: ReportLab for PDF report generation
- Background Processing: APScheduler for handling long-running tasks
Getting Started
Prerequisites
- Python 3.9+
- MongoDB
- Anthropic API key (for AI suggestions)
Installation
- Clone the repository
git clone https://github.com/sylvester-francis/Automated-Document-Compliance-Auditor.git
cd Automated-Document-Compliance-Auditor
- Create and activate a virtual environment
python -m venv venv
# On macOS/Linux
source venv/bin/activate
# On Windows
# venv\Scripts\activate
- Install the dependencies
pip install -r requirements.txt
- Set up MongoDB
Make sure MongoDB is running on your system. You can install it following the official MongoDB installation guide.
- Create the instance directory and .env file
mkdir -p instance
touch instance/.env
Edit the .env file and add the following configuration:
SECRET_KEY=your-secret-key
MONGO_URI=mongodb://localhost:27017/compliance_auditor
ANTHROPIC_API_KEY=your-anthropic-api-key
USE_MOCK_LLM=False # Set to True to use mock LLM service instead of Claude API
API_KEY=your-api-key # For accessing the API endpoints
MAX_CONTENT_LENGTH=10485760 # Maximum file size (10MB)
ALLOWED_EXTENSIONS=pdf,docx,txt # Allowed file extensions
Note: You'll need to obtain an Anthropic API key from Anthropic's website. If you don't have one, you can set
USE_MOCK_LLM=Trueto use the mock LLM service for testing.
- Run the application
python app.py
- Access the application
Open your browser and navigate to http://localhost:5006
Docker Deployment
The application can also be deployed using Docker for easier setup and consistent environments.
Using Docker Compose (Recommended)
- Clone the repository
git clone https://github.com/sylvester-francis/Automated-Document-Compliance-Auditor.git
cd Automated-Document-Compliance-Auditor
- Set your Anthropic API key as an environment variable
export ANTHROPIC_API_KEY=your_anthropic_api_key
# Alternatively, to use the mock LLM service (no API key required)
export USE_MOCK_LLM=True
- Start the application with Docker Compose
docker-compose up -d
- Access the application
Open your browser and navigate to http://localhost:5006
Using Docker without Compose
- Build the Docker image
docker build -t document-compliance-auditor .
- Run the container
docker run -p 5006:5006 \
-e MONGO_URI=your_mongo_uri \
-e ANTHROPIC_API_KEY=your_api_key \
-e SECRET_KEY=your_secret_key \
document-compliance-auditor
Note: When using Docker without Compose, you'll need to set up MongoDB separately and provide the correct connection URI.
Usage
Document Management
-
Upload Documents
- Click the "Upload New Document" button on the documents list page
- Select a file (PDF, DOCX, or TXT) from your computer
- The system will process the document and extract text and metadata
-
Browse Documents
- Use the search bar to find documents by filename or content
- Filter documents by type (PDF, DOCX, TXT) using the dropdown menu
- Sort documents by date, name, or compliance score
- Toggle between ascending and descending order
-
View Document Details
- Click on a document card to view its details
- Navigate between document content, compliance issues, and metadata using the tabs
- Toggle between light and dark mode using the theme switch in the navigation bar
Compliance Checking
-
Run Compliance Check
- Click the "Run Compliance Check" button on the document view page
- The system will analyze the document against selected compliance standards
- View the compliance score and issues found
-
Review Compliance Issues
- Issues are highlighted in the document content
- Click on an issue to see details and suggestions
- Generate AI-powered suggestions using the "Generate Suggestion (Claude)" button
-
Export Compliance Report
- Click the "Export PDF Report" button to generate a PDF report
- The report includes document details, compliance score, issues, and suggestions
API Access
All functionality is also available through the API. See the API Documentation section for details.
Project Structure
Automated-Document-Compliance-Auditor/
├── app/ # Flask application
│ ├── __init__.py # App initialization
│ ├── config.py # Configuration settings
│ ├── extensions.py # Flask extensions
│ ├── models/ # Data models
│ │ ├── __init__.py
│ │ ├── compliance.py # Compliance models
│ │ └── document.py # Document models
│ ├── routes/ # View functions
│ │ ├── __init__.py
│ │ ├── api.py # API endpoints
│ │ ├── compliance.py # Compliance checking routes
│ │ ├── documents.py # Document management routes
│ │ └── main.py # Main routes
│ ├── services/ # Business logic
│ │ ├── __init__.py
│ │ ├── bulk_processor.py # Batch document processing
│ │ ├── document_classifier.py # Document type classification
│ │ ├── document_service.py # Document handling
│ │ ├── extraction_service.py # Text extraction
│ │ ├── llm_service.py # LLM integration with mock support
│ │ ├── pdf_exporter.py # PDF export generation
│ │ ├── rule_engine.py # Compliance rules
│ │ └── seed_service.py # Data seeding
│ ├── static/ # Static assets
│ │ ├── css/ # Stylesheets
│ │ ├── js/ # JavaScript files
│ │ └── img/ # Images
│ ├── templates/ # Jinja2 templates
│ │ ├── base.html # Base template
│ │ ├── index.html # Homepage
│ │ ├── about.html # About page
│ │ ├── compliance/ # Compliance templates
│ │ │ ├── debug.html # Debug page
│ │ │ ├── results.html # Results page
│ │ │ ├── results_partial.html # HTMX partial for results
│ │ │ └── suggestions_partial.html # HTMX partial for suggestions
│ │ ├── components/ # Reusable UI components
│ │ │ └── pagination.html # Pagination component
│ │ ├── documents/ # Document templates
│ │ │ ├── bulk_upload.html # Bulk upload form
│ │ │ ├── list.html # Document list
│ │ │ ├── list_partial.html # HTMX partial for document list
│ │ │ ├── upload.html # Upload form
│ │ │ └── view.html # Document viewer
│ │ └── reports/ # Report templates
│ │ ├── compliance_pdf.html # Compliance report template
│ │ └── document_pdf.html # Document report template
│ └── utils/ # Utility functions
│ ├── __init__.py
│ ├── background_tasks.py # Background task processing
│ ├── cache.py # Caching utilities
│ ├── document_extractor.py # Document extraction utilities
│ ├── error_handler.py # Centralized error handling
│ ├── form_validation.py # Input validation
│ ├── pagination.py # Pagination utilities
│ ├── pdf_export.py # PDF export utilities
│ ├── pdf_utils.py # PDF utility functions
│ ├── rate_limiter.py # API rate limiting
│ ├── security.py # Security utilities
│ └── text_processing.py # Text processing utilities
├── instance/ # Instance-specific files
│ ├── uploads/ # Uploaded documents
│ └── temp/ # Temporary files
├── screenshots/ # Application screenshots
├── static/ # Global static files
│ └── images/ # Image assets
│ └── screenshots/ # Screenshot images for documentation
├── testdocuments/ # Test document files
├── tests/ # Test suite
│ ├── __init__.py
│ ├── conftest.py # Test configuration
│ ├── test_api.py # API tests
│ ├── test_document_service.py # Document service tests
│ ├── test_extraction_service.py # Extraction service tests
│ ├── test_routes.py # Route tests
│ ├── test_rule_engine.py # Rule engine tests
│ └── test_utils.py # Utility tests
├── app.py # Application entry point
├── app.log # Application logs
├── Dockerfile # Docker configuration
├── docker-compose.yml # Docker Compose configuration
├── requirements.txt # Python dependencies
└── README.md # Project documentation
Portfolio Project Notes
This project demonstrates:
- Full-stack development with Python (Flask) and modern frontend techniques (HTMX)
- Integration of NLP techniques and AI technologies
- Document processing and text analysis
- Database design and integration
- User interface design for complex data visualization
Features in Detail
Document Processing
The system extracts text from various document formats (PDF, DOCX, TXT) and splits it into paragraphs for analysis. It uses PyPDF2 for PDF extraction and python-docx for DOCX files, with specialized utilities in the utils module.
Compliance Rules Engine
The rules engine (rule_engine.py) checks documents against predefined compliance rules using:
- Regular expression matching for specific clause patterns
- Keyword detection for important compliance terms
- Severity classification (High, Medium, Low)
AI-Powered Suggestions
When a compliance issue is detected, the system generates remediation suggestions using Anthropic's Claude API (llm_service.py), providing context-appropriate clause examples that would satisfy compliance requirements. A fallback mock service is integrated directly into the LLM service and can be enabled by setting USE_MOCK_LLM=True in your environment variables or .env file.
Interactive User Interface
The interface provides:
- Document uploading and management
- Real-time compliance checking
- Highlighted issues in the document view
- Detailed compliance reports
- Interactive suggestion generation with Claude
- Debug tools for testing API integration
Recent Improvements
-
Error Handling:
- Implemented a centralized error handling system with custom
AppErrorclass - Added decorators for route error handling with user-friendly feedback
- Implemented a centralized error handling system with custom
-
User Experience:
- Added toast notification system for improved user feedback
- Implemented dark mode support for better accessibility
- Enhanced mobile responsiveness for all device sizes
-
Performance Optimization:
- Added document caching to improve retrieval speed
- Implemented pagination for document lists to handle large datasets
- Added background task processing for long-running operations
-
Security Enhancements:
- Implemented input validation and sanitization to prevent XSS attacks
- Added CSRF protection for all forms
- Implemented rate limiting to prevent abuse
- Added API key authentication for API endpoints
-
Feature Additions:
- Created a RESTful API for programmatic access to all features
- Added PDF export functionality for compliance reports
- Implemented advanced search and filtering for documents
- Added health check endpoints for monitoring
-
Code Quality:
- Fixed metadata loading and compliance score display issues
- Consolidated LLM services by integrating mock functionality
- Added configuration options for toggling features
- Improved error handling and debugging information
Development
Code Quality
This project uses ruff and flake8 for code quality checks. To run these checks locally:
- Run ruff:
# Navigate to your project directory
cd /Users/sylvester/Desktop/Automated-Document-Compliance-Auditor
# Activate virtual environment
source venv/bin/activate
# Run ruff on the entire codebase
ruff check .
# To automatically fix some issues
ruff check --fix .
- Run flake8:
# Run flake8 on the entire codebase
flake8 .
Known Issues
- PDF export occasionally fails with large documents
- Some HIPAA rules need refinement for better accuracy
- Mobile view has alignment issues on small screens
- MongoDB connection pooling needs optimization
CI/CD Pipeline
This project includes a GitHub Actions workflow for continuous integration and deployment. The workflow is defined in .github/workflows/ci-cd.yml and includes the following stages:
- Lint: Runs ruff and flake8 to check code quality
- Test: Runs pytest with coverage reporting
- Build: Builds and pushes a Docker image to DockerHub (on main/master branch)
- Deploy: Deploys the application to production (on main/master branch)
The CI/CD pipeline uses GitHub Container Registry (GHCR) to store Docker images, which is free for public repositories. The pipeline automatically handles authentication using GitHub Actions' built-in secrets.
If you're using the deployment step, you'll need to set up the following GitHub secrets:
DEPLOY_USER: SSH username for deployment (if using SSH deployment)DEPLOY_HOST: SSH host for deployment (if using SSH deployment)
Future Enhancements
- Support for additional document formats (HTML, XML, etc.)
- More compliance standards (SOX, CCPA, etc.)
- Machine learning model for document classification
- Custom compliance rules with a rule builder interface
- Analytics dashboard with compliance trends
- Integration with document management systems
- Multi-language support
- Collaborative review features
- Automated scheduled compliance checks
- Advanced prompt engineering for more precise suggestions
API Documentation
The application provides a RESTful API for programmatic access to all features. API endpoints are secured with API key authentication and rate limiting.
Authentication
All API requests require an API key to be included in the request headers:
X-API-Key: your-api-key
Generating an API Key
To generate and configure an API key for the application:
- Create a secure random API key:
python -c "import secrets; print(secrets.token_hex(32))"
- Add the API key to your
.envfile in theinstancedirectory:
# Create the instance directory if it doesn't exist
mkdir -p instance
# Add the API key to your .env file
echo "API_KEY=your_generated_key_here" >> instance/.env
- Restart the application to load the new API key from the environment.
For security best practices:
- Generate a unique API key for each client or service
- Rotate API keys periodically
- Never share API keys in public repositories or insecure channels
Endpoints
GET /api/documents- List all documents with pagination and filteringGET /api/documents/{document_id}- Get a specific document by IDGET /api/documents/{document_id}/compliance- Get compliance information for a documentPOST /api/documents/{document_id}/check- Check compliance for a documentGET /api/documents/{document_id}/export/pdf- Export a document as PDFGET /api/documents/{document_id}/compliance/export/pdf- Export compliance report as PDFGET /api/rules- List all compliance rulesGET /api/stats- Get application statistics
Related Skills
Frontend Typescript Linting.mdc
TypeScript and ESLint rules that MUST be followed when creating, modifying, or reviewing any file under apps/frontend/, including .ts, .tsx, .js, and .jsx files. Also apply when discussing frontend li...
2. Apply Deepthink Protocol (reason about dependencies
risks