General
PromptBeginner5 minmarkdown
Untitled Skill
170
DevOps and IT Ops automation - CI/CD, monitoring, incident management, and infrastructure workflows
Loading actions...
Main instructions and any bundled files for this skill.
TypeScript and ESLint rules that MUST be followed when creating, modifying, or reviewing any file under apps/frontend/, including .ts, .tsx, .js, and .jsx files. Also apply when discussing frontend li...
risks
Automate DevOps workflows including CI/CD pipelines, monitoring, incident management, and infrastructure operations. Based on n8n's IT Ops workflow templates.
This skill covers:
workflow: "GitHub CI/CD Notifications"
triggers:
- github_push
- github_pull_request
- github_workflow_run
on_push:
action:
- trigger_ci: if_main_branch
- notify_slack:
channel: "#deployments"
message: |
📦 *New Push to {branch}*
Commit: `{commit_sha_short}`
Author: {author}
Message: {commit_message}
[View Diff]({compare_url})
on_pr_opened:
action:
- notify_slack:
channel: "#code-review"
message: |
🔀 *New Pull Request*
Title: {pr_title}
Author: {author}
Branch: {head} → {base}
[Review PR]({pr_url})
- assign_reviewers: based_on_codeowners
- run_ci_checks
on_workflow_complete:
action:
- notify_slack:
message: |
{status_emoji} *Build {status}*
Workflow: {workflow_name}
Branch: {branch}
Duration: {duration}
{if_failed: [View Logs]({logs_url})}
deployment_pipeline:
stages:
build:
trigger: push_to_main
steps:
- checkout_code
- install_dependencies
- run_tests
- build_artifact
- push_to_registry
staging:
trigger: build_success
steps:
- deploy_to_staging
- run_integration_tests
- notify_qa
production:
trigger: manual_approval
steps:
- create_backup
- deploy_to_production
- run_smoke_tests
- notify_team
rollback:
trigger: deployment_failed OR manual
steps:
- revert_to_previous
- notify_team
- create_incident
alert_routing:
sources:
- prometheus
- datadog
- cloudwatch
- new_relic
severity_levels:
critical:
response_time: 5_minutes
channels: [pagerduty, slack_urgent, sms]
escalation: immediate
high:
response_time: 15_minutes
channels: [slack_alerts, email]
escalation: after_15_minutes
medium:
response_time: 1_hour
channels: [slack_alerts]
low:
response_time: 24_hours
channels: [slack_logging]
routing_rules:
- if: service == "payments"
team: payments_oncall
severity_boost: +1
- if: service == "auth"
team: security_oncall
- default:
team: platform_oncall
alert_templates:
infrastructure:
cpu_high:
title: "🔥 High CPU Usage"
body: |
Server: {host}
CPU: {cpu_percent}%
Duration: {duration}
Threshold: {threshold}%
[View Dashboard]({grafana_url})
memory_critical:
title: "💾 Critical Memory"
body: |
Server: {host}
Memory: {memory_percent}%
Available: {available_mb}MB
[SSH to Server]({ssh_link})
disk_full:
title: "💿 Disk Space Critical"
body: |
Server: {host}
Disk: {disk_percent}%
Available: {available_gb}GB
Suggestion: Clean logs or expand volume
application:
error_spike:
title: "📈 Error Rate Spike"
body: |
Service: {service}
Error Rate: {error_rate}%
Normal: {baseline}%
Top Errors:
{top_errors}
latency_high:
title: "🐢 High Latency"
body: |
Service: {service}
P99 Latency: {p99_ms}ms
Threshold: {threshold_ms}ms
incident_workflow:
detection:
sources: [monitoring, user_report, automated_check]
triage:
auto_severity:
- if: affects_payments
severity: critical
- if: affects_auth
severity: critical
- if: affects_api AND error_rate > 10%
severity: high
response:
critical:
- create_incident_channel: "#inc-{timestamp}"
- page_oncall: immediately
- notify_stakeholders: [engineering_lead, product]
- start_war_room: zoom_link
- create_status_page: incident
high:
- create_incident_channel
- notify_oncall: slack
- create_ticket: jira
communication:
internal:
frequency: every_30_minutes
channel: incident_channel
template: |
📊 *Incident Update*
Status: {status}
Impact: {impact}
Next update: {next_update_time}
Current actions:
{action_items}
external:
channel: status_page
template: customer_facing_update
resolution:
steps:
- confirm_resolution
- update_status_page: resolved
- notify_stakeholders
- schedule_postmortem
- close_incident_channel: after_24h
postmortem_template:
sections:
summary:
- incident_title
- duration
- severity
- impact
timeline:
format: |
| Time | Event |
|------|-------|
| {time} | {event} |
root_cause:
- what_happened
- why_it_happened
- contributing_factors
impact:
- users_affected
- revenue_impact
- sla_breach
resolution:
- how_it_was_fixed
- time_to_detect
- time_to_resolve
action_items:
format: |
| Action | Owner | Due Date | Status |
|--------|-------|----------|--------|
lessons_learned:
- what_went_well
- what_went_poorly
- lucky_breaks
provisioning_workflow:
trigger: jira_ticket OR slack_request
steps:
1. validate_request:
check: [budget_approval, security_review]
2. create_infrastructure:
terraform:
- vpc
- security_groups
- ec2_instances
- load_balancer
3. configure_server:
ansible:
- base_configuration
- security_hardening
- monitoring_agent
- application_setup
4. validate:
- health_check
- security_scan
- performance_baseline
5. notify:
slack: "✅ Server {hostname} is ready"
include: [ssh_access, dashboard_link]
maintenance_automation:
tasks:
certificate_renewal:
schedule: "30 days before expiry"
action:
- request_new_cert: letsencrypt
- deploy_cert
- verify_ssl
- notify: if_failure
security_patching:
schedule: "weekly"
action:
- check_updates
- if_critical: immediate_patch
- else: schedule_maintenance_window
log_rotation:
schedule: "daily"
action:
- rotate_logs
- compress_old
- upload_to_s3
- delete_local: older_than_7_days
backup_verification:
schedule: "weekly"
action:
- restore_to_test_env
- run_integrity_checks
- report_status
kubernetes_automation:
deployment:
trigger: docker_image_pushed
steps:
- update_manifest: with_new_image_tag
- apply_to_staging
- run_tests
- if_success: apply_to_production
scaling:
trigger: metric_threshold
rules:
- if: cpu > 80%
action: scale_up
max_replicas: 10
- if: cpu < 20%
action: scale_down
min_replicas: 2
rollback:
trigger: health_check_failed
action:
- kubectl_rollout_undo
- notify_team
- create_incident
Request: "Set up deployment notifications for GitHub Actions"
Output:
# GitHub Actions Deployment Notifications
## n8n Workflow
```yaml
trigger: GitHub Webhook
events: [workflow_run]
Build Started:
🚀 *Deployment Started*
Branch: main
Commit: abc1234
Author: @developer
Triggered by: Push
[View Workflow](https://github.com/...)
Build Success:
✅ *Deployment Successful*
Environment: Production
Duration: 3m 42s
Version: v1.2.3
Changes:
• Feature X
• Bug fix Y
[View Deployment](https://app.example.com)
Build Failed:
❌ *Deployment Failed*
Stage: Test
Error: npm test failed
[View Logs](https://github.com/...)
[Retry](https://github.com/...)
channel: "#deployments"
mention_on_failure: "@oncall"
thread_replies: true
---
*DevOps Automation Skill - Part of Claude Office Skills*