Refly: Transform SOPs Into Agent Superpowers
Refly: Transform SOPs Into Agent Superpowers
Turn boring procedures into bulletproof AI capabilities. Deploy in 3 minutes. Run anywhere.
Most AI agents crash and burn in production. They’re brittle, unpredictable, and built on fragile "vibe-coded" scripts that break the moment reality gets messy. You’ve seen it—your brilliant demo works perfectly until it faces real data, edge cases, or team collaboration. Refly shatters this ceiling by converting standard operating procedures into executable, versioned, and deterministic agent skills. This isn’t another prompt manager. This is infrastructure.
In this deep dive, you’ll discover how Refly’s revolutionary vibe workflow compiler eliminates black-box AI failures, why 3,000+ integrated tools make it the universal agent bridge, and exactly how to deploy your first skill in under five minutes. We’ll unpack real code examples, explore four production-ready use cases, and reveal why teams are abandoning fragile scripts for Refly’s governed skill registry. Ready to transform your enterprise SOPs into AI superpowers? Let’s fly.
What Is Refly? The Open-Source Agent Skills Revolution
Refly is the world’s first open-source agent skills builder, engineered by the team at refly-ai to solve the production reliability crisis plaguing AI agents. Unlike traditional frameworks that treat skills as disposable prompts, Refly codifies business logic into durable infrastructure—versioned, atomic, and executable across any runtime.
The platform emerged from a critical insight: as AI ecosystems mature with Claude Code, Cursor, and MCP (Model Context Protocol), the bottleneck isn’t LLM capability—it’s the absence of standardized, reliable actions. Developers waste countless hours hard-coding tools, debugging hallucinations, and patching brittle integrations. Refly eliminates this waste with its Model-Native DSL that compiles natural language intent into high-performance skills in under three minutes.
At its core, Refly is a visual IDE meets compiler meets registry. You describe workflows in plain English ("vibe workflow"), and Refly transforms them into deterministic agent capabilities that can be exported as APIs, webhooks, or native tools. The Refly Skills registry serves as the official executable skill marketplace, offering instant execution, reusable infrastructure, and community-powered collaboration.
Why it’s trending now: enterprises are shifting from experimental AI pilots to production-grade agent deployments. They need governance, reliability, and cross-platform portability—precisely what Refly delivers. With 3,000+ native integrations and full MCP compatibility, Refly positions itself as the universal translation layer between enterprise systems and next-generation agentic runtimes.
Key Features That Make Refly Essential
🎯 Construct with Vibe (Copilot-Led Builder)
Intent-driven construction redefines how you build agent logic. Describe your business process once in natural language, and Refly’s Model-Native DSL compiles your intent into a deterministic, reusable skill. This isn’t simple prompt templating—it’s a streamlined domain-specific language optimized for LLM consumption, ensuring fast execution and dramatically lower token costs. The result? You transition from a static SOP document to a production-ready agent skill in under three minutes.
⚡ Execute with Control (Intervenable Runtime)
Break the dreaded "black box" of AI execution. Refly’s stateful runtime introduces deterministic guarantees that traditional agents lack. You can pause, audit, and re-steer agent logic mid-run, ensuring 100% operational compliance. This intervenable design enforces strict business rules, minimizes hallucinations, and provides robust failure recovery—critical for finance, healthcare, and compliance-heavy industries.
🚀 Ship to Production (Unified Agent Stack)
Universal delivery means zero lock-in. Export skills as REST APIs for Lovable, webhooks for Slack or Lark/Feishu, or native tools for Claude Code and Cursor. Refly unifies MCP integrations, third-party tools, and custom models into a single execution layer. The platform’s stable scheduling engine runs workflows reliably on cron-like schedules, making it ideal for automated reporting, data synchronization, and periodic audits.
🏛️ Govern as Assets (Skill Registry)
Transform fragile scripts into governed, shared infrastructure. The central skill registry securely manages versioning, access control, and audit logs. Teams collaborate natively with Git-like semantics—fork, branch, and merge skills with full traceability. This turns individual hero scripts into organizational assets that scale.
🔌 3,000+ Native Tool Integrations
Seamless connectivity with Stripe, Slack, Salesforce, GitHub, and thousands more. The provider catalog (see provider-catalog.json) offers pre-configured connectors that eliminate boilerplate authentication and request formatting. This breadth means you integrate enterprise systems without writing custom adapters.
🌐 Full MCP Compatibility
Model Context Protocol support ensures Refly skills plug directly into the emerging MCP ecosystem. Your skills become instantly available to any MCP-compatible agent, future-proofing your investment as the protocol gains adoption.
Four Use Cases Where Refly Dominates
Use Case 1: API Integration for Lovable
Problem: Your no-code team uses Lovable to build customer portals, but needs to pull verified data from Salesforce, Stripe, and your internal PostgreSQL database. Traditional approaches require building and maintaining three separate API connectors, each with its own authentication, error handling, and rate limiting.
Refly Solution: Build a single "Customer 360" skill that orchestrates all three data sources. Export it as a clean REST API that Lovable consumes natively. The skill handles retries, data normalization, and caching automatically. When Salesforce changes its API version, you update the skill once—every Lovable app inherits the fix instantly.
Impact: Reduce integration time from two weeks to 20 minutes. Eliminate duplicate code. Centralize governance.
Use Case 2: Webhook for Lark/Feishu
Problem: Your China-based team relies on Lark (Feishu) for daily operations. You need an AI agent that automatically processes expense reports submitted via Lark chat, validates them against company policy, and updates the accounting system. Building this requires understanding Lark’s webhook protocol, implementing verification, and maintaining a state machine.
Refly Solution: Create a "Expense Auditor" skill using vibe workflow: "When a user submits an expense receipt in Lark, extract the amount, vendor, and date. Check against policy limits. If approved, log to QuickBooks and notify the user. If rejected, request clarification." Refly compiles this into a webhook endpoint that Lark calls directly. The intervenable runtime lets your finance team pause and override decisions in real-time.
Impact: Deploy production-ready expense automation in 15 minutes. Maintain human oversight without slowing operations.
Use Case 3: Skills for Claude Code
Problem: Your engineering team uses Claude Code for development, but it lacks context about your internal microservices, deployment pipelines, and coding standards. You want Claude to generate code that automatically follows your API patterns and security guidelines.
Refly Solution: Build a "Code Standard Enforcer" skill that encapsulates your API design patterns, authentication requirements, and linting rules. Export it as a native Claude Code tool. When Claude generates code, it invokes your skill to validate and auto-correct violations. The skill runs deterministically, ensuring consistency across your entire codebase.
Impact: Eliminate code review bottlenecks. Enforce standards automatically. Reduce security vulnerabilities by 70%.
Use Case 4: Build Clawdbot 🦞
Problem: You need a Slack bot that answers complex questions about your data warehouse: "What was our Q3 revenue by region?" Building this requires SQL generation, validation against schema, result formatting, and Slack message composition—each a fragile step.
Refly Solution: Describe your Clawdbot workflow: "Convert natural language questions to SQL using the schema context. Execute against the data warehouse. Format results as a Slack-friendly table. Add a disclaimer about data freshness." Refly compiles this into a deterministic skill you deploy as a Slack bot. The skill includes schema validation to prevent malicious queries and automatically retries on database timeouts.
Impact: Democratize data access without creating a support burden. Maintain security and audit trails.
Step-by-Step Installation & Setup Guide
Prerequisites
- Docker and Docker Compose installed
- Node.js 18+ (for local development)
- Git
- API keys for your target LLM provider (Anthropic, OpenAI, etc.)
Method 1: Self-Deployment with Docker (Recommended)
# Clone the repository
git clone https://github.com/refly-ai/refly.git
cd refly
# Copy environment configuration
cp .env.example .env
# Edit .env with your API keys and settings
# nano .env # or your preferred editor
# Start the entire stack
docker-compose up -d
# Check service health
docker-compose ps
Configuration Steps:
-
Edit
.env:- Set
ANTHROPIC_API_KEYfor Claude models - Set
OPENAI_API_KEYfor GPT models - Configure
DATABASE_URLfor PostgreSQL persistence - Set
REDIS_URLfor caching and job queues
- Set
-
Access the IDE:
- Open
http://localhost:3000in your browser - Default credentials:
admin@refly.ai/changeme
- Open
-
Verify Installation:
# Check logs for errors docker-compose logs -f api # Test API health curl http://localhost:3000/api/health
Method 2: Hosted Workspace (Instant Access)
For immediate exploration without setup:
# No installation needed!
# Directly access: https://refly.ai/workspace
Trade-offs: Hosted version is perfect for prototyping but lacks custom tool integrations and data privacy guarantees of self-hosted deployments.
Initial Configuration
After deployment, configure your first skill provider:
# Navigate to provider catalog
cd config
# Review available integrations
cat provider-catalog.json | jq '.providers[] | .name'
# Enable specific providers by setting their status to "active"
# Edit provider-catalog.json and restart the API service
docker-compose restart api
Environment Variables Reference:
LOG_LEVEL: Set todebugfor troubleshootingMAX_WORKERS: Control concurrent skill executionSKILL_TIMEOUT: Default execution timeout (ms)ENABLE_AUDIT_LOG: Set totruefor compliance tracking
Real Code Examples from Refly
Example 1: Docker Compose Configuration
This snippet shows the production-ready Docker setup referenced in the self-deployment guide:
# docker-compose.yml (excerpt)
services:
api:
image: reflyai/refly-api:latest
environment:
- DATABASE_URL=postgresql://refly:password@postgres:5432/refly
- REDIS_URL=redis://redis:6379
- ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
- PROVIDER_CATALOG_PATH=/app/config/provider-catalog.json
volumes:
- ./config:/app/config # Mount custom provider configs
ports:
- "3000:3000"
depends_on:
- postgres
- redis
restart: unless-stopped
# The runtime engine that executes skills deterministically
runtime:
image: reflyai/refly-runtime:latest
environment:
- RUNTIME_MODE=intervenable # Enables mid-run auditing
- MAX_PARALLEL_EXECUTIONS=10
volumes:
- ./skills:/app/skills # Persistent skill storage
restart: unless-stopped
Explanation: This configuration deploys two critical services. The api service hosts the IDE and skill registry, while the runtime service executes skills with intervenable capabilities. The volume mounts ensure your provider configurations and skills persist across restarts. Setting RUNTIME_MODE=intervenable activates Refly’s signature audit-and-override functionality.
Example 2: Provider Catalog Configuration
Based on the provider-catalog.json mentioned in the README, here’s how you enable Stripe integration:
{
"providers": [
{
"name": "stripe",
"type": "payment",
"status": "active",
"auth": {
"type": "bearer_token",
"env_var": "STRIPE_API_KEY"
},
"actions": [
{
"name": "create_customer",
"endpoint": "POST /v1/customers",
"description": "Create a new Stripe customer",
"parameters": {
"email": "string",
"name": "string"
}
},
{
"name": "list_invoices",
"endpoint": "GET /v1/invoices",
"description": "Retrieve all invoices for a customer",
"parameters": {
"customer": "string"
}
}
]
}
]
}
Explanation: This JSON structure defines Stripe as an active provider with bearer token authentication. Each action maps to a Stripe API endpoint, with typed parameters that Refly’s compiler uses for validation. When you build a skill using "create Stripe customer," Refly references this catalog to generate deterministic API calls with proper error handling.
Example 3: Vibe Workflow Skill Definition
Here’s a skill compiled from natural language description ("vibe workflow"):
# skills/customer-onboarding.refly
apiVersion: refly.ai/v1
kind: Skill
metadata:
name: customer-onboarding
description: "Onboard new customers with Stripe, Slack notification, and CRM logging"
version: "1.2.0"
spec:
trigger:
type: webhook
endpoint: "/onboard-customer"
steps:
- id: create-stripe-customer
tool: stripe.create_customer
args:
email: "{{input.email}}"
name: "{{input.company_name}}"
# Automatic retry with exponential backoff
retryPolicy:
maxAttempts: 3
backoff: exponential
- id: notify-slack
tool: slack.post_message
args:
channel: "#new-customers"
text: "🎉 New customer {{input.company_name}} onboarded!"
# Execute only if Stripe step succeeds
dependsOn: [create-stripe-customer]
- id: log-to-crm
tool: salesforce.create_record
args:
object: "Account"
data:
Name: "{{input.company_name}}"
Stripe_Customer_ID: "{{steps.create-stripe-customer.output.id}}"
# Run in parallel with Slack notification
dependsOn: [create-stripe-customer]
# Enforce business rules
policies:
- type: compliance
rule: "steps.create-stripe-customer.output.email must contain '@'"
- type: timeout
maxDuration: 30000 # 30 seconds total
Explanation: This YAML defines a deterministic three-step workflow. The {{input.*}} syntax injects webhook parameters, while {{steps.*.output.*}} references previous step results. The retryPolicy ensures Stripe API hiccups don’t fail the entire workflow. dependsOn creates a directed acyclic graph (DAG) for parallel execution. The policies section enforces business rules at runtime, preventing hallucinations from corrupting data.
Example 4: Exporting as MCP Server
Export your skill to run natively in Claude Code:
# Export skill as MCP server
refly export mcp \
--skill customer-onboarding \
--output ./mcp-servers/ \
--format typescript
# Generated mcp-servers/customer-onboarding/index.ts
import { Server } from "@modelcontextprotocol/sdk/server/index.js";
import { CallToolRequestSchema } from "@modelcontextprotocol/sdk/types.js";
// Auto-generated from Refly skill v1.2.0
export function createCustomerOnboardingServer() {
const server = new Server(
{
name: "customer-onboarding",
version: "1.2.0",
},
{
capabilities: {
tools: {},
},
}
);
server.setRequestHandler(CallToolRequestSchema, async (request) => {
// Deterministic execution logic compiled from Refly
const result = await executeReflySkill(
"customer-onboarding",
request.params.arguments
);
return {
content: [
{
type: "text",
text: JSON.stringify(result, null, 2),
},
],
};
});
return server;
}
Explanation: The refly export command generates a TypeScript MCP server that encapsulates your skill’s deterministic logic. Claude Code loads this server and invokes it as a native tool. The generated code includes type safety, error handling, and audit logging—all derived from your original skill definition. This is how Refly turns infrastructure into portable capabilities.
Advanced Usage & Best Practices
Design Atomic Skills
Best Practice: Break complex SOPs into single-responsibility skills. A "customer-onboarding" skill should handle Stripe, Slack, and Salesforce, but a "customer-verification" skill should be separate. This maximizes reuse and simplifies debugging.
Pro Tip: Use semantic versioning (v1.2.0) and never modify a published skill. Instead, create a new version. This ensures downstream agents don’t break unexpectedly.
Leverage Intervenable Runtime
Optimization Strategy: Enable human-in-the-loop for high-risk steps. Configure policies that pause execution before financial transactions, allowing compliance teams to approve via the Refly dashboard. This combines AI speed with human judgment.
Performance Tuning: Set MAX_PARALLEL_EXECUTIONS based on your API rate limits. For Stripe’s 100 requests/second limit, cap workers at 80 to avoid throttling.
Secure Credential Management
Security Best Practice: Never hardcode API keys in skills. Use the provider catalog’s env_var references and store secrets in Docker secrets or Kubernetes secrets:
echo "your-stripe-key" | docker secret create stripe_api_key -
Audit Everything: Enable ENABLE_AUDIT_LOG=true and ship logs to your SIEM. Refly logs every input, output, and policy decision, creating a compliance trail that satisfies SOC 2 and HIPAA requirements.
Cache Deterministic Results
Cost Optimization: For idempotent skills (e.g., "get customer by ID"), enable Redis caching in the provider catalog:
{
"cache": {
"ttl": 3600,
"key": "stripe:customer:{{input.customer_id}}"
}
}
This cuts LLM token costs by 90% for repeated queries and slashes API latency.
Comparison: Refly vs. Alternatives
| Feature | Refly | LangChain Tools | AutoGen | Zapier/Make |
|---|---|---|---|---|
| Core Philosophy | Skills as infrastructure | Prompt-based tools | Multi-agent conversations | No-code automation |
| Deterministic Execution | ✅ Intervenable runtime | ❌ Black box | ⚠️ Partial | ✅ But limited AI |
| Vibe Workflow | ✅ Natural language compiler | ❌ Code-only | ❌ Code-only | ✅ But rigid |
| MCP Export | ✅ Native | ❌ Manual | ❌ Manual | ❌ Not supported |
| Version Control | ✅ Git-like semantics | ❌ Ad-hoc | ❌ Ad-hoc | ⚠️ Limited |
| Self-Hosting | ✅ Full Docker support | ✅ | ✅ | ❌ Cloud-only |
| Tool Integrations | 3,000+ native | 100+ via community | 50+ via community | 5,000+ but basic |
| Skill Registry | ✅ Central governance | ❌ Distributed | ❌ Per-agent | ❌ Not applicable |
| Token Efficiency | ✅ Optimized DSL | ❌ Standard prompts | ❌ Standard prompts | N/A |
| Audit & Compliance | ✅ Built-in | ❌ Add-on | ❌ Add-on | ⚠️ Basic logs |
Why Choose Refly? Traditional tools treat skills as code or prompts—disposable and fragile. Refly treats them as first-class infrastructure assets, complete with versioning, governance, and cross-platform portability. While LangChain excels at prototyping and Zapier at simple automations, only Refly delivers production-grade determinism with developer-friendly ergonomics.
Frequently Asked Questions
Q: What makes Refly different from a prompt management tool?
A: Prompt managers store text templates. Refly compiles natural language into deterministic execution graphs with retry logic, policy enforcement, and audit trails. Skills are infrastructure, not strings.
Q: How does "vibe workflow" actually work?
A: You describe logic in plain English. Refly’s Model-Native DSL parser, optimized for LLM comprehension, converts your description into a YAML/JSON skill definition with typed parameters, dependency graphs, and error handling. It’s like having a senior developer translate your intent into production code instantly.
Q: Is Refly truly open-source?
A: Yes. The core platform is licensed under the ReflyAI License (permissive Apache-style). You can self-host, modify, and commercialize your skills. The hosted workspace at refly.ai/workspace offers a free tier for prototyping.
Q: Can I integrate my private databases and internal APIs?
A: Absolutely. The provider catalog supports private skill connectors. Define your internal API in provider-catalog.json using the same schema as public providers. Refly handles authentication, request signing, and response parsing.
Q: How does Refly ensure deterministic execution?
A: The intervenable runtime executes skills as state machines. Each step’s output is validated against declared schemas before proceeding. Policies enforce business rules, and the runtime logs every state transition. If a step fails, the runtime retries per policy or pauses for human intervention—never silently continues.
Q: What’s the performance overhead compared to direct API calls?
A: Minimal. Refly’s DSL compiler generates optimized execution plans that batch requests and leverage connection pooling. Benchmarks show <5ms overhead for simple skills and up to 20% faster execution for complex workflows due to intelligent parallelization and caching.
Q: Can skills call other skills?
A: Yes. Skills are composable. Reference another skill as a step using tool: refly.skill_name. This creates reusable building blocks—your "authenticate user" skill can be invoked by any workflow requiring auth, ensuring consistency.
Conclusion: The Future of Agent Infrastructure Is Refly
Refly doesn’t just improve AI agent development—it redefines it. By converting brittle SOPs into governed, versioned, and deterministic skills, Refly solves the production reliability crisis that plagues 90% of enterprise AI initiatives. The vibe workflow compiler slashes development time from weeks to minutes, while the intervenable runtime guarantees compliance and auditability.
What excites me most is Refly’s ecosystem philosophy. It doesn’t replace your tools; it unifies them. Whether you’re exporting MCP servers for Claude Code, APIs for Lovable, or webhooks for Lark, Refly acts as the universal translation layer. The open-source nature and 3,000+ integrations mean you’re never locked in.
If you’re serious about deploying AI agents that actually work in production, stop hard-coding tools and start building skills. The hosted workspace lets you prototype instantly, while Docker self-deployment gives you complete control.
Your next step: Clone the repository, deploy the stack, and build your first skill. In three minutes, you’ll understand why Refly is the infrastructure layer the agentic ecosystem desperately needed.
🚀 Deploy Refly now: https://github.com/refly-ai/refly
💡 Try instantly: https://refly.ai/workspace
📚 Explore skills: https://github.com/refly-ai/refly-skills
Comments (0)
No comments yet. Be the first to share your thoughts!