Testing
Testing your plugin
Plugins are pure Python — every existing testing pattern applies. The three levels of confidence: unit-test the handlers in isolation, integration-test through a real
PluginContext, and manual-test against a running gateway with a real agent.Three levels of testing
- Unit — test handler functions directly. Fastest, narrowest scope. No Flowly imports needed.
- Integration — boot a real
PluginContext, register your plugin against an isolatedToolRegistry+HookRegistry, then dispatch tool calls or fire hooks programmatically. Catches wiring bugs. - Manual end-to-end — install in a real Flowly profile, restart the gateway, drive the agent from the desktop chat. Catches real-world UX bugs that unit/integration miss.
Most plugins only need unit + manual. Integration tests pay off once the plugin becomes complex enough that boot-time wiring is non-trivial.
Unit tests for handlers
Your handlers are just Python functions. Test them like any other function — no Flowly machinery needed.
python
# tests/test_my_plugin_handlers.py
import pytest
from pathlib import Path
# Import your plugin handlers directly
from my_plugin import (
weather_handler,
slash_handler,
_extract_paths_from_exec,
)
@pytest.mark.asyncio
async def test_weather_handler_default_city(monkeypatch):
# Mock the network call
async def fake_get(url, **kwargs):
class _R:
text = "Berlin: ⛅️ +14°C"
def raise_for_status(self): pass
return _R()
monkeypatch.setattr("httpx.AsyncClient.get", fake_get)
result = await weather_handler("Berlin")
assert "+14°C" in result
def test_slash_help_returned_on_empty_args():
out = slash_handler("")
assert out.startswith("/auto-commit") # help text starts here
def test_slash_status_with_no_config(tmp_path, monkeypatch):
monkeypatch.setenv("FLOWLY_HOME", str(tmp_path))
monkeypatch.delenv("FLOWLY_AUTO_COMMIT_PATHS", raising=False)
out = slash_handler("status")
assert "DISABLED" in out
def test_extract_paths_from_heredoc():
cmd = "cat > /tmp/file.py << 'EOF'\nprint('x')\nEOF\n"
paths = _extract_paths_from_exec({"command": cmd})
assert "/tmp/file.py" in pathsWrite tests directly against the parsing/scrape helpers (
_extract_paths_from_exec, _was_recently_written). They have the most regression risk when you tweak the implementation, and they're the cheapest to test.Integration tests with PluginContext
When you want to verify that register(ctx) wires everything up correctly, build a minimal context and run it:
python
# tests/test_my_plugin_integration.py
import pytest
from flowly.agent.hooks import HookRegistry, ToolHookContext
from flowly.agent.tools.registry import ToolRegistry
from flowly.plugins import PluginManager
@pytest.fixture
def loaded_plugin(tmp_path, monkeypatch):
"""Boot your plugin against a fresh registry inside an isolated home."""
monkeypatch.setenv("FLOWLY_HOME", str(tmp_path / "home"))
tools = ToolRegistry()
hooks = HookRegistry()
mgr = PluginManager(tool_registry=tools, hook_registry=hooks)
mgr.discover_and_load(
enabled={"my-plugin"},
disabled=set(),
)
return tools, hooks, mgr
def test_my_plugin_registers_expected_surfaces(loaded_plugin):
tools, hooks, mgr = loaded_plugin
info = next(p for p in mgr.list_plugins() if p["key"] == "my-plugin")
assert info["enabled"] is True
assert "my_tool" in info["tools"]
assert "post_tool_call" in info["hooks"]
assert "my-cmd" in info["commands"]
@pytest.mark.asyncio
async def test_hook_runs_on_tool_call(loaded_plugin):
tools, hooks, _ = loaded_plugin
# Fire a synthetic post_tool_call event
ctx = ToolHookContext(
tool_name="write_file",
params={"path": "/some/path"},
success=True,
)
await hooks.fire("post_tool_call", ctx)
# Assert your plugin's side effects (e.g. log file written, state updated)
# ...The pattern: build a fresh, isolated PluginManager, point it at a temp FLOWLY_HOME, force-enable your plugin, then drive hook/tool calls programmatically.
Reference
The Flowly source includes integration tests at
tests/test_disk_cleanup_plugin.py showing this pattern end-to-end.Manual end-to-end test recipe
The most reliable way to catch real-world bugs:
bash
# 1. Install fresh
flowly plugins remove my-plugin --yes 2>/dev/null
flowly plugins install /path/to/my-plugin
flowly plugins enable my-plugin
# 2. Restart the gateway with debug logging
LOGURU_LEVEL=DEBUG flowly gateway --port 18790
# 3. In another terminal, watch your plugin's audit log
tail -f ~/.flowly/my-plugin/log.jsonl
# 4. Drive the agent from desktop chat / Telegram / web
# — test the slash commands manually
# — ask the agent to do something that triggers your hook
# — observe the audit log for hook_fired events
# 5. Inspect persistent state
cat ~/.flowly/my-plugin/config.jsonChecklist
flowly plugins listshows your plugin asenabledafter gateway start- Slash commands respond — including help, status, and error paths
- The hook fires for the expected tools (verify via
hook_firedaudit events) - Skip filters log a reason — never silently return
- Successful actions write a clearly-named audit event
- Plugin survives a misconfigured environment (missing env var, missing files)
Useful fixtures
Patterns we've found valuable across plugin test suites:
python
@pytest.fixture
def isolated_flowly_home(tmp_path, monkeypatch):
"""Each test gets its own pristine $FLOWLY_HOME."""
home = tmp_path / "home"
home.mkdir()
monkeypatch.setenv("FLOWLY_HOME", str(home))
return home
@pytest.fixture
def temp_git_repo(tmp_path):
"""A throwaway git repo for plugins that touch git."""
import subprocess
repo = tmp_path / "repo"
repo.mkdir()
for cmd in [
["git", "init"],
["git", "config", "user.email", "t@local"],
["git", "config", "user.name", "test"],
]:
subprocess.run(cmd, cwd=repo, check=True, capture_output=True)
return repo
@pytest.fixture
def fresh_audit_log(isolated_flowly_home, monkeypatch):
"""Returns a Path to your plugin's log.jsonl — guaranteed empty."""
log = isolated_flowly_home / "my-plugin" / "log.jsonl"
yield log
# Optionally print contents on test failure for debugging
if log.exists():
print("Audit log contents:")
print(log.read_text())Combine
isolated_flowly_home + temp_git_repo + a real PluginManager for hermetic integration tests that mirror actual production behavior without touching your real ~/.flowly/.