← Back to Deck All Tiers

Graphify — Codebase Knowledge Graph

Maps your entire codebase into a queryable knowledge graph — agents query "how does X connect to Y?" in ~300 tokens instead of reading thousands of lines of source code. 50-150x token savings.

Video Demo

Interactive graph view — click nodes, explore connections, search across 4,000+ code entities in real time.

Overview

Graphify (safishamsi/graphify, 50K+ stars on GitHub) turns codebases into interactive knowledge graphs. It extracts entities, concepts, and relationships from 31+ programming languages via AST parsing (tree-sitter), plus docs, PDFs, and images via AI analysis. Every relationship is tagged EXTRACTED, INFERRED, or AMBIGUOUS — you always know what was found vs guessed.

Agents query the graph instead of grepping through source files. A question like "what connects the phase3 database to the delegation tool?" costs ~300 tokens via graph traversal vs ~15,000 tokens reading source files — 50x savings. For complex dependency chains, savings reach 150x.

Key Features

Interactive Graph View: Click any node to see its source file, type, community, and connections. Search, filter, zoom, and drag — a real-time network map of your codebase.

Community Detection: Leiden algorithm groups related code into communities — reveals hidden architecture you didn't know existed.

Graph Queries: BFS/DFS traversal answers "what depends on X?" in seconds. Shortest-path finds the connection chain between any two modules.

Confidence Tags: Every relationship is EXTRACTED (directly found in code), INFERRED (AI-detected pattern), or AMBIGUOUS (uncertain) — no guessing what's real.

Incremental Updates: SHA256 caching means nightly rebuilds only process changed files. Code changes are AST-only (free), doc changes trigger AI re-analysis.

How It Works

Build: graphify . scans your codebase, extracts AST nodes via tree-sitter, runs AI semantic analysis on docs, and outputs three files: graph.json (queryable), graph.html (interactive), and GRAPH_REPORT.md (highlights).

Query: graphify query "how does X work?" traverses the graph with BFS, returning relevant nodes and connections. Use graphify path "A" "B" for dependency tracing, graphify explain "Module" for deep dives.

Nightly Rebuild: A cron job at 02:00 runs incremental updates on all three graphs (tools, skills, consultancy). Git hooks auto-rebuild on commit for real-time freshness during development.

MCP Server: The graph is exposed as MCP tools (query_graph, get_node, get_neighbors, shortest_path) for direct agent access — no CLI needed.

Token Savings

Average 157x fewer tokens per query vs reading raw source files. A typical agent session with 20 codebase questions saves ~1 million tokens — that's ~$0.15 per session at DeepSeek pricing, or ~$15 at Anthropic pricing.

QueryRaw SourceGraphSavings
"What does phase3_delegate() do?"~15K tokens~200 tokens75x
"Which tools depend on phase3.db?"~50K tokens~300 tokens167x
"Trace delegation flow"~80K tokens~500 tokens160x
"Find all sqlite3 usage"~30K tokens~250 tokens120x

Configuration

Install via pip: pip install graphifyy --break-system-packages. Run graphify install --platform hermes to register as a Hermes skill. The graph is built once per codebase: graphify .. Nightly updates run automatically via cron — no manual intervention needed.

Requirements

Python 3.10+. DeepSeek API key (or any OpenAI-compatible provider) for semantic doc/image analysis. Code-only extraction is free (AST). Deployed to all tiers — the graph maps your specific deployment configuration.