graphify: Turn Codebases into Queryable Knowledge Graphs

graphifyTurn Codebases into Queryable Knowledge Graphs

graphify is an open-source AI coding assistant skill that integrates with tools like Claude Code, Cursor, and Gemini CLI. It transforms any code folder, SQL database schema, R scripts, documents, images, or videos into a queryable knowledge graph. This helps developers gain a holistic understanding of their codebase, including application logic, database structures, and infrastructure, making complex projects more approachable.

Project Overview

Developers stepping into a large, unfamiliar project often face a daunting challenge: the codebase looks like an impenetrable tangle. Interface documentation is outdated, database dependencies are a guessing game, and microservice call chains are a nightmare to trace. The open-source project graphify aims to tackle this problem head-on by converting these chaotic codebases into structured knowledge graphs – and it's remarkably language and tool agnostic.

What graphify Brings to the Table

At its core, graphify functions as an AI coding assistant skill. You can seamlessly integrate it into popular AI programming environments such as Claude Code, Codex, OpenCode, Cursor, or Gemini CLI. What it does is quite clever: it scans one or more specified directories, parsing and indexing everything from application code and SQL schemas to shell scripts, R scripts, PDF documents, and even images and videos. The output is a queryable knowledge graph. This means you can ask natural language questions like, “Which database tables does this API endpoint use?” or “Which modules call this specific function?” or “What downstream services depend on this microservice?”

While it might sound like an advanced code search, the real power lies in the graph structure. Unlike traditional full-text search, which typically returns a list of files, graphify allows you to visualize and explore the interconnected web of entities. This relational view makes dependencies and relationships immediately apparent, offering a far richer understanding than simply finding keywords.

Practical Scenarios Where graphify Shines

Onboarding New Developers to Legacy Systems: Imagine a new team member needing to understand a massive monorepo. Feed the entire repository to graphify, generate a graph in minutes, and then let them directly query confusing modules, perhaps asking, “What files and tables are involved in the user login process?”
Pre-Refactoring Dependency Analysis: Before breaking down a large module into smaller microservices, graphify can map out all current code dependencies, providing a clear blueprint for defining new service boundaries.
Understanding Research Papers or Technical Documentation: Instead of flipping through pages, you can index relevant PDFs and code examples into the graph. Then, search by concept, making information retrieval significantly faster and more targeted.

Getting Started with graphify

Being Python-based, installation is straightforward: a simple pip install graphify (a virtual environment is always a good idea). The next step involves loading it into your chosen AI coding tool, with detailed instructions available on the GitHub repository. It currently supports major AI programming assistants, including Claude Code, Cursor, and Gemini CLI. Developers just need to point graphify to a directory path, and it automatically scans, indexes, and generates the graph file.

A notable feature is graphify's ability to go beyond just text code. It can parse SQL database schemas (DDL statements) to understand table relationships and even process container and infrastructure configurations like Docker Compose files or Kubernetes YAML. Integrating these non-code assets into the same unified graph is particularly valuable for modern cloud-native applications, offering a truly comprehensive view.

The Upsides and Downsides

The advantages are quite compelling: multi-modal input support, seamless integration with mainstream AI tools, and fast, intuitive graph querying. With over 70,000 stars on GitHub, the project's active community and stability are well-established, indicating a robust and well-maintained tool.

However, it's not without its limitations. Firstly, it requires some initial configuration; it's not entirely plug-and-play, as you need an existing AI coding environment. Secondly, building the graph for extremely large codebases can be slow, especially if they contain numerous image and video files. Lastly, the accuracy of natural language queries ultimately depends on the underlying AI model; if the model itself has comprehension gaps, the answers might not be perfectly precise.

Practical Advice for Adoption

If you're considering graphify, I'd suggest starting with a smaller project—perhaps a personal application—to get a feel for the generated graph structure. Also, be selective about which directories you index. Excluding large dependencies like node_modules or massive datasets can significantly reduce build time and storage requirements.

For team environments, graphify can serve as a shared knowledge asset. Every team member can query the graph via their AI tools, potentially reducing the common complaint of neglected documentation. While it won't entirely replace well-written documentation, it certainly makes the code itself far more approachable and understandable.

Frequently Asked Questions