Heimdall uses tree-sitter for AST-based code analysis, with regex fallbacks for languages without full parser support. This page covers what’s extracted for each language and how to extend support.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/iamngoni/heimdall/llms.txt
Use this file to discover all available pages before exploring further.
Language Support Matrix
| Language | Grammar | Symbol Extraction | Call Graph | Status |
|---|---|---|---|---|
| Rust | tree-sitter-rust | Functions, structs, traits, enums, methods | Full | ✅ Full |
| Python | tree-sitter-python | Functions, classes, methods | Full | ✅ Full |
| JavaScript | tree-sitter-javascript | Functions, classes, arrow functions, methods | Full | ✅ Full |
| TypeScript | tree-sitter-typescript | Functions, classes, methods, interfaces | Full | ✅ Full |
| Go | tree-sitter-go | Functions, methods, structs, interfaces | Full | ✅ Full |
| Java | tree-sitter-java | Classes, methods, constructors, interfaces | Full | ✅ Full |
| Ruby | regex fallback | Methods, classes, modules | Basic | ⚠️ Basic |
| PHP | regex fallback | Functions, classes | Basic | ⚠️ Basic |
| C | regex fallback | Functions, structs, typedefs, macros | Basic | ⚠️ Basic |
| C++ | regex fallback | Classes, functions, methods, namespaces | Basic | ⚠️ Basic |
| C# | regex fallback | Classes, methods, properties, interfaces | Basic | ⚠️ Basic |
| Swift | regex fallback | Functions, classes, structs, protocols | Basic | ⚠️ Basic |
| Kotlin | regex fallback | Functions, classes, objects | Basic | ⚠️ Basic |
| Scala | regex fallback | Functions, classes, traits, objects | Basic | ⚠️ Basic |
| Shell/Bash | regex fallback | Functions, aliases, exports | Basic | ⚠️ Basic |
- ✅ Full: Tree-sitter AST parsing, complete symbol table, accurate call graphs
- ⚠️ Basic: Regex-based heuristics, best-effort symbol extraction, no call graph
Tree-Sitter Grammars
Tree-sitter provides robust, incremental parsers for supported languages. Heimdall uses the following grammars:Rust
Grammar:tree-sitter-rustVersion: Latest
Extraction:
src/index/symbols.rs:189-293
What’s extracted:
- Functions (
fn main(),pub async fn handler()) - Structs (
pub struct Config) - Traits (
pub trait Provider) - Enums (
pub enum Status) - Impl blocks and methods
- Visibility modifiers (
pub, private) - Entry points (
main(), route handlers inroutes/files)
User(struct, public)new(method, public)
Python
Grammar:tree-sitter-pythonVersion: Latest
Extraction:
src/index/symbols.rs:299-350
What’s extracted:
- Functions (
def hello():) - Classes (
class MyClass:) - Methods (functions inside class bodies)
- Async functions (
async def handler():) - Public/private based on naming (
_private_method)
UserService(class, public)create_user(method, public)_validate_email(method, private)
JavaScript / TypeScript
Grammar:tree-sitter-javascript, tree-sitter-typescriptExtraction:
src/index/symbols.rs:356-460
What’s extracted:
- Function declarations (
function foo() {}) - Class declarations (
class Bar {}) - Arrow functions (
const handler = () => {}) - Methods (
method() {}) - Exported symbols (
export function ...,export class ...)
AuthService(class, exported)login(method, public)#generateToken(method, private)validateToken(function, exported)
Go
Grammar:tree-sitter-goExtraction:
src/index/symbols.rs:466-542
What’s extracted:
- Functions (
func Process()) - Methods (
func (s *Service) Handle()) - Structs (
type Config struct) - Interfaces (
type Reader interface) - Public/private based on capitalization (
Publicvsprivate)
UserRepository(struct, public)FindByID(method, public, entry point)validate(method, private)
Java
Grammar:tree-sitter-javaExtraction:
src/index/symbols.rs:548-626
What’s extracted:
- Classes (
public class User) - Interfaces (
public interface Service) - Methods (
public void save()) - Constructors (
public User()) - Public/private/protected modifiers
UserController(class, public)createUser(method, public, entry point)validate(method, private)
What’s Extracted for Each Language
Symbol Types
Each extracted symbol includes:Entry Point Detection
Heimdall marks symbols as entry points using heuristics:| Language | Entry Point Criteria |
|---|---|
| Rust | fn main(), public functions in routes/ files, functions starting with handle_ |
| Python | def main(), functions in views/ or routes/ files |
| JavaScript/TypeScript | Functions in files containing route, handler, or api in path |
| Go | func main(), public functions in handler/ or api/ files |
| Java | public static void main(), public methods in *Controller or *Handler files |
Call Graph Construction
For tree-sitter-supported languages, Heimdall extracts call relationships:- Find all call expressions in the AST (
call_expression,method_invocation, etc.) - Extract the callee identifier
- Match against known function/method symbols
- Store in
Symbol.callsvector
process_user symbol will have calls = ["validate_id", "store_user"].
Regex Fallback Languages
For languages without tree-sitter support, Heimdall uses regex-based extraction.Ruby
Extraction:src/index/symbols.rs:869-915
Patterns:
UserService(class)create(method)valid_email?(method)
C/C++
Extraction:src/index/symbols.rs:952-1064
Patterns:
- Function pointer types may cause false positives
- Template specializations not fully supported
- Preprocessor macros parsed separately
C#
Extraction:src/index/symbols.rs:1066-1127
Patterns:
Adding New Language Support
Option 1: Tree-Sitter Grammar
For full AST support: Step 1: Add the tree-sitter dependency toCargo.toml:
src/index/symbols.rs:
extract_with_tree_sitter:
Option 2: Regex Fallback
For simpler support: Step 1: Define regex patterns insrc/index/symbols.rs:
extract_symbols_regex:
Language Detection
Heimdall infers language from file extensions:Static Analysis Rule Coverage
Static analysis rules insrc/pipeline/static_analysis/mod.rs use language filters:
languages filters to include it.
Performance Considerations
- Tree-sitter parsing: ~10-50ms per file (depends on file size)
- Regex fallback: ~1-5ms per file
- Memory: Symbol index is held in memory during scans (~1-5MB for typical repos)
- Indexing only changed files in incremental scans
- Sampling strategy (index entry points + changed files)
- Parallel indexing (Heimdall uses
rayonfor this)
Testing
Run language extraction tests:Related Files
src/index/symbols.rs— Symbol extraction for all languagessrc/index/callgraph.rs— Call graph constructionsrc/pipeline/static_analysis/mod.rs— Static analysis rules with language filters