Skip to main content

Ingestion & Knowledge Graphs

The AI module keeps long-running context by mirroring your repository into a Cognee-powered knowledge graph and persisting conversations in local storage.

CLI Commands

# Scan the current project (skips .git/, .fuzzforge/, virtualenvs, caches)
fuzzforge ingest --path . --recursive

# Alias - identical behaviour
fuzzforge rag ingest --path . --recursive

The command gathers files using the filters defined in ai/src/fuzzforge_ai/ingest_utils.py. By default it includes common source, configuration, and documentation file types while skipping temporary and dependency directories.

Customising the File Set

Use CLI flags to override the defaults:

fuzzforge ingest --path backend --file-types .py --file-types .yaml --exclude node_modules --exclude dist

Command Options

fuzzforge ingest exposes several flags (see cli/src/fuzzforge_cli/commands/ingest.py):

  • --recursive / -r – Traverse sub-directories.
  • --file-types / -t – Repeatable flag to whitelist extensions (-t .py -t .rs).
  • --exclude / -e – Repeatable glob patterns to skip (-e tests/**).
  • --dataset / -d – Write into a named dataset instead of <project>_codebase.
  • --force / -f – Clear previous Cognee data before ingesting (prompts for confirmation unless flag supplied).

All runs automatically skip .fuzzforge/** and .git/** to avoid recursive ingestion of cache folders.

Dataset Layout

  • Primary dataset: <project>_codebase
  • Additional datasets: create ad-hoc buckets such as insights via the ingest_to_dataset tool
  • Storage location: .fuzzforge/cognee/project_<id>/

Persistence Details

  • Every dataset lives under .fuzzforge/cognee/project_<id>/{data,system}. These directories are safe to commit to long-lived storage (they only contain embeddings and metadata).
  • Cognee assigns deterministic IDs per project; if you move the repository, copy the entire .fuzzforge/cognee/ tree to retain graph history.
  • HybridMemoryManager ensures answers from Cognee are written back into the ADK session store so future prompts can refer to the same nodes without repeating the query.
  • All Cognee processing runs locally against the files you ingest. No external service calls are made unless you configure a remote Cognee endpoint.

Prompt Examples

You> refresh the project knowledge graph for ./backend
Assistant> Kicks off `fuzzforge ingest` with recursive scan

You> search project knowledge for "prefect workflow" using INSIGHTS
Assistant> Routes to Cognee `search_project_knowledge`

You> ingest_to_dataset("Design doc for new scanner", "insights")
Assistant> Adds the provided text block to the `insights` dataset

Environment Template

The CLI writes a template at .fuzzforge/.env.template when you initialise a project. Keep it in source control so collaborators can copy it to .env and fill in secrets.

# Core LLM settings
LLM_PROVIDER=openai
LITELLM_MODEL=gpt-5-mini
OPENAI_API_KEY=sk-your-key

# FuzzForge backend (Prefect-powered)
FUZZFORGE_MCP_URL=http://localhost:8010/mcp

# Optional: knowledge graph provider
LLM_COGNEE_PROVIDER=openai
LLM_COGNEE_MODEL=gpt-5-mini
LLM_COGNEE_API_KEY=sk-your-key

Add comments or project-specific overrides as needed; the agent reads these variables on startup.

Tips

  • Re-run ingestion after significant code changes to keep the knowledge graph fresh.
  • Large binary assets are skipped automatically—store summaries or documentation if you need them searchable.
  • Set FUZZFORGE_DEBUG=1 to surface verbose ingest logs during troubleshooting.