Problem/Motivation

Whenever we're working on Drupal, with drupal/core as dependency, we rely on the models training (or web search) to make sure that our Drupal specific coding questions are answered correctly as your IDE is not able to search within Drupal's code base for answers.

That could mean you're working on a Drupal 11 project, but it still replies with Drupal 10 API documentation the model is trained on, or even Drupal 7 specific documentation.

What we want is to make sure you get the right documentation for the version you're working on.

Proposed resolution

I'd like to create a proof of concept that:
- Works fully locally (No expensive API calls)
- Grabs version specific Drupal documentation
- Connects your IDE with an MCP to that documentation
- And have you query with natural language to get the right documentation

Remaining tasks

TBD.

User interface changes

TBD.

Comments

ronaldtebrake created an issue. See original summary.

ronaldtebrake’s picture

StatusFileSize
new413.43 KB

Work in progress here: https://github.com/ronaldtebrake/surge-mcp-dev

The current process:

Ingestion (Maintainers only):

  • Parse Drupal's *.api.php files to extract API documentation
  • Generate vector embeddings using Ollama's embedding model
  • Export embeddings to JSON files and commit to Git

Hydration (End Users):

  • Download pre-computed JSON embeddings from Git
  • Load them into Redis vector store for fast searching

Runtime (End Users):

  • Your IDE sends a natural language query to the MCP server
  • Server automatically detects your Drupal version from composer.lock (or uses a manually set version)
  • Server generates an embedding for your query using Ollama
  • Searches Redis vector store for similar documentation matching your Drupal version
  • Returns relevant API documentation with code examples for your specific version

Key Components

  • MCP PHP SDK (mcp/sdk): Handles MCP protocol communication between your IDE and the server
  • PHP-Parser (nikic/php-parser): Parses Drupal's *.api.php files to extract functions, hooks, classes, and documentation
  • DocBlock Extractor: Extracts documentation and code examples from PHP docblocks
  • Docker: Containerizes Redis and Ollama services for easy setup and management
  • Redis (with Redis Stack): Vector database for fast semantic search of embeddings
  • Ollama: Free, local LLM service for generating vector embeddings (no API keys required)
  • Pre-computed Embeddings: JSON files containing Drupal API documentation embeddings (stored in Git)

This way we can separate the Hydration & Run time usage to Surge later on.
So we don't need to ship with all the dependencies, but only have to load the versions embeddings in to the Vector Store and set up the MCP to be able to connect to it.

Here you can see how the MCP is giving Drupal 11 specific documentation:
hook_entity_duplicate is only introduce in version 11.2.x https://www.drupal.org/node/3268812
Drupal 11 specific documentation

For a Drupal 10 Repo:
Drupal 10 specific documentation

ronaldtebrake’s picture

StatusFileSize
new461.57 KB

And without the MCP on Drupal 10

Drupal 10 - without mcp

ronaldtebrake’s picture

StatusFileSize
new417 KB
ronaldtebrake’s picture

Status: Active » Closed (won't fix)

Closed for now, it's very likely this could be better of being a Skill and only loaded when needed.

Now that this issue is closed, review the contribution record.

As a contributor, attribute any organization that helped you, or if you volunteered your own time.

Maintainers, credit people who helped resolve this issue.