How I Made devUML - Part 3: Integrating AI for Intuitive Diagram Generation
Creating a natural language interface for UML diagramming with Google's Gemini model

Girish Koundinya
In my previous posts, I explored how devUML handles bidirectional sync and version control. Today, I want to share the most transformative feature we've added: AI-powered diagram generation from natural language descriptions.
Creating UML diagrams has traditionally required specialized knowledge—not just understanding the notation, but also mastering diagramming tools. I wanted to remove these barriers and make UML accessible to everyone, from seasoned architects to those new to software design.
The Google Connection: Why We Chose Gemini
When I started building devUML's AI capabilities, I needed to find the right model that could generate valid, consistent Mermaid syntax. After evaluating several options, Google's Gemini emerged as the clear winner for three key reasons:
-
Consistent Mermaid Compliance: Gemini demonstrated superior ability to generate syntactically correct Mermaid code, especially for complex diagrams with many entities and relationships.
-
Understanding of Diagrams as Visual Structures: Unlike text-only models, Gemini has multimodal capabilities that help it understand diagrams as visual structures rather than just text, resulting in more logical and readable layouts.
-
Cost-Effectiveness at Scale: When comparing costs across providers, Gemini offered the best balance of quality and pricing for our specific use case, making it economically viable to scale.
This decision has proven crucial to providing a reliable diagramming experience. Users don't see the complexity behind the scenes—they just know that their natural language descriptions reliably convert to accurate diagrams.
The Art of Structured Responses: Embracing JSON
One of the most important lessons I learned during development was the value of structured responses from AI models. Early implementations faced a significant challenge: inconsistent formatting that was difficult to parse reliably.
The breakthrough came when I shifted to requiring structured JSON responses from the model. This approach provides several critical benefits:
- Predictable Parsing: JSON responses eliminate ambiguity in parsing, making the system more robust.
- Clean Separation of Concerns: By separating explanation text from diagram code, we avoid contaminating the Mermaid syntax.
- Additional Metadata: The structured format allows us to include helpful metadata like suggested titles.
Here's an example of the response format we require:
{
"title": "Sales Process Flow: Lead to Contract",
"explanation": "This diagram illustrates the complete sales process with approval gates for pricing and legal review.",
"mermaidCode": "flowchart TD\n A[Lead Generation] --> B[Qualification]\n B --> C[Discovery Calls]..."
}
By enforcing this structure, we've dramatically reduced rendering errors and improved the overall reliability of the system.
Ensuring Mermaid Compliance with Precision Prompting
The heart of devUML's AI capabilities is our carefully crafted prompt template. After numerous iterations and refinements, we've developed a prompt structure that consistently produces valid Mermaid diagrams:
You are a Mermaid diagram expert. Always generate mermaid complaint code. Always suggest a clear, professional title for the diagram.
Current Diagram:
```mermaid
flowchart TB
A[Start]
B[Define Purpose and Scope]
C[Identify Steps and Sequence]
D[Choose Appropriate Symbols and Shapes]
E[Draw the Flowchart]
F[Test and Refine]
G[End]
A --> B
B --> C
C --> D
D --> E
E --> F
F --> G
Context: {diagramContext}
Previous: {previousMessages}
User Request: {userRequest}
Provide your response in the following JSON format without any additional text or explanation:
{
"title": "Concise Professional Diagram Title",
"explanation": "Your explanation of changes or diagram details",
"mermaidCode": "updated mermaid code that is valid and correct"
}
Make sure your response is ONLY the JSON with no other text around it.
Let's break down the key elements that make this prompt effective:
1. Clear Role Definition
The prompt begins by establishing the AI's role as a "Mermaid diagram expert" and explicitly instructs it to "always generate mermaid compliant code." This sets clear expectations from the start.
2. Sample Diagram as a Template
A critical innovation in our approach is providing a sample diagram in the system prompt, even for new diagram creation. This gives the model a clear reference for both syntax and formatting style. By showing a well-structured flowchart with proper node definitions and connections, we essentially say "make your response look like this."
This technique of "learning by example" proved far more effective than lengthy instructions about Mermaid syntax rules.
3. Contextual Awareness
The prompt includes:
- The current diagram (if one exists)
- Context about the diagram type or domain
- Previous conversation history (when available)
This contextual information helps the AI generate diagrams that build on existing work rather than starting from scratch each time.
4. Explicit Format Requirements
Perhaps most critically, we specify the exact JSON format required and emphasize that the response should contain "ONLY the JSON with no other text around it." This strict formatting requirement has eliminated countless parsing headaches.
5. Focused User Request
We clearly separate the user's request from other instructions, making it easy for the AI to identify exactly what changes or additions are being requested.
The Technical Implementation
With our prompt template defined, the implementation required careful handling of responses to ensure reliable parsing and rendering:
async createDiagram(task, context = {
currentDiagram: '',
previousMessages: [],
isFirstMessage: false,
files: []
}) {
// Define sample diagram to use when no current diagram exists
const SAMPLE_DIAGRAM = `flowchart TB
A[Start]
B[Define Purpose and Scope]
C[Identify Steps and Sequence]
D[Choose Appropriate Symbols and Shapes]
E[Draw the Flowchart]
F[Test and Refine]
G[End]
A --> B
B --> C
C --> D
D --> E
E --> F
F --> G`;
// Format the prompt with current context
const prompt = `You are a Mermaid diagram expert. Always generate mermaid complaint code. Always suggest a clear, professional title for the diagram.
Current Diagram:
\`\`\`mermaid
${context.currentDiagram || SAMPLE_DIAGRAM}
\`\`\`
Context: ${context.diagramContext || 'general software diagram'}
Previous: ${JSON.stringify(context.previousMessages)}
User Request: ${task}
Provide your response in the following JSON format without any additional text or explanation:
{
"title": "Concise Professional Diagram Title",
"explanation": "Your explanation of changes or diagram details",
"mermaidCode": "updated mermaid code that is valid and correct"
}
Make sure your response is ONLY the JSON with no other text around it.`;
// Call the Gemini API
const result = await this.generateContent([{ text: prompt }]);
if (result.success) {
try {
// Extract the JSON response
let responseText = result.data.trim();
const jsonMatch = responseText.match(/\{[\s\S]*\}/);
if (jsonMatch) {
responseText = jsonMatch[0];
} else {
throw new Error('Response does not contain valid JSON');
}
// Parse the JSON
const parsedResponse = JSON.parse(responseText);
// Extract the components
return {
success: true,
data: {
explanation: parsedResponse.explanation,
content: parsedResponse.mermaidCode,
suggestedName: parsedResponse.title
}
};
} catch (error) {
console.error('Error parsing Gemini response:', error);
return { success: false, error: 'Failed to parse response' };
}
}
return result;
}
This implementation does several important things:
- Defines a sample diagram to use when creating new diagrams from scratch
- Formats the prompt with all necessary context
- Extracts the JSON response using regex to handle any potential text wrapping
- Parses the JSON to extract the diagram components
- Returns a standardized response format for the rest of the application
Handling Edge Cases and Validation
Even with a carefully crafted prompt, there are still cases where the generated Mermaid code might have issues. To handle these edge cases, we implemented validation and cleaning steps:
// Basic validation of Mermaid syntax
function validateMermaidSyntax(code) {
try {
// Use Mermaid's own parser to validate
mermaid.parse(code);
return { isValid: true, error: null };
} catch (error) {
return {
isValid: false,
error: error.message.replace(/^Parse error:?\s*/i, '')
};
}
}
// Process the received Mermaid content
function processMermaidContent(content) {
// Clean up by ensuring consistent line breaks
return content
.replace(/\\n/g, '\n')
.replace(/\r\n/g, '\n')
.replace(/\r/g, '\n')
.split('\n')
.map(line => line.trim())
.join('\n');
}
These utility functions ensure that even if there are minor issues with the generated code, we can catch and often fix them before rendering the diagram for the user.
The Results: A Seamless Experience
The combination of Google's Gemini model, structured JSON responses, and our precision prompt template with a sample diagram has created a remarkably reliable diagramming experience. Users can simply describe what they want in natural language, and in seconds they have a professionally formatted diagram that meets their needs.
Here's a real example of a user request and the resulting diagram:
User Request:
"Design a sales process flow from lead to contract. Include stages: qualification, discovery calls, technical validation, pricing, legal review. Show approval gates for pricing and legal, plus key decision points for deal sizes and stakeholder sign-offs."
Generated Diagram:
flowchart TD
A[Lead Generation] --> B[Qualification]
B --> C[Discovery Calls]
C --> D[Technical Validation]
D --> E{Deal Size?}
E -->|Small| F[Standard Pricing]
E -->|Medium| G[Custom Pricing]
E -->|Large| H[Strategic Pricing]
F --> I{Pricing Approval}
G --> I
H --> I
I -->|Rejected| D
I -->|Approved| J[Legal Review]
J --> K{Legal Approval}
K -->|Rejected| L[Revisions]
L --> J
K -->|Approved| M{Stakeholder Sign-off}
M -->|Rejected| N[Address Concerns]
N --> M
M -->|Approved| O[Contract Execution]
What's remarkable is that this entire flow happened in seconds, with no manual tweaking of the diagram syntax required.
Lessons Learned and Future Directions
Integrating AI into devUML has taught me several valuable lessons:
-
Structured outputs are crucial for reliability. By enforcing a JSON format, we eliminated many parsing and handling issues.
-
Show, don't just tell. Including a sample diagram in the prompt was far more effective than lengthy syntax instructions.
-
Precise prompting matters. The specific wording and structure of our prompt template directly impacts the quality and consistency of the generated diagrams.
-
Choose the right model for your specific task. Google's Gemini proved to be the best fit for generating Mermaid code, but other tasks might benefit from different models.
-
Validation is still necessary. Even with a great model and prompt, validation provides an essential safety net.
As we continue to develop devUML, we're exploring several exciting directions:
- Cross-diagram intelligence: Helping users maintain consistency across multiple related diagrams
- Code and documentation generation: Automatically creating starter code or documentation based on diagrams
- Enhanced visualization options: Expanding beyond standard UML to support more specialized diagram types
By focusing on a reliable, consistent experience powered by AI, we're making diagramming more accessible to everyone involved in software design and documentation.
In my next post, I'll dive into how we've used these AI capabilities to create a learning experience that helps users become better software designers. Stay tuned!
