Language Detection
The Language Detection system in ConstruQtor is a critical component that parses AI-generated code blocks and determines whether they contain actual file paths or just language identifiers.
The Problem It Solves
Different AI providers use different conventions for code blocks:
| Provider | Code Block Style | Issue |
|---|---|---|
| OpenAI | ```py | Creates file “py” if not detected |
| Gemini | ```python:main.py | Works correctly |
| Claude | ```python | Creates file “python” if not detected |
| DeepSeek | ```js | Creates file “js” if not detected |
The Solution (v1.0.0)
A comprehensive keyword set of 400+ language identifiers that covers:
Python variants (25+)
python, py, py3, pyw, pyi, pyc, pyx, pxd, pxi,
python3, python2, ipython, cython, pyrex, jython...
JavaScript/TypeScript (40+)
javascript, js, jsx, mjs, cjs, es6, es5, es7,
typescript, ts, tsx, mts, cts, d.ts, node...
Infrastructure-as-Code (30+)
terraform, tf, tfvars, hcl, ansible, puppet,
k8s, kubernetes, helm, docker, dockerfile...
All GitHub Linguist IDs (300+)
Every language identifier from GitHub's Linguist v4.5.2+
Generic Markers (20+)
code, snippet, output, console, terminal, shell,
plaintext, text, raw, diff, patch...
How It Works
1. Code Block Detection:
# Extracts info line: "python:src/main.py" or "py"
pattern = r'^```(\S+)?\s*\n(.*?)```'
2. Filename Validation:
def _is_valid_filename(identifier):
# ✅ Real file: "src/main.py" (has path + extension)
# ✅ Known file: "Dockerfile" (extensionless but valid)
# ❌ Keyword: "py" (in language_keywords set)
# ❌ Keyword: "typescript" (in language_keywords set)
3. Smart Fallback:
- If identifier is a language keyword → Skip (don’t create file)
- If identifier has valid path/extension → Write file
- If identifier is known extensionless file → Write file
Known Extensionless Files
The system also recognizes legitimate files without extensions:
known_extensionless_files = {
# Build files
'dockerfile', 'containerfile', 'makefile', 'gnumakefile',
'rakefile', 'gemfile', 'guardfile', 'brewfile',
# Go ecosystem
'go.mod', 'go.sum', 'go.work',
# Package managers
'package.json', 'cargo.toml', 'pyproject.toml',
# CI/CD
'jenkinsfile', 'procfile',
# Config dotfiles
'.gitignore', '.dockerignore', '.env', '.editorconfig'
}
Provider Compatibility
| Provider | Tested | Result |
|---|---|---|
| OpenAI GPT-4/GPT-4o | ✅ | BULLETPROOF |
| Google Gemini | ✅ | BULLETPROOF |
| Anthropic Claude | ✅ | BULLETPROOF |
| DeepSeek Coder | ✅ | BULLETPROOF |
| Qwen/Qwen2.5-Coder | ✅ | BULLETPROOF |
Configuration
This system is built into ConstruQtor and requires no configuration. It works automatically with any AI provider.
Introduced In
v1.0.0-stable - December 30, 2025