Back to stack
Dec 30, 2025
3 min read

Language Detection – 400+ Keywords

Bulletproof code block parsing with comprehensive language identifier coverage for all AI providers

Language Detection

The Language Detection system in ConstruQtor is a critical component that parses AI-generated code blocks and determines whether they contain actual file paths or just language identifiers.

The Problem It Solves

Different AI providers use different conventions for code blocks:

ProviderCode Block StyleIssue
OpenAI```pyCreates file “py” if not detected
Gemini```python:main.pyWorks correctly
Claude```pythonCreates file “python” if not detected
DeepSeek```jsCreates file “js” if not detected

The Solution (v1.0.0)

A comprehensive keyword set of 400+ language identifiers that covers:

Python variants (25+)

python, py, py3, pyw, pyi, pyc, pyx, pxd, pxi,
python3, python2, ipython, cython, pyrex, jython...

JavaScript/TypeScript (40+)

javascript, js, jsx, mjs, cjs, es6, es5, es7,
typescript, ts, tsx, mts, cts, d.ts, node...

Infrastructure-as-Code (30+)

terraform, tf, tfvars, hcl, ansible, puppet,
k8s, kubernetes, helm, docker, dockerfile...

All GitHub Linguist IDs (300+)

Every language identifier from GitHub's Linguist v4.5.2+

Generic Markers (20+)

code, snippet, output, console, terminal, shell,
plaintext, text, raw, diff, patch...

How It Works

1. Code Block Detection:

# Extracts info line: "python:src/main.py" or "py"
pattern = r'^```(\S+)?\s*\n(.*?)```'

2. Filename Validation:

def _is_valid_filename(identifier):
    # ✅ Real file: "src/main.py" (has path + extension)
    # ✅ Known file: "Dockerfile" (extensionless but valid)
    # ❌ Keyword: "py" (in language_keywords set)
    # ❌ Keyword: "typescript" (in language_keywords set)

3. Smart Fallback:

  • If identifier is a language keyword → Skip (don’t create file)
  • If identifier has valid path/extension → Write file
  • If identifier is known extensionless file → Write file

Known Extensionless Files

The system also recognizes legitimate files without extensions:

known_extensionless_files = {
    # Build files
    'dockerfile', 'containerfile', 'makefile', 'gnumakefile',
    'rakefile', 'gemfile', 'guardfile', 'brewfile',
    
    # Go ecosystem
    'go.mod', 'go.sum', 'go.work',
    
    # Package managers
    'package.json', 'cargo.toml', 'pyproject.toml',
    
    # CI/CD
    'jenkinsfile', 'procfile',
    
    # Config dotfiles
    '.gitignore', '.dockerignore', '.env', '.editorconfig'
}

Provider Compatibility

ProviderTestedResult
OpenAI GPT-4/GPT-4oBULLETPROOF
Google GeminiBULLETPROOF
Anthropic ClaudeBULLETPROOF
DeepSeek CoderBULLETPROOF
Qwen/Qwen2.5-CoderBULLETPROOF

Configuration

This system is built into ConstruQtor and requires no configuration. It works automatically with any AI provider.

Introduced In

v1.0.0-stable - December 30, 2025