# MD040 - Code blocks should have a language specified

Aliases: `fenced-code-language`

## What this rule does

Ensures code blocks (```) specify what programming language they contain. Optionally enforces consistent language labels and restricts which languages are allowed.

## Why this matters

- **Syntax highlighting**: Editors and renderers can color-code the syntax correctly
- **Clarity**: Readers immediately know what language they're looking at
- **Consistency**: Using the same label (e.g., always `bash` instead of mixing `sh`/`bash`/`zsh`) keeps documentation uniform
- **Tools**: Some tools use language hints for processing or validation

## Examples

### ✅ Correct

````markdown
```python
def hello():
    print("Hello, world!")
```

```javascript
console.log("Hello, world!");
```

```bash
echo "Hello, world!"
```
````

### ❌ Incorrect

<!-- rumdl-disable MD040 MD031 -->

````markdown
```
def hello():
    print("Hello, world!")
```

```
console.log("Hello, world!");
```
````

<!-- rumdl-enable MD040 MD031 -->

### 🔧 Fixed

````markdown
```text
def hello():
    print("Hello, world!")
```

```text
console.log("Hello, world!");
```
````

> **Note**: The fix adds `text` as a default language hint when none is specified.

## Configuration

```toml
[MD040]
# Language label normalization mode
# - "disabled" (default): Only check for missing language
# - "consistent": Normalize to most prevalent alias per language
style = "disabled"

# Override preferred label for specific languages
# Keys are GitHub Linguist canonical names, values are your preferred alias
preferred-aliases = { Shell = "bash", JavaScript = "js" }

# Restrict which languages are allowed (empty = allow all)
# Uses GitHub Linguist canonical language names
allowed-languages = ["Python", "Shell", "JavaScript", "TypeScript", "JSON", "YAML"]

# Block specific languages (ignored if allowed-languages is non-empty)
disallowed-languages = ["Java", "C++"]

# Action for unknown language labels not in GitHub Linguist
# - "ignore" (default): Silently ignore unknown languages
# - "warn": Emit a warning for unknown languages
# - "error": Treat unknown languages as errors
unknown-language-action = "ignore"
```

### Consistent Mode

When `style = "consistent"`, the rule ensures all code blocks that refer to the same language use the same label. For example, if your document has:

````markdown
```bash
echo "one"
```

```sh
echo "two"
```

```bash
echo "three"
```
````

The rule will flag `sh` as inconsistent because `bash` is more prevalent (2 occurrences vs 1).

**With `--fix`**, inconsistent labels are automatically normalized to the most prevalent one:

````markdown
```bash
echo "one"
```

```bash
echo "two"
```

```bash
echo "three"
```
````

### Preferred Aliases

Use `preferred-aliases` to override which label is used regardless of prevalence:

```toml
[MD040]
style = "consistent"
preferred-aliases = { Shell = "sh" }  # Always use "sh" instead of "bash"
```

### Language Restrictions

Restrict which languages can appear in your documentation:

```toml
[MD040]
# Only allow these languages
allowed-languages = ["Python", "Shell", "JSON"]
```

Or block specific languages:

```toml
[MD040]
# Block these languages (only works if allowed-languages is empty)
disallowed-languages = ["Java", "C++"]
```

### Unknown Languages

By default, language labels not recognized by GitHub Linguist are silently ignored. Use `unknown-language-action` to change this behavior:

```toml
[MD040]
# Warn about unknown languages
unknown-language-action = "warn"

# Or treat unknown languages as errors
unknown-language-action = "error"
```

This is useful for enforcing that all language labels are valid and will receive proper syntax highlighting on GitHub.

## Linguist Integration

This rule uses [GitHub Linguist](https://github.com/github-linguist/linguist) as the source of truth for language names and aliases. This ensures compatibility with GitHub's syntax highlighting.

Common language mappings:
- `sh`, `bash`, `zsh`, `shell-script` → Shell
- `js`, `node` → JavaScript
- `ts` → TypeScript
- `python`, `python3` → Python

## Automatic fixes

- Missing language: Adds `text` as the default
- Inconsistent labels (when `style = "consistent"`): Normalizes to the preferred/prevalent label

## Learn more

- [CommonMark fenced code blocks](https://spec.commonmark.org/0.31.2/#fenced-code-blocks) - Technical specification
- [GitHub Flavored Markdown](https://github.github.com/gfm/#info-string) - Language hints in code blocks
- [GitHub Linguist](https://github.com/github-linguist/linguist) - Language detection and aliases

## Related rules

- [MD046](md046.md) - Code block style should be consistent
- [MD048](md048.md) - Code fence style should be consistent
- [MD031](md031.md) - Code blocks should be surrounded by blank lines
