Architecture#
conda-completion uses a hybrid Python/Rust design that splits the work into two distinct phases: manifest generation, and completion on every TAB press.
The two-phase design#
flowchart TD
subgraph python ["Phase 1: Python (manifest generation)"]
direction TB
A["conda completion generate"] --> B["Call generate_parser()"]
B --> C["Walk argparse tree"]
C --> D["Include plugin commands"]
D --> D2["Resolve package metadata"]
D2 --> E["Write completion.msgpack\n+ versions.index/store"]
end
E --> F[("completion.msgpack\nversions.index/store\n(cache directory)")]
subgraph rust ["Phase 2: Rust (runs on every TAB)"]
direction TB
G["_conda_completer"] --> H["Read completion.msgpack"]
H --> I["Walk cwd for project context"]
I --> J["Read global state"]
J --> K["Prefix/substring/fuzzy match"]
end
F --> G
style python fill:#306998,color:#fff
style rust fill:#dea584,color:#000
style F fill:#f5f5f5,stroke:#333
Phase 1: Generation (Python). conda completion generate calls
conda’s generate_parser() function, which loads all registered plugin
subcommands. The resulting argparse tree is walked recursively to extract
commands, flags, positional arguments, help text, and mutually exclusive
groups. It also resolves package metadata from configured channels via
conda’s SubdirData
API to extract package names and versions, reusing fresh package metadata
when available. The output is a completion.msgpack manifest plus
versions.index and versions.store, all stored in your platform’s
cache directory.
Phase 2: Completion (Rust). On every TAB press, the shell calls
_conda_completer, a statically linked Rust binary. It reads the
manifest, examines the current command line, and outputs matching
candidates in the format your shell expects. No Python process is
started. Package name completion uses a three-stage matching strategy
(prefix, substring, then fuzzy similarity) to handle typos.
Why this split?#
Argparse introspection requires importing conda and all its plugins.
That means loading Python, resolving imports, and initializing the plugin
system. That work is too slow for an interactive TAB press.
By running Python once and caching the result as msgpack, the hot path becomes a simple binary file read in Rust, with no Python startup cost.
Plugin awareness#
conda’s generate_parser() function (in conda.cli.conda_argparse)
calls configure_parser_plugins(), which discovers all registered conda
plugins via entry points and adds their subcommands to the parser tree.
conda-completion introspects the tree after this step, so any plugin
that registers conda_subcommands is included when the manifest is
generated.
For example, installing conda-workspaces adds workspace, ws, and
task subcommands. After running conda completion generate, those
subcommands appear in the manifest with full flag and positional
argument details.
Automatic regeneration#
conda-completion registers a conda_post_commands hook that fires after
install, remove, and update. The hook hashes the set of registered
plugin entry point names and compares it to the hash stored in the
manifest. If they differ (a plugin entry point was added or removed), the
manifest is regenerated without prompting.
For plugins installed through conda, conda workspace <TAB> can offer
the new subcommands after the install command finishes, without a manual
conda completion generate step.
Contextual completions#
Static command trees are not enough. When you type conda install --name <TAB>, you want to see your actual environment names, not a
generic placeholder.
The Rust binary reads project and global files directly:
Source |
What it provides |
|---|---|
|
Workspace-style environment names, task names, channels |
|
Environment name, channels |
|
Environment names, command names |
|
Environment names, command names |
|
Locked environment names and channels |
|
Channel names |
|
All registered environment names |
|
Configured channel names |
|
Tool names for arguments explicitly marked as |
The binary walks upward from the working directory to find project files
and checks fixed locations for global state. conda.toml support follows
the emerging manifest used by
conda-workspaces,
not a formal conda standard.
Stat-based file cache#
Parsing TOML and YAML on every TAB press would be wasteful when
most files rarely change between keystrokes. The binary maintains a
stat cache (context_cache.msgpack) that stores (mtime, size) tuples
for every source file.
On each invocation:
stat()every source file (one syscall each)Compare against cached tuples
On a cache hit (all stats match): read pre-parsed candidates from the cache file. No TOML/YAML parsing at all.
On a cache miss: re-parse only the changed file(s), merge with cached results, and write the updated cache atomically (write to
.tmp, then rename)
This turns the hot path from “parse 5-8 files” into “5-8 stat syscalls plus one small cache read.”
Shell integration#
Each supported shell gets a small script that wires the shell’s
completion system to _conda_completer. The scripts are generated by
conda completion init <shell> and installed into your RC file by
conda completion install <shell>.
The scripts differ per shell but follow the same pattern:
Define a completion function
The function calls
_conda_completerwith the current command line state (words and cursor position)Parse the output into the shell’s native completion format
The output format varies by shell:
bash: one candidate per line, no descriptions
zsh:
group\tcandidate:description(grouped and colon-separated)fish:
candidate\tdescription(tab-separated)PowerShell:
candidate\tdescription(wrapped inCompletionResult)
Dependency philosophy#
The Rust binary uses a minimal set of dependencies:
serde+rmp-serdefor the msgpack manifest and cachestomlfor project files (conda.toml, pixi.toml, pyproject.toml)serde-saphyrfor YAML files (environment.yml, .condarc, lockfiles; pure Rust, no unsafe)fs-errfor better I/O error messages
This keeps the binary small and startup fast. Heavier frameworks like
clap_complete or full conda type libraries (rattler) were deliberately
avoided to stay within the performance budget.
Design decisions#
msgpack over TOML for the manifest. The manifest is a derived artifact, never hand-edited. msgpack is smaller and faster to deserialize than TOML. It is already used in conda’s sharded repodata.
Indexed package-version data. completion.msgpack (~500KB, command
tree plus package names) is loaded for normal completion invocations.
versions.index maps package names to byte ranges in versions.store,
and the store record for one package is loaded only when = appears in
the current word. This keeps the common TAB press fast while avoiding a
full version-map
deserialization for one package.
Argparse introspection over a new hookspec. Introspecting conda’s existing argparse tree reuses plugin metadata without a conda-completion-specific API. A dedicated hookspec would require plugin maintainers to add a new hook implementation.
stat() over content hashing. stat() is one syscall per file.
Content hashing requires reading the entire file before deciding
whether to parse it. The only false-negative case (content changes
without mtime/size changing) is vanishingly rare in editing workflows.
Damerau-Levenshtein over Jaro-Winkler for fuzzy matching. Damerau-Levenshtein handles insertions, deletions, substitutions, and transpositions as single-cost operations. Jaro-Winkler fails on partial matches where string lengths differ significantly. The three-stage strategy (prefix > substring > similarity) ensures fuzzy matching only fires when nothing else matches.