Conda Channel Naming Discrepancies Analysis Report#
Executive Summary#
This report analyzes naming differences between the main conda channel and conda-forge channel for Python packages. The analysis was done by comparing the PyPI mappings between main and conda-forge channel packages and then identifying when they, the conda package names, are different. This analysis was done mid August 2025.
Study Overview#
Channel data was downloaded from cf-graph-countyfair (grayskull_pypi_mapping.json) for conda-forge channel and an internal Anaconda metadata source for the main channel. With both sources, containing the conda package name and the pypi name, the following analysis was done:
# main_df: Pandas Dataframe from main channel
# cf_df: Pandas Dataframe from conda-forge channel
# Merging data by PyPi Name
# main_df stores the package name as name and cf_df as conda_name so there is no collision
mdf = pd.merge(main_df, cf_df, left_on="pypi_name", right_on="pypi_name")
not_found_df = mdf[mdf.name != mdf.conda_name].sort_values(by="name")
# Because there could be other names (aliases or older names) we should check
# if each of the discrepancies also exists.
conda_search_results = {}
for x in not_found_df.conda_name:
if x in conda_search_results:
continue
print(f"Looking up {x} on main...")
# Note: In the original jupyter notebook of the analysis
# this is what was run: query = !conda search {x}
query = subprocess.run(["conda", "search", x], capture_output=True, text=True)
conda_search_results[x] = query
# Find any than are also found in main (these were noted belov)
found_on_main = [
k
for k, v in conda_search_results.items()
if not any("PackagesNotFoundError" in l for l in v)
]
Data Overview#
72 packages were found with discrepancies between main conda channel and conda-forge channel conda package names. This was done by collecting information main’s packages (from an internal to Anaconda data store) and comparing this against conda-forge’s cf-graph-countyfair. The cf-graph-county-fair is the data repository for conda-forge’s automation. This repository stores the dependency graph and its introspection. Prefix’s parselmouth repository (which is used to store similar mapping data) was not used in this comparison.
A few of these were not listed below as the data mapping was incorrect.
Key Findings#
Discrepancies fall in several categories. There are cases where the name of the package was changed (This, of course, has it’s own challenges by having to select the ‘correct’ conda package).
Naming Pattern Categories#
Prefix/Suffix Standardization Differences#
# |
Main Channel Name |
Conda-Forge Name |
PyPI Name |
Notes |
---|---|---|---|---|
1 |
|
|
|
|
2 |
|
|
|
|
3 |
|
|
|
|
4 |
|
|
|
|
5 |
|
|
|
main also has |
6 |
|
|
|
main also has |
7 |
|
|
|
|
8 |
|
|
|
|
9 |
|
|
|
|
10 |
|
|
|
main also has |
Vendor/Project Name Clarification#
# |
Main Channel Name |
Conda-Forge Name |
PyPI Name |
Notes |
---|---|---|---|---|
1 |
|
|
|
|
2 |
|
|
|
main also has |
3 |
|
|
|
|
4 |
|
|
|
main also has |
5 |
|
|
|
|
6 |
|
|
|
|
7 |
|
|
|
Hyphen vs Underscore Standardization#
# |
Main Channel Name |
Conda-Forge Name |
PyPI Name |
Notes |
---|---|---|---|---|
1 |
|
|
|
main also has |
2 |
|
|
|
|
3 |
|
|
|
|
4 |
|
|
|
|
5 |
|
|
|
|
6 |
|
|
|
main also has |
7 |
|
|
|
|
8 |
|
|
|
|
9 |
|
|
|
|
10 |
|
|
|
main also has |
11 |
|
|
|
|
12 |
|
|
|
Package Family Consolidation#
# |
Main Channel Name |
Conda-Forge Name |
PyPI Name |
Notes |
---|---|---|---|---|
1 |
|
|
|
main also has |
2 |
|
|
|
|
3 |
|
|
|
|
4 |
|
|
|
|
5 |
|
|
|
|
6 |
|
|
|
|
7 |
|
|
|
|
8 |
|
|
|
|
9 |
|
|
|
main also has |
10 |
|
|
|
main also has |
11 |
|
|
|
This is weird as conda-forge maintains both pandera and pandera-core which point to the same PyPI project pandera. |
12 |
|
|
|
|
13 |
|
|
|
|
14 |
|
|
|
|
15 |
|
|
|
|
16 |
|
|
|
|
17 |
|
|
|
|
18 |
|
|
|
|
19 |
|
|
|
|
20 |
|
|
|
|
21 |
|
|
|
|
22 |
|
|
|
|
23 |
|
|
|
Old main variants, main now uses pytorch. |
24 |
|
|
|
Old main variants, main now uses pytorch. |
25 |
|
|
|
main also has |
Impact Analysis#
There are discrepancies even within channels (pandera and pandera-core)
There are very few discrepancies between the two channels. Main has more than 2k different PyPI projects and there are only about 20 real differences.
Because of main’s longevity, there are older packages that are no longer maintained as main seems to be moving to use conda-forge package names whenever possible.
Research Questions for Further Investigation#
Is there disagreement among cf-graph-countyfair and parselmouth?
Would we expect it to be very significant if there was?
Should there be an effort to identify packages that are no longer current?
This is for cases where the conda package name has changed within a channel.
Recommendations#
Because of the few discrepancies, conda-pypi could get away with using conda-forge mapping with little impact.
As we move forward to a sustainable solution (a continually updated ecosystem mapping), they should be on a channel by channel basis.
For the short term MVP, conda-pypi should hard code as a long-term sustainable solution is decided on and implemented.
Conclusion#
There are very few instances of name differences between main and conda-forge. Though it would be optimal to have an index by channel, in the short term conda-pypi could just use conda-forge mappings. For extra coverage, we could hard code, the small list of exceptions found by this report.