Making conda fast again

Explaining how we’ve created the mamba prototype, a solver for conda environments that is hopefully fast enough to support a conda-forge with hundreds of thousands of packages.

4 min readMar 28, 2019

You might have seen the announcement on Twitter: at QuantStack we’ve been working on making a prototype of a conda-compatible package manager called mamba. Conda is a great tool to distribute data science packages. The community-led conda-forge comprises tons of awesome packages. The Anaconda company supplies us with recent and well integrated compilers. And conda-build is simply amazing to build binaries across different platforms (Windows, Linux and OS X). At QuantStack we use conda all the time to package Python, C++, Julia and R packages, and ship them to clients around the world.

However, due to the growth of conda-forge (it’s got over 60'000 packages for Linux right now) many users have made experience with the “conda: Solving environment” spinner. It’s been frustratingly slow for a while now.

At QuantStack, one of our main expertises is building High Performance applications for customers, mainly using C++.

To make conda faster we propose to

Build a Python extension using C++, pybind11 and compile it with all optimizations enabled
Use the existing libsolv library, that powers package managers like Fedora’s DNF or OpenSUSEs zypper and (like conda) performs SAT solving to satisfy all package dependencies correctly
For faster parsing of the repodata.json (already 35 MB of JSON for conda-forge) we use a library called simdjson which enables high speed parsing

With the prototype, we manage to solve environments in seconds, as demonstrated in the following video:

This prototype is already available on conda-forge. Existing conda users can install it easily by executing

conda install mamba -c conda-forge/label/mamba-alpha -c conda-forge

The source code for all of mamba can be found on github: https://github.com/QuantStack/mamba

The code is re-using as much from conda as possible. We re-implemented only the repository parsing, and the solving. Thanks to using the existing libsolv abstractions, mamba’s total lines of code are roughly 300 lines of Python, mostly adapted from conda, and 600 lines of C++ for parsing the JSON and adding all rules to libsolv. We try to keep this library as small as possible, which also makes it easier to debug and reason about.

We’re currently ironing out some low-hanging bugs, but actively looking into ways to further fund this work. We already have some promising leads, but if you know of an organization or company willing to sponsor some days of development for these tools, that would be great. We’re doing this with the goal of upstreaming the work into the original conda package manager at some point in the future.

Until this happens: the Anaconda team has also released a very interesting blog post with tips on how to make conda faster https://www.anaconda.com/understanding-and-improving-condas-performance/

                  __    __    __    __
                 /  \  /  \  /  \  /  \
                /    \/    \/    \/    \
███████████████/  /██/  /██/  /██/  /█████████████████████████████
              /  / \   / \   / \   / \  \____
             /  /   \_/   \_/   \_/   \    o \__,
            / _/                       \_____/  `
            |/

Ongoing work

We still have some work to do:

libsolv has never been run on Windows before we at QuantStack made a Windows port week ago. We’re currently upstreaming the changes. Just in case you know the equivalent of fcntl(store->pagefd, F_SETFD, FD_CLOEXEC); on Windows we’d glad to hear from you on this PR https://github.com/openSUSE/libsolv/pull/306
Thankfully, the libsolv maintainers (especially Michael Schroeder) have already implemented conda version matching exactly to the Python specifications (https://github.com/openSUSE/libsolv/commit/67d113f336327f3e1adc384bee2990951b2b13c1)! However, we have not yet had the time to make use of it. We definitely need to integrate this work to get the exact version ordering as expected from conda.
We need to verify that the conda test suite is passing so that we get a chance at upstreaming this work eventually. This includes evaluating the optimization strategies used by libsolv vs conda.
Cache parsed repository data into .solv files, the libsolv binary format. Using this caching format makes repo loading a matter of milliseconds.

About QuantStack

QuantStack is located in the center of Europe (Paris). We build Open Source Software for a living — from creating fresh conda packages to robot applications, from high performance computing to interactive C++ and Jupyter widgets. If you’re interested in our services, do not hesitate to drop us a line. http://quantstack.net/

Making conda fast again

Explaining how we’ve created the mamba prototype, a solver for conda environments that is hopefully fast enough to support a conda-forge with hundreds of thousands of packages.

Ongoing work

About QuantStack

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Wolf Vollprecht

Responses (1)

More from Wolf Vollprecht

Cross-platform conda packages for ROS

2020 has been a busy year for the RoboStack project: we collaboratively published ros-noetic on four platforms (Windows, macOS, Linux x64…

Introducing scikit-geometry

The Python ecosystem is lacking a library with useful geometric types — we aim to fix this by introducing scikit-geometry. This new…

A Diagram Editor for JupyterLab

With the success of the notebook file format as a medium for communicating scientific results, more than an interactive development…

The xtensor vision

Here we’re laying out a vision for the xtensor project, the n-dimensional array in the C++ language — that makes it easy to write…

Recommended from Medium

This new IDE from Google is an absolute game changer

This new IDE from Google is seriously revolutionary.

How I Learned to Love `init.py`: A Simple Guide😊

💡 Heads Up! Click here to unlock this article for free if you’re not a Medium member!

Hypermodern Python Toolbox 2025

Python tools setting the standard in 2025.

FinGPT: The Future of Financial Analysis — Revolutionizing Markets with Open-Source AI

Discover how FinGPT is disrupting traditional financial tools like Bloomberg Terminal, making powerful analytics accessible for everyone —…

End of Jupyter Notebook.Marimo is a reactive Python notebook

For long years Jupyter Notebook is ruling as leading notebook in entire python language there was no alternative for it.But now we have a…

How to Install pyenv and Manage python Version on your Local Machine

Guideline for Windows, MacOS, and Ubuntu

Making conda fast again

Explaining how we’ve created the mamba prototype, a solver for conda environments that is hopefully fast enough to support a conda-forge with hundreds of thousands of packages.

Ongoing work

About QuantStack

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Wolf Vollprecht

Responses (1)

More from Wolf Vollprecht

Cross-platform conda packages for ROS

2020 has been a busy year for the RoboStack project: we collaboratively published ros-noetic on four platforms (Windows, macOS, Linux x64…

Introducing scikit-geometry

The Python ecosystem is lacking a library with useful geometric types — we aim to fix this by introducing scikit-geometry. This new…

A Diagram Editor for JupyterLab

With the success of the notebook file format as a medium for communicating scientific results, more than an interactive development…

The xtensor vision

Here we’re laying out a vision for the xtensor project, the n-dimensional array in the C++ language — that makes it easy to write…

Recommended from Medium

This new IDE from Google is an absolute game changer

This new IDE from Google is seriously revolutionary.

How I Learned to Love `__init__.py`: A Simple Guide😊

💡 Heads Up! Click here to unlock this article for free if you’re not a Medium member!

Hypermodern Python Toolbox 2025

Python tools setting the standard in 2025.

FinGPT: The Future of Financial Analysis — Revolutionizing Markets with Open-Source AI

Discover how FinGPT is disrupting traditional financial tools like Bloomberg Terminal, making powerful analytics accessible for everyone —…

End of Jupyter Notebook.Marimo is a reactive Python notebook

For long years Jupyter Notebook is ruling as leading notebook in entire python language there was no alternative for it.But now we have a…

How to Install pyenv and Manage python Version on your Local Machine

Guideline for Windows, MacOS, and Ubuntu

How I Learned to Love `init.py`: A Simple Guide😊