The mamba project and the CZI grant

As announced by the Chan Zuckerberg Initiative the mamba project has been selected for funding as part of the essential open source software for science, grant cycle 4 (EOSS-4). We wanted to let everybody know what awesome things this funding enables. Here are our 6 deliverables!

5 min readOct 19, 2021

--

For those who don’t know yet: mamba is a fast, cross-platform & non-language-specific package manager widely used in the scientific space. Mamba works with conda -packages and works great in tandem with the conda-forge channel — a huge collection of community maintained packages for Windows, macOS and Linux.

Network of mirrors support in mamba

We want mamba to be able to understand the concept of a mirror. In the (Linux) software packaging world, mirrors are very common — many universities and internet providers are hosting mirrors of repositories to enable global, speedy access. We would like it if mamba understands that there are e.g. 3 hosts to choose from to get to packages from the conda-forge channel.

We are basing this work on an existing library called librepo. Unfortunately we couldn’t use librepo directly because it’s written in C and uses the glib-library. Therefore we are reimplementing the core ideas in C++: especially the mirror handling, automatic mirror sorting and fastest mirror selection.

The initial parts are already being implemented in the powerloader toolkit — with the aim of making the tool so general that it could be even useful for other package managers. Furthermore (and this is not part of librepo) powerloader can down- and upload to S3 buckets and OCI registries (yep, that’s right, OCI registries as in Docker registries can actually host any kind of binary artifact and it’s already used quite successfully for the homebrew project).

A faster package management experience

To make the package management experience faster we want to investigate a couple of things.

  • The first idea is to use the zchunk library to only download necessary new bits of the repodata.json file. The repodata doesn’t change that much over time (only new metadata is added) but for the conda-forge channel it currently weighs ~13Mb gzip’ped. That’s a lot of data to download until we get to an “interactive” state. Using zchunk has the potential to dramatically shrink that. Again, we’re inspired by librepo here and the initial implementation of this is in the powerloader toolkit.
  • Another nice speed improvement is to extract packages in parallel. Currently (thanks to limitations in the way libarchive works) decompression is done in a single thread. That can be a limiting factor, especially with a very good network. A contributor has already explored multi-process decompression in this pull request with very encouraging results (30–80% speed up). We’re looking forward to implementing and rolling this out soon.
  • If time permits, we would also like to investigate a library called taskflow to see if we can also parallelise the file-linking step (when files are linked from the cache into the final environment destination).

Improved error message handling

Solver error messages can be a pain — oftentimes the “real reason” is hidden between the lines. While mamba/libsolv already somewhat improve the situation over the conda default solver error messages, they are still far from perfect! An error message these days can look something like

package python-3.6.13-hffdb5ce_0_cpython requires openssl >=1.1.1j,<1.1.2a, but none of the providers can be installed

which does not really explain why openssl of this version cannot be installed.

A package manager framework that claims to have taken great care of solver error messages is the PubGrub solver, used originally by the Dart language package manager. There is also a very nice blogpost explaining how it works, as well as great documentation.

We want to deeply analyse how PubGrub does its magic. This involves two parts: we would like to try the Rust implementation of PubGrub and see if we can make it work on conda metadata (and if we get good error messages!). And then we would like to see if we can integrate similar methods to produce error messages into libsolv or if we should find other ways (for example, additionally running the Rust pubgrub implementation if solving fails).

Package Signing for conda-forge

In 2021 conda & mamba have gained the ability to cryptographically verify signatures attached to package metadata. We would like conda-forge to adopt the content-trust framework and to cryptographically sign each build artifact. This will also help with the mirroring of channels as the digital signatures can be used to verify authenticity.

For this work we would like to extend the tooling available in conda-forge to add a secret key to each Azure pipeline that can be used to sign a release and then coordinate on a roll out.

Increase the sandbox-ability of virtual environments

We have some crazy ideas on how to sandbox virtual environments. It is by far the most “experimental” milestone on this list as we are not sure how we can make this work in practice. The different operating systems offer different ways of sandboxing environments: for example on Linux a simple sandbox way would be to use chroot.

We would also like to selectively restrict network access, or access to the (home)-folder and so on — to make sure that a randomly installed package cannot get access to sensitive data.

We think this can be implemented e.g. via ad-hoc containerization (Docker or systemd-nspawn and similar solutions). Hopefully we can also find some good ideas in other packaging systems such as Flatpak (on Linux) and figure out the native solutions on Windows and macOS.

Organizing PackagingCon 2021 and 2022

Last but not least — as part of the CZI grant we promised to organise PackagingCon! It’s a completely new event in 2021, but we’ve already been able to secure ~60 talks from many different packaging ecosystems.

PackagingCon 2021 is happening on November 9th and 10th. All the details can be found on https://packaging-con.org and the current schedule on https://pretalx.com/packagingcon-2021/schedule/.

The idea of PackagingCon is to bring together these different ecosystems and figure out what common problems we all have. Ideally we can then work together on problems to make package management more fun and easy for all of us — and not reinvent the wheel at every corner!

Thanks for reading

We hope that these goals are well-received by the community. You can reach out to us on one of the chat rooms: conda-forge gitter or the new mamba-org gitter. Please do let us know your feedback!

Also we are always looking forward to help new contributors with their first PRs! Feel free to interact with us on GitHub.

--

--

Wolf Vollprecht

Written by Wolf Vollprecht

I work as a scientific and robotics software developer for QuantStack in Paris and Berlin. We do Open Source for a living!

Responses (1)