Informational: Conda: Creating Portable and Reproducible Projects with Conda

Overview

Achieving reproducibility in Python ensures that your code runs exactly the same way on your colleague’s machine, a remote server, or your own computer six months from now. Using conda as a package and environment manager is one of the most effective ways to reach this goal.

Detailed Information

Dedicated Project Directories

Every new project should live in its own unique directory. This folder acts as the "source of truth" for all scripts, data, and configuration files associated with that specific task.

Defining the environment.yml File

Instead of installing packages manually via the command line, you should define your environment in a declarative environment.yml file. This file acts as a blueprint for the conda environment.

Essential Components:

  • Name: A unique identifier for the environment.

  • Channels: The locations where conda looks for packages (e.g., defaults, conda-forge).

  • Dependencies: The specific packages and version numbers required.

Example Structure:

YAML

name: data-analysis-project
channels:
    - conda-forge
    - defaults
dependencies:
    - python=3.10
    - pandas=2.1.0
    - scikit-learn
    - pip:
        - specific-utility-pkg==1.0.2

  • Benefit: Enables one-command setup. Anyone with this file can run conda env create -f environment.yml to recreate your exact workspace, eliminating "it works on my machine" errors.

  • How-To: Create a Virtual Conda Environment From a YML File via Command Line Interface:
    https://tdx.umsystem.edu/TDClient/36/DoIT/KB/ArticleDet?ID=2215

Iterative Updates and Maintenance

As a project progresses, you will inevitably need new libraries. Rather than just running conda install, you should update your environment.yml file and then update the environment.

Action Command Benefit
Add Package Edit .yml dependencies Keeps the blueprint synchronized with the actual code.
Update Env conda env update -f environment.yml Ensures your local environment matches the documented requirements.
Prune Env conda env update --prune Removes packages no longer listed in the .yml, keeping the environment lean.

 

Finalizing for Portability

Upon project completion, it is best practice to "freeze" your dependencies. While you might start with broad versions (e.g., python=3.10), the final version of your file should ideally include specific version strings.

  • Exporting: You can generate a fully pinned file using conda env export > environment.yml.

  • Benefit: Guarantees long-term stability. Even if a library releases a breaking update three years from now, your project will continue to use the legacy version it was built for, preserving your results.

  • How-To: Backup a Virtual Conda Environment Configuration via Command Line Interface
    https://tdx.umsystem.edu/TDClient/36/DoIT/KB/ArticleDet?ID=2211

 

For Additional Assistance

Conda User Guide Creating Projects: https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/creating-projects.html

Print Article

Related Articles (12)

Quick start guide to using Anaconda for Python on University-managed computers including installation without administrator credentials, utilization of unique conda environments for each project, adding additional channels for packages and backing up configurations for portability and reproducibility. Please see linked articles for detailed how-to instructions and the additional assistance links for more documents related to the subject.
Quick start guide to using Miniconda for Python on University-managed computers including installation without administrator credentials, utilization of unique conda environments for each project, adding additional channels for packages and backing up configurations for portability and reproducibility. Please see linked articles for detailed how-to instructions and the additional assistance links for more documents related to the subject.
Backup a Conda Environment Configuration via Command Line Interface into a YAML formatted file.
Compare three versions of the Conda engine to help you pick the best one for your technical skills and computer type. You can choose Anaconda Navigator for a beginner-friendly, "point-and-click" experience, or go with Miniconda or Miniforge if you prefer a lightweight, professional setup that saves disk space. While they differ in size and interface, all three create isolated "sandboxes" to ensure your software projects don't interfere with each other.
A central guide for researchers to build stable and high-performance coding environments using the Conda ecosystem. It brings together best practices—such as choosing the right distribution and using "blueprints" to share work—to help you avoid technical conflicts and ensure your research can be perfectly recreated by others. By following these strategies, you can protect your productivity and make your software projects both portable and reliable.
How to safely update your software tools using either a simple "point-and-click" dashboard or a few quick commands. While staying up to date provides the latest security and features, the guide cautions against updating in the middle of a project to prevent unexpected changes to your current work. Using separate, unique environments for different projects is recommended to keep your updates organized and your computer stable.
Conda channels act like specialized "app stores" or online libraries where you can find and download different software packages for your projects. This article explains how to choose between major stores like the curated Anaconda Defaults or the massive, community-run Conda-Forge. It also covers how to set a "priority" for these stores so your computer always knows which one to check first to keep your software stable and up-to-date.
Think of Conda as a way to create separate, private "workspaces" on your computer for different projects so their software settings don't get mixed up or cause errors. While other tools only handle one coding language, Conda is a "heavy hitter" that can manage almost any software component, making it the top choice for complex fields like Data Science and AI. By using these isolated spaces, you keep your computer stable and make it easy to share your exact setup with others.
Conda acts as a smart organizer for your coding projects, ensuring that all the necessary software "ingredients" work together perfectly without breaking your computer's setup. It supports both Python and R, making it easy to manage complex tools and share your exact environment so your work runs reliably on any machine.
Index of Articles for Managed Virtual Conda Environments
Index of Miniconda How-To Articles for Virtual Conda Environments