Overview
Achieving reproducibility in Python ensures that your code runs exactly the same way on your colleague’s machine, a remote server, or your own computer six months from now. Using conda as a package and environment manager is one of the most effective ways to reach this goal.
Detailed Information
Dedicated Project Directories
Every new project should live in its own unique directory. This folder acts as the "source of truth" for all scripts, data, and configuration files associated with that specific task.
Defining the environment.yml File
Instead of installing packages manually via the command line, you should define your environment in a declarative environment.yml file. This file acts as a blueprint for the conda environment.
Essential Components:
-
Name: A unique identifier for the environment.
-
Channels: The locations where conda looks for packages (e.g., defaults, conda-forge).
-
Dependencies: The specific packages and version numbers required.
Example Structure:
YAML
name: data-analysis-project
channels:
- conda-forge
- defaults
dependencies:
- python=3.10
- pandas=2.1.0
- scikit-learn
- pip:
- specific-utility-pkg==1.0.2
-
Benefit: Enables one-command setup. Anyone with this file can run conda env create -f environment.yml to recreate your exact workspace, eliminating "it works on my machine" errors.
-
How-To: Create a Virtual Conda Environment From a YML File via Command Line Interface:
https://tdx.umsystem.edu/TDClient/36/DoIT/KB/ArticleDet?ID=2215
Iterative Updates and Maintenance
As a project progresses, you will inevitably need new libraries. Rather than just running conda install, you should update your environment.yml file and then update the environment.
| Action |
Command |
Benefit |
| Add Package |
Edit .yml dependencies |
Keeps the blueprint synchronized with the actual code. |
| Update Env |
conda env update -f environment.yml |
Ensures your local environment matches the documented requirements. |
| Prune Env |
conda env update --prune |
Removes packages no longer listed in the .yml, keeping the environment lean. |
Finalizing for Portability
Upon project completion, it is best practice to "freeze" your dependencies. While you might start with broad versions (e.g., python=3.10), the final version of your file should ideally include specific version strings.
-
Exporting: You can generate a fully pinned file using conda env export > environment.yml.
-
Benefit: Guarantees long-term stability. Even if a library releases a breaking update three years from now, your project will continue to use the legacy version it was built for, preserving your results.
-
How-To: Backup a Virtual Conda Environment Configuration via Command Line Interface
https://tdx.umsystem.edu/TDClient/36/DoIT/KB/ArticleDet?ID=2211
For Additional Assistance
Conda User Guide Creating Projects: https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/creating-projects.html