Inside conda environments: A brief walkthrough with a practical example
In the world of software development and data science, managing dependencies is one of the most challenging aspects of any project. Whether you’re building a machine learning model, developing a web application, or conducting scientific research, your project relies on specific libraries and tools. These dependencies often have their own requirements, which can conflict with other projects or even the operating system itself.
Conda environments provide a solution to this problem by creating isolated spaces where each project can have its own dedicated set of dependencies. This isolation ensures that projects don’t interfere with one another, allowing you to work seamlessly across multiple workflows without worrying about version conflicts or breaking system tools.
But why is this isolation so important? What happens if you don’t use Conda (or similar tools) to manage your environments? Let’s start by exploring some real-world scenarios where things can go wrong without proper dependency management.
The Perils of System Python: Why Isolation Is Critical
Most Linux distributions come with a built-in Python installation (e.g., /usr/bin/python3
) that is essential for running system utilities like yum
, dnf
, or apt
. If you modify this system Python installation—say, by installing or upgrading a library globally—you risk breaking critical system tools.
Real-World Example: Breaking SSH with Global Library Updates
Imagine you’re working on a project that requires the latest version of the cryptography
library. You decide to install it globally using pip
:
# Installing a newer cryptography library globally
$ pip install cryptography==42.0 --user
This seems harmless until you try to use ssh-keygen
, which depends on an older version of cryptography
:
$ ssh-keygen -t rsa
ImportError: /lib/x86_64-linux-gnu/libcrypto.so.1.1: version `OPENSSL_1_1_1' not found
By upgrading cryptography
globally, you’ve broken OpenSSL dependencies that are critical for SSH functionality. Now your system tools are unusable, and fixing this mess can be time-consuming and frustrating.
How Conda Saves the Day
With Conda environments, you can isolate your project’s dependencies from the rest of the system:
# Safe installation in an isolated environment
$ conda create -n safe_env python=3.10 cryptography=42.0
$ conda activate safe_env
# SSH remains functional outside this environment
This way, your project gets the specific version of cryptography
it needs without interfering with system tools or other projects.
Conda’s Storage Architecture: Where Files Live
To understand how Conda achieves this isolation, let’s take a closer look at its storage architecture.
Key Directories in Conda
When you install Conda (via Miniconda or Anaconda), it creates several key directories:
- Base Environment: The core Conda installation, typically located at
~/miniconda3/
or~/anaconda3/
. - Environments: Each environment is stored in its own directory under
~/miniconda3/envs/
. For example, an environment nameddata_science
would live at~/miniconda3/envs/data_science/
. - Package Cache: Downloaded package archives are stored in
~/.conda/pkgs/
or~/miniconda3/pkgs/
. This cache allows Conda to reuse packages across environments, saving both time and storage space. - Configuration: Settings for channels and storage paths are stored in the
.condarc
file located in your home directory (~/.condarc
).
Example: Creating a new environment for data analysis:
$ conda create -n data_analysis python=3.9 numpy pandas matplotlib
This command creates a directory at ~/miniconda3/envs/data_analysis/
, which contains:
- A Python 3.9 interpreter specific to this environment.
- Libraries like NumPy, Pandas, and Matplotlib installed in the environment’s
site-packages/
directory. - Environment-specific binaries and metadata files.
When you activate this environment using conda activate data_analysis
, Conda modifies your shell’s environment variables (particularly PATH) to prioritize the binaries and libraries in this directory.
A Practical Example: Building and Using a Custom Package (simple_math
)
Let’s take things one step further by creating our own custom package called simple_math
. This will help us understand how packages work within Conda environments and how they can be installed locally or published for others to use.
Step 1: Package Structure
Here’s what our package directory looks like:
simple_math/
├── setup.py
└── simple_math/
└── operations.py
operations.py:
def add(a, b):
return a + b
def factorial(n):
return 1 if n <= 1 else n * factorial(n-1)
setup.py:
from setuptools import setup
setup(
name="simple_math",
version="0.1",
packages=["simple_math"],
)
Installation Methods: Local vs Editable vs Published
1. Local Installation (Non-Editable)
Use Case: Installing a stable version of your package.
Method:
$ python setup.py sdist bdist_wheel
$ pip install dist/simple_math-0.1-py3-none-any.whl
How It Works:
- Copies the package to the environment’s
site-packages/
directory. - Changes to the source code won’t reflect until you reinstall.
2. Editable Installation (pip install -e
)
Use Case: Actively developing a package and testing changes live.
Method:
$ pip install -e .
How It Works:
- Creates a symlink to your source code directory in
site-packages/
. - Changes to the code immediately take effect without reinstalling.
Ideal For:
- GitHub repositories where you’re contributing to the codebase.
- Projects with frequent code updates.
Example Workflow:
# Clone repository from GitHub
$ git clone https://github.com/you/simple_math
# Create and activate Conda environment
$ conda create -n dev_env python=3.9
$ conda activate dev_env
# Install package in editable mode
$ pip install -e .
# Test changes instantly without reinstalling
>>> import simple_math
>>> simple_math.add(5, 3)
8
3. Private Publishing (Organization Use)
Use Case: Sharing packages internally without public exposure.
Options:
- Conda Private Channel: Host a channel on an internal server.
- Artifactory/Nexus: Use enterprise package managers.
Steps:
# Build a Conda package
$ conda build .
# Upload to private channel
$ anaconda upload --private /path/to/package.tar.bz2
4. Public Publishing (Conda-Forge/PyPI)
Use Case: Open-source distribution.
Conda-Forge Steps:
- Submit a recipe to staged-recipes.
- Maintainers review and merge it.
- Bots automate future updates.
PyPI Steps:
# Build and upload package
$ python setup.py sdist bdist_wheel
$ twine upload dist/*
Key Differences Between Installation Types
Feature | pip install . |
pip install -e . |
---|---|---|
Installation Type | Copies files to site-packages | Symlinks to source directory |
Code Updates | Requires reinstallation | Instant |
Disk Usage | Higher (duplicate files) | Lower (shared files) |
Use Case | Stable releases | Active development |
Optimizing Storage: What to Clean
Conda environments and cached packages can consume significant disk space over time.
Common Culprits:
- Unlinked Packages: Orphaned files in
~/.conda/pkgs/
. - Old Environments: Unused directories under
~/miniconda3/envs/
. - Temporary Files: Leftover build artifacts.
Pro Tips:
# Find largest environments
$ du -sh ~/miniconda3/envs/* | sort -hr
# Clean aggressively (saves ~5–15 GB)
$ conda clean --all --yes
Conclusion
Conda environments are indispensable for modern development workflows, offering isolation, reproducibility, and efficiency:
- Editable Installs (
pip install -e
) are perfect for active development workflows tied to GitHub repositories. - Wheel/PyPI Publishing is ideal for distributing stable Python packages publicly.
- Conda Private Channels allow secure sharing within organizations.
- Conda-Forge Publishing is best suited for complex dependencies requiring cross-platform compatibility.
By understanding Conda’s isolation mechanics and publishing options, you can optimize both your workflow and storage usage while ensuring reproducibility across projects.
Whether you’re debugging dependency conflicts or distributing tools globally, Conda provides the infrastructure needed to keep your projects lean and scalable!