Inside conda environments: A brief walkthrough with a practical example

In the world of software development and data science, managing dependencies is one of the most challenging aspects of any project. Whether you’re building a machine learning model, developing a web application, or conducting scientific research, your project relies on specific libraries and tools. These dependencies often have their own requirements, which can conflict with other projects or even the operating system itself.

Conda environments provide a solution to this problem by creating isolated spaces where each project can have its own dedicated set of dependencies. This isolation ensures that projects don’t interfere with one another, allowing you to work seamlessly across multiple workflows without worrying about version conflicts or breaking system tools.

But why is this isolation so important? What happens if you don’t use Conda (or similar tools) to manage your environments? Let’s start by exploring some real-world scenarios where things can go wrong without proper dependency management.

The Perils of System Python: Why Isolation Is Critical

Most Linux distributions come with a built-in Python installation (e.g., /usr/bin/python3) that is essential for running system utilities like yum, dnf, or apt. If you modify this system Python installation—say, by installing or upgrading a library globally—you risk breaking critical system tools.

Real-World Example: Breaking SSH with Global Library Updates

Imagine you’re working on a project that requires the latest version of the cryptography library. You decide to install it globally using pip:

# Installing a newer cryptography library globally  
$ pip install cryptography==42.0 --user  

This seems harmless until you try to use ssh-keygen, which depends on an older version of cryptography:

$ ssh-keygen -t rsa  
ImportError: /lib/x86_64-linux-gnu/libcrypto.so.1.1: version `OPENSSL_1_1_1' not found  

By upgrading cryptography globally, you’ve broken OpenSSL dependencies that are critical for SSH functionality. Now your system tools are unusable, and fixing this mess can be time-consuming and frustrating.

How Conda Saves the Day

With Conda environments, you can isolate your project’s dependencies from the rest of the system:

# Safe installation in an isolated environment  
$ conda create -n safe_env python=3.10 cryptography=42.0  
$ conda activate safe_env  
# SSH remains functional outside this environment  

This way, your project gets the specific version of cryptography it needs without interfering with system tools or other projects.

Conda’s Storage Architecture: Where Files Live

To understand how Conda achieves this isolation, let’s take a closer look at its storage architecture.

Key Directories in Conda

When you install Conda (via Miniconda or Anaconda), it creates several key directories:

Base Environment: The core Conda installation, typically located at ~/miniconda3/ or ~/anaconda3/.
Environments: Each environment is stored in its own directory under ~/miniconda3/envs/. For example, an environment named data_science would live at ~/miniconda3/envs/data_science/.
Package Cache: Downloaded package archives are stored in ~/.conda/pkgs/ or ~/miniconda3/pkgs/. This cache allows Conda to reuse packages across environments, saving both time and storage space.
Configuration: Settings for channels and storage paths are stored in the .condarc file located in your home directory (~/.condarc).

Example: Creating a new environment for data analysis:

$ conda create -n data_analysis python=3.9 numpy pandas matplotlib  

This command creates a directory at ~/miniconda3/envs/data_analysis/, which contains:

A Python 3.9 interpreter specific to this environment.
Libraries like NumPy, Pandas, and Matplotlib installed in the environment’s site-packages/ directory.
Environment-specific binaries and metadata files.

When you activate this environment using conda activate data_analysis, Conda modifies your shell’s environment variables (particularly PATH) to prioritize the binaries and libraries in this directory.

A Practical Example: Building and Using a Custom Package (`simple_math`)

Let’s take things one step further by creating our own custom package called simple_math. This will help us understand how packages work within Conda environments and how they can be installed locally or published for others to use.

Step 1: Package Structure

Here’s what our package directory looks like:

simple_math/  
├── setup.py  
└── simple_math/  
    └── operations.py  

operations.py:

def add(a, b):  
    return a + b  

def factorial(n):  
    return 1 if n &lt;= 1 else n * factorial(n-1)  

setup.py:

from setuptools import setup  

setup(  
    name="simple_math",  
    version="0.1",  
    packages=["simple_math"],  
)  

Installation Methods: Local vs Editable vs Published

1. Local Installation (Non-Editable)

Use Case: Installing a stable version of your package.

Method:

$ python setup.py sdist bdist_wheel
$ pip install dist/simple_math-0.1-py3-none-any.whl

How It Works:

Copies the package to the environment’s site-packages/ directory.
Changes to the source code won’t reflect until you reinstall.

2. Editable Installation (`pip install -e`)

Use Case: Actively developing a package and testing changes live.

Method:

$ pip install -e .

How It Works:

Creates a symlink to your source code directory in site-packages/.
Changes to the code immediately take effect without reinstalling.

Ideal For:

GitHub repositories where you’re contributing to the codebase.
Projects with frequent code updates.

Example Workflow:

# Clone repository from GitHub
$ git clone https://github.com/you/simple_math

# Create and activate Conda environment
$ conda create -n dev_env python=3.9
$ conda activate dev_env

# Install package in editable mode
$ pip install -e .

# Test changes instantly without reinstalling
&gt;&gt;&gt; import simple_math
&gt;&gt;&gt; simple_math.add(5, 3)
8

3. Private Publishing (Organization Use)

Use Case: Sharing packages internally without public exposure.

Options:

Conda Private Channel: Host a channel on an internal server.
Artifactory/Nexus: Use enterprise package managers.

Steps:

# Build a Conda package
$ conda build .

# Upload to private channel
$ anaconda upload --private /path/to/package.tar.bz2

4. Public Publishing (Conda-Forge/PyPI)

Use Case: Open-source distribution.

Conda-Forge Steps:

Submit a recipe to staged-recipes.
Maintainers review and merge it.
Bots automate future updates.

PyPI Steps:

# Build and upload package
$ python setup.py sdist bdist_wheel
$ twine upload dist/*

Key Differences Between Installation Types

Feature	`pip install .`	`pip install -e .`
Installation Type	Copies files to site-packages	Symlinks to source directory
Code Updates	Requires reinstallation	Instant
Disk Usage	Higher (duplicate files)	Lower (shared files)
Use Case	Stable releases	Active development

Optimizing Storage: What to Clean

Conda environments and cached packages can consume significant disk space over time.

Common Culprits:

Unlinked Packages: Orphaned files in ~/.conda/pkgs/.
Old Environments: Unused directories under ~/miniconda3/envs/.
Temporary Files: Leftover build artifacts.

Pro Tips:

# Find largest environments
$ du -sh ~/miniconda3/envs/* | sort -hr

# Clean aggressively (saves ~5–15 GB)
$ conda clean --all --yes

Conclusion

Conda environments are indispensable for modern development workflows, offering isolation, reproducibility, and efficiency:

Editable Installs (pip install -e) are perfect for active development workflows tied to GitHub repositories.
Wheel/PyPI Publishing is ideal for distributing stable Python packages publicly.
Conda Private Channels allow secure sharing within organizations.
Conda-Forge Publishing is best suited for complex dependencies requiring cross-platform compatibility.

By understanding Conda’s isolation mechanics and publishing options, you can optimize both your workflow and storage usage while ensuring reproducibility across projects.

Whether you’re debugging dependency conflicts or distributing tools globally, Conda provides the infrastructure needed to keep your projects lean and scalable!

The Perils of System Python: Why Isolation Is Critical

Real-World Example: Breaking SSH with Global Library Updates

How Conda Saves the Day

Conda’s Storage Architecture: Where Files Live

Key Directories in Conda

A Practical Example: Building and Using a Custom Package (simple_math)

Step 1: Package Structure

Installation Methods: Local vs Editable vs Published

1. Local Installation (Non-Editable)

Method:

How It Works:

2. Editable Installation (pip install -e)

Method:

How It Works:

Ideal For:

Example Workflow:

3. Private Publishing (Organization Use)

Options:

Steps:

4. Public Publishing (Conda-Forge/PyPI)

Conda-Forge Steps:

PyPI Steps:

Key Differences Between Installation Types

Optimizing Storage: What to Clean

Common Culprits:

Pro Tips:

Conclusion

A Practical Example: Building and Using a Custom Package (`simple_math`)

2. Editable Installation (`pip install -e`)