Anaconda Introduction

Creation date: 6/27/2023 1:38 PM    Updated: 9/12/2023 10:34 AM   anaconda cluster conda elzar hpc software

Anaconda Introduction

Robert Petkus, 6/27/23

  • Introduction to Anaconda
  • Why Anaconda is useful
  • Using Anaconda
  • Installing Anaconda
  • Anaconda Basics
  • Tips and tricks
  • Exercise
  • Anaconda vs. Miniconda

1. Introduction to Anaconda

Anaconda is a popular Python and R distribution that simplifies package management and deployment. It is designed to make it easy to install and manage data science packages, tools, and environments. It comes with a powerful package and environment manager called Conda, which helps manage dependencies and create isolated environments for different projects.

2. Why Anaconda is useful

  • Simplifies package management and installation
  • Supports multiple programming languages (Python and R)
  • Provides a large number of pre-built packages for data science and machine learning
  • Allows for easy creation and management of isolated environments for different projects
  • Cross-platform compatibility (Windows, macOS, and Linux)

3. Using Anaconda

On a personal system (workstation, laptop) - you will need to download and install Anaconda.
On Elzar you can use the version of Anaconda provided by Easybuild or download and install a version in your home directory
Easybuild:


> module load EBModules
> module load Anaconda3


4. (optional) Installing Anaconda safely in your home directory

👉 You can skip to Step 5 if you are using Anaconda as a module
To install Anaconda in your home directory, follow these steps:

  • Download the latest version of Anaconda from the official website
  • Open a terminal and navigate to the directory where the downloaded file is located.
  • Run the following command to start the installation:


> bash Anaconda3-2021.05-Linux-x86_64.sh


Notes:
  • Current version of Anaconda may differ from the following example.
  • The latest version of Anaconda requires Python 3.10.
  • On Elzar (bamdev1/2) you will want to load a version of Python > 3.10 before proceeding (e.g., module load Python/3.10.8-GCCcore-12.2.0)
  • Follow the on-screen prompts to complete the installation. Make sure to choose a custom installation location within your home directory.
  • Once the installation is complete, close and reopen your terminal. Verify that Anaconda is installed by running:


> conda --version

5. Anaconda Basics

Create a new environment:
> conda create --name myenv

Example: Create a new environment named "mytest" and install Python version 3.7:
> conda create --name mytest python=3.7

Activate an environment:

> conda activate mytest


👉 When activating an environment for the first time, you will need to initialize conda, then logout and login again:
> conda init bash


Deactivate an environment:

> conda deactivate

List all environments:
> conda env list

Remove an environment:
> conda env remove --name mytest

Update Anaconda to the latest version:
> conda update -n base conda

Install packages (after activating environment):
> conda install package_name

Update packages:
> conda update package_name

Remove packages:
> conda remove package_name

List installed packages in a particular environment:
> conda list -n mytest


6. Tips and Tricks

Keep your base environment clean by always working in a dedicated environment for each project.


Channels

  • Conda channels are locations where conda packages are stored and downloaded from.  A conda channel can be a URL or a name that maps to a URL.
  • Conda channels can host different versions of the same package.  Conda can prioritize or sort them based on criteria.

Configure channels to use by default:

> conda config --add channels conda-forge

List currently configured channels:
> conda config --show channels
> conda config --show default-channels channels

Search

Use the Conda Forge channel to install packages that are not available in the default channel:
> conda install -c conda-forge package_name

Export and Sharing

Export an environment for sharing:
> conda env export > my_env.yml

Create an environment from an existing yml file:
> conda env create -f my_env.yml


7. Exercise

Objective: gain hands-on experience creating, managing, and exporting Conda environments
  1. Create a new Conda environment named "exercise" and activate it.
  2. Install the following packages in the "exercise" environment: numpy, pandas, matplotlib, and scikit-learn.
  3. Export the environment to a file named "exercise_environment.yml".
  4. Deactivate and remove the "exercise" environment.
  5. Create a new environment named "exercise_recreated" using the "exercise_environment.yml" file and verify that the required packages are installed.

8. Supplemental: Anaconda vs. Miniconda

Miniconda is a minimal version of Anaconda that includes only Conda and Python.  It does not come with any pre-installed packages.  Users install only the packages they required.


Advantages

  • Smaller in size and thus faster to download and install
  • Allows users to install only the packages they required, reducing potential conflicts and ensuring a clean environment
  • Suitable for experienced users who prefer to have full control over their package installations

Disadvantages

  • Requires manual installation of packages - time consuming when a large number of libraries are needed
  • Less beginner-friendly due to the lack of pre-installed packages and additional setup required.

Anaconda is a better choice for users who want a comprehensive, ready-to-use solution. 
Miniconda is more suitable for users who prefer a minimal, customizable setup.