How to use Catalyst with Lightning-GPU on Qubernetes
Published on March 7, 2025.
Lightning-GPU is a high-performance CUDA-backed state-vector simulator for PennyLane. This device enables fast quantum circuit simulations on NVIDIA GPUs and has been recently integrated with Catalyst.
In this guide, you will find out how to execute efficiently the quantum routines described in the PennyLane’s blog post How to use Catalyst with Lightning-GPU that introduced this capability.
You will learn how to start a Qubernetes project and adapt the PennyLane demo to execute the routines on a remote cluster with the necessary computational capabilities for Catalyst and Lightning-GPU. This allows you to perform the demo experiments without needing a local machine with compatible hardware or having to install the required software locally.
Setting Up the Environment
Before we start, you need to have the following pre-requisites:
- Access to a Kubernetes cluster that has has nodes with NVIDIA GPU devices with CUDA 12.8 support. The configuration file
kubeconfig
that allows you to connect to the remote cluster should be placed in the project root directory. You can check the CUDA compatibility of your GPU on the NVIDIA website. - A Docker Hub account to store the images built for the project.
Next, install the q8s
package. It provides the Qubernetes CLI tool q8sctl
that allows you to build and execute your projects on the cluster. You can install the package using the following command:
pip install q8s
Adapting the PennyLane Demo
We start by adding the demo script to the project. The following is an extract of the script that we will run on the remote cluster.
import pennylane as qml
import jax.numpy as jnp
import jax
# Set number of wires
num_wires = 28
# Set a random seed
key = jax.random.PRNGKey(0)
dev = qml.device("lightning.gpu", wires=num_wires)
@qml.qjit(autograph=True)
@qml.qnode(dev)
def circuit(params):
# Apply layers of RZ and RY rotations
for i in range(num_wires):
qml.RZ(params[3*i], wires=[i])
qml.RY(params[3*i+1], wires=[i])
qml.RZ(params[3*i+2], wires=[i])
return qml.expval(qml.PauliZ(0) + qml.PauliZ(num_wires-1))
# Initialize the weights
weights = jax.random.uniform(key, shape=(3 * num_wires,), dtype=jnp.float32)
circuit(weights)
>>> 1.7712995142661776
Next, we create the project configuration file Q8Sproject
. The configuration file defines the project name, the Python dependencies common to all targets, and Docker registry username. The targets
section defines the dependencies specific to the CUDA environment as gpu
, and optionally for the local environment as cpu
.
name: catalyst-lightning-gpu-demo
python_env:
dependencies:
- pennylane
- pennylane-catalyst
targets:
cpu:
python_env:
dependencies:
- jax
- pennylane-lightning
gpu:
python_env:
dependencies:
- jax[cuda12]
- pennylane-lightning-gpu
docker:
username: vstirbu
The resulting project structure is as follows:
- kubeconfig
- Q8Sproject
- demo.py
Running the Demo
Now that the project is set up, we are ready to run the demo on the remote cluster. The first step is to build the target gpu
image, and push it to the Docker Hub:
q8sctl build --target gpu
Once the images are built we can run the demo by executing the following command:
q8sctl execute --target gpu --kubeconfig kubeconfig demo.py
The command will start the execution of the demo.py
script on the GPU target.
Benefits of Using Qubernetes
Qubernetes offers scalability, allowing you to seamlessly expand workloads across various computational configurations such as CPUs, GPUs, and QPUs. In this example, we used PennyLane with Catalyst and Lightning-GPU, but Qubernetes is flexible to support other quantum libraries as well.
Another advantage is portability. By leveraging containerized deployments, Qubernetes guarantees that your projects will run consistently across different environments. You can easily share the Q8Sproject configuration with your team members.
Efficiency is also a key benefit of Qubernetes. Its design optimizes resource management for GPU workloads by utilizing computational resources only during execution. Once a job is complete, the resources are released and can be used immediately for other tasks.
Challenges and Solutions
Running the quantum routines using Catalyst with Lightning-GPU requires CUDA 12.8 compatible hardware. If you have the necessary hardware, you can configure a self-hosted cluster using the k3s Kubernetes distribution, with the required updates for supporting CUDA workloads. Alternatively, you can use a cloud provider that offers managed Kubernetes environments that offer GPU instances with the required CUDA version.
Conclusion
In this guide, we have shown how to run quantum programs with Catalyst compiler and Lightning-GPU simulator on Qubernetes. You have learned how to set up the environment, adapt the PennyLane demo, and run the demo on a remote cluster. By following these steps, you can efficiently experiment with demanding quantum routines even if your local machine does not have the necessary hardware.
The code used in the guide is available on the q8s-examples repository. Feel free to explore the repository and adapt the examples to your needs.
Happy coding! 🚀