RamaLama simplifies a lot the system setup for running LLM models by abstracting the management of libraries and dependencies. It even allows you to take advantage of hardware acceleration for running the models, even on macOS systems running on Apple Silicon (M-series).
As RamaLama uses Linux containers, it requires a Linux kernel, what is provided, under macOS (and Microsoft Windows) by a “hidden” virtual machine. This same workaround is used by Podman and Docker, and is done in a quasi-transparent way, as one still have to create and start the virtual machine that will run the actual containers.
With podman
you can use:
$ podman machine init
$ podman machine start
If you execute these commands, you’ll have a Linux virtual machine running, configured with 4 CPUs, 100GiB of storage and 2GiB (2048MiB) of memory. The configuration can check the machine data with:
$ podman machine list NAME VM TYPE CREATED LAST UP CPUS MEMORY DISK SIZE podman-machine-default* applehv 24 minutes ago 18 minutes ago 4 2GiB 100GiB
Test your podman configuration:
$ podman run --rm quay.io/podman/hello
Now you can use Homebrew to install RamaLama:
# brew install ramalama
Trying to run RamaLama with this machine will give you a warning, and ask if you want to continue:
$ ramalama list
Warning! Your VM podman-machine-default is
using applehv, which does not support GPU.
Only the provider libkrun has GPU support.
See `man ramalama-macos` for more information.
Do you want to proceed without GPU? (yes/no):
You can check the the virtual machine does not have direct access to the GPU with
$ podman machine ssh
(...)
core@localhost:~$ ls /dev/dri
ls: cannot access '/dev/dri': No such file or directory
To setup a podman machine that can take advantage of the M-series GPU,
you’ll need to create a custom machine, that will use
libkrun instead of applehv
.
Before the machine can be created, krunkit
needs to be available on
the host, and I suggest you install it through Homebrew
$ brew tap slp/krunkit
$ brew install krunkit
Now create a new podman machine, using the libkrun
provider:
$ export CONTAINERS_MACHINE_PROVIDER="libkrun"
$ podman machine init
$ podman machine start
Note that you have to export
the CONTAINERS_MACHINE_PROVIDER
variable,
as any other podman command will require it to select the proper machine.
(Otherwise, it will try to use the default value, which is applehv
, and
will try to access the podman-machine-default
).
Check the machine type:
$ podman machine list NAME VM TYPE CREATED LAST UP CPUS MEMORY DISK SIZE ramalama libkrun 22 minutes ago Currently running 4 2GiB 100GiB
And check direct access to the GPU:
$ podman machine ssh ramalama ls /dev/dri
by-path
card0
renderD128
Now we can take advantage of the Apple Silicon GPU when running RamaLama.
With the environment properly configured, retrieve one of the available models. You can see a list of available models (accessible through their short names) with:
$ ramalama info
The first time you run ramalama with a model, it will download a container
image, and it can take a while, but no visual feedback is provided, so,
for the downloading the first model, I suggest using the --debug
option:
$ ramalama --debug pull gemma3
After downloading the model, you can start it and run a CLI prompt with:
$ ramalama run gemma3
Or you can start it as a server that can be accessed through any browser:
$ ramalama run gemma3
With this configuration, any RamaLama tool can be used, such as the nice way it provides to improve model response with RAG.
Running containers and tools that use containers on macOS has improved a lot in the last few years (2?).
As Linux containers are somehow “alien software” for the operating system, some work may be required specially when some dedicated hardware support is need, as is the case of making the GPU available for AI inference.
There’s still some (lots of!) room for improvement on the user experience side of running software in containers on a Mac, but it looks like the gap on some basic or not-so-basic workloads is not that big anymore, even when compared to running native containers in Linux.
This are some issues that you may hit:
If podman can’t run a container, check if the podman machine was created and started.
If RamaLama can’t start due to podman being unable to access the podman
machine, check if the machine is running, and check if you don’t have
another machine running (e.g. the podman-default-machine
). RamaLama does
not seem to be smart enough to select the machine based on
CONTAINERS_MACHINE_PROVIDER
, and it may get the default machine. Try
stopping the podman-default-machine
and run RamaLama again. As a last
resort, remove the podman-default-machine
(it can be recreated at any time).
If things start to get too slow, without without any visual feedback,
stop execution (CTLR+C) and restart with --debug
. Most operations restart
from where it stopped (specially those involving downloads), and debug may
calm down your anxiety. (It helped with mine.)