Problem2

Motivation

I have long been a fan of Python and Django. Over the years of use, I found that it really lives up to their tagline of being the web framework for perfectionists with deadlines. And as a freelancer, I have often found myself with some really tight deadlines, and nothing else would allow me to ship a stable working product nearly as fast as Django does.

And yet Django, like Python in general, has a significant downside - it can be painfully slow. Which is where Rust comes in. It's a language that compiles down to machine code, so any code you write in in will be as fast as you can get.

Now, one could rewrite the whole codebase in Rust and have it blazing fast. But that is obviously a lot of work, and in programming, laziness is the mother of wisdom. While that may be the right thing to do for a large team with proper funding, it is not really an option for a lone freelancer on a tight budget of both time and money. Additionally, one would lose most of what makes Django so magical - the ability to create generic views, handle forms, and write templates with very little code. Instead, a programmer with deadlines can decide to compromise - to follow Amdahl's law and rewrite only the slowest, most often used parts of the code. This means there are now (at least) two parts of the codebase, one written in Python with Django, and one in Rust. And it's here that first real problems appear.

I personally went down this path twice already, and I can safely say that the second time was much easier. In this post, I collected some of my findings that will hopefully make your first time easier too.

Django ↔ Rust Communication

There must be some way of making function calls and passing data between the two languages. Fortunately, there are many to choose from, each with its own set of tradeoffs. You will probably notice that a lot of them suffer from Django's name mangling - adding various hashes to cache keys, channel names, etc.

A Python module written in Rust

The excellent PyO3 crate allows you to export Rust functions to Python effortlessly. If you ever had to use Swig or SIP to write Python modules in C or C++, then you will appreciate how great PyO3 is. It uses a small set of Rust macros to generate all the needed glue code. It can even integrate logging and async runtimes between the two languages.

However, while PyO3 is great when you're doing intensive number crunching or text manipulation in a single view, it does not give as much of a benefit in most web services. For one thing, it means the web request still comes through the Python layer, using Django's routing, views, and request objects. For another, such a Python module has limited ways of calling back to Python code if needed. Finally, a lot of the slowness in Django comes from database access, which brings us to the next topic.

Accesing the Django database from Rust

Fortunately, Rust can help us here us well. Django creates regular tables in the SQL database which we can connect to from Rust. For example, using Diesel, we can easily import all the tables by connecting to the same database (for example, with the same DATABASE_URL environment variable) and running diesel print-schema. A useful thing to note here is that Django handles all the database migrations, so you don't have to (but also can't if you wanted to) perform them in Rust.

There are some caveats here though. One, Django prefers to do some things in Python code rather than in SQL. Two items that are relevant here are default values and ON DELETE triggers. Both have tickets that are many years old and not likely to be supported any time soon. This means that your Rust code will not be able to use the default values you set to model fields in Django, and you will not be able to automatically cascade delete objects from Rust, both have to be dealt with explicitly.

Update from 2024: Database-based default values are now supported, as of Django 5.0. So things are even easier than they were at the time of writing of this post.

Diesel itself comes with its own limitations. As its documentation notes, it has limits on how many columns a table can have. Django has no such limits, so depending on your project, some of your tables may be way too big. In this case, you still have multiple options of working around it

Increase the table size limits by enabling the 64-column-tables or 128-column-tables. This ought to be enough for everybody, but as with many things that ought to be, sometimes it is not. Enabling these features also significantly increase the compilation times.
If your Rust code needs only a subset of columns, you can manually remove them from the table! macro invocation in the generated schema file.
Rewrite your models to split some fields off to a new table, for example with a OneToOneField to the original large model.
Use a different method of accesing the database, such as SQLx or postgres. With these, you give up a the ORM features of Diesel, but you still benefit from Rust's speed and safety.

Other languages

Both PyO3 and Diesel are specific to Rust, but the approaches mentioned above are not. Python modules can be written in other languages too, just not as conveniently, and because of our deadlines they are not really practical for us. Similarly, most programming languages can access SQL databases. The main upside of Rust here is that Diesel can generate type-safe code from an existing database schema.

The following solutions are more generic. If you can connect to Redis or if you can make HTTP requests, you're good to go.

Redis

Redis is already a very common method of sharing data between multiple services and has client libraries is many languages. Python and Rust are no exception here. You can connect to the same Redis instance from both sides, and use it as a shared data store as well as a messaging channel with PubSub.

If you want to use Redis directly, there there is nothing specific to Django or Rust to do here. However, Django can also optionally use Redis for some internal functionality, namely cache, channels, and user sessions. Some strategies for integrating with these are laid out in the following sections.

Cache {#redis-cache}

Django can use Redis as a backend for its cache mechanism. Because of this, Django and Rust code can relatively easily use the same cache. There are just two things that one has to keep in mind - Django prepends version numbers to all cache keys, and it serializes its value with pickle.

Version numbers are easy to work around - most likely you Django code doesn't even use version numbers when caching, in which case it is always set to 1. On the Rust side, make sure to always prepend :1: to all your cache keys - for example, a Django cache key of my-key will become :1:my-key in Redis. This applies both to reading and writing cache values, if you forget the prefix your Rust code will get a (nil) value, and Django will simply not see your writes. If you do use cache versions numbers, you need some logic to figure out the latest entry, just like Django does.

Pickled values are much more complicated to work with. Python's pickle format is very general - in addition to values, it can also serialize classes and functions, which cannot be interpreted by Rust code. Accesing Django's cache in Rust thus requires some disciplines on both sides to only serialize simple types. On the Rust side, pickled values can be manipulated using the serde-pickle crate. Its documentation lists the supported types which may be used from the Python side.

Websockets and Django Channels

Django has channels to support websockets and other kinds of message passing, using Redis as the backend. With proper configuration, you can listen or send messages to these channels directly from Rust.

Note that if you're using channels just to send updates to the client, you probably don't want to do this - instead you can serve websockets directly from Rust code. Without touching Python, this way is much faster and less resource demanding, but it likely requires user authentication in the Rust handler - see the Authentication section for that. If this is the case for your application, you can safely skip the rest of this section. If, however, your websocket consumers in Python also do other work, such as processing data, sending notifications, or calling Celery tasks, read ahead.

First, make sure to use the new RedisPubSubChannelLayer from channels_redis. The older RedisChannelLayer in the same library would also work, but the pubsub-based implementation is both faster and easier to connect to. Set up your channel layer like so:

CHANNEL_LAYERS = {
    "default": {
        "BACKEND": "channels_redis.pubsub.RedisPubSubChannelLayer",
        "CONFIG": {
            "hosts": [env("CHANNEL_LAYER_URL")],
            "prefix": env("CHANNEL_LAYER_PREFIX"),
        },
    },
}

The prefix, if set, is important - all channel names in Redis will have this prefix, so you will have to use the same prefix from Rust code.

With django-channels, individual channels correspond to a single websocket connection, and their names are obscured (looking like specific.2e2785bc50e7460fa34083f616e09f14), so you will not get far trying to send messages to a single channel. However, they can be added to groups, in which case they receive all messages sent to their groups. Groups are also backed with Redis pubsub channels, but in this case they have predictable and readable names - if the prefix specified above is test, a group named notifications will create a pubsub channel named test__group__notifications.

The messages sent to these channels are pickled, just like the values in the Django cache. See the Cache section for details on serializing and deserializing. In addition, django channels requires a certain structure to the messages - they must be dictionaries, and must have a type key. The value corresponding to the type key is a string containing the name of the method on the Consumer class that will be called. If you're using channels from Python code, you have to structure your messages like this already, so it's really just doing the same thing in Rust.

Finally, there is another way to send messages between Rust and Django channels. Instead of publishing to the same pubsub channel as your websocket, you can create a proxy service that brokers messages between the two "universes". This introduces overhead and higher latency, but may be simpler to code since you don't have to deal with pickling. As a Django management command, it looks like this

from django.core.management.base import BaseCommand
from django.conf import settings
from channels.layers import get_channel_layer
import asyncio
import aioredis

async def monitor(channel, group):
    # Set up plain Redis connection
    redis = await aioredis.create_redis_pool(settings.REDIS_URL)
    [ch] = await redis.subscribe(channel)

    # Set up Django Channels connection
    layer = get_channel_layer()

    while await ch.wait_message():
        message = await ch.get()
        # Here, we could read contents to dispatch to multiple channels
        if message is not None:
            await layer.group_send(
                group, {"type": "message_received", "message": str(message)}
            )


class Command(BaseCommand):
    help = "Subscribe to a Redis channel and send data to Django Channels"

    def add_arguments(self, parser):
        parser.add_argument("channel", help="Source Redis channel")
        parser.add_argument("group", help="Destination Django Channels group")

    def handle(self, channel, group, *args, **options):
        asyncio.run(monitor(channel, group))

On the sending side (which can be in Rust or in any other language), you then publish messages to the same channel

    let mut con = redis_pool.get().await?;
    let _ = con.publish(channel, message).await?;

This one handles updates from Rust code (using plainly named Redis channels) and pushes them to Django Channels. One could write a similar monitor command to pass updates the other way, from Python to Rust.

Calling HTTP endpoints

Finally, you can always just have one web service make HTTP calls to the other. This approach is very general and can do pretty much everything, including passing serialized data. A big upside is also that the services can run on different hosts and are easy to scale (e.g. with Kubernetes). The downside compared to the above methods is speed - even if the services are physically together, there is still some latency and the overhead of a HTTP connection. However, since most of our site is written in Python, we probably don't care about performance that much.

Sessions & Authentication {#sessions-and-authentication}

Django comes with many batteries included, and user authentication is one of them. You really don't have to think about it. But if you do want to think about it, you have packages like allauth that lets you authenticate with practically anything. As perfectionists with deadlines, we definitely do not want to roll our own authentication code, so we stick with Django for that.

Checking the current user

However, we often do want to make sure that a user is logged in, and optionally that they have the necessary permissions to see a certain page or perform a certain operation. This most often comes up when Rust is used to serve a endpoint that is reachable directly from the browser, not when it's called by server-side Python code. This may be a full user-visible page, but more commonly you're using Django and its templates for those, while Rust serves less visible components like APIs and websockets. In all these cases, though, you often want to authenticate a Django user from Rust code.

This is a case where an outside language comes close to being able to resolve user sessions, but not quite there. Django can use different session backends - in production, you probably use the database, the cache, or a combination of the two. Both the database and the cache can be accessed from Rust, as we saw in the previous sections. However, there is another catch: whether in the database or in the cache, the session data is stored encoded, pickled, and cryptographically signed. The complete encoding scheme is not documented, and the format can change between versions, so loading the session data would be complicated, and would immediately throw out the "with deadlines" restriction. It would make a very useful project for someone without deadlines though.

In my case, I went with the much simpler solution of creating a HTTP endpoint in Python that does nothing but verify that the user is logged in. Then, in Rust, whenever I have to get the current user, I just call that endpoint, while making sure to forward the sessionid header from the user request.

Using a Rust web framework like Axum or Rocket, you probably want to abstract this to a separate extractor object. First, get the session ID from a cookie.

/// Extracts a Django user session ID from the `sessionid` cookie
pub struct DjangoSession(pub String);

#[async_trait]
impl<'a, B> FromRequest<B> for DjangoSession
where
    B: Send,
{
    type Rejection = Error;

    async fn from_request(req: &mut RequestParts<B>) -> Result<Self, Self::Rejection> {
        let cookies = Cookies::from_request(req)
            .await
            .map_err(|(status_code, text)| {
                Error::MissingSession(format!("Could not read cookies: {} {}", status_code, text))
            })?;
        let cookie = cookies
            .get("sessionid")
            .ok_or_else(|| Error::MissingSession(String::from("No sessionid cookie provided")))?;
        Ok(DjangoSession(String::from(cookie.value())))
    }
}

Then, whenever needed, load the user data from the Django service. Since calling Python code is always going to be slow, it makes sense to set up a simple caching mechanism using Redis, as is done in the example below.

/// Holds user information
#[derive(Clone, Debug, Deserialize)]
pub struct UserInfo {
    pub id: i32,
    pub username: String,
    pub first_name: String,
    pub last_name: String,
    pub full_name: String,
    pub email: String,
    pub is_superuser: bool,
}

impl UserInfo {
    // Verifies the session ID using a Redis cache or the Django service.
    // If the session has expired or the user has logged out, this will return an error.
    async fn from_session(session_id: &str, state: &State) -> Result<Self> {
        let cache_key = format!("session:{session_id}");
        let mut redis_conn = state.redis.get().await?;
        let value: Option<String> = redis_conn.get(&cache_key).await?;

        let json = if let Some(value) = value {
            // If the user data is in the cache, we can use it right away
            value
        } else {
            // Otherwise, we have to call the Django service to get the user
            let header = format!("sessionid={session_id}");
            let json = state
                .client
                .get("http://daphne:8000/api/session/")
                .header(COOKIE, header)
                .send()
                .await?
                .text()
                .await?;
            // Then, cache the returned value for a short time to prevent frequent repeated calls
            let () = redis_conn
                .set_ex(&cache_key, &json, Duration::minutes(60).num_seconds() as usize)
                .await?;
            json
        };

        Ok(serde_json::from_str(&json)?)
    }
}

The Python view serving this API is simple, it only needs to serialize the data, since authentication is done by Django already.

def validate_session(request):
    u = request.user
    if u.is_authenticated:
        return JsonResponse(
            {
                "id": u.id,
                "username": u.username,
                "first_name": u.first_name,
                "last_name": u.last_name,
                "full_name": u.get_full_name(),
                "email": u.email,
                "is_superuser": u.is_superuser,
            }
        )
    else:
        return HttpResponseForbidden()