在一台服务器,一个工程内做三个虚拟网站

How to handle multiple sites in Django: the problem

Consider a set up where the same Django project has a bunch of apps that could be reached from multiple domains:

Django apps and multiple domain names

With a default set up every request to www.example-a.dev, www.example-b.dev, or www.example-c.dev is free to reach the URL configuration of any installed app. This could harm SEO, especially for content-heavy Django applications.

What I mean is that if you have a blog app with any number of pages, the same pages will be served on www.example-a.dev, www.example-b.dev, and www.example-c.dev.

It's not unusual to have a single Django project serving requests for multiple domain names, and it's important to isolate the requests so the right domain is paired with the corresponding Django app.

In this tutorial you'll learn how to handle multiple sites in Django.

 

Dear Host header, we need to talk

Before getting into details we need to remember that an HTTP header named "Host" is sent any time a browser makes a request to a website. You can check the header by opening a browser's console. In the Network tab (Firefox in this example) look for the Host request header (open up the console and visit any website):

The host header is sent alongside with HTTP requests

If you're on a command line you can use curl to send the appropriate header with each request. For example:

curl -v -H "Host:www.fake.com" https://www.djangoproject.com/

Try to send "www.fake.com" as the host header to djangoproject.com's server, it will simply ignore your request and respond with an error. If instead you provide the correct Host header, the server will be happy to fulfill your request:

curl -v -H "Host:www.djangoproject.com" https://www.djangoproject.com/

So the Host header tells the server "ehi, I'm looking for this site!", and the server replies with the corresponding content, associated with a virtual host.

While Django can easily handle the Host header as we'll see in a moment, by default it doesn't care whether you're looking for www.example-a.dev or www.example-b.dev, it simply replies to any request on any path, regardless of the Host header.

Let's see how to fix the issue.

 

How to handle multiple sites in Django: first, a test

Let's first define the desired outcome with a test. To follow along with this tutorial make sure to create a new Django project.

With the project in place create two Django apps, one named blog:

django-admin startapp blog

and another named links:

django-admin startapp links

Next up enable the apps in settings.py:

INSTALLED_APPS = [
    "django.contrib.admin",
    "django.contrib.auth",
    "django.contrib.contenttypes",
    "django.contrib.sessions",
    "django.contrib.messages",
    "django.contrib.staticfiles",
    # Enable the new apps
    "blog.apps.BlogConfig",
    "links.apps.LinksConfig",
]

Still in settings.py make sure to configure ALLOWED_HOSTS to allow anything (don't do that in production!):

ALLOWED_HOSTS = ["*"]

Now open up blog/tests.py and let's start with a couple of tests. What we want is that the blog application should reply only on www.example-a.dev, not on www.example-b.dev or www.example-c.dev.

We can test this with the Django test client, by passing the host header:

from django.test import TestCase, Client


class BlogTest(TestCase):
    def test_should_respond_only_for_example_a(self):
        client = Client(HTTP_HOST="www.example-a.dev")

To keep things clean we'll call the Django view through its name with reverse:

from django.test import TestCase, Client
from django.urls import reverse


class BlogTest(TestCase):
    def test_should_respond_only_for_example_a(self):
        client = Client(HTTP_HOST="www.example-a.dev")
        view = reverse("index")

And with the client in place we can test the actual response:

from django.test import TestCase, Client
from django.urls import reverse


class BlogTest(TestCase):
    def test_should_respond_only_for_example_a(self):
        client = Client(HTTP_HOST="www.example-a.dev")
        view = reverse("index")
        response = client.get(view)
        self.assertEqual(response.status_code, 200)

Now let's write two other tests for the domains that we don't want to respond to. In this case we can expect a 404 not found:

from django.test import TestCase, Client
from django.urls import reverse


class BlogTest(TestCase):
    def test_should_respond_only_for_example_a(self):
        client = Client(HTTP_HOST="www.example-a.dev")
        view = reverse("index")
        response = client.get(view)
        self.assertEqual(response.status_code, 200)

    def test_should_not_respond_for_example_b(self):
        client = Client(HTTP_HOST="www.example-b.dev")
        view = reverse("index")
        response = client.get(view)
        self.assertEqual(response.status_code, 404)

    def test_should_not_respond_for_example_c(self):
        client = Client(HTTP_HOST="www.example-c.dev")
        view = reverse("index")
        response = client.get(view)
        self.assertEqual(response.status_code, 404)

Now let's run the tests with:

python manage.py test

They should fail with django.urls.exceptions.NoReverseMatch: Reverse for 'index' not found. 'index' is not a valid view function or pattern name.. Let's add views and urls to the mix!

 

Wiring up urls and views

Create a new file, blog/urls.py and inside this file configure a first named url. With our setup there's no need to namespace the app because Django will pick just one URL conf for request:

from django.urls import path
from .views import index

# not needed
# app_name = "blog"

urlpatterns = [path("blog/", index, name="index")]

Let's also create a simple view named index in blog/views.py (we'll skip the template for now):

from django.http import HttpResponse


def index(request):
    html = (
        "<html><body><h1>Imagine there's a list of blog posts here!</h1></body></html>"
    )
    return HttpResponse(html, charset="utf-8")

Now open up django_quick_start/urls.py and include the new app:

from django.contrib import admin
from django.urls import path, include

urlpatterns = [
    path("", include("blog.urls")),
    path("admin/", admin.site.urls),
]

Finally let's run the test again:

python manage.py test

You should see the first test passing, while the others are failing:

FAIL: test_should_not_respond_for_example_b (blog.tests.BlogTest)
    self.assertEqual(response.status_code, 404)
AssertionError: 200 != 404

FAIL: test_should_not_respond_for_example_c (blog.tests.BlogTest)
    self.assertEqual(response.status_code, 404)
AssertionError: 200 != 404

That's predictable. Nothing stops Django from serving the same view for two different domains.

As a dirty trick we could allow just www.example-a.dev in ALLOWED_HOSTS, but that's not what we want. Remember that www.example-b.dev and www.example-c.dev need to be handled by our project too, but from different apps.

With a test and a basic structure in place let's try to fix the problem.

 

How to handle multiple sites in Django: a first attempt

In every Django view we're free to access the request object. The request object holds all the details about the current request.

There's this dictionary on request.META for example that carries all the request headers, host included. To access the host header on META we can do:

host = request.META["HTTP_HOST"]

So in our view:

from django.http import HttpResponse


def index(request):
    host = request.META["HTTP_HOST"]

    html = (
        "<html><body><h1>Imagine there's a list of blog posts here!</h1></body></html>"
    )
    return HttpResponse(html)

But even better we can use get_host():

from django.http import HttpResponse


def index(request):
    host = request.get_host()

    html = (
        "<html><body><h1>Imagine there's a list of blog posts here!</h1></body></html>"
    )
    return HttpResponse(html)

Given that we can check the host in the view then, how about something like this?

from django.http import HttpResponse
from django.http import Http404


def index(request):
    if not request.get_host() == "www.example-a.dev":
        raise Http404

    html = (
        "<html><body><h1>Imagine there's a list of blog posts here!</h1></body></html>"
    )
    return HttpResponse(html)

We check for the request host, and if it's not www.example-a.dev we raise a 404 error. But this approach is hacky, dirty, and not scalable. We could extract that snippet in a function, but still we'll need to copy-past the function in each view.

There should be a better way! But first let's see how Django processes our requests.

 

When Django receives a request ...

When Django receives a request it has to pick a root url configuration from which it extracts all the paths available for our users. By default the root urlconf is defined in ROOT_URLCONF. For example our simple project has the following root url configuration:

ROOT_URLCONF = "django_quick_start.urls"

If you remember, this url configuration includes our blog application as well:

from django.contrib import admin
from django.urls import path, include

urlpatterns = [
    path("", include("blog.urls")),
    path("admin/", admin.site.urls),
]

Now imagine to have two or more apps in this file:

from django.contrib import admin
from django.urls import path, include

urlpatterns = [
    path("", include("blog.urls")),
    path("", include("links.urls")),
    path("admin/", admin.site.urls),
]

blog has a path named blog/, and links could have a bunch of other paths like links/, links/create/ and so on.

Whenever a new request comes in, be it from www.example-a.dev, www.example-b.dev, or www.example-c.dev, Django doesn't know how to route them. It makes all the paths available to any domain name. Unless we tell it to serve one and only url configuration.

How? With a middleware!

 

How to handle multiple sites in Django: the middleware

I already talked about Django middleware in Building a Django middleware. The middleware is a piece of code that hooks into the request/response lifecycle.

Django has already its own set of middlewares included in the MIDDLEWARE setting in django_quick_start/settings.py:

MIDDLEWARE = [
    "django.middleware.security.SecurityMiddleware",
    "django.contrib.sessions.middleware.SessionMiddleware",
    "django.middleware.common.CommonMiddleware",
    "django.middleware.csrf.CsrfViewMiddleware",
    "django.contrib.auth.middleware.AuthenticationMiddleware",
    "django.contrib.messages.middleware.MessageMiddleware",
    "django.middleware.clickjacking.XFrameOptionsMiddleware",
]

Let's add a custom middleware in this list since we're going to create our own for handling virtual hosts. It could be VirtualHostMiddleware:

MIDDLEWARE = [
    # our custom middleware
    "django_quick_start.virtualhostmiddleware.VirtualHostMiddleware",
    "django.middleware.security.SecurityMiddleware",
    "django.contrib.sessions.middleware.SessionMiddleware",
    "django.middleware.common.CommonMiddleware",
    "django.middleware.csrf.CsrfViewMiddleware",
    "django.contrib.auth.middleware.AuthenticationMiddleware",
    "django.contrib.messages.middleware.MessageMiddleware",
    "django.middleware.clickjacking.XFrameOptionsMiddleware",
]

(django_quick_start is the sample project that you should've been created with this tutorial).

Now create a skeleton for the middleware in a new file named django_quick_start/virtualhostmiddleware.py:

class VirtualHostMiddleware:
    def __init__(self):
        pass

    def __call__(self):
        pass

The middleware can be a class, with its __init__ and __call__ methods. In the __init__ method we accept and prepare get_response:

class VirtualHostMiddleware:
    def __init__(self, get_response):
        self.get_response = get_response

    def __call__(self):
        pass

In the __call__ method instead we accept and prepare the request, and we return the response as well:

class VirtualHostMiddleware:
    def __init__(self, get_response):
        self.get_response = get_response

    def __call__(self, request):
        response = self.get_response(request)
        return response

Now we're ready to switch the root url configuration depending on the host header. This is what the Django documentation says:

Django determines the root URLconf module to use. Ordinarily, this is the value of the ROOT_URLCONF setting, but if the incoming HttpRequest object has a urlconf attribute (set by middleware), its value will be used in place of the ROOT_URLCONF setting.

urlconf is the same kind of string passed in include. That means we can use a simple mapping between hosts and url strings (I suggest to use named constants in a real project):

virtual_hosts = {
    "www.example-a.dev": "blog.urls",
    "www.example-b.dev": "links.urls",
    "www.example-c.dev": "links.urls",
}


class VirtualHostMiddleware:
    def __init__(self, get_response):
        self.get_response = get_response

    def __call__(self, request):
        response = self.get_response(request)
        return response

In this mapping we forward requests for www.example-a.dev to the blog app, while all the rest goes to the links app. Of course you can imagine more hosts and more apps in your dictionary. You can also place the mapping inside __init__.

Now in __call__ we check the mapping, and for each virtual host we assign the url configuration:

virtual_hosts = {
    "www.example-a.dev": "blog.urls",
    "www.example-b.dev": "links.urls",
    "www.example-c.dev": "links.urls",
}


class VirtualHostMiddleware:
    def __init__(self, get_response):
        self.get_response = get_response

    def __call__(self, request):
        # let's configure the root urlconf
        host = request.get_host()
        request.urlconf = virtual_hosts.get(host)
        # order matters!
        response = self.get_response(request)
        return response

Pay attention to the order here. The request must be configured before is passed to get_response. Now don't forget to remove the per-view check in blog/views.py:

from django.http import HttpResponse


def index(request):
    """
    this goes away now!
    if not request.get_host() == "www.example-a.dev":
        raise Http404
    """

    html = (
        "<html><body><h1>Imagine there's a list of blog posts here!</h1></body></html>"
    )
    return HttpResponse(html)

At this point you should be able to run the tests again:

python manage.py test

And they'll pass:

Ran 3 tests in 0.007s

OK

Great job!

 

A note on url namespaces

It is a good idea in Django to namespace each application's url:

from django.urls import path
from .views import index

app_name = "blog"

urlpatterns = [path("blog/", index, name="index")]

Here the namespace for this app is "blog", and the index view is "index". Whenever we want to reference this view in our code we can use reverse:

from django.urls import reverse

# omit
view = reverse("blog:index")
# omit

Other places where you may want to reverse a view's name are generic views. Take this CreateView for example:

class BlogPostCreateView(CreateView):
    model = BlogPost
    form_class = BlogPostForm
    success_url = reverse_lazy("blog:published")

Here I overwrite success_url to redirect users to a view named "blog:published". Now the problem here is that with our virtual host middleware we effectively make the namespace redundant.

When we set request.urlconf in the middleware to a single URL conf, that configuration will be the only group of paths to be loaded. While the test will continue to work with "blog:something", any other reference to this namespace will be invalid.

You'll see a lot of django.urls.exceptions.NoReverseMatch: 'blog' is not a registered namespace. The solution is to use just the view name, without the namespace prefix:

class BlogPostCreateView(CreateView):
    model = BlogPost
    form_class = BlogPostForm
    success_url = reverse_lazy("published")

You'll get the same error in a template with the url tag. So instead of:

<form action="{% url "blog:post_create" %}" method="post">
<!-- omit -->
</form>

do:

<form action="{% url "post_create" %}" method="post">
<!-- omit -->
</form>

 

Wrapping up

In this tutorial you learned how to create a Django middleware for handling multiple domain names in the same Django project. We created two applications, blog, and links.

Any call to www.example-a.dev is handled by blog, while all the rest goes to links. Calls for www.example-b.dev or www.example-c.dev are not allowed to reach the blog app.

This approach works fine for small to medium projects when you don't want to touch any non-Django server side configuration. The same outcome of course is attainable with Nginx and multiple instances of Django. Pick your own ride!

As an exercise try to add a couple of paths to the link app, with some tests.

Thanks for reading and stay tuned on this blog!

posted @ 2023-07-20 15:18  花生与酒  阅读(20)  评论(0编辑  收藏  举报