Running tasks with Celery on Heroku guide

An example project and a basic guide showing how to run Django/Celery on Heroku.

Basic requirements

First of all, let's actually set up a typical Django project for this. We would need virtualenvwrapper for that. One could use any other particular method. I prefer this one.

$ cd dev
$ mkvirtualenv dch
(dch) $ pip install django
(dch) $ django-admin startproject djheroku
(dch) $ cd djheroku
# Make sure is working:
(dch) $ ./manage.py runserver

From now I will consider working on a terminal with this (dch) environment on.

Heroku hosting setup

We would need our project set up for heroku python server. The docs live HERE, as for moment of this guide writing. One would need to follow and setup a basic heroku project. I will not stop here rewriting official guide as it is good enough.

Installing celery

Assuming we have a basic django dyno at heroku here we will continue.
Now let's install Celery and add it to our requirements list (as we had just started, let's just overwrite requirements.txt here):

$ pip install 'celery[redis]'
$ pip freeze > requirements.txt

Let's touch our settings.py adding the following snippet:
And include "djcelery" into INSTALLED_APPS tuple.

Redis broker

Another option would be a Redis-based broker. AMQP is great, but three connections are barely enough - it's a really tight limitation. RedisToGo addon allows for 10 connections, so we may consider using it instead. Both RabbitMQ and Redis brokers are considered stable and fully featured. Let's install the addon and Python module for Redis:

$ heroku addons:add rediscloud
Adding rediscloud on happy-holliday-1467... done, v10 (free)
Use `heroku addons:docs rediscloud` to view documentation.

$ echo 'redis==2.10.3' >> requirements.txt
$ pip install redis==2.10.3

Now we need to add certain settings to configure settings in Django project:

BROKER_URL = BROKER_URL = os.environ.get("REDISCLOUD_URL", "django://")
BROKER_POOL_LIMIT = 1
BROKER_CONNECTION_MAX_RETRIES = None

CELERY_TASK_SERIALIZER = "json"
CELERY_ACCEPT_CONTENT = ["json", "msgpack"]
CELERYBEAT_SCHEDULER = 'djcelery.schedulers.DatabaseScheduler'

if BROKER_URL == "django://":
    INSTALLED_APPS += ("kombu.transport.django",)

BROKER_TRANSPORT_OPTIONS = {
    "max_connections": 2,
}
BROKER_POOL_LIMIT = None

We need to set the REDISCLOUD_URL after this is done in heroku app settings. (At the hosting control panel.

Continue with broker setup

Let's store our process by doing a commit. And since djcelery app has some models, also apply migrations:

$ git add djheroku/settings.py requirements.txt
$ git commit -m 'Add Celery support'
[master 43afd41] Add Celery support
 2 files changed, 46 insertions(+)
$ git push heroku master
...
-----> Installing dependencies with pip
       Installing collected packages: amqp, anyjson, billiard, celery, django-celery, kombu, pytz
...
$ heroku run python manage.py migrate
...

Staying in a free tier with a single dyno

To save money on the start by not using the second dyno at all. From a Procfile we'll start a process manager that would run multiple processes for us. This just can't scale at all (any attempts to scale would give unpredictable results), but we could easily revise this at a later time. The only issue is, since this will be the web dyno, it will be killed ("sleeping" in Heroku terms) if no requests happen within one hour. Since we have a scheduler, we could probably work around this limitation by sending an HTTP request to ourselves, though. Let's consider we've added Celery worker to Procfile using one of the above methods. In this tutorial I'll stick to Python-only, Honcho.

$ echo 'honcho==1.0.1' >> requirements.txt
$ pip install honcho==1.0.1

We'll need a workers declared in a Procfile. Then we'll swap the file with a "proxy" one:

$ git mv Procfile Procfile.real

And change the Procfile.real with:

web: gunicorn helloworld.wsgi --log-file -
worker: python manage.py celery worker --loglevel=info
beat: python manage.py celery beat --loglevel=info

This Original Procfile (that is executed by heroku) should look like this:

web: env > .env; env PYTHONUNBUFFERED=true honcho start -f Procfile.real 2>&1

Now we should commit and push to heroku and connect to heroku loggin to check if everything went well:

$ heroku logs -t | cut -c34-

Another downside of this hack is messy logging. But it's the prices of a "free" compromise.

Celery essentials

Now, we're done with the setup so let's actually write some tasks and their management code. First of all, let's create celery.py. A simple task that'd fetch an URL and return a status code would look as following:

import os
from celery import Celery
from django.conf import settings


# Lets the celery command line program know where project settings are.
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'djheroku.settings')

# Creates the instance of the Celery app.
app = Celery('djheroku')

app.config_from_object('djheroku:settings', namespace='CELERY')

# Set up autodiscovery of tasks in the INSTALLED_APPS.
app.autodiscover_tasks(lambda: settings.INSTALLED_APPS)

if __name__ == '__main__':
    app.start()

Now we have a command file to running celery on heroku server dyno. Next step is to add a sample task in a file called tasks.py. It will be auto-collected by celery:

from celery import task


@task()
def echoe():
    """
    A simple task that echoes a Hello World! text to celery console.
    """
    print('Hello World!')

Testing stuff

We can now test this locally by running our server on one terminal instance:

$ ./manage.py runserver 0.0.0.0:8000

And a sample celery console with built in beat process as a debug purpose worker:

$ celery worker --loglevel=info --beat

Both those terminals instances will emulate a working heroku environment that we have just created.
Now we can trigger our sample script:

$ celery call echoe

This will trigger a task and put into celery beat queue. We can observe it's execution after some time passed on the celery worker console.
That's basically it.
Time to commit our changes and push to heroku.

Here is a git repository: https://github.com/garmoncheg/djheroku

Django: Resetting Passwords (with internal tools)

I have had a task recently. It was about adding a forms/mechanism for resetting a password in our Django based project. We have had our own registration system ongoing... It's a corporate sector project. So you can not go and register yourself. Admins (probably via LDAP sync) will register your email/login in system. So you have to go there and only set yourself a password. For security reasons you can not register. One word. First I've tried to find standart decision. From reviewed by me were: django-registration and django password-reset . These are nice tools to install and give it a go. But I've needed a more complex decision. And the idea was that own bicycle is always better. So I've thought of django admin and that it has all the things you need to do this yourself in no time. (Actually it's django.contrib.auth part of django, but used out of the box in Admin UI) You can find views you need for this in there. they are: password_reset password_reset_...

Programmer blog

Search This Blog