Running tasks with Celery on Heroku guide

An example project and a basic guide showing how to run Django/Celery on Heroku.

Basic requirements

First of all, let's actually set up a typical Django project for this. We would need virtualenvwrapper  for that. One could use any other particular method. I prefer this one.
$ cd dev
$ mkvirtualenv dch
(dch) $ pip install django
(dch) $ django-admin startproject djheroku
(dch) $ cd djheroku
# Make sure is working:
(dch) $ ./manage.py runserver
From now I will consider working on a terminal with this (dch) environment on.

Heroku hosting setup

We would need our project set up for heroku python server. The docs live HERE, as for moment of this guide writing. One would need to follow and setup a basic heroku project. I will not stop here rewriting official guide as it is good enough.

Installing celery

Assuming we have a basic django dyno at heroku here we will continue.
Now let's install Celery and add it to our requirements list (as we had just started, let's just overwrite requirements.txt here):
$ pip install 'celery[redis]'
$ pip freeze > requirements.txt
Let's touch our settings.py adding the following snippet:
And include "djcelery" into INSTALLED_APPS tuple.

Redis broker 

Another option would be a Redis-based broker. AMQP is great, but three connections are barely enough - it's a really tight limitation. RedisToGo addon allows for 10 connections, so we may consider using it instead. Both RabbitMQ and Redis brokers are considered stable and fully featured. Let's install the addon and Python module for Redis:
$ heroku addons:add rediscloud
Adding rediscloud on happy-holliday-1467... done, v10 (free)
Use `heroku addons:docs rediscloud` to view documentation.

$ echo 'redis==2.10.3' >> requirements.txt
$ pip install redis==2.10.3
Now we need to add certain settings to configure settings in Django project:
BROKER_URL = BROKER_URL = os.environ.get("REDISCLOUD_URL", "django://")
BROKER_POOL_LIMIT = 1
BROKER_CONNECTION_MAX_RETRIES = None

CELERY_TASK_SERIALIZER = "json"
CELERY_ACCEPT_CONTENT = ["json", "msgpack"]
CELERYBEAT_SCHEDULER = 'djcelery.schedulers.DatabaseScheduler'

if BROKER_URL == "django://":
    INSTALLED_APPS += ("kombu.transport.django",)

BROKER_TRANSPORT_OPTIONS = {
    "max_connections": 2,
}
BROKER_POOL_LIMIT = None
We need to set the REDISCLOUD_URL after this is done in heroku app settings. (At the hosting control panel.

Continue with broker setup 

Let's store our process by doing a commit. And since djcelery app has some models, also apply migrations:
$ git add djheroku/settings.py requirements.txt
$ git commit -m 'Add Celery support'
[master 43afd41] Add Celery support
 2 files changed, 46 insertions(+)
$ git push heroku master
...
-----> Installing dependencies with pip
       Installing collected packages: amqp, anyjson, billiard, celery, django-celery, kombu, pytz
...
$ heroku run python manage.py migrate
...

 Staying in a free tier with a single dyno 

To save money on the start by not using the second dyno at all. From a Procfile we'll start a process manager that would run multiple processes for us. This just can't scale at all (any attempts to scale would give unpredictable results), but we could easily revise this at a later time. The only issue is, since this will be the web dyno, it will be killed ("sleeping" in Heroku terms) if no requests happen within one hour. Since we have a scheduler, we could probably work around this limitation by sending an HTTP request to ourselves, though. Let's consider we've added Celery worker to Procfile using one of the above methods. In this tutorial I'll stick to Python-only, Honcho.
$ echo 'honcho==1.0.1' >> requirements.txt
$ pip install honcho==1.0.1
We'll need a workers declared in a Procfile. Then we'll swap the file with a "proxy" one:
$ git mv Procfile Procfile.real
And change the Procfile.real with:
web: gunicorn helloworld.wsgi --log-file -
worker: python manage.py celery worker --loglevel=info
beat: python manage.py celery beat --loglevel=info
This Original Procfile (that is executed by heroku) should look like this:
web: env > .env; env PYTHONUNBUFFERED=true honcho start -f Procfile.real 2>&1
Now we should commit and push to heroku and connect to heroku loggin to check if everything went well:
$ heroku logs -t | cut -c34-
Another downside of this hack is messy logging. But it's the prices of a "free" compromise.

Celery essentials

Now, we're done with the setup so let's actually write some tasks and their management code. First of all, let's create celery.py. A simple task that'd fetch an URL and return a status code would look as following:
import os
from celery import Celery
from django.conf import settings


# Lets the celery command line program know where project settings are.
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'djheroku.settings')

# Creates the instance of the Celery app.
app = Celery('djheroku')

app.config_from_object('djheroku:settings', namespace='CELERY')

# Set up autodiscovery of tasks in the INSTALLED_APPS.
app.autodiscover_tasks(lambda: settings.INSTALLED_APPS)

if __name__ == '__main__':
    app.start()
Now we have a command file to running celery on heroku server dyno. Next step is to add a sample task in a file called tasks.py. It will be auto-collected by celery:
from celery import task


@task()
def echoe():
    """
    A simple task that echoes a Hello World! text to celery console.
    """
    print('Hello World!')

Testing stuff

We can now test this locally by running our server on one terminal instance:
$ ./manage.py runserver 0.0.0.0:8000
And a sample celery console with built in beat process as a debug purpose worker:
$ celery worker --loglevel=info --beat
Both those terminals instances will emulate a working heroku environment that we have just created.
Now we can trigger our sample script:
$ celery call echoe
This will trigger a task and put into celery beat queue. We can observe it's execution after some time passed on the celery worker console.
That's basically it.
Time to commit our changes and push to heroku.

Here is a git repository: https://github.com/garmoncheg/djheroku


Comments

Popular posts from this blog

Django: Resetting Passwords (with internal tools)

Time Capsule for $25

Vagrant error: * Unknown configuration section 'hostmanager'.