Skip to main content

SQLAlchemy (Flask) count model instances by unique values

One comes to a task that has to do with counting items in a database. We will describe the right approach here. Despite being so obvious I did not find much of the docs for junior developers to watch and learn. Here is a sample task and solution:
Let's assume we have a model like so:
class Cycle(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    object_id = db.Column(db.String, nullable=False)
Sample date populated into it will be:
{
   id: 1,
   object_id: 'unique1'
}, {
   id: 2,
   object_id: 'unique1'
}, {
   id: 3,
   object_id: 'unique2'
}, {
   id: 4,
   object_id: 'unique2'
}, {
   id: 5,
   object_id: 'unique2'
}, {
   id: 6,
   object_id: 'unique3'
}
We need to count unique model instances with same object_id. To achieve this relatively simple task one would go straightforward. E.g. Fetch all the Cycle instances with a simple query and then iterate through them via for cycle. Looks like so:
objects = Cycle.query.all()
cycles_unique = {}
for object in objects:
    if object.object_id in cycles_unique.items():
        cycles_unique[object.object_id] += 1
    else:
        cycles_unique[object.object_id] = 1

print(cycles_unique)
The output would be like so:
{
    'unique1': 2, 
    'unique2': 3, 
    'unique3': 1
}
Job done. All ok. HOWEVER! Imagine this database column would grow to big sizes. E.g. Those Cycles are 1 000 000 records. It is not a rare situation in coding world. Called - BigData ;)

We have a problem with cycle like so. Flask or whatever gear i used behind those models would need to load all those 1 million records into memory and then process them one by one. Even with fast server it will be not an easy task in the scope of rendering a webpage. User might wait quite long for the answer.

Making it faster means doing this count on a DB side. SQL databases are built specially for tasks like so and will do much much better at this. Incomparable to code written above.

Proper implementation would be by using SQL syntax and handling on a DB side. Code will use a query that counts unique records with that one particular field. Looks like so:
from sqlalchemy import func
from sqlalchemy.sql import label

session.query(
        label('object_id', Crop.object_id),  func.count(Cycle.object_id)
).group_by(Cycle.object_id).all()
This will output the data similar to the above. However output is with numbers of Long type (Python Long).
[('unique1', 2L), ('unique2', 3L), ('unique3', 1L)]
This is basically how to solve this task properly with SQL Alchemy and Flask models.
Hope it helps someone to write more reliable and better code.

Comments

Popular posts from this blog

Pretty git Log

SO you dislike git log output in console like me and do not use it... Because it looks like so: How about this one? It's quite easy... Just type: git log - - graph - - pretty = format : '%Cred%h%Creset -%C ( yellow ) %d%Creset %s %Cgreen ( %cr) %C ( bold blue ) <%an>%Creset' - - abbrev - commit - - It may be hard to enter such an easy command every time. Let's make an alias instead... Copypaste this to your terminal: git config --global alias.lg "log --color --graph --pretty=format:'%Cred%h%Creset -%C(yellow)%d%Creset %s %Cgreen(%cr) %C(bold blue)<%an>%Creset' --abbrev-commit --" And use simple command to see this pretty log instead: git lg Now in case you want to see lines that changed use: git lg - p In order for this command to work remove  the -- from the end of the alias. May the code be with you! NOTE: this article is a rewritten copy of  http://coderwall.com/p/euwpig?i=3&p=1&t=git   and have b...

Django: Resetting Passwords (with internal tools)

I have had a task recently. It was about adding a forms/mechanism for resetting a password in our Django based project. We have had our own registration system ongoing... It's a corporate sector project. So you can not go and register yourself. Admins (probably via LDAP sync) will register your email/login in system. So you have to go there and only set yourself a password. For security reasons you can not register. One word. First I've tried to find standart decision. From reviewed by me were: django-registration and django password-reset . These are nice tools to install and give it a go. But I've needed a more complex decision. And the idea was that own bicycle is always better. So I've thought of django admin and that it has all the things you need to do this yourself in no time. (Actually it's django.contrib.auth part of django, but used out of the box in Admin UI) You can find views you need for this in there. they are: password_reset password_reset_...

Time Capsule for $25

The real article name might be something like:  Configuring Raspbery Pi to serve like a Time Capsule with Netatalk 3.0 for Mountain Lion.  But it's too long ;) Here I will describe the process of using Raspberry Pi like a Time Machine in my network. To be able to backup your MAC's remotely (Like it would be NAS of some kind). It assumes you have a Raspberry Pi and have installed a Raspbian there and have a ssh connection, or somehow having access to it's console. Refer to my previous article for details . Now that we have a Pi that is ready for action let's animate it. So to make it suit you as a Time Capsule (NAS) for your MAC's you need to do those basic steps: - connect and configure USB hard drive(s) - install support of HFS+ filesystem to be able to use MAC's native filesystem - make mount (auto-mount on boot) of your hard drive - install Avahi and Netatalk demons - configure Netatalk daemon to make it all serve as a Time Machine - configure ...