Follow

Why do developers make stupid database queries

@alvarezp
if (user not in foo.filter_bar(User.objects.all()))

@wolf480pl Usually because they have a mismatch between their mental model of how the database system works, and how it actually works.

@jankoekepan hm.. like thinking fetching all users from database is faster than fetching one user?

@wolf480pl

It can be faster, with some big ifs attached: what's the ratio of internal performance to external IO? Is the data stored in a format that rewards simple streaming of unfiltered datasets with minimal interference? And so on.

Also, it depends on the developer's other purposes: is it part of a function that might be called many times to fetch multiple users, thus motivating fetching lots and caching the data once versus fifty individual calls?

And sometimes they're just wrong.

@jankoekepan @wolf480pl

> sometimes they're just wrong

or it could be throw-away code that wasn't thrown away after all (bad code lives forever)

that example will become unwieldy fast when you have many accounts and you fetch the whole table into memory;

I've seen things, terrible things:
* "let's fetch the whole product list and pick 5 random products in code to show as `related`" (this was a magento site, ended up doing like 5000 queries per product page, they were asking me why is the database so slow - the database was taking it like a champ, incidentally, but it was like 1ms away on a different server)
* "let's store all objects we fetch from the db in memcache and fetch them one by one to generate some graphs because latency is zero" - they're still not able to have memcached on a different server, or to have 2 application servers 5 years later...
* many, many django bulk processing scripts contining "for thing in Thing.objects.all()" - Python database drivers tend to load the whole result set in memory; if your table is like 1-2 GB you may get away with it, when it's 100GB, not so much; postgres drivers are especially terrible at this
* note written in all caps in some php configuration file saying something along the lines of "do not delete this, it's a function written by X, we don't know why it only works here and it's 2 days to black fucking friday" - that commit was 3 years old

I've had my share of clever queries that had the db server grinding for 30 minutes, but I learned to be careful about these things pretty early on; some people never seem to learn, tho.

@m0n5t3r @jankoekepan

>or it could be throw-away

that code passed code review by 2 reviewers, including the project lead

@wolf480pl @jankoekepan heh, I've missed things that were in a large commit, or it might be a case of "we can live with this because the users table won't ever have more than 10 rows" - when numbers are small enough you can get away with lots of things :)

@jankoekepan
>is it part of a function that might be called many times to fetch multiple users

No, it's called only once per request.

>Is the data stored in a format that rewards simple streaming?

No, it's a good old relational database, and then they need to do other queries to filter out users who shouldn't be visible.

Then they check if the user specified in URL is in the set of visible users.

Sign in to participate in the conversation
Mastodon

The social network of the future: No ads, no corporate surveillance, ethical design, and decentralization! Own your data with Mastodon!