CopyDisable

Friday, 8 August 2025

ClickHouse Issue: "Max Query Size Exceeded" when updating large data

 

Bonjour readers👋

I have been wrestling with a peculiar issue this week while using ClickHouse database to manage image data (stored as Base64 strings, I know it is not recommended to store image in DB, but for a particular reason I have to store it).
I could happily INSERT multi-hundred KB images without a hitch using the clickhouse_driver in Python. 

It was working fine and I was happy 😀. 

Then came the need to update an existing image. Suddenly, for images smaller than what I could initially insert, I started getting the "Max query size exceeded" error. 🤔

Inserts were successful with images of size 500KB but updates with smaller images say of 250KB were getting error.

After banging my head for a while, I realized it’s a classic case of how ClickHouse processes different types of queries under the hood.


The Plot Twist: INSERT vs. ALTER TABLE UPDATE

The key difference lies in how ClickHouse processes INSERT and ALTER TABLE UPDATE statements.

When you INSERT data using the Python driver, the driver is pretty smart. It sends the relatively short INSERT command first, and then it efficiently streams the actual data to the server in a separate binary format. 

The server's max_query_size setting primarily applies to the SQL command text itself, not this separate data stream. This is why those large inserts sailed through and my images with size around 500KB were inserted successfully.

Now, let's come to ALTER TABLE UPDATE. In ClickHouse, an UPDATE is actually a mutation performed via an ALTER TABLE command. 

The SQL looks something like this:


ALTER TABLE your_images_table

UPDATE image_base64 = 'my_very_long_base64_string'

WHERE image_id = 'some_unique_id';


The important point here is that the entire Base64 encoded string of your new image gets embedded directly into the SQL query text itself!!! 

When your Python driver constructs this command, it creates one massive string. ClickHouse then checks the size of this entire string against the max_query_size (which is typically 256 KiB). Even a 250KB image, when Base64 encoded, can easily push the query string over this limit. So, Boom! "Max query size exceeded." 💥

So, it's not about the size of the new image data itself being too big, but rather the size of the SQL query string containing that data.


The Fix: 

We've got a couple of ways to tackle this, one more "ClickHouse way" than the other.

Option 1: Embrace the ReplacingMergeTree (The Recommended way or Clickhouse way)

ClickHouse, being an analytical database, isn't really designed for frequent in-place updates. A more Clickhouse way in to use it's table engines, specifically ReplacingMergeTree.

The core idea is to treat updates as new insertions

You insert a new version of the row with the updated image data. ReplacingMergeTree then intelligently deduplicates rows based on a specified versioning column during its background merge processes, keeping only the latest version.

Here's how its done:

  1. Recreate your table using ReplacingMergeTree, including a versioning column (like a timestamp):


    CREATE TABLE images_replacing (

    image_id String,

    image_base64 String,

    updated_at DateTime

    ) ENGINE = ReplacingMergeTree(updated_at)

    ORDER BY image_id;


  2. To "update" an image, simply INSERT a new row with the same image_id and a newer updated_at timestamp. You can continue using your efficient INSERT mechanism.


  3. When querying: To get the latest image of the row, use the FINAL modifier:


    SELECT * FROM images_replacing FINAL WHERE image_id = 'some_image_id';


This method aligns perfectly with ClickHouse's strengths and avoids the max_query_size limitation altogether without compromising speed.


Option 2: Increase the max_query_size size (Use with Caution)

As a quick (but generally less recommended) fix, you can increase the max_query_size setting on your ClickHouse server. While this might solve your immediate problem, it's more of a temporary solution. Allowing excessively large queries can potentially impact server performance down the line.


Happy querying! 🚀