Discuss Scratch

Alastrantia
Scratcher
20 posts

Scratch Asset Server MD5 Hash Collision Issue

Hi everyone, I have some questions about the scratch asset server. (https://assets.scratch.mit.edu)

For those who don't know: MD5 is an old hash function with some flaws, you can read about it here: https://en.wikipedia.org/wiki/MD5

From what I've noticed, all assets are not tied to any project. Everytime you click on “Save”, all assets you created or modified will first be hashed with MD5 client-side and will all be sent in a seperate POST request to https://assets.scratch.mit.edu/{md5-hash}.{fileformat png, svg or wav} with authentication. However they become accessible via that hash to anyone, without authentication at all. This does make sense and reduces the amount of storage needed on the server because the same file won't ever be saved twice, all across Scratch.

However the problem with relying only on MD5 hashes to identify individual assets is that MD5 is somewhat broken and two different files can result in the same hash, which can also be done on purpose.

If someone tries to upload a file that has the same hash as another, different file, the server will not check the contents of the file to identify if it's already stored, but rather just the MD5 hash => which is equal for both, it thinks that the file is already stored and doesn't change anything, the old file is still available on the hash.

This is both good and bad in the same way, it prevents malicious actors from creating a malformed scratchcat file with the same hash as real scratchcat and ending up replacing the default scratchcat everywhere. On the other side though, it is simply not possible to upload 2 files with the same hash, the first one will remain forever.

I made a screen recording demonstrating this here: https://uploadnow.io/files/VWrFxZB (I don't know why saving the project took so long in the video, sorry).

Will the MD5 hashes ever be replaced with a more modern hash that is more secured against collisions?
Has any of you guys noticed this? What do you think?

Sorry if anything was unclear (reading through this again), just ask..

Last edited by Alastrantia (Sept. 8, 2025 14:31:24)

ajskateboarder
Scratcher
1000+ posts

Scratch Asset Server MD5 Hash Collision Issue

I can't seem to access your video but yeah, it's an issue. sha256 would be orders of magnitude less likely to produce hash collisions, and that function was also made a while ago
davidtheplatform
Scratcher
500+ posts

Scratch Asset Server MD5 Hash Collision Issue

if there are 1 billion projects and each project has 1,000 unique assets =1 trillion total assets, the chance of a hash collision happening randomly is roughly one in 2*10^-13 [1]. Basically, a random hash collision isn’t going to happen.
I don’t see a problem with the current setup. Scratch handles collisions correctly, which it should do regardless of the algorithm.

[1] this is the birthday problem with d=2^128 (# of md5 outputs) and n=1 trillion
Alastrantia
Scratcher
20 posts

Scratch Asset Server MD5 Hash Collision Issue

ajskateboarder wrote:

I can't seem to access your video but yeah, it's an issue. sha256 would be orders of magnitude less likely to produce hash collisions, and that function was also made a while ago
here, this one should work: https://www.veed.io/view/d8d8f780-6050-43df-8e8d-86186eaa92e2?panel=share
Alastrantia
Scratcher
20 posts

Scratch Asset Server MD5 Hash Collision Issue

davidtheplatform wrote:

if there are 1 billion projects and each project has 1,000 unique assets =1 trillion total assets, the chance of a hash collision happening randomly is roughly one in 2*10^-13 [1]. Basically, a random hash collision isn’t going to happen.
I don’t see a problem with the current setup. Scratch handles collisions correctly, which it should do regardless of the algorithm.

[1] this is the birthday problem with d=2^128 (# of md5 outputs) and n=1 trillion
maybe not a random one, but someone could cause hash collisions on purpose to disable uploading another asset, like some sort of DoS.
nembence
Scratcher
500+ posts

Scratch Asset Server MD5 Hash Collision Issue

Alastrantia wrote:

davidtheplatform wrote:

if there are 1 billion projects and each project has 1,000 unique assets =1 trillion total assets, the chance of a hash collision happening randomly is roughly one in 2*10^-13 [1]. Basically, a random hash collision isn’t going to happen.
I don’t see a problem with the current setup. Scratch handles collisions correctly, which it should do regardless of the algorithm.

[1] this is the birthday problem with d=2^128 (# of md5 outputs) and n=1 trillion
maybe not a random one, but someone could cause hash collisions on purpose to disable uploading another asset, like some sort of DoS.
People can always do small changes to it (hiding a small random text in an SVG, slightly changing the color of a few random pixels of a PNG, upscaling the image etc) It's impossible to upload collisions for all possible variants of an image
Jeffalo
Scratcher
1000+ posts

Scratch Asset Server MD5 Hash Collision Issue

i've experimented with this before, and a theoretical attack is possible, but what's the possible attack scenario? someone predicts you'll upload a file and uploads something bad with the same hash first? my understanding is that these kinds of collisions practically never will occur by accident.
novice27b
Scratcher
1000+ posts

Scratch Asset Server MD5 Hash Collision Issue

I made a PoC project a while back exploiting this, if someone can find it… (I no longer remember where, but I posted about it in this forum)

Basically, the first-uploaded file takes precedence. So if you send someone a sb3 with colliding assets, they can view it offline, but the assets will change once they upload the project to the scratch site (assuming you previously uploaded the other colliding assets).

Last edited by novice27b (Sept. 10, 2025 23:33:29)

Powered by DjangoBB