Discuss Scratch
- Discussion Forums
- » Advanced Topics
- » Scratch Asset Server MD5 Hash Collision Issue
- Alastrantia
-
Scratcher
20 posts
Scratch Asset Server MD5 Hash Collision Issue
Hi everyone, I have some questions about the scratch asset server. (https://assets.scratch.mit.edu)
For those who don't know: MD5 is an old hash function with some flaws, you can read about it here: https://en.wikipedia.org/wiki/MD5
From what I've noticed, all assets are not tied to any project. Everytime you click on “Save”, all assets you created or modified will first be hashed with MD5 client-side and will all be sent in a seperate POST request to https://assets.scratch.mit.edu/{md5-hash}.{fileformat png, svg or wav} with authentication. However they become accessible via that hash to anyone, without authentication at all. This does make sense and reduces the amount of storage needed on the server because the same file won't ever be saved twice, all across Scratch.
However the problem with relying only on MD5 hashes to identify individual assets is that MD5 is somewhat broken and two different files can result in the same hash, which can also be done on purpose.
If someone tries to upload a file that has the same hash as another, different file, the server will not check the contents of the file to identify if it's already stored, but rather just the MD5 hash => which is equal for both, it thinks that the file is already stored and doesn't change anything, the old file is still available on the hash.
This is both good and bad in the same way, it prevents malicious actors from creating a malformed scratchcat file with the same hash as real scratchcat and ending up replacing the default scratchcat everywhere. On the other side though, it is simply not possible to upload 2 files with the same hash, the first one will remain forever.
I made a screen recording demonstrating this here: https://uploadnow.io/files/VWrFxZB (I don't know why saving the project took so long in the video, sorry).
Will the MD5 hashes ever be replaced with a more modern hash that is more secured against collisions?
Has any of you guys noticed this? What do you think?
Sorry if anything was unclear (reading through this again), just ask..
For those who don't know: MD5 is an old hash function with some flaws, you can read about it here: https://en.wikipedia.org/wiki/MD5
From what I've noticed, all assets are not tied to any project. Everytime you click on “Save”, all assets you created or modified will first be hashed with MD5 client-side and will all be sent in a seperate POST request to https://assets.scratch.mit.edu/{md5-hash}.{fileformat png, svg or wav} with authentication. However they become accessible via that hash to anyone, without authentication at all. This does make sense and reduces the amount of storage needed on the server because the same file won't ever be saved twice, all across Scratch.
However the problem with relying only on MD5 hashes to identify individual assets is that MD5 is somewhat broken and two different files can result in the same hash, which can also be done on purpose.
If someone tries to upload a file that has the same hash as another, different file, the server will not check the contents of the file to identify if it's already stored, but rather just the MD5 hash => which is equal for both, it thinks that the file is already stored and doesn't change anything, the old file is still available on the hash.
This is both good and bad in the same way, it prevents malicious actors from creating a malformed scratchcat file with the same hash as real scratchcat and ending up replacing the default scratchcat everywhere. On the other side though, it is simply not possible to upload 2 files with the same hash, the first one will remain forever.
I made a screen recording demonstrating this here: https://uploadnow.io/files/VWrFxZB (I don't know why saving the project took so long in the video, sorry).
Will the MD5 hashes ever be replaced with a more modern hash that is more secured against collisions?
Has any of you guys noticed this? What do you think?
Sorry if anything was unclear (reading through this again), just ask..
Last edited by Alastrantia (Sept. 8, 2025 14:31:24)
- ajskateboarder
-
Scratcher
1000+ posts
Scratch Asset Server MD5 Hash Collision Issue
I can't seem to access your video but yeah, it's an issue. sha256 would be orders of magnitude less likely to produce hash collisions, and that function was also made a while ago
- davidtheplatform
-
Scratcher
500+ posts
Scratch Asset Server MD5 Hash Collision Issue
if there are 1 billion projects and each project has 1,000 unique assets =1 trillion total assets, the chance of a hash collision happening randomly is roughly one in 2*10^-13 [1]. Basically, a random hash collision isn’t going to happen.
I don’t see a problem with the current setup. Scratch handles collisions correctly, which it should do regardless of the algorithm.
[1] this is the birthday problem with d=2^128 (# of md5 outputs) and n=1 trillion
I don’t see a problem with the current setup. Scratch handles collisions correctly, which it should do regardless of the algorithm.
[1] this is the birthday problem with d=2^128 (# of md5 outputs) and n=1 trillion
- Alastrantia
-
Scratcher
20 posts
Scratch Asset Server MD5 Hash Collision Issue
I can't seem to access your video but yeah, it's an issue. sha256 would be orders of magnitude less likely to produce hash collisions, and that function was also made a while agohere, this one should work: https://www.veed.io/view/d8d8f780-6050-43df-8e8d-86186eaa92e2?panel=share
- Alastrantia
-
Scratcher
20 posts
Scratch Asset Server MD5 Hash Collision Issue
if there are 1 billion projects and each project has 1,000 unique assets =1 trillion total assets, the chance of a hash collision happening randomly is roughly one in 2*10^-13 [1]. Basically, a random hash collision isn’t going to happen.maybe not a random one, but someone could cause hash collisions on purpose to disable uploading another asset, like some sort of DoS.
I don’t see a problem with the current setup. Scratch handles collisions correctly, which it should do regardless of the algorithm.
[1] this is the birthday problem with d=2^128 (# of md5 outputs) and n=1 trillion
- nembence
-
Scratcher
1000+ posts
Scratch Asset Server MD5 Hash Collision Issue
People can always do small changes to it (hiding a small random text in an SVG, slightly changing the color of a few random pixels of a PNG, upscaling the image etc) It's impossible to upload collisions for all possible variants of an imageif there are 1 billion projects and each project has 1,000 unique assets =1 trillion total assets, the chance of a hash collision happening randomly is roughly one in 2*10^-13 [1]. Basically, a random hash collision isn’t going to happen.maybe not a random one, but someone could cause hash collisions on purpose to disable uploading another asset, like some sort of DoS.
I don’t see a problem with the current setup. Scratch handles collisions correctly, which it should do regardless of the algorithm.
[1] this is the birthday problem with d=2^128 (# of md5 outputs) and n=1 trillion
- Jeffalo
-
Scratcher
1000+ posts
Scratch Asset Server MD5 Hash Collision Issue
i've experimented with this before, and a theoretical attack is possible, but what's the possible attack scenario? someone predicts you'll upload a file and uploads something bad with the same hash first? my understanding is that these kinds of collisions practically never will occur by accident.
- novice27b
-
Scratcher
1000+ posts
Scratch Asset Server MD5 Hash Collision Issue
I made a PoC project a while back exploiting this, if someone can find it… (I no longer remember where, but I posted about it in this forum)
Basically, the first-uploaded file takes precedence. So if you send someone a sb3 with colliding assets, they can view it offline, but the assets will change once they upload the project to the scratch site (assuming you previously uploaded the other colliding assets).
Basically, the first-uploaded file takes precedence. So if you send someone a sb3 with colliding assets, they can view it offline, but the assets will change once they upload the project to the scratch site (assuming you previously uploaded the other colliding assets).
Last edited by novice27b (Sept. 10, 2025 23:33:29)
- Discussion Forums
- » Advanced Topics
-
» Scratch Asset Server MD5 Hash Collision Issue