llvm2scratch - C to Scratch compiler

-Zn

Hi, I'm helping develop a program to convert LLVM IR into scratch code, meaning that code written in LLVM frontends like C, C++ and Rust can be translated into scratch blocks.

Currently the progress is:

Functions + Calling + Return values + Recursion
‘Temporaries’ which are variables localized with the function name
Stack + Allocation + Load + Store + Deallocation
Global ‘static’ vars
All integer/floating point/binary operations (add, subtract, multiply, unsigned division, signed division, unsigned remainder, signed remainder, unsigned remainder, shift left, logical shift right, arithmetic shift left, and, or, xor) up to 48 bits (scratch's floating point numbers can store 53 bits)
All comparison modes
Most casting instructions
Branching operations + Switch statements
Post-Optimiser - Analyses the resulting code and simplifies known values + removes unnecessary variable assignments
.sprite3 file output

Example:

#include <stdio.h>
 
static int a = 7;
 
int add_one(int a) {
  return a + 1;
}
 
int main(void) {
  puts("hello world");
  a += 2;
  a -= 4;
  a *= 2;
  a /= -3;
  a = -340;
  a %= -60;
 
  a = 31;
  a <<= a;
  a >>= 3;
 
  unsigned int b = 3204;
  b >>= 2;
  b ^= 113;
  b |= 1546;
  b &= 393;
 
  a = add_one(a);
 
  return 0;
}

Becomes this code: https://scratch.mit.edu/projects/1206058442/editor/

The source code is here: https://github.com/Classfied3D/llvm2scratch
It's written in python (with types). I would note though that one of the dependencies relies on LLVM which can be difficult to install because of the very large size of the source code, so I've written some info on how to add precompiled LLVM binaries on macOS, if anyone does this on a different OS, lmk the process so I can add it!

There two other projects written for the same purpose that I know of:

@bobbybee's llvm to scratch compiler compiles to scratch 2, and is incomplete
ELVM can compile to scratch 3, but can only compile C code and creates no optimizations which means it would be difficult to run anything

If anyone has any questions or wants to help, lmk!

Last edited by -Zn (Oct. 25, 2025 18:21:45)

MonkeyBean2

Very cool! I tried doing something like this a while ago but I was somewhat discouraged by the LLVM IR syntax, because I wanted to write the parser myself for some reason. It looks like you could do a bit more optimization though, as the example code sets variables that are only used once directly after (so you could remove that and collapse it into one expression). Also, consider checking out scratchattach's work-in-progress project editing framework, maybe you could help improve that with the experience you've gained through writing your own in sb3.py

-Zn

MoneyBean2 wrote:
Very cool! I tried doing something like this a while ago but I was somewhat discouraged by the LLVM IR syntax, because I wanted to write the parser myself for some reason.

Thanks! How far did you get and do you have any of the code from this?

MonkeyBean2 wrote:
It looks like you could do a bit more optimization though, as the example code sets variables that are only used once directly after (so you could remove that and collapse it into one expression).

Yes, I'm planning to do something like this. It would also interact well with the current optimizer because they can work together (e.g. with an add followed by another add).

MoneyBean2 wrote:
Also, consider checking out scratchattach's work-in-progress project editing framework, maybe you could help improve that with the experience you've gained through writing your own in sb3.py

I'll have a look. I wrote my own because I couldn't find anything that did this but might move to this in future if I need it.

MonkeyBean2

-Zn wrote:
MoneyBean2 wrote:
Very cool! I tried doing something like this a while ago but I was somewhat discouraged by the LLVM IR syntax, because I wanted to write the parser myself for some reason.
Thanks! How far did you get and do you have any of the code from this?

I ended up switching to regular assembly because it's simpler than LLVM IR (but less high level and therefor might produce less efficient scratch code), and I got it outputting some stuff but I think I was having some issues converting jumps to functions, and I didn't get it to an at all usable state.
However, recently I started a python to scratch transpiler which works great, and I've already written over 70KB of working python code that can get translated to scratch. I currently only have a couple simple unsafe/fast math optimizations, but I want to implement what I described with variables.

-Zn wrote:
MonkeyBean2 wrote:
It looks like you could do a bit more optimization though, as the example code sets variables that are only used once directly after (so you could remove that and collapse it into one expression).
Yes, I'm planning to do something like this. It would also interact well with the current optimizer because they can work together (e.g. with an add followed by another add).

Great! I suppose you should run the optimization pass a few times then (run it first to consolidate expressions, then run it again to optimize those consolidated expressions). Also, I think LLVM does a lot of optimization on the IR itself, maybe you can have LLVM do some optimization before it passes the IR to your program.

-Zn wrote:
MoneyBean2 wrote:
Also, consider checking out scratchattach's work-in-progress project editing framework, maybe you could help improve that with the experience you've gained through writing your own in sb3.py
I'll have a look. I wrote my own because I couldn't find anything that did this but might move to this in future if I need it.

I wrote my own as well for my transpiler. Scratchattach's implementation is based on some stuff from mine actually.

By the way, I'm pretty sure MacOS comes with clang preinstalled, which uses LLVM. You can also use clang to output LLVM IR with

clang -S -emit-llvm test.c

-Zn

MonkeyBean2 wrote:
I ended up switching to regular assembly because it's simpler than LLVM IR (but less high level and therefor might produce less efficient scratch code), and I got it outputting some stuff but I think I was having some issues converting jumps to functions, and I didn't get it to an at all usable state.
However, recently I started a python to scratch transpiler which works great, and I've already written over 70KB of working python code that can get translated to scratch. I currently only have a couple simple unsafe/fast math optimizations, but I want to implement what I described with variables.

Nice! One of the things I wanted to port with this is micropython.
I'm still working out how floating point operations would work (with scratch only having double precision floats), have you implemented these?

MonkeyBean2 wrote:
Great! I suppose you should run the optimization pass a few times then (run it first to consolidate expressions, then run it again to optimize those consolidated expressions). Also, I think LLVM does a lot of optimization on the IR itself, maybe you can have LLVM do some optimization before it passes the IR to your program.

Yeah, what I'm doing atm is to check if any optimizations where made in a specific area and repeat until no more were made (which could potentially lead to bugs if I don't add a max limit but I've had no issues so far).
LLVM optimizations are probably the main reason why I'm choosing to compile at this point as well as code size ones (e.g. -Os with clang).

MonkeyBean2 wrote:
By the way, I'm pretty sure MacOS comes with clang preinstalled, which uses LLVM. You can also use clang to output LLVM IR with
clang -S -emit-llvm test.c

This is what I'm doing atm, I think llvm is installed with the xcode cmd line tools installer.
I need LLVM itself at a specific version bc its a dependency of the parser I'm using.

MonkeyBean2

-Zn wrote:
MonkeyBean2 wrote:
I ended up switching to regular assembly because it's simpler than LLVM IR (but less high level and therefor might produce less efficient scratch code), and I got it outputting some stuff but I think I was having some issues converting jumps to functions, and I didn't get it to an at all usable state.
However, recently I started a python to scratch transpiler which works great, and I've already written over 70KB of working python code that can get translated to scratch. I currently only have a couple simple unsafe/fast math optimizations, but I want to implement what I described with variables.

Nice! One of the things I wanted to port with this is micropython.
I'm still working out how floating point operations would work (with scratch only having double precision floats), have you implemented these?

Ah, I imagine you'll be able to use libraries if you get it to work with micropython. Why would floats be an issue, can't you just use double precision for everything (including float32) and hope nothing breaks? 64+ bit ints would certainly be an issue though.

-Zn wrote:
MonkeyBean2 wrote:
Great! I suppose you should run the optimization pass a few times then (run it first to consolidate expressions, then run it again to optimize those consolidated expressions). Also, I think LLVM does a lot of optimization on the IR itself, maybe you can have LLVM do some optimization before it passes the IR to your program.
Yeah, what I'm doing atm is to check if any optimizations where made in a specific area and repeat until no more were made (which could potentially lead to bugs if I don't add a max limit but I've had no issues so far).
LLVM optimizations are probably the main reason why I'm choosing to compile at this point as well as code size ones (e.g. -Os with clang).

Does that mean you don't optimize the same area twice? I don't think that would be a problem normally

-Zn wrote:
MonkeyBean2 wrote:
By the way, I'm pretty sure MacOS comes with clang preinstalled, which uses LLVM. You can also use clang to output LLVM IR with
clang -S -emit-llvm test.c
This is what I'm doing atm, I think llvm is installed with the xcode cmd line tools installer.
I need LLVM itself at a specific version bc its a dependency of the parser I'm using.

Maybe you can find a parser written in python. There is also LLVM IR bitcode, which could be easier to read than strings.

-Zn

MonkeyBean2 wrote:
Ah, I imagine you'll be able to use libraries if you get it to work with micropython. Why would floats be an issue, can't you just use double precision for everything (including float32) and hope nothing breaks? 64+ bit ints would certainly be an issue though.

ig floating points probably wont be an issue lol. Although they might be annoying to store in a list due to the rounding to 10 scratch does when numbers start to lose precision, although I could keep track of a separate exponent maybe (using log2)
64 bit ints shouldn't be too hard, I'll just need to work out different instructions for them (I already do something like this for 32-bit multiplication where I calculate smaller multiplications and add them up according to modular arithmetic)
I can probably look into what 32 bit systems do

MonkeyBean2 wrote:
Does that mean you don't optimize the same area twice? I don't think that would be a problem normally

There might be a situation where something ends up in an infinite loop, I doubt it tho

MonkeyBean2 wrote:
Maybe you can find a parser written in python. There is also LLVM IR bitcode, which could be easier to read than strings.

Maybe, this lib does more than parsing it so may not require it for just that purpose idk

MonkeyBean2

-Zn wrote:
MonkeyBean2 wrote:
Ah, I imagine you'll be able to use libraries if you get it to work with micropython. Why would floats be an issue, can't you just use double precision for everything (including float32) and hope nothing breaks? 64+ bit ints would certainly be an issue though.
ig floating points probably wont be an issue lol. Although they might be annoying to store in a list due to the rounding to 10 scratch does when numbers start to lose precision, although I could keep track of a separate exponent maybe (using log2)
64 bit ints shouldn't be too hard, I'll just need to work out different instructions for them (I already do something like this for 32-bit multiplication where I calculate smaller multiplications and add them up according to modular arithmetic)
I can probably look into what 32 bit systems do

By the way, doubles support integers up to 2**53-1

-Zn wrote:
MonkeyBean2 wrote:
Does that mean you don't optimize the same area twice? I don't think that would be a problem normally
There might be a situation where something ends up in an infinite loop, I doubt it tho

I suppose that risk increases as you add more complicated optimizations. Though normally if it's oscilating between two states that would mean that it's getting de-optimized and then optimized again.

-Zn wrote:
MonkeyBean2 wrote:
Maybe you can find a parser written in python. There is also LLVM IR bitcode, which could be easier to read than strings.
Maybe, this lib does more than parsing it so may not require it for just that purpose idk

I see.

-Zn

MonkeyBean2 wrote:
By the way, doubles support integers up to 2**53-1

Yes, that's why I'm using 48 bits maximum

Since the precision halfs each time you double the number its actually really simple to bitshift an integer left since you dont have to worry about it

-Zn

Update: I got branching to work using broadcasts but found out the hard way they are VERY slow even in TW (haven't tried messing with TW settings tho), in turbo mode and with the say blocks removed. I might use custom blocks in future but the issue with them is that they gradually consume more memory which would eventually break infinite loops, but this is probably the only solution (besides trying to convert branches into scratch's repeat until/if else blocks which I'm not sure can cover every case).

Code:

#include <stdio.h>
 
int test_branch(int num) {
  int a = 3;
  if (num != 1) a = 50;
  return a + num;
}
 
int main(void) {
  for (unsigned char d = 65; d != 70; d++) {
    putchar(d);
  }
  int c = test_branch(2);
  return 0;
}

Becomes this code: https://scratch.mit.edu/projects/1206466346/editor/

_LDG099_

This is way too complicated

-Zn

_LDG099_ wrote:
This is way too complicated

lol. Fortunately LLVM/Clang does most of the hard work translating C into relatively simple instructions, they just don't all translate that well into scratch

_LDG099_

Interesting

-Zn

-Zn wrote:
I might use custom blocks in future but the issue with them is that they gradually consume more memory which would eventually break infinite loops, but this is probably the only solution

I found a solution to this:

define branch1
increment counter :: control
if <(counter :: control reporter) > [100000]> then 
set [id v] to [0]
create clone of [myself v]
delete this clone
end
...
when I start as a clone
clear counter :: control
if <(id) < [1]> then // binary search
branch1
else
... // code for branch 2
end

This uses the hacked counter block because incrementing to it is ~20x faster than adding to a regular variable
Alternatively to creating a clone and deleting the old one, a broadcast could be used and the receiver could stop all other scripts, however this is slower than creating a clone from testing
It is still quite faster (25% faster in one example where I increment a counter) to not include this extra check, though, so potentially I could just log all the branches which end up doing this and then only apply this there

I also tested removing the check:

5 million iterations uses up 2GB of memory due to scratch having to remember where to jump to after the code finishes
So in the majority of cases this should be fine

Another potential solution is this:

define function 1
set [branch id v] to [1]
forever
if <(branch id) < [1]> then // binary search
... // code for branch 1
else
... // code for branch 2
stop [this script v] // returns (jumps back to where the function was called)
end

This is faster with only two branches, but any more and it is slower than the other options which means it isn't the best option all the time (although two or less possible branches a loop is probably the most common outcome).

If anyone finds a faster solution to branching, lmk!

Last edited by -Zn (Aug. 15, 2025 16:17:33)

-Zn

MonkeyBean2 wrote:
I ended up switching to regular assembly because it's simpler than LLVM IR (but less high level and therefor might produce less efficient scratch code), and I got it outputting some stuff but I think I was having some issues converting jumps to functions, and I didn't get it to an at all usable state.
However, recently I started a python to scratch transpiler which works great, and I've already written over 70KB of working python code that can get translated to scratch. I currently only have a couple simple unsafe/fast math optimizations, but I want to implement what I described with variables.

Did you find a fast solution to branching in your compiler?

MonkeyBean2

-Zn wrote:
MonkeyBean2 wrote:
I ended up switching to regular assembly because it's simpler than LLVM IR (but less high level and therefor might produce less efficient scratch code), and I got it outputting some stuff but I think I was having some issues converting jumps to functions, and I didn't get it to an at all usable state.
However, recently I started a python to scratch transpiler which works great, and I've already written over 70KB of working python code that can get translated to scratch. I currently only have a couple simple unsafe/fast math optimizations, but I want to implement what I described with variables.

Did you find a fast solution to branching in your compiler?

ah I just remembered, I think I actually made some loops in the scratch AST, as in block a -> block b -> block a, I think it did work to some extend (although it does not get rendered). this basically functions as arbitrary jumps. to help with debugging, make it so you can switch between using broadcasts/procedures and this method, as when you are using this method it is invisible in the editor.

Geotale

There are faster solutions for this but they leave various things still completely impossible (e.g. jump tables). In general arbitrary jumps can be simplified down into simple control flow with extra conditions ^^ Although often not particularly easily, esp in Scratch which doesn't even have basic `break` or similar statements to begin with…

-Zn

Geotale wrote:
There are faster solutions for this but they leave various things still completely impossible (e.g. jump tables). In general arbitrary jumps can be simplified down into simple control flow with extra conditions ^^ Although often not particularly easily, esp in Scratch which doesn't even have basic `break` or similar statements to begin with…

I might try and recognize common structures (e.g. if/else: branch to a or b, both branch to c and if: branch to a or b, a branches to b) but I'll probably mostly use procedure calls (and maybe have some post inline step).

What I'm doing atm is seeing if any recursions could happen and for each cycle I add a counter check in at least one of them, and in other branches I just increment the counter. If no recursions could happen, I just rely on scratch to recurse backward to the caller (which can be slower than a binary search if we branch enough, so I don't do it for unknown branch lengths).

bobbybee

What's old is new again!

Your code is a lot cleaner than what I did as a kid, good stuff

For control flow, you shouldn't need broadcasts or clones or even custom blocks necessarily. Instead, the basic “most general” way to handle control flow is to emulate a jump table. Maintain a “program counter” variable with the current block/label we're executing, have a single script looping forever, within the loop check the program counter for each possible block/label and execute the corresponding block. Isn't that slow? A bit, but you can nest the if-else blocks and use <= comparisons, letting you bake a binary search for the block you want to execute. That requires only log2(N) comparisons for N blocks in the program, which means less than ~16 comparisons in practice. Not the speediest, but not too bad either.

The more advanced next step is attempting to reconstruct Scratch control flow, by pattern matching if-else blocks and such. This is an optimization, to be sure. However, it's something to do after nailing down the binary search above, since it will never be fully general: Scratch control flow can't represent arbitrary goto statements, early-exit breaks, and so on. Whereas the binary search handles those easily.

So first implement the binary search approach, it'll work great, and if you still crave more performance, you can add some heuristics on top.

Cheers!

-Zn

bobbybee wrote:
Your code is a lot cleaner than what I did as a kid, good stuff

Thanks!

bobbybee wrote:
For control flow, you shouldn't need broadcasts or clones or even custom blocks necessarily. Instead, the basic “most general” way to handle control flow is to emulate a jump table. Maintain a “program counter” variable with the current block/label we're executing, have a single script looping forever, within the loop check the program counter for each possible block/label and execute the corresponding block. Isn't that slow? A bit, but you can nest the if-else blocks and use <= comparisons, letting you bake a binary search for the block you want to execute. That requires only log2(N) comparisons for N blocks in the program, which means less than ~16 comparisons in practice. Not the speediest, but not too bad either.

I perf tested this solution before and it is faster but only up to 2 branches, so I might use it in that case (which is fairly common I imagine)
The current method I use is to work out all branch cycles and apply the counter check to the minimum(ish) amount of nodes needed to add a check in all of these, otherwise just increment the counter (the scratch runtimes' source code for the counter is just incrementing a variable so is very fast).

bobbybee wrote:
The more advanced next step is attempting to reconstruct Scratch control flow, by pattern matching if-else blocks and such. This is an optimization, to be sure. However, it's something to do after nailing down the binary search above, since it will never be fully general: Scratch control flow can't represent arbitrary goto statements, early-exit breaks, and so on. Whereas the binary search handles those easily!

This is something I plan to do later, although I might also try and inline custom blocks in post.

Discuss Scratch