Discuss Scratch
- -Zn
-
18 posts
llvm2scratch - C to Scratch compiler
I found another method for branching which doesn't require all downstream functions that call a function with a loop to return to an address and is slightly faster by unwinding the stack to a ‘jump table’ when resetting the call stack.
I probably won't implement this tho because I'm fine with the current solution (which uses a broadcast to reset the stack) and this method has some quirks that make it annoying to implement. Another issue is that it requires functions both upstream and downstream of a loop to have duplicate return to address code (because the counter can only be used once) and use the branch ID system, or use a slower branching method.
define caller
callee
repeat until <(Branch ID) = (0)>
if <(Branch ID) = (1)> then // binary search
branch 1
else
... // other branches
end
end
... // code after func call
define callee
increment counter :: control
branch 1
define branch 1
increment counter :: control
if <(counter :: control reporter) > [1000000]> then
clear counter :: control
set [Branch ID v] to [1]
stop [this script v]
end
branch 1
I probably won't implement this tho because I'm fine with the current solution (which uses a broadcast to reset the stack) and this method has some quirks that make it annoying to implement. Another issue is that it requires functions both upstream and downstream of a loop to have duplicate return to address code (because the counter can only be used once) and use the branch ID system, or use a slower branching method.
- -Zn
-
18 posts
llvm2scratch - C to Scratch compiler
It looks like you could do a bit more optimization though, as the example code sets variables that are only used once directly after (so you could remove that and collapse it into one expression).
Spent a few days implementing this and it works!
Code:
#include <stdio.h> static int a = 7; int add_one(int num) { return num + 1; } int test_branch(int num) { int a = 3; if (num != 1) a = 50; return a + num; } int main(void) { puts("hello world"); a += 2; a -= 4; a *= 2; a /= -3; a = -340; a %= -60; a = 31; a <<= a; a >>= 3; unsigned int b = 3204; b >>= 2; b ^= 113; b |= 1546; b &= 393; a = add_one(a); int c = test_branch(2); for (unsigned char d = 65; d != 70; d++) { putchar(d); } unsigned int e = 46; switch (e) { case 0: puts("0"); break; case 1: puts("1"); break; case 20: puts("20"); break; case 21: puts("21"); break; default: puts("default"); break; } return 0; }
Resultant scratch code (optimized and unoptimized versions):
https://scratch.mit.edu/projects/1208872099/
Essentially, I work out which variables are modified where and only elide them if a dependency is not changed (+ a LOT of edge cases). I also work out the computational cost to calculate the value and how many times it is used to work out if it's actually less expensive to do so (e.g. a complex value or used multiple times in a loop).
- bobbybee
-
1000+ posts
llvm2scratch - C to Scratch compiler
Super cool. I think you want to call Mem2Reg (?) on the input LLVM code to elide most of the stack garbage, that should speed up a lot too.
- -Zn
-
18 posts
llvm2scratch - C to Scratch compiler
UPDATE: recursion and stack deallocation are now supported
For stack deallocation, I try to work out if a function allocates a known amount overall by finding unavoidable branches (nodes). If it is known, I increase the stack size variable at the start of function by the same amount I decrease it on return. If the amount is unknown, I store the previous stack size and restore it on return. If it doesn't call any functions which allocate on the stack, then I don't increase the stack size and treat the offsets from the stack differently (the stack size is only really needed to indicate the offset to the next function). I also gave the stack size variable some special optimization properties because every function call will return the stack size variable back to it's original value (expect branches but I never run any code after a branch).
With recursion, I either use custom block parameters or a list (similar to the regular stack) to store local variables in functions to be accessed after the function call.
Example:
https://scratch.mit.edu/projects/1211169662/
Source Code:
For stack deallocation, I try to work out if a function allocates a known amount overall by finding unavoidable branches (nodes). If it is known, I increase the stack size variable at the start of function by the same amount I decrease it on return. If the amount is unknown, I store the previous stack size and restore it on return. If it doesn't call any functions which allocate on the stack, then I don't increase the stack size and treat the offsets from the stack differently (the stack size is only really needed to indicate the offset to the next function). I also gave the stack size variable some special optimization properties because every function call will return the stack size variable back to it's original value (expect branches but I never run any code after a branch).
With recursion, I either use custom block parameters or a list (similar to the regular stack) to store local variables in functions to be accessed after the function call.
Example:
https://scratch.mit.edu/projects/1211169662/
Source Code:
#include <stdio.h> int factorial_recurse(int n) { if (n == 1) return 1; return factorial_recurse(n - 1) * n; } int sum_to_one_digit(unsigned int n) { unsigned int sum = 0; while (n > 0) { sum += n % 10; n /= 10; } if (sum >= 10) return sum_to_one_digit(sum); return sum; } int main(void) { puts("hello world"); int f = factorial_recurse(10); int g = sum_to_one_digit(473); return 0; }
- -Zn
-
18 posts
llvm2scratch - C to Scratch compiler
UPDATE (been a while lol):
I rewrote the code to my own wrapper to a different library (because the old one couldn't provide enough information)
I added support for some instructions used in optimizations (phi, select, etc)
Most recently I now support the getelementptr instructions, allowing some array support.
Example:
https://scratch.mit.edu/projects/1226122280/
Source:
Resultant IR (also scratch supports llvm ir highlighting what):
I rewrote the code to my own wrapper to a different library (because the old one couldn't provide enough information)
I added support for some instructions used in optimizations (phi, select, etc)
Most recently I now support the getelementptr instructions, allowing some array support.
Example:
https://scratch.mit.edu/projects/1226122280/
Source:
#include <stdio.h> static char str[] = "hello world"; void numberize(char* str) { for (int i = 0; str[i] != '\0'; i++) { switch (str[i]) { case 'a': str[i] = '4'; break; case 'e': str[i] = '3'; break; case 'l': str[i] = '1'; break; case 'o': str[i] = '0'; break; } } } int main(void) { numberize(str); puts(str); return 0; }
Resultant IR (also scratch supports llvm ir highlighting what):
@str = internal global [12 x i8] c"hello world\00" define void @numberize(i8* %0) { br label %2 2: %3 = phi i32 [ 0, %1 ], [ %16, %15 ] %4 = getelementptr inbounds i8, i8* %0, i32 %3 %5 = load i8, i8* %4 %6 = icmp eq i8 %5, 0 br i1 %6, label %7, label %8 7: ret void 8: %9 = zext i8 %5 to i32 switch i32 %9, label %15 [ i32 97, label %13 i32 101, label %10 i32 108, label %11 i32 111, label %12 ] 10: br label %13 11: br label %13 12: br label %13 13: %14 = phi i8 [ 51, %10 ], [ 49, %11 ], [ 48, %12 ], [ 52, %8 ] store i8 %14, i8* %4 ; branch to %15 ; label %15 %16 = add i32 %3, 1 ; branch to %2 } define i32 @main() { call void @numberize(i8* getelementptr ([12 x i8], [12 x i8]* @str, i32 0, i32 0)) %1 = call i32 @puts(i8* getelementptr ([12 x i8], [12 x i8]* @str, i32 0, i32 0)) ret i32 0 }
Last edited by -Zn (Oct. 7, 2025 18:55:09)
- FurryR
-
14 posts
llvm2scratch - C to Scratch compiler
I noticed that variables in LLVM IR are immutable. In Scratch this would create A LOT OF variables. Please consider reuse variables to create a (at least) maintainable project.
- -Zn
-
18 posts
llvm2scratch - C to Scratch compiler
All the variables are local to the sprite so it shouldn't affect other sprites and writing/calling code inside the sprite containing the code is not preferable anyway because you'd have to modify the sprite each time you modified the source code. I noticed that variables in LLVM IR are immutable. In Scratch this would create A LOT OF variables. Please consider reuse variables to create a (at least) maintainable project.
If you did want to interact with the functions I'd probably add some kind of system where you call functions via a broadcast defined in some C code or similar (and a C api for scratch blocks).
That said, I am planning to find which functions can never be ran at the same time and re-use identifiers between them.