Discuss Scratch

bobbybee
Scratcher
1000+ posts

Some optimizations for cloud data

Cloud data over a port-level, or simple firewall
Run cloud server over port 80 or 443. Send cross domain over the same connection. Possibly do very simple HTTP handshake to mimic HTTP and fool simple firewalls.

Cloud data over an application-level firewall:

Flash Cross-Domain can be avoided altogether via same-domain hosting (e.g., host 2.0 swf on the same domain as the cloud data server).

Use a custom HTTP client for requests, with a custom server (not using node http module or anything, actually using raw TCP, e.g., net module). Host server on port 80. Instantiate 2 sockets. One send a blank request, and hangs the connection as the cloud data is received. The other one hangs in a loop, sending headers (which are actually bogus headers, or simply values to real headers, perhaps sent multiple times, perhaps using separate headers) Alternatively, a POST request may be used, and the data is sent as actual form data. The server is configured to never hang up connections (or make the time to live at perhaps 300 seconds after no data being received), and more importantly, parses data as it arrives, as opposed to wait for an end packet (as I recall, simply a blank space)



Cloud Protocol itself

The cloud data, in it's current state, is key-value pairs, which are relatively high bandwidth, and have a very high overhead if you were to use a very network intensive cloud app (e.g., real time multiplayer). I propose a binary protocol. This protocol would be relatively more complex to implement, but would support large amounts of data in a small amount of time.

The simplest packet is simply a handshake. This could be implemented simply as a few integers (project id, unique integer token). A more complex, variable setting packet, would have to be type-cast. A binary protocol would be strictly typed for compression reasons, where as a string protocol like Scratch may be wild-typed, and subsequently, you need to do proper coercion of data. A set packet could have a bit or a niblet (4 bits, half byte) that specifies the type (int or string, or potentially a size). If it is an integer, you might use a signed 4-byte integer. If it is a string, you might use null-terminated strings. A floating point might be stored as it currently is, a string, to simplify the protocol and evade having to implement a true decimal.

Lists, on the other hand, are not so simple. I propose to also use a binary protocol, but instead, use a modification based protocol. Effectively, send a list of all the data in a binary array (integer length, followed by consecutive data, no delimiters). Use similar binary packets for each list block, to emulate server-side Scratch.

Last edited by bobbybee (May 19, 2013 17:46:47)


“Ooo, can I call you Señorita Bee?” ~Chibi-Matoran
sdg1
Scratcher
100+ posts

Some optimizations for cloud data

Thanks for the suggestions - they make a lot of sense.

We can't really do same-domain hosting. If you have noticed, after we launched, we have started to use content-delivery networks (CDNs) to manage the load on our servers, and to make things load faster for people who access the site from Europe, Asia, Africa or Australia. The SWF is still being served from scratch.mit.edu domain, but that's because our old crossdomain.xml file had some super long cache expiry value, and people still have that in their browsers. Remember how all projects were showing up with a blank stage+Scratch cat just after we launched? That was because of the old cached crossdomain.xml. However, within the next few months, we will be putting the SWF back on cdn.scratch.mit.edu (it is one of the largest file that we serve from the website, so it is kind of high priority for us to have it served via a more efficient system) .

As for binary protocols - yep! I am still using a somewhat verbose protocol since I need to easily debug stuff (there are some lingering (and puzzling) problems with the current system, such as people not getting the cloud-token at the correct time, etc.). Switching to a binary, or a shorter protocol would make debugging harder. However, once everything is done (strings, lists, and maybe cloud broadcasts), I would certainly consider moving to a more efficient (spacewise) protocol.

sdg1
MIT Scratch Team
HyperPixel
Scratcher
46 posts

Some optimizations for cloud data

sdg1 wrote:

Thanks for the suggestions - they make a lot of sense.

We can't really do same-domain hosting. If you have noticed, after we launched, we have started to use content-delivery networks (CDNs) to manage the load on our servers, and to make things load faster for people who access the site from Europe, Asia, Africa or Australia. The SWF is still being served from scratch.mit.edu domain, but that's because our old crossdomain.xml file had some super long cache expiry value, and people still have that in their browsers. Remember how all projects were showing up with a blank stage+Scratch cat just after we launched? That was because of the old cached crossdomain.xml. However, within the next few months, we will be putting the SWF back on cdn.scratch.mit.edu (it is one of the largest file that we serve from the website, so it is kind of high priority for us to have it served via a more efficient system) .

As for binary protocols - yep! I am still using a somewhat verbose protocol since I need to easily debug stuff (there are some lingering (and puzzling) problems with the current system, such as people not getting the cloud-token at the correct time, etc.). Switching to a binary, or a shorter protocol would make debugging harder. However, once everything is done (strings, lists, and maybe cloud broadcasts), I would certainly consider moving to a more efficient (spacewise) protocol.
Is it possible that there is a URL for the raw cloud variable data on a project since the new update today?

bobbybee
Scratcher
1000+ posts

Some optimizations for cloud data

HyperPixel wrote:

sdg1 wrote:

Thanks for the suggestions - they make a lot of sense.

We can't really do same-domain hosting. If you have noticed, after we launched, we have started to use content-delivery networks (CDNs) to manage the load on our servers, and to make things load faster for people who access the site from Europe, Asia, Africa or Australia. The SWF is still being served from scratch.mit.edu domain, but that's because our old crossdomain.xml file had some super long cache expiry value, and people still have that in their browsers. Remember how all projects were showing up with a blank stage+Scratch cat just after we launched? That was because of the old cached crossdomain.xml. However, within the next few months, we will be putting the SWF back on cdn.scratch.mit.edu (it is one of the largest file that we serve from the website, so it is kind of high priority for us to have it served via a more efficient system) .

As for binary protocols - yep! I am still using a somewhat verbose protocol since I need to easily debug stuff (there are some lingering (and puzzling) problems with the current system, such as people not getting the cloud-token at the correct time, etc.). Switching to a binary, or a shorter protocol would make debugging harder. However, once everything is done (strings, lists, and maybe cloud broadcasts), I would certainly consider moving to a more efficient (spacewise) protocol.
Is it possible that there is a URL for the raw cloud variable data on a project since the new update today?

I don't believe so, but I feel I need something interesting to work on, so I'm going to write a quick cloud data-to-JSON web-app.

“Ooo, can I call you Señorita Bee?” ~Chibi-Matoran

Powered by DjangoBB