Discuss Scratch

CVCoderDojo
Scratcher
6 posts

Scraping usernames from a Studio page

I'm a mentor in a CoderDojo and we are using Scratch as our main language as kids join the dojo. We have set up a series of public studios for the ninjas to post their projects to for each of the tasks. I would like to set up a way to write a script (in something like python) that would automatically scrape the usernames of the kids who had submitted projects. When I view the studio page I can see all of the projects and the usernames out there on the page, but when I view the HTML that information is not there - it is part of a script call out of some kind.

Does anyone know if it is possible to get access to this information without harvesting it all manually?

THANKS
jTron
Scratcher
100+ posts

Scraping usernames from a Studio page

You might want to check out /site-api/projects/in/:id/:page. It looks like the query `.owner a` should give you a NodeList with all of the links to each user; a simple `.innerText` should give you the names inside. Seems like a job for node.js to me, though I'm sure python could easily handle it.

clipd • osx clipboard manager and history

;
CVCoderDojo
Scratcher
6 posts

Scraping usernames from a Studio page

Brilliant.

About three minutes of python code and it does what I need it to do. Thanks jTron.

CVCoderDojo
Scratcher
6 posts

Scraping usernames from a Studio page

Hey jTron, is there documentation somewhere for the /site-api/ ? I can't seem to find it.

I have found the regular api and it's documentation in the wiki, but it appears somewhat limited. What you showed me above goes beyond the standard api and I would appreciate knowing what else is possible through /site-api/

THANKS.
jTron
Scratcher
100+ posts

Scraping usernames from a Studio page

CVCoderDojo wrote:

Hey jTron, is there documentation somewhere for the /site-api/ ? I can't seem to find it.

I have found the regular api and it's documentation in the wiki, but it appears somewhat limited. What you showed me above goes beyond the standard api and I would appreciate knowing what else is possible through /site-api/

THANKS.

As far as I can tell, nope. Scratch's apis (of which, I imagine, there are many more than we realize) are unimaginably poorly documented.

It may be time (…I've thought for a while) for a community effort to document as much of this as possible; and if this is already a thing it's not easy to find. Basically the entirety of my discovery of the site's apis has been on my own - here's what I did when I saw your question:

1. Headed over to the first studio on the front page to see what I could find. Currently, that's this.
2. Noticed that it took a moment for the projects to load after the rest of the page did (slow internet has its advantages )
3. Opened the dev tools (I'm using Chrome) and switched to the “Network” tab.
4. Reloaded the page. As the assets appeared, I enabled the XHR filter to sort out the noise.
5. Saw http://scratch.mit.edu/site-api/projects/in/278471/1/ . After that point, it was just a little bit of experimenting to figure out the various arguments.

The best way to find new, awesome Scratch apis is to look for their applications in production. It's always fun to discover new.

If you need anything else related to the apis, the ATs are the perfect place to ask. I guarantee that somewhere on here you can find someone who knows what they're doing.

clipd • osx clipboard manager and history

;
CVCoderDojo
Scratcher
6 posts

Scraping usernames from a Studio page

Thanks jTron.

Since my message to you I also stumbled onto"

scratch.mit.edu/site-api/galleries/owned_or_curated_by/:username

When combined with yours this really helps me do what I want to do. It means I don't have to update my own database of the galleries owned by the dojo. Instead, I can find out what galleries we own and then iterate over that list to scrape those galleries for student progress. While a dojo isn't a competition as to who can complete the most things, it is nice to be able to build a spreadsheet or webpage that shows where kids are at in our progression.



CVCoderDojo
Scratcher
6 posts

Scraping usernames from a Studio page

jTron wrote:

1. Headed over to the first studio on the front page to see what I could find. Currently, that's this.
2. Noticed that it took a moment for the projects to load after the rest of the page did (slow internet has its advantages )
3. Opened the dev tools (I'm using Chrome) and switched to the “Network” tab.
4. Reloaded the page. As the assets appeared, I enabled the XHR filter to sort out the noise.
5. Saw http://scratch.mit.edu/site-api/projects/in/278471/1/ . After that point, it was just a little bit of experimenting to figure out the various arguments.

jTron. Wanted to thank you again for your help with this the other week and the response above. I was able to use your technique to solve a couple of other problems which have helped out the dojo quite a bit.

CHEERS!

Powered by DjangoBB