Discuss Scratch
- Discussion Forums
- » Advanced Topics
- » VERY IMPORTANT: SCRACTHDB CLONE RESEARCH
- i_eat_coffee
-
Scratcher
1000+ posts
VERY IMPORTANT: SCRACTHDB CLONE RESEARCH
hello people of scratch
it is I, the infamous coffee eater, who has returned from my short break as i struggled to continue eating coffee, because it melted since it's so hot outside
anyway, i've been working on a scratch DB clone, and I've got about 0.000104858 Terabytes* worth of forum post data
no user data, just forum posts
also i forgot what scratchdb was, but i think it was a database of scratch info
so i did that, but with the forums
and i need you guys to help me by letting me know what endpoints exactly do you need because i have no idea what scratchdb had before
(i cant see the docs because they were replaced by a hiatus message on the website)
once i code them i'll publish the website
also you'll have to wait a bit because i want approval from the scratch team before sharing it but still
it is I, the infamous coffee eater, who has returned from my short break as i struggled to continue eating coffee, because it melted since it's so hot outside
anyway, i've been working on a scratch DB clone, and I've got about 0.000104858 Terabytes* worth of forum post data
no user data, just forum posts
also i forgot what scratchdb was, but i think it was a database of scratch info
so i did that, but with the forums
and i need you guys to help me by letting me know what endpoints exactly do you need because i have no idea what scratchdb had before
(i cant see the docs because they were replaced by a hiatus message on the website)
once i code them i'll publish the website
also you'll have to wait a bit because i want approval from the scratch team before sharing it but still
- davidtheplatform
-
Scratcher
500+ posts
VERY IMPORTANT: SCRACTHDB CLONE RESEARCH
I'm working on a similar thing but I think i have more stuff done, I have most topics scraped and ~1m posts (out of ~6m total). Also some of the endpoints work. anyways we could work together possibly
relevant emails:
relevant emails:
Do you have the v3 docs for scratchdb (/v3/docs/)? Archive.org/archive.is couldn't save them unfortunately.I still have the docker images somewhere so I’ll be able to keep an archive somewhere at some point, however currently they aren’t hosted anywhere. Once I get everything back online to try to do some recovery I’ll see what I can do about publishing a bit of an archive.
- Jeffalo
-
Scratcher
1000+ posts
VERY IMPORTANT: SCRACTHDB CLONE RESEARCH
hey all! please try to consolidate your efforts and be courteous with your scraping!!! i want a scratchdb alternative as much as you do, but multiple people mass scraping the most expensive and fragile part of the website is really not helping anyone.
tiny reminder that scratch is a free service. these scrapers play a large part in disrupting the service for everyone else. let’s please be careful with this.
tiny reminder that scratch is a free service. these scrapers play a large part in disrupting the service for everyone else. let’s please be careful with this.
- dynamicsofscratch
-
Scratcher
1000+ posts
VERY IMPORTANT: SCRACTHDB CLONE RESEARCH
My proposal for a ScratchDB clone is this for coffee:
Instead of creating multiple ScratchDB clones which would hurt the Scratch website, resulting in high downtime, slow loadtimes, etc, create a project which everyone can join and contribute what they can to make a new ScratchDB, and then abandon it (ScratchDB). This would make only one scraper be on the website, reducing clogging of the website.
Voyager, a project by @josueart is to replace ScratchDB, and is yet to make a prototype as we don't have much experience with web-scraping. I am actively trying to learn it, and make a working prototype which scrapes the entire forums and organizes it in a database. coffee is very much welcome to join our project, and as @Jeffalo said:
Instead of creating multiple ScratchDB clones which would hurt the Scratch website, resulting in high downtime, slow loadtimes, etc, create a project which everyone can join and contribute what they can to make a new ScratchDB, and then abandon it (ScratchDB). This would make only one scraper be on the website, reducing clogging of the website.
Voyager, a project by @josueart is to replace ScratchDB, and is yet to make a prototype as we don't have much experience with web-scraping. I am actively trying to learn it, and make a working prototype which scrapes the entire forums and organizes it in a database. coffee is very much welcome to join our project, and as @Jeffalo said:
(#4)
hey all! please try to consolidate your efforts and be courteous with your scraping!!! i want a scratchdb alternative as much as you do, but multiple people mass scraping the most expensive and fragile part of the website is really not helping anyone.
tiny reminder that scratch is a free service. these scrapers play a large part in disrupting the service for everyone else. let’s please be careful with this.
- i_eat_coffee
-
Scratcher
1000+ posts
VERY IMPORTANT: SCRACTHDB CLONE RESEARCH
hey all! please try to consolidate your efforts and be courteous with your scraping!!! i want a scratchdb alternative as much as you do, but multiple people mass scraping the most expensive and fragile part of the website is really not helping anyone.^^
tiny reminder that scratch is a free service. these scrapers play a large part in disrupting the service for everyone else. let’s please be careful with this.
some time ago I spoke with one of the scratch website engineers, they said it's okay to send requests to the forums as long as it's really really slow to ensure that the website is unaffected
for the record, i started gathering data in february: https://scratch.mit.edu/discuss/post/7793231/
- i_eat_coffee
-
Scratcher
1000+ posts
VERY IMPORTANT: SCRACTHDB CLONE RESEARCH
how the DB works (at the moment):
it's a huge database with the following data format for each post (they are not categorised in any way, just stored as-is)
“POST ID”: {
POST ID,
DATE POSTED,
TOPIC ID,
author: {
USERNAME
PICTURE
ID
},
POST CONTENT (HTML)
}
it's a huge database with the following data format for each post (they are not categorised in any way, just stored as-is)
“POST ID”: {
POST ID,
DATE POSTED,
TOPIC ID,
author: {
USERNAME
PICTURE
ID
},
POST CONTENT (HTML)
}
- dynamicsofscratch
-
Scratcher
1000+ posts
VERY IMPORTANT: SCRACTHDB CLONE RESEARCH
(#6)any scraper right now affects the website, as the website is really fragile and needs maintainence (which they cancelled…)hey all! please try to consolidate your efforts and be courteous with your scraping!!! i want a scratchdb alternative as much as you do, but multiple people mass scraping the most expensive and fragile part of the website is really not helping anyone.^^
tiny reminder that scratch is a free service. these scrapers play a large part in disrupting the service for everyone else. let’s please be careful with this.
some time ago I spoke with one of the scratch website engineers, they said it's okay to send requests to the forums as long as it's really really slow to ensure that the website is unaffected
for the record, i started gathering data in february: https://scratch.mit.edu/discuss/post/7793231/
- ajskateboarder
-
Scratcher
1000+ posts
VERY IMPORTANT: SCRACTHDB CLONE RESEARCH
Surely a scraper where the last post scraped was made over half a year ago couldn't affect the website in comparison to existing user traffic, right? I do agree though that any scraping efforts should be throttled intensively, just as i_eat_coffee's scraper is doing(#6)any scraper right now affects the website, as the website is really fragile and needs maintainence (which they cancelled…)hey all! please try to consolidate your efforts and be courteous with your scraping!!! i want a scratchdb alternative as much as you do, but multiple people mass scraping the most expensive and fragile part of the website is really not helping anyone.^^
tiny reminder that scratch is a free service. these scrapers play a large part in disrupting the service for everyone else. let’s please be careful with this.
some time ago I spoke with one of the scratch website engineers, they said it's okay to send requests to the forums as long as it's really really slow to ensure that the website is unaffected
for the record, i started gathering data in february: https://scratch.mit.edu/discuss/post/7793231/
- josueart
-
Scratcher
500+ posts
VERY IMPORTANT: SCRACTHDB CLONE RESEARCH
hey all! please try to consolidate your efforts and be courteous with your scraping!!! i want a scratchdb alternative as much as you do, but multiple people mass scraping the most expensive and fragile part of the website is really not helping anyone.
tiny reminder that scratch is a free service. these scrapers play a large part in disrupting the service for everyone else. let’s please be careful with this.
Instead of creating multiple ScratchDB clones which would hurt the Scratch website, resulting in high downtime, slow loadtimes, etc, create a project which everyone can join and contribute what they can to make a new ScratchDB, and then abandon it (ScratchDB). This would make only one scraper be on the website, reducing clogging of the website.
I love this idea, but how would be coordinate? Maybe this topic could help if OP redirected the theme.
- i_eat_coffee
-
Scratcher
1000+ posts
VERY IMPORTANT: SCRACTHDB CLONE RESEARCH
that's kind of.. the point of the current website (link)hey all! please try to consolidate your efforts and be courteous with your scraping!!! i want a scratchdb alternative as much as you do, but multiple people mass scraping the most expensive and fragile part of the website is really not helping anyone.
tiny reminder that scratch is a free service. these scrapers play a large part in disrupting the service for everyone else. let’s please be careful with this.Instead of creating multiple ScratchDB clones which would hurt the Scratch website, resulting in high downtime, slow loadtimes, etc, create a project which everyone can join and contribute what they can to make a new ScratchDB, and then abandon it (ScratchDB). This would make only one scraper be on the website, reducing clogging of the website.
I love this idea, but how would be coordinate? Maybe this topic could help if OP redirected the theme.
people contribute by manually scraping scratch posts via the website, reducing the load on scratch servers
- josueart
-
Scratcher
500+ posts
VERY IMPORTANT: SCRACTHDB CLONE RESEARCH
This is probably not a good idea though. I don't think people will be able to index every single topic on Scratch, and this process could easily be automated.that's kind of.. the point of the current website (link)hey all! please try to consolidate your efforts and be courteous with your scraping!!! i want a scratchdb alternative as much as you do, but multiple people mass scraping the most expensive and fragile part of the website is really not helping anyone.
tiny reminder that scratch is a free service. these scrapers play a large part in disrupting the service for everyone else. let’s please be careful with this.Instead of creating multiple ScratchDB clones which would hurt the Scratch website, resulting in high downtime, slow loadtimes, etc, create a project which everyone can join and contribute what they can to make a new ScratchDB, and then abandon it (ScratchDB). This would make only one scraper be on the website, reducing clogging of the website.
I love this idea, but how would be coordinate? Maybe this topic could help if OP redirected the theme.
people contribute by manually scraping scratch posts via the website, reducing the load on scratch servers
We could reduce the load by limiting the workers (Scratch's recommendation for the API is 10req/s, but the forums probably wouldn't handle this, so 5req/s?).
In the end, the fragility is caused by how old and outdated the forums are

- cosmosaura
-
Scratch Team
1000+ posts
VERY IMPORTANT: SCRACTHDB CLONE RESEARCH
Topic closed on request from OP. If you need it re-opened, though, you can report this and ask. 

- Discussion Forums
- » Advanced Topics
-
» VERY IMPORTANT: SCRACTHDB CLONE RESEARCH







