Discuss Scratch

jvvg
Scratcher
1000+ posts

A script to archive the old Scratch Forums

I (and a few other users) am working to download the old Scratch Forums, because Lightnin said they need to be closed soon because of security issues. However, after talking with him and telling how to fix the immediate problems, he agreed to keep it up long enough for a HTML only version to be made.

Lightnin said he also may run it locally so that the ST can have their own HTML-only version to store and make public. Anyway, if you want the code, it's available on Github. The instructions are in the description. This is a PHP CLI script, so you need to have PHP installed on your computer (if you're on Linux, get it using apt-get, if you're on Mac OS, use MAMP or it may be built in, and if you're on Windows, get XAMPP and change your PATH settings as necessary).


Professional web developer and lead engineer on the Scratch Wiki
Maybe the Scratch Team isn't so badWhy the April Fools' Day forum didn't work last year
scimonster
Scratcher
1000+ posts

A script to archive the old Scratch Forums

Or if you're on a non-Ubuntu Linux system (such as Red Hat [Fedora]), use yum.

Retired Community Moderator
BTW, i run Google Chrome 41.0.2272.101 on a Linux system - Ubuntu 14.04. NEW: iPad 4th gen. w/retina.

418 I'm a teapot (original - to be read by bored computer geeks)
THE GAME (you just lost)
; THE SEMICOLON LIVES ON IN OUR SIGS
3DSfan12345
Scratcher
1000+ posts

A script to archive the old Scratch Forums


In 2018, I asked the Scratch Team to remove all my forum posts to protect my privacy. That's why this post is blank. Besides, I've outgrown this website, and I don't want the dumb things I said in my late tween/early teen years to follow me around for the rest of my life. This post probably wasn't anything interesting or important, anyway.
scimonster
Scratcher
1000+ posts

A script to archive the old Scratch Forums

3DSfan12345 wrote:

Is MAMP command line? I hate command line.
This is a command line application, like it or not.
Command line isn't so bad, as long as you know what you're doing, or only use scripts from sources that you trust.

Retired Community Moderator
BTW, i run Google Chrome 41.0.2272.101 on a Linux system - Ubuntu 14.04. NEW: iPad 4th gen. w/retina.

418 I'm a teapot (original - to be read by bored computer geeks)
THE GAME (you just lost)
; THE SEMICOLON LIVES ON IN OUR SIGS
jvvg
Scratcher
1000+ posts

A script to archive the old Scratch Forums

3DSfan12345 wrote:

Is MAMP command line? I hate command line.
MAMP is mostly graphical. However, for this script, you do need to use the command line.


Professional web developer and lead engineer on the Scratch Wiki
Maybe the Scratch Team isn't so badWhy the April Fools' Day forum didn't work last year
jvvg
Scratcher
1000+ posts

A script to archive the old Scratch Forums

My archive is well under way. Although it may seem like a while, I think I will have it finished by next Thursday.


Professional web developer and lead engineer on the Scratch Wiki
Maybe the Scratch Team isn't so badWhy the April Fools' Day forum didn't work last year
blob8108
Scratcher
1000+ posts

A script to archive the old Scratch Forums

@jvvg A neat trick is to randomise the order a bit, and have the script only download the file if it doesn't already exist. That way, you can run several instances of the script in parallel, which makes it a lot faster.

tosh · slowly becoming a grown-up adult and very confused about it
jvvg
Scratcher
1000+ posts

A script to archive the old Scratch Forums

blob8108 wrote:

@jvvg A neat trick is to randomise the order a bit, and have the script only download the file if it doesn't already exist. That way, you can run several instances of the script in parallel, which makes it a lot faster.
However, I intentionally want it not to go too fast, because otherwise my ISP may suspect a DDoS and disconnect me, or even worse report me to legal authorities.


Professional web developer and lead engineer on the Scratch Wiki
Maybe the Scratch Team isn't so badWhy the April Fools' Day forum didn't work last year
blob8108
Scratcher
1000+ posts

A script to archive the old Scratch Forums

@jvvg It seems unlikely. Maybe it's just my slow internet connection, but even with several running in parallel I still barely managed even 1MB/s.

tosh · slowly becoming a grown-up adult and very confused about it
jvvg
Scratcher
1000+ posts

A script to archive the old Scratch Forums

blob8108 wrote:

@jvvg It seems unlikely. Maybe it's just my slow internet connection, but even with several running in parallel I still barely managed even 1MB/s.
It's not the bandwidth I'm concerned about. It's the massive amount of HTTP requests. If I make a request every second for 10 hours, it looks a lot like a DDoS. Many DDoS attacks use seemingly slow requests but just have them coming from millions of computers at the same time.


Professional web developer and lead engineer on the Scratch Wiki
Maybe the Scratch Team isn't so badWhy the April Fools' Day forum didn't work last year
DigiTechs
Scratcher
500+ posts

A script to archive the old Scratch Forums

I could do the downloading and stuff; I have a good connection.

I do, in fact, have my own site; it's here.
I'm also working on a thing called Fetch. Look at it here!
@thisandagain pls explain. @thisandagain pls explain. @thisandagain pls explain. @thisandagain pls explain. @thisandagain pls explain.
jvvg
Scratcher
1000+ posts

A script to archive the old Scratch Forums

DigiTechs wrote:

I could do the downloading and stuff; I have a good connection.
For this script, you can do it on a 1 Mb/s connection. It's more about ping, and I have about 33 ms ping. However, you also shouldn't take full advantage of your connection for this, because your provider may suspect you of a DDoS attack and cancel your service.


Professional web developer and lead engineer on the Scratch Wiki
Maybe the Scratch Team isn't so badWhy the April Fools' Day forum didn't work last year
DigiTechs
Scratcher
500+ posts

A script to archive the old Scratch Forums

jvvg wrote:

DigiTechs wrote:

I could do the downloading and stuff; I have a good connection.
For this script, you can do it on a 1 Mb/s connection. It's more about ping, and I have about 33 ms ping. However, you also shouldn't take full advantage of your connection for this, because your provider may suspect you of a DDoS attack and cancel your service.
meh, I actually have done a DoS attack once; to test how strong my internet was. (Yes, it's a denial of service attack if there's only one user, not a distributed one. That's MORE than one )

I do, in fact, have my own site; it's here.
I'm also working on a thing called Fetch. Look at it here!
@thisandagain pls explain. @thisandagain pls explain. @thisandagain pls explain. @thisandagain pls explain. @thisandagain pls explain.
jvvg
Scratcher
1000+ posts

A script to archive the old Scratch Forums

Now I've only got about 20K topics left to archive. If I keep the program running constantly, it will take about 23 hours to completely finish.


Professional web developer and lead engineer on the Scratch Wiki
Maybe the Scratch Team isn't so badWhy the April Fools' Day forum didn't work last year
jvvg
Scratcher
1000+ posts

A script to archive the old Scratch Forums

A bunch of files didn't copy correctly, so I have about 15K topics left now. Once that is done, I will have a complete archive.


Professional web developer and lead engineer on the Scratch Wiki
Maybe the Scratch Team isn't so badWhy the April Fools' Day forum didn't work last year
jvvg
Scratcher
1000+ posts

A script to archive the old Scratch Forums

It is finally done! I now have a complete archive of the old forums (overall, it's about 3.2 GB). If the Scratch Team does close down the archive (which Lightnin said they probably will soon), I will be able to recover all of the old topics for everybody.


Professional web developer and lead engineer on the Scratch Wiki
Maybe the Scratch Team isn't so badWhy the April Fools' Day forum didn't work last year
DigiTechs
Scratcher
500+ posts

A script to archive the old Scratch Forums

Well then…

UPLOAD THEM!

I do, in fact, have my own site; it's here.
I'm also working on a thing called Fetch. Look at it here!
@thisandagain pls explain. @thisandagain pls explain. @thisandagain pls explain. @thisandagain pls explain. @thisandagain pls explain.
jvvg
Scratcher
1000+ posts

A script to archive the old Scratch Forums

DigiTechs wrote:

Well then…

UPLOAD THEM!
I'd like to, but I don't know of any file host that lets me upload 3.2 GB of HTML data.
For now, if you want your own copy of the archive, you still need to run the script yourself.

I also wrote a simple search tool that lets me search the topic subjects, so if the official archive gets closed down permanently (Lightnin still plans on closing it, but he said that they might use my script to convert everything into HTML pages which are safer) and the HTML version is not used, then I will be able to locate everybody's topics quickly.


Professional web developer and lead engineer on the Scratch Wiki
Maybe the Scratch Team isn't so badWhy the April Fools' Day forum didn't work last year
danwoodski
Scratcher
100+ posts

A script to archive the old Scratch Forums

Could you compress it and put it on dropbox for people to download?
jvvg
Scratcher
1000+ posts

A script to archive the old Scratch Forums

danwoodski wrote:

Could you compress it and put it on dropbox for people to download?
Compressed, it's still 730 MB. I may do that, though. If I do, it won't be available for a little while.


Professional web developer and lead engineer on the Scratch Wiki
Maybe the Scratch Team isn't so badWhy the April Fools' Day forum didn't work last year

Powered by DjangoBB