Discuss Scratch

NFlex23
Scratcher
1000+ posts

How to webscrape follower counts with Python?

My project, https://scratch.mit.edu/projects/563896966/, has been getting a lot of attention lately, and people have been complaining about the accurateness (is that even a word?) of ScratchDB. I would like to replace using ScratchDB with a webscraper so it's more accurate, but I'm not entirely sure where to start. It should preferably be able to detect non-existent users, so it could return an error.

Thank you in advance!

Last edited by NFlex23 (Oct. 15, 2021 20:31:47)

ScolderCreations
Scratcher
1000+ posts

How to webscrape follower counts with Python?

Make the program load scratch pages, and then look for links to profiles. Then, it would add each profile it sees to a list, and repeat. After a while, you'll have a ton of user account names, and then you can just get the API data and save it.
NFlex23
Scratcher
1000+ posts

How to webscrape follower counts with Python?

ScolderCreations wrote:

Make the program load scratch pages, and then look for links to profiles. Then, it would add each profile it sees to a list, and repeat. After a while, you'll have a ton of user account names, and then you can just get the API data and save it.
I'm not sure I understand. I want it to do something like this:
function get followers (username)
set result to GET "https://scratch.mit.edu/users/username/followers"
if user doesn't exist
return "error"
otherwise
find the follower count in result and return it
Please excuse my pseudocode.

Last edited by NFlex23 (Oct. 15, 2021 20:49:33)

u7p
Scratcher
100+ posts

How to webscrape follower counts with Python?

PseudoCode:
function getFollowers(username) {
request = http.get(‘scratch.mit.edu/users/‘ + username)
if(request.response == 404) {
return ‘Requested user does not exist’
}
else {
parser = new DOMParser()
document = parser.parseFromString(request.content)
followers = document.getElementById(‘follower-count’).textContent
return followers
}
}
NFlex23
Scratcher
1000+ posts

How to webscrape follower counts with Python?

u7p wrote:

PseudoCode:
function getFollowers(username) {
request = http.get(‘scratch.mit.edu/users/‘ + username)
if(request.response == 404) {
return ‘Requested user does not exist’
}
else {
parser = new DOMParser()
document = parser.parseFromString(request.content)
followers = document.getElementById(‘follower-count’).textContent
return followers
}
}
Is there a ‘DOMParser’-ish library for Python?
Maximouse
Scratcher
1000+ posts

How to webscrape follower counts with Python?

NFlex23 wrote:

u7p wrote:

PseudoCode:
function getFollowers(username) {
request = http.get(‘scratch.mit.edu/users/‘ + username)
if(request.response == 404) {
return ‘Requested user does not exist’
}
else {
parser = new DOMParser()
document = parser.parseFromString(request.content)
followers = document.getElementById(‘follower-count’).textContent
return followers
}
}
Is there a ‘DOMParser’-ish library for Python?
Yes.
Verixion
Scratcher
100+ posts

How to webscrape follower counts with Python?

like this?
import requests
import re
def get_followers(username):
  # scratchr2 has more accurate follower counts because users havent been migrated to react
  request = requests.get(f'https://scratch.mit.edu/users/{username}/followers', headers={
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:94.0) Gecko/20100101 Firefox/94.0"
  })
  html = request.text
  regex_search = re.search(r"Followers \((\d+)\)", html)
  if regex_search == None:
    return
  else:
    return int(regex_search[1])
print(get_followers("Verixion"))

requests module must be installed for this to work, run
pip install requests
for this to work.

Last edited by Verixion (Oct. 16, 2021 11:08:42)

NFlex23
Scratcher
1000+ posts

How to webscrape follower counts with Python?

Verixion wrote:

like this?
import requests
import re
def get_followers(username):
  # scratchr2 has more accurate follower counts because users havent been migrated to react
  request = requests.get(f'https://scratch.mit.edu/users/{username}/followers', headers={
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:94.0) Gecko/20100101 Firefox/94.0"
  })
  html = request.text
  regex_search = re.search(r"Followers \((\d+)\)", html)
  if regex_search == None:
    return
  else:
    return int(regex_search[1])
print(get_followers("Verixion"))

requests module must be installed for this to work, run
pip install requests
for this to work.
Thank you! I didn't even think about using regexes, lol.
MagicCrayon9342
Scratcher
1000+ posts

How to webscrape follower counts with Python?

Verixion wrote:

like this?
import requests
import re
def get_followers(username):
  # scratchr2 has more accurate follower counts because users havent been migrated to react
  request = requests.get(f'https://scratch.mit.edu/users/{username}/followers', headers={
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:94.0) Gecko/20100101 Firefox/94.0"
  })
  html = request.text
  regex_search = re.search(r"Followers \((\d+)\)", html)
  if regex_search == None:
    return
  else:
    return int(regex_search[1])
print(get_followers("Verixion"))

requests module must be installed for this to work, run
pip install requests
for this to work.
umm… I am quite confused here. Not about the code, about the ‘Firefox 94.0’ Off-topic I know, just curious. Does Firefox have weekly/daily builds like Chrome does?

Last edited by MagicCrayon9342 (Oct. 16, 2021 17:41:15)

u7p
Scratcher
100+ posts

How to webscrape follower counts with Python?

MagicCrayon9342 wrote:

umm… I am quite confused here. Not about the code, about the ‘Firefox 94.0’ Off-topic I know, just curious
What about it?
MagicCrayon9342
Scratcher
1000+ posts

How to webscrape follower counts with Python?

u7p wrote:

MagicCrayon9342 wrote:

umm… I am quite confused here. Not about the code, about the ‘Firefox 94.0’ Off-topic I know, just curious
What about it?
i updated the post recently. more info added
--VSCoder--
Scratcher
27 posts

How to webscrape follower counts with Python?

Hey there, @NFlex23,

@PikachuB2005 is asking me to give you the link to this resource that he made: @PikachuB2005/fast follower count finder.

Hope this helps,
Thanks!

Last edited by --VSCoder-- (Oct. 16, 2021 18:51:27)

NFlex23
Scratcher
1000+ posts

How to webscrape follower counts with Python?

--VSCoder-- wrote:

Hey there, @NFlex23,

@PikachuB2005 is asking me to give you the link to this resource that he made: @PikachuB2005/fast follower count finder.

Hope this helps,
Thanks!
Thanks.
cheddargirl
Scratch Team
1000+ posts

How to webscrape follower counts with Python?

Closed by request.

Powered by DjangoBB