How to webscrape follower counts with Python?

NFlex23

My project, https://scratch.mit.edu/projects/563896966/, has been getting a lot of attention lately, and people have been complaining about the accurateness (is that even a word?) of ScratchDB. I would like to replace using ScratchDB with a webscraper so it's more accurate, but I'm not entirely sure where to start. It should preferably be able to detect non-existent users, so it could return an error.

Thank you in advance!

Last edited by NFlex23 (Oct. 15, 2021 20:31:47)

ScolderCreations

Make the program load scratch pages, and then look for links to profiles. Then, it would add each profile it sees to a list, and repeat. After a while, you'll have a ton of user account names, and then you can just get the API data and save it.

NFlex23

ScolderCreations wrote:
Make the program load scratch pages, and then look for links to profiles. Then, it would add each profile it sees to a list, and repeat. After a while, you'll have a ton of user account names, and then you can just get the API data and save it.

I'm not sure I understand. I want it to do something like this:

function get followers (username)
  set result to GET "https://scratch.mit.edu/users/username/followers"
  if user doesn't exist
    return "error"
  otherwise
    find the follower count in result and return it

Please excuse my pseudocode.

Last edited by NFlex23 (Oct. 15, 2021 20:49:33)

u7p

PseudoCode:

function getFollowers(username) {
  request = http.get(‘scratch.mit.edu/users/‘ + username)
  if(request.response == 404) {
    return ‘Requested user does not exist’
  }
  else {
    parser = new DOMParser()
    document = parser.parseFromString(request.content)
    followers = document.getElementById(‘follower-count’).textContent
    return followers
  }
}

NFlex23

u7p wrote:

PseudoCode:

function getFollowers(username) {
  request = http.get(‘scratch.mit.edu/users/‘ + username)
  if(request.response == 404) {
    return ‘Requested user does not exist’
  }
  else {
    parser = new DOMParser()
    document = parser.parseFromString(request.content)
    followers = document.getElementById(‘follower-count’).textContent
    return followers
  }
}

Is there a ‘DOMParser’-ish library for Python?

Maximouse

NFlex23 wrote:

u7p wrote:

PseudoCode:

function getFollowers(username) {
  request = http.get(‘scratch.mit.edu/users/‘ + username)
  if(request.response == 404) {
    return ‘Requested user does not exist’
  }
  else {
    parser = new DOMParser()
    document = parser.parseFromString(request.content)
    followers = document.getElementById(‘follower-count’).textContent
    return followers
  }
}

Is there a ‘DOMParser’-ish library for Python?

Yes.

Verixion

like this?

import requests
import re
def get_followers(username):
  # scratchr2 has more accurate follower counts because users havent been migrated to react
  request = requests.get(f'https://scratch.mit.edu/users/{username}/followers', headers={
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:94.0) Gecko/20100101 Firefox/94.0"
  })
  html = request.text
  regex_search = re.search(r"Followers \((\d+)\)", html)
  if regex_search == None:
    return
  else:
    return int(regex_search[1])
print(get_followers("Verixion"))

requests module must be installed for this to work, run

pip install requests

for this to work.

Last edited by Verixion (Oct. 16, 2021 11:08:42)

NFlex23

Verixion wrote:

like this?

import requests
import re
def get_followers(username):
  # scratchr2 has more accurate follower counts because users havent been migrated to react
  request = requests.get(f'https://scratch.mit.edu/users/{username}/followers', headers={
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:94.0) Gecko/20100101 Firefox/94.0"
  })
  html = request.text
  regex_search = re.search(r"Followers \((\d+)\)", html)
  if regex_search == None:
    return
  else:
    return int(regex_search[1])
print(get_followers("Verixion"))

requests module must be installed for this to work, run

pip install requests

for this to work.

Thank you! I didn't even think about using regexes, lol.

MagicCrayon9342

Verixion wrote:

like this?

import requests
import re
def get_followers(username):
  # scratchr2 has more accurate follower counts because users havent been migrated to react
  request = requests.get(f'https://scratch.mit.edu/users/{username}/followers', headers={
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:94.0) Gecko/20100101 Firefox/94.0"
  })
  html = request.text
  regex_search = re.search(r"Followers \((\d+)\)", html)
  if regex_search == None:
    return
  else:
    return int(regex_search[1])
print(get_followers("Verixion"))

requests module must be installed for this to work, run

pip install requests

for this to work.

umm… I am quite confused here. Not about the code, about the ‘Firefox 94.0’ Off-topic I know, just curious. Does Firefox have weekly/daily builds like Chrome does?

Last edited by MagicCrayon9342 (Oct. 16, 2021 17:41:15)

u7p

MagicCrayon9342 wrote:
umm… I am quite confused here. Not about the code, about the ‘Firefox 94.0’ Off-topic I know, just curious

What about it?

MagicCrayon9342

u7p wrote:
MagicCrayon9342 wrote:
umm… I am quite confused here. Not about the code, about the ‘Firefox 94.0’ Off-topic I know, just curious
What about it?

i updated the post recently. more info added

--VSCoder--

Hey there, @NFlex23,

@PikachuB2005 is asking me to give you the link to this resource that he made: @PikachuB2005/fast follower count finder.

Hope this helps,
Thanks!

Last edited by --VSCoder-- (Oct. 16, 2021 18:51:27)

NFlex23

--VSCoder-- wrote:
Hey there, @NFlex23,

@PikachuB2005 is asking me to give you the link to this resource that he made: @PikachuB2005/fast follower count finder.

Hope this helps,
Thanks!

Thanks.

cheddargirl

Closed by request.

Discuss Scratch