Discuss Scratch

4meansmuchmore
Scratcher
75 posts

Don't let people use weird Unicode characters that look like letters to spell words in comments

I saw a post on page 2 about typing quirks, and it made me think.

Let's say the word “pie” is a nasty word.

Someone could post “≋p≋i≋e≋” and the swear filter wouldn't catch it. Why can't we just add those workarounds to the filter? Because there are way too many.

Someone trying to evade punishment could post ​℘ꪱׁׅꫀׁׅܻ​, or PIΣ. These strings don't contain any real letters, so the filter wouldn't deduce them easily. It would be better to just disable these characters altogether, right? Let me know what you think, guys.

Last edited by 4meansmuchmore (Nov. 16, 2024 19:09:42)

SockAlternative
Scratcher
500+ posts

Don't let people use weird Unicode characters that look like letters to spell words in comments

This would be ridiculously hard to pull off. There are 150,000 unique unicode characters. For scratch to look through all of them and determine if they look like letters would be a waste of resources. If scratch simply decides to block all unicode characters, then go back and unban the common ones, that might be a problem for other languages. Reporting people evading the filter would be a better option.
4meansmuchmore
Scratcher
75 posts

Don't let people use weird Unicode characters that look like letters to spell words in comments

SockAlternative wrote:

This would be ridiculously hard to pull off. There are 150,000 unique unicode characters. For scratch to look through all of them and determine if they look like letters would be a waste of resources. If scratch simply decides to block all unicode characters, then go back and unban the common ones, that might be a problem for other languages. Reporting people evading the filter would be a better option.
I disagree. There are categories for unicode characters, so they could simply disable the categories that would be completely unnessecary (those that contain lookalike characters that aren't just letters from different alphabets)
SockAlternative
Scratcher
500+ posts

Don't let people use weird Unicode characters that look like letters to spell words in comments

4meansmuchmore wrote:

I disagree. There are categories for unicode characters, so they could simply disable the categories that would be completely unnessecary (those that contain lookalike characters that aren't just letters from different alphabets)
Still, this doesn't fix the problem of there being 150,000 unicode symbols. Scratch would have to look at each catagory, determine if any of the symbols are used in different languages, decide if they look close enough to letters, then go block them. That seems like it would take forever, be expensive, and barely help.
4meansmuchmore
Scratcher
75 posts

Don't let people use weird Unicode characters that look like letters to spell words in comments

I really can't agree that it would be that hard. The Scratch Team could just not review the categories that don't have letter-like symbols in them. It would be better to at least block some of these characters than do nothing at all.

Last edited by 4meansmuchmore (Nov. 16, 2024 20:07:22)

WindowsAdmin
Scratcher
1000+ posts

Don't let people use weird Unicode characters that look like letters to spell words in comments

Kid named the report button:
Za-Chary
Scratcher
1000+ posts

Don't let people use weird Unicode characters that look like letters to spell words in comments

Blocking all such characters isn't feasible. Note that Σ, used in one of your examples, is the Greek letter sigma*. Since this is just a “real letter” in another language, the Scratch Team wouldn't block it, or else this would make communicating on Scratch significantly more difficult for those who speak that language. Imagine if the Scratch Team just censored the letter S entirely!

Another common example is the “small caps” font that is commonly used in typing quirks, filter evasions, etc…. But many of the “small caps” letters are actually letters in the Russian alphabet, I believe.

Obviously not all unicode symbols are letters (in English or otherwise), but regardless of how many get blocked, I'm sure people will continue to find workarounds for everything if they feel like trolling…

*Y'know, like sigma males.
4meansmuchmore
Scratcher
75 posts

Don't let people use weird Unicode characters that look like letters to spell words in comments

Maybe instead of automatically blocking the symbols, we just stop allowing people to speak like this? That way, people won't be inspired to speak in code all the time, which would make moderation much easier. I don't think a severe penalty would make sense, but maybe a warning would work.
Za-Chary
Scratcher
1000+ posts

Don't let people use weird Unicode characters that look like letters to spell words in comments

4meansmuchmore wrote:

Maybe instead of automatically blocking the symbols, we just stop allowing people to speak like this?
How would that be done? If someone uses these characters to say inappropriate things, you can already report that. If not, then that just leads to the sorts of “typing quirks” which are currently allowed — and disallowing them is rejected.
8to16
Scratcher
1000+ posts

Don't let people use weird Unicode characters that look like letters to spell words in comments

Rejected

Paddle2See wrote:

(#116)

ajskateboarder wrote:

Look, I understand typing quirks are a way of expressing yourself, but there comes a point where they just get in the way. Many typing quirks can make comments incredibly hard for people to understand or read as-is/with a screenreader. And, after a certain point, they can make comments outright impossible to translate into other languages. These issues can also potentially hurt moderation in the case that a comment should be reported

With these problems in mind, I think Scratch could at least include a report option for comments being too hard to read
You can always report a comment if you think it violates the Community Guidelines in some way. But reporting for style issues doesn't seem like it would work. Some “quirks” may actually be caused by a brain or physical disorder. Or somebody trying to use a language with which they may not be familiar. Or somebody trying to bond with a group that uses a particular style.

If you are having difficulty understanding what someone is saying - politely tell them so. It's up to them to decide if it's something that can be improved - or not.
4meansmuchmore
Scratcher
75 posts

Don't let people use weird Unicode characters that look like letters to spell words in comments

That's a different suggestion.
4meansmuchmore
Scratcher
75 posts

Don't let people use weird Unicode characters that look like letters to spell words in comments

By that logic, we should allow people to make cloud chatrooms, and only report when something bad happens. Isn't it better to prevent problems?
ThisIsTemp1
Scratcher
1000+ posts

Don't let people use weird Unicode characters that look like letters to spell words in comments

4meansmuchmore wrote:

By that logic, we should allow people to make cloud chatrooms, and only report when something bad happens. Isn't it better to prevent problems?
Only if we don’t create more problems, and the blocking is effective.
Za-Chary
Scratcher
1000+ posts

Don't let people use weird Unicode characters that look like letters to spell words in comments

4meansmuchmore wrote:

By that logic, we should allow people to make cloud chatrooms, and only report when something bad happens. Isn't it better to prevent problems?
This is why I am asking how you propose the problem be prevented, if not by automatically blocking certain unicode symbols.
SockAlternative
Scratcher
500+ posts

Don't let people use weird Unicode characters that look like letters to spell words in comments

4meansmuchmore wrote:

I really can't agree that it would be that hard. The Scratch Team could just not review the categories that don't have letter-like symbols in them. It would be better to at least block some of these characters than do nothing at all.
How would they know not to look at the categories with letter like symbols? Thinking something looks like a letter is extremely subjective.
Skadoodly
Scratcher
1000+ posts

Don't let people use weird Unicode characters that look like letters to spell words in comments

What seems more logical would be to have scratch internally register those unicode characters as the letter they look like instead and still treat them as such-given the sheer scale of the number of unicode characters that exist this would take forever though. I've seen some discussion of integrating AI into scratch, and that could come in handy for this.
HermioneGranger471
Scratcher
43 posts

Don't let people use weird Unicode characters that look like letters to spell words in comments

Maybe scratch could have it so it ignores Unicode like this
Pie is the nasty word here
=P=i=e=
Now the filter will see it like this
Pie
I don’t know if this is a good idea or not
han614698
Scratcher
1000+ posts

Don't let people use weird Unicode characters that look like letters to spell words in comments

HermioneGranger471 wrote:

(#17)
Maybe scratch could have it so it ignores Unicode like this
Pie is the nasty word here
=P=i=e=
Now the filter will see it like this
Pie
I don’t know if this is a good idea or not
I don't think = is a special unicode character.

i think you would find if you were to spell out a bad word with = it would be filtered
HermioneGranger471
Scratcher
43 posts

Don't let people use weird Unicode characters that look like letters to spell words in comments

han614698 wrote:

HermioneGranger471 wrote:

(#17)
Maybe scratch could have it so it ignores Unicode like this
Pie is the nasty word here
=P=i=e=
Now the filter will see it like this
Pie
I don’t know if this is a good idea or not
I don't think = is a special unicode character.

i think you would find if you were to spell out a bad word with = it would be filtered

Sorry I’m bad with special Unicode but replace that with special Unicode and you’ll get what I mean right?

Powered by DjangoBB