Voice Recognition

Scratchtheguy1

Support, I would like the block to look this:

() spoken?:: #076585 boolean

It has a bad word filter built-in

It will also detect special programs that bypass filters.

medians

Bringing this topic up.

jmdzti_0-0

Scratchtheguy1 wrote:
Support, I would like the block to look this:
() spoken?:: #076585 boolean
It has a bad word filter built-in

It will also detect special programs that bypass filters.

i don’t think we need a bad word filer for something the user themself says.

medians

jmdzti_0-0 wrote:
Scratchtheguy1 wrote:
Support, I would like the block to look this:
() spoken?:: #076585 boolean
It has a bad word filter built-in

It will also detect special programs that bypass filters.
i don’t think we need a bad word filer for something the user themself says.

We could also prevent what the user is saying from being saved to the cloud if the concern is bad words being saved to the cloud too. Scratch can already do this because they already do it for the Face Sensing extension that is planned to be added, as well as the Video Sensing extension
Also, other extensions can already be used for blocking inappropriate words

BigNate469

Scratchtheguy1 wrote:
Support, I would like the block to look this:
() spoken?:: #076585 boolean
It has a bad word filter built-in

First of all, that's a privacy issue- if it's constantly listening. You could have a project that just cycles through a list of commonly used words and builds a transcript out of those words, and sends that transcript to who knows where using cloud variables and an external server. Though, as @medians said, this can be stopped by disabling cloud variables in projects using this block.

Second, it's incredibly inefficient and would either lag the webpage (if the processing is done locally) or likely cost the ST a decent sum of money (if it's done on someone's servers, like AWS), because it would have to be constantly analyzing the input audio.

Scratchtheguy1 wrote:
It will also detect special programs that bypass filters.

How?

I think a better solution would be a pair of blocks:

[start v] recording from microphone :: #076585
[stop v] recording from microphone :: #076585

(transcript of microphone recording :: #076585) //would output any text detected in the last few seconds of audio

Where the audio is only processed on demand, and only the last few seconds of audio are ever saved or processed. The speech from the recording can't be processed until the recording is stopped (the block will return an empty string). All three could be made to be mandatory yield points as well, to prevent the same issues that the previous block had (you have to wait at least a frame between reading audio)

Additionally, a disclaimer similar to those on projects using the username block and the face sensing blocks would appear before the green flag is clicked, and (as is required by the way the JavaScript APIs that would make this work function) the browser's dialog about allowing Scratch to use your microphone would appear.

Last edited by BigNate469 (Aug. 8, 2025 15:36:17)

MousePotato1234

jmdzti_0-0 wrote:
Scratchtheguy1 wrote:
Support, I would like the block to look this:
() spoken?:: #076585 boolean
It has a bad word filter built-in

It will also detect special programs that bypass filters.
i don’t think we need a bad word filer for something the user themself says.

Well what if a user says something that sounds a little like an expletive but is not? I support a bad word filter.

jmdzti_0-0

BigNate469 wrote:
First of all, that's a privacy issue- if it's constantly listening. You could have a project that just cycles through a list of commonly used words and builds a transcript out of those words, and sends that transcript to who knows where using cloud variables and an external server. Though, as @medians said, this can be stopped by disabling cloud variables in projects using this block.

no, it requires permission from the user, and everything happens in the browser, and then it’s sent. have you read like literally the third post at all?

plus, transcribed speech is not PII at all. at most it could be used to make chatrooms, which are already not allowed.

Last edited by jmdzti_0-0 (Aug. 8, 2025 20:49:25)

cubetube7

i don't understand how a speech to text extension will invade privacy in any way.

firstly, the user literally has to press a big square “allow” button to give scratch any access to their microphone
and also:
1. the only thing the code can access is TEXT. it CANNOT access ANY sound.
2. scratch doesn't even receive the audio data. processing can be setup to occur locally (and ST will definitely find a way to do processing locally to minimize server load)
3. even if scratch received the audio, they couldn't sell it in any way otherwise they'd break their legal non-profit company licencing thingy, and they'd also be breaking child safety laws at the same time, so they'd be hunted down by the police
not to mention that the video sensing extension is able to literally RECORD your face

so this idea is not privacy invasive

i like the idea of being able to make a voice-controlled game, that sounds fun
if this is paired with the TTS extension in a game it would be super op
imo i think the blocks for this extension should be:

when [apple] heard::sensing hat
start voice detection::sensing
end voice detection::sensing
detected speech::sensing reporter // would act like the sensing "answer" reporter/variable

when voice detected::sensing hat
detect [any v] voice::sensing // four different options: any/male/female/child
set detection to [English v]::sensing

Last edited by cubetube7 (Nov. 1, 2025 19:40:50)

Discuss Scratch