Discuss Scratch

gtoal
Scratcher
1000+ posts

Natural language parser under development

I finally wrote the grammar-driven parser I've been threatening to write for some time… https://scratch.mit.edu/projects/91892113

The language it accepts is fairly simple and I don't yet return a parse-tree or any attributes you can use - it merely says whether or not the input is grammatical - but I will add more features over time.

This can be used for chatbots, for vetting conversations in a whitelisted chat, for dungeon/adventure games, and one I particularly want to try - for translating from one language to another. A multi-lingual interactive chat which whitelisted vocabulary and grammar could be pretty impressive!

Try it out and give me your feedback, especially if you feel up to defining some grammar rules of your own and extending the syntax checker's vocabulary.

Rules look like

RULENAME alt1 | alt2 | alt3 | …

and alt (alternative) is a sequence of terms, eg THIS IS A TEST, or THIS IS <DETERMINER> <NOUN> - each space-separated item is either a literal or a phrase name in angle brackets.

The first rule is the root of the grammar.

Graham
hiccup01
Scratcher
100+ posts

Natural language parser under development

Just had a play with it, maybe it could be integrated with @griffpatch's word processor for grammar checking.

I'm a veteran Scratcher who likes to challenge themself with Scratch's limited toolset.
website | github
Dylan5797
Scratcher
1000+ posts

Natural language parser under development

Needs adjectives. Pretty cool though!

gtoal
Scratcher
1000+ posts

Natural language parser under development

Parsing looks quite robust now. My question is what to do with the parse to make it useful - I could return a parse tree, which is what I'd normally do in a high level language, but I'm wondering if something simpler might be more useful for Scratch: I could have each different statement issue a broadcast when it is parsed successfully… so for example in an adventure game any one of “N”, “North”, “go north”, “exit north” might cause a “North” broadcast to be generated? This wouldn't supply as much detail as a parse tree but if it were sufficient for your application it would be a lot easier to use?

(Edit: Done - it works surprisingly well.)

I'm trying to avoid returning a parse tree because to be honest I haven't yet come up with an easy to access data structure or interface to the data which would be easy for people to use in applications. Defining the grammar is easy (for the user I mean), I'm just not sure how the user will want to access the results of the parse.

I'll document how to use the parser in detail soon.

G

Last edited by gtoal (Dec. 19, 2015 07:07:09)

gtoal
Scratcher
1000+ posts

Natural language parser under development

Natural Language Parsing

If your program needs to interact with the user conversationally, you've probably already tried several ways of extracting keywords from what they type, and possibly have found them to be unsatisfactory in various ways.

One method of getting typed input that you may not have tried is to use a natural language parser - because until now there hasn't been one on Scratch. (The few projects that say they ‘parse’ text really just separate it into words - those should really be called ‘lexers’)

Parsing a sentence properly breaks it down into its syntactical structure - eg subject, verb, object - at multiple levels, eg subject -> noun phrase -> determiner + noun…

A full parser that's typically used to implement a programming language will return this syntactical structure as a tree, with the user's input inserted at the leaves of the tree.

The parser I've written could return a structure like this, but it doesn't because it's not for implementing programming languages - it's for handling natural language and letting the users express simple commands. Instead of returning a data structure with the full parse tree, what I've done here is allow the programmer to insert scratch “broadcast” messages that are triggered by the parser recognising phrases.

So how do you use it in your code?

Well, first you write a ‘grammar’ describing the sentences you are willing to accept.

This is pretty simple. A grammar rule has a name and a definition, for example
Rule [SENTENCE] :== [<VERB> <OBJECT>]
SENTENCE is the name and <VERB> <OBJECT> is the definition…

The definition contains either literal strings or other grammar rules. <VERB> <OBJECT> was a definition that used two other grammar rules. But let's expand those:

VERB can be defined as TAKE and OBJECT might be defined as LAMP, so with this simple grammar you can implement the command “take lamp” from the classic Adventure game.

But this isn't much of a language, with one command, so we want to extend it by allowing alternatives, like this
Rule [VERB] :== [take | drop]
Rule [OBJECT] :== [lamp | wand]
The alteratives are separated by “|” characters.

Now we can say any of “take lamp”, “drop lamp”, “take wand”, “drop wand”…

But it's still a bit terse and unnatural, so let's improve the grammar to make it proper English…
Rule [SENTENCE] :== [<VERB> <OBJECT>]
Rule [VERB] :== [TAKE | DROP]
Rule [OBJECT] :== [THE <NOUN> | <NOUN>]
Rule [NOUN] :== [LAMP | WAND]
Now we can say “Take the lamp” as well as “Take lamp”.

The final thing I'll cover in this intro is how to interact with your program. The way that I've chosen to do this is to insert dummy phrase names into the definitions, and have those phrases trigger a broadcast once the sentence has been recognised. The phrase names can be anything as long as they start with a “$” symbol.

For example,
Rule [SENTENCE] :== [<DIRECTION> | GO <DIRECTION>]
Rule [DIRECTION] :== [<NORTH> <$GO NORTH> | <SOUTH> <$GO SOUTH>]
Rule [NORTH] :== [N | NORTH]
Rule [SOUTH] :== [S | SOUTH]
With this grammar in place, whenever you type “go north” the parser will issue a Scratch broadcast with the description “$GO NORTH”, and you can use a “when I receive” block to perform the appropriate action.
when I receive [$GO NORTH v]

By separating your application into other sprites, you can import the parser sprite verbatim and not need to understand its internals or change it in any way other than entering the rules at the top.

Here's the project… https://scratch.mit.edu/projects/91892113/

Post questions on how to use it here, I'll follow this thread and reply promptly.

Last edited by gtoal (Dec. 23, 2015 00:36:24)

gtoal
Scratcher
1000+ posts

Natural language parser under development

Please try https://scratch.mit.edu/projects/92272269/ - it's the parser for a generic dungeon style adventure. There's no game or semantic processing yet, for now I'm just trying to get a good covering of the vocabulary and grammar. Try a few dungeon-like commands and make suggestions in the comments for grammatical structures that could be added to make the language more natural.

(For now there's no punctuation so don't type “.”, “,”, or “?”… - I'll get round to that relatively soon)

G
GrannyCookies
Scratcher
100+ posts

Natural language parser under development

This seems cool. Now, if you'll excuse me, I'm off to write an adventure game.

gtoal
Scratcher
1000+ posts

Natural language parser under development

Here's what I have so far. The grammar has grown incrementally to a complexity where I need to take a step back and refactor it in order to make it work correctly. Although this style of parser does lookahead and backtracking, it is still important to order the rules correctly to avoid situations where a sequence of text is valid but there is also a longer sequence of text possible which is never tried because the shorter one matched first.

For example:
Rule [STATEMENT] :== [HELLO | HELLO WORLD]
will not match “Hello world” because “hello” returned first, then ‘statement’ failed because there was left-over input text after matching “Hello”.

This however would work:
Rule [STATEMENT] :== [HELLO WORLD | HELLO]

and that's fine, but it does parse “hello” twice. For what we do here that's not a problem. For a compiler however you would probably want to write the same thing more efficiently so that the parser doesn't need to backtrack and retry:

Rule [STATEMENT] :== [HELLO <WORLD?>]
Rule [WORLD?] :== [WORLD | ]

Here I've used an empty terminal option which will return a match without removing any text from the input stream. So “Hello” will be removed from the input stream by “<statement>” and “world” - if present - will be removed by “<world?>”.

With that explained, here's my rough draft of an adventure game grammar.


define Define rules
Rule [STATEMENT] :== [<INDIRECT> <AND-SOMETHING> | HELP ME | HELP | SAVE | RESTORE | <INVENTORY> | <QUIT>]
Rule [AND-SOMETHING] :== [AND <INDIRECT> <AND-SOMETHING> |]
Rule [INDIRECT] :== [<BASIC> | <ASK>]
Rule [BASIC] :== [<MOVE> | <TAKE> | <DROP> | <THROW> | <OPEN> | <KILL> | <LOOK> | <LIGHT> | <PUT> | <READ> | <GIVE>]
Rule [OPEN] :== [OPEN <THE-OBJECT> <WITH?>]
Rule [WITH] :== [WITH <THE-OBJECT> |]
Rule [MOVE] :== [<MOVE/GO> <DIRECTION> | <MOVE/GO> TO THE <DIRECTION> | <DIRECTION> | TAKE THE <DIRECTION> EXIT | EXIT <DIRECTION> | <LEAVE> THROUGH THE <DIRECTION> <DOOR> | EXIT | LEAVE | GET OUT OF HERE | GO INTO THE <DIRECTION> ROOM | LEAVE TO THE <DIRECTION> | <CLIMB>]
Rule [MOVE/GO] :== [MOVE | WALK | CRAWL | FLY | FLEE | RUN | ESCAPE | GO]
Rule [CLIMB] :== [CLIMB OUT OF <THE-DOOR> | CLIMB OUT <THE-DOOR> | CLIMB OUT | CLIMB DOWN | CLIMB UP | CLIMB]
Rule [LEAVE] :== [LEAVE | EXIT | GO OUT | GO]
Rule [TAKE] :== [<TAKE/STEAL> <THE-OBJECT> <FROM?> | <TAKE/STEAL> <FROM?> | PICK UP <THE-OBJECT> | PICK IT UP | PICK <ALL> UP]
Rule [TAKE/STEAL] :== [TAKE | STEAL | GRAB | FETCH | FIND]
Rule [FROM?] :== [FROM <THE-PLAYER> |]
Rule [DROP] :== [DROP <THE-OBJECT> | PUT DOWN <THE-OBJECT>]
Rule [ALL] :== [ALL | EVERYTHING]
Rule [THROW] :== [THROW <THE-OBJECT> <AT>]
Rule [AT] :== [AT <THE-PLAYER-ACCUSATIVE>]
Rule [LOOK] :== [LOOK <AT/AROUND> THE ROOM | LOOK AROUND | LOOK | <EXAMINE> | WHERE AM I | WHAT CAN YOU SEE | WHAT CAN I SEE | WHAT DO YOU SEE | WHAT'S HERE]
Rule [AT/AROUND] :== [AT | AROUND ]
Rule [EXAMINE] :== [EXAMINE <THE-OBJECT> <CLOSELY> | LOOK <CLOSELY> AT <THE-OBJECT> | LOOK <AT/AROUND> THE ROOM <CLOSELY>]
Rule [CLOSELY] :== [EVEN MORE CLOSELY | MORE CLOSELY | CLOSELY |]
Rule [INVENTORY] :== [INVENTORY | INVENT | INV | LOOK AT WHAT I AM CARRYING | WHAT AM I CARRYING]
Rule [DIRECTION] :== [<NORTH> | <SOUTH> | <EAST> | <WEST> | <UP> | <DOWN> | ONWARDS | ON | FORWARDS | FORWARD | BACKWARDS | BACKWARD | BACK]
Rule [NORTH] :== [NORTH | NORTHWARDS | NORTHWARD | N]
Rule [SOUTH] :== [SOUTH | SOUTHWARDS | SOUTHWARD | S]
Rule [EAST] :== [EAST | EASTWARDS | EASTWARD | E]
Rule [WEST] :== [WEST | WESTWARDS | WESTWARD | W]
Rule [UP] :== [UP | UPWARDS | UPWARD | U| CLIMB UP | CLIMB | ABOVE | ASCEND]
Rule [DOWN] :== [DOWN | DOWNWARDS | DOWNWARD | D | DIG DOWN | DIG | BELOW | DESCEND]
Rule [THE-OBJECT] :== [<THE> <OBJECT> | <OBJECT> | IT | ALL | EVERYTHING | SOMETHING |]
Rule [OBJECT] :== [<WAND> | <CHALICE> | <LAMP> | <BOX> | <READABLE-OBJECT> | AXE | <PESTLE> | KEY | <DOOR>]
Rule [THE-DOOR] :== [<THE> <DOOR>]
Rule [DOOR] :== [DOOR | WINDOW | TRAPDOOR | HOLE IN THE WALL | CELLAR DOOR | ATTIC DOOR | SKYLIGHT | HATCH]
Rule [BOX] :== [BOX | MAILBOX | MAIL BOX | LETTER BOX | LETTER-BOX | LETTERBOX]
Rule [READ] :== [READ <THE-READABLE-OBJECT>]
Rule [THE-READABLE-OBJECT] :== [<THE> <READABLE-OBJECT> | <READABLE-OBJECT> | ALL | EVERYTHING | IT |]
Rule [READABLE-OBJECT] :== [LETTER]
Rule [WAND] :== [WAND | ROD | RUSTY ROD | MAGIC WAND]
Rule [CHALICE] :== [CHALICE FROM THE PALACE | CHALICE | CUP]
Rule [PESTLE] :== [PESTLE FROM THE VESSEL | PESTLE]
Rule [THE-LAMP] :== [<THE> <LAMP> | <LAMP>]
Rule [LAMP] :== [LAMP | TORCH | LIGHT | FLASHLIGHT]
Rule [LIGHT] :== [<LAMP> <ON-OR-OFF> | LIGHT <THE-LAMP> | LIGHT IT UP | LIGHT IT | LIGHT | TURN <THE-LAMP> <ON-OR-OFF> | TURN <ON-OR-OFF> THE <LAMP> | EXTINGUISH <THE-LAMP> | TURN IT <ON-OR-OFF>]
Rule [THE-PLAYER] :== [<THE> <PLAYER> | <PLAYER>]
Rule [PLAYER] :== [PIRATE | BEAR | <SCRATCH> | GOBO | DWARF | TROLL]
Rule [THE-PLAYER-ACCUSATIVE] :== [<THE-PLAYER> | HIM | HER | IT | ME]
Rule [SCRATCH] :== [SCRATCH THE CAT | SCRATCHY | SCRATCH | CAT]
Rule [ON-OR-OFF] :== [ON | OFF]
Rule [KILL] :== [<ATTACK> <THE-PLAYER> <WITH?>]
Rule [WITH?] :== [WITH <THE-OBJECT> | ]
Rule [ATTACK] :== [KILL | HIT | ATTACK]
Rule [QUIT] :== [QUIT | BYE | LOGOUT | LOGOFF | LOG OUT | LOG OFF | GOODBYE | GOOD BYE | STOP]
Rule [PUT] :== [PUT <THE-OBJECT> IN <THE-OBJECT>]
Rule [GIVE] :== [GIVE <THE-OBJECT> TO <THE-PLAYER-ACCUSATIVE> | GIVE <THE-PLAYER-ACCUSATIVE> <THE-OBJECT> | GIVE <THE-OBJECT>]
Rule [ASK] :== [<REQUEST> <THE-PLAYER-ACCUSATIVE> TO <BASIC>]
Rule [THE] :== [THE | A]
Rule [REQUEST] :== [ASK | TELL | BEG]

This should accept sentences like “tell the troll to steal a lamp”…

In a game, we'd probably have a mechanism to allow us to add more rules that were only activated when the player was in a certain room or a certain event had happened; a rule might be created and activated just for one response, to allow a question to be answered. The mechanism is fairly powerful but until we hack up a working proof of concept, the parser will undoubtedly lack a few key features.

In any system other than Scratch, I would have already built a regression-test script that feeds example sentences in to the parser to confirm that existing sentences continue to work after changes to the grammar, which is definitely something that a language designer needs to worry about. Actually maybe I _can_ create a regression test in Scratch… I'll give it a try and see what I can come up with…

G
alexphan
Scratcher
1000+ posts

Natural language parser under development

Good tutorial!
Maybe try adding a parameter that lets the user type any string in the definition?
gtoal
Scratcher
1000+ posts

Natural language parser under development

alexphan wrote:

Good tutorial!
Maybe try adding a parameter that lets the user type any string in the definition?

Can you give me an example of what you mean? Any alternative in the first rule (“sentence”) will be accepted if it is typed by the player.

One thing I'm looking at adding to make grammar specification easier is to allow you to define a rule multiple times, and each definition will be added to the initial one as if it had been separated by a “|” character. This will help avoid unnecessary levels of dummy rules, while also avoiding ugly source code with extremely long lines.

G
Thepuzzlegame
Scratcher
1000+ posts

Natural language parser under development

Cool! It just so happens that I've been working on building a text adventure engine in JavaScript for a bit, your parser already seems to be a leg up from mine :p

hi!
gtoal
Scratcher
1000+ posts

Natural language parser under development

Thepuzzlegame wrote:

Cool! It just so happens that I've been working on building a text adventure engine in JavaScript for a bit, your parser already seems to be a leg up from mine :p

It's actually using possibly the oldest design of parser out there… table-driven top-down recursive descent with backtracking went out of favour in the 70's because there were faster algorithms available, but this is *extremely* simple to write and more powerful than many more complex parsers, and with the speed of modern computers nowadays it seemed ripe for a revival :-)
If you want to use this in Javascript, here's the design:

Each rule is a sequence of “OR” options, and each “OR” option is a sequence of “AND” terms.

The grammar is stored in a flat array. Element 1 is the root statement.

The rule is encoded as sequential elements in the array. First is a count of the number of “OR” options, then comes the first OR option.

An OR option is a count of the number of AND terms, followed by the terms themselves. Each term takes only 1 array element. It is either an index to the start of another rule, or it is a literal. In Scratch (and Javascript I think) you can overload these cells and store either an integer or a string in the same cell; in other languages you would store the literals as indexes to a string array, maybe using negative numbers to distinguish them from pointers.

Here's a worked example:

SENTENCE :== Hello world | Goodbye world | something else

The array for that would be:

1: 3
2: 2 “Hello” “World”
5: 2 “Goodbye” “World”
8: 2 “something” “else”

And so on recursively:

SENTENCE :== <Hello-or-goodbye> world | something else

1: 2
2: 2 8 “world” ( 8 is the index of <hello-or-goodbye> )
5: 2 “something” “else”

Hello-or-goodbye :== Hello | Goodbye
8: 2
9: 1 “Hello”
11: 1 “Goodbye”

So to recursively parse a user's data, you start at entry 1. There are two options so you repeat to handle them both.

Save the state of the parse before you handle an option.

To handle an option within the repeat loop, get the next integer in the grammar, which tells you how many terms are in that phrase.

If the term is a literal, does it match the input? If yes, match that term and move the input pointer on to the next word.
If the term was another rule, parse it recursively.

Repeat for each term. If all the terms matched, return success

otherwise backtrack to the saved state and go on to try the sequence of terms in the next alternative.

That's pretty much it. The parser is actually the easiest bit of the code to write - the hard part is converting the descriptive grammar into the array described above. Also in Scratch where there are no early loop exits ('break' in C), you have to take a little care to keep the recursion stack balanced when you return from a successful alternative sequence.

Last edited by gtoal (Dec. 23, 2015 02:57:12)

alexphan
Scratcher
1000+ posts

Natural language parser under development

gtoal wrote:

Can you give me an example of what you mean? Any alternative in the first rule (“sentence”) will be accepted if it is typed by the player.

G

For example, if you were to type a name, instead of making a very long list of names in a rule:

Rule [SENTENCE] :== [MY NAME IS <NAME>]
Rule [NAME] :== [AARON | ALICE | BOB | BEATRICE | CARL | CASEY etc.]

You could allow the user to type in any string to represent any name, so as to avoid having a bunch of names onto one block.

Rule [SENTENCE] :== [MY NAME IS <#STRING>]

(maybe add a symbol next to the grammar rule so the parser doesn't mistake it for an undefined rule?)

I'm sorry if this is confusing, I would be more than happy to clarify if it is
Thepuzzlegame
Scratcher
1000+ posts

Natural language parser under development

alexphan wrote:

gtoal wrote:

Can you give me an example of what you mean? Any alternative in the first rule (“sentence”) will be accepted if it is typed by the player.

G

You could allow the user to type in any string to represent any name, so as to avoid having a bunch of names onto one block.

Rule [SENTENCE] :== [MY NAME IS <#STRING>]

Are you referring to wildcards?

hi!
alexphan
Scratcher
1000+ posts

Natural language parser under development

Thepuzzlegame wrote:

Are you referring to wildcards?

Oh, wildcards! Wow I'm dumb
gtoal
Scratcher
1000+ posts

Natural language parser under development

alexphan wrote:

gtoal wrote:

Can you give me an example of what you mean? Any alternative in the first rule (“sentence”) will be accepted if it is typed by the player.

G

For example, if you were to type a name, instead of making a very long list of names in a rule:

Rule [SENTENCE] :== [MY NAME IS <NAME>]
Rule [NAME] :== [AARON | ALICE | BOB | BEATRICE | CARL | CASEY etc.]

You could allow the user to type in any string to represent any name, so as to avoid having a bunch of names onto one block.

Rule [SENTENCE] :== [MY NAME IS <#STRING>]

(maybe add a symbol next to the grammar rule so the parser doesn't mistake it for an undefined rule?)

I'm sorry if this is confusing, I would be more than happy to clarify if it is

There will be a way to handle this… it will need a few more parser features to be in place before I can add it though. First we need a data structure to extract values from the parse tree, and we need a phrase that will match the input from a previous match regardless of what it was, and finally we need an unchecked/unparsed input type that will match anything (like a wildcard as suggested above) so that the conversation can take arbitrary words that the user enters and use them in responses. eg “open the znarfle with the key” -> “I'm not sure there are any znarfles here, and even if there were I would have no idea how to open one!”

Tangentially related to this - I want to be able to construct rules from arrays, eg an array of adjectives or adverbs or nouns, so that even impossible requests can still be parsed, which will allow for better phrasing of the negative responses. It gets boring fast if everything you try is just met with “Sorry, can't”…

G
ChocolatePi
Scratcher
1000+ posts

Natural language parser under development

you could theoretically make a programming language parser out of this, if scratch had proper support for multiline strings!
Jonathan50
Scratcher
1000+ posts

Natural language parser under development

ChocolatePi wrote:

you could theoretically make a programming language parser out of this, if scratch had proper support for multiline strings!
it does (kind of)
you can edit the JSON of a file to put a newline in a block input or make a variable with a newline in it:
if <(answer) = (newline
say [You just entered a newline.
end

say (join [Hello,] (join (newline) [World!

Last edited by Jonathan50 (Dec. 22, 2015 23:01:26)


Not yet a Knight of the Mu Calculus.
drmcw
Scratcher
1000+ posts

Natural language parser under development

Thorin enters!


Excellent! Could use this and Griffpatch's floodfill to make https://www.youtube.com/watch?v=3f3PFfK-9Gk


10 !
ScratchVaders or Galaga?
Maybe Eliza can help you decide?
Thepuzzlegame
Scratcher
1000+ posts

Natural language parser under development

drmcw wrote:

Thorin enters!


Excellent! Could use this and Griffpatch's floodfill to make https://www.youtube.com/watch?v=3f3PFfK-9Gk

Thorin sits down and starts singing about gold.

Last edited by Thepuzzlegame (Dec. 23, 2015 00:35:52)


hi!

Powered by DjangoBB