These exercises develop some useful utility functions for working with the semantic triples representation of knowledge. Real world systems can contain millions, even trillions, of triples, and often work with RDF for maximum portability. These exercises use the much simpler triples code developed in class, as a basis for exploring what semantic queries can be like, and ways to optimize AI and Lisp code in general.
Files and packages:
Package: cs325-user
Files:
- Base code: graph-search.lisp
- Test cases: graph-search-tests.lisp
- Test data: movie-triples.txt
Exercises
:NOT queries
Function names: query-search
Test cases: (run-tests graph-search-not)
Change query-search
so that (:not query1 query2 ...)
patterns work. Such queries are needed to rule out results in a graph
search, such as this test case for graph-search-not
:
(graph-search '((?movie movie ?year) (:not (?movie actor tom_cruise))))
This should return all movies that do NOT have Tom Cruise as an actor, by first getting all movies, then eliminating those that have Tom Cruise as an actor.
The :not
can be a graph of multiple queries, that might include
new variables. The :not
applies if something matching all the queries can be found. For example,
(graph-search '((?movie actress scarlett_johansson) (:not (?movie director ?director) (?movie2 actor ?director))))
This asks for movies with Scarlett Johansson, but not those where the director was an actor in some movie.
This is equivalent to SparQL's FILTER NOT EXISTS form. It is not the same as SparQL's MINUS form.
:FILTER patterns
Function names: query-search
Tests: (run-tests filter-blists graph-search-filter)
Change query-search
so that (:filter filter1 filter2 ...)
patterns work. A filter is a Lisp-like expression of the form
(function arg1 arg2 ...)
Where function is the name of a Lisp function, built-in or user-defined,
and each arg is either a pattern variable, like ?name
or a constant value, like 1999
or tom_waits
.
A (:filter filter1 filter2 ...)
pattern is applied to every binding list gathered so far by graph-search
.
Each binding list is kept only if all the filter forms are true. A filter is
true if its function, when applied to the values of the arguments, returns true.
The value of each argument is:
- the argument itself, if it is not a variable, or is a variable not in the binding list
- the value the argument variable has in the binding list
Here are two examples from the test cases for graph-search
.
(graph-search '((?movie actor tom_cruise) (?movie movie ?year) (:filter (> ?year 1990))))
This collects movies with Tom Cruise and their years, then filters out movies not greater than 1990.
(graph-search '((?movie1 movie 1999) (?movie2 movie 1999) (:filter (not-equal ?movie1 ?movie2)) (?movie1 actress ?actress) (?movie2 actress ?actress)))
This finds any actress who was in two different movies in 1999. (Note that two answers are returned that are actually the same. Could that be eliminated with a different filter?)
Because filters are somewhat complicated, there is a set of test cases just for filtering. Your code must pass both test sets:
Do not use EVAL to implement filters. Page 161 of Graham explains why
eval
is to be avoided in general.
External Interaction
The exercises below are relevant for connecting an external client, like a browser, to a semantic web service in Lisp.
atomize
Function names: atomize
Tests: (run-tests atomize)
Externally, semantic web triples contain numbers, strings, or qualified names. Qualified names
look like Lisp symbols, e.g., foo:bar
.
For our movie application, there are just symbols and numbers. Strings like "American Beauty"
need to be converted to AMERICAN_BEAUTY
Define a function (atomize string)
that takes a string containing
zero or more "items", where an item can be anything you see in the SELECT queries below.
The function should return a list of atoms, either numbers or symbols. Items that look like symbols
in the input should be unchanged. String values like "Tom Cruise" should become symbols like
TOM_CRUISE.
Punctuation should map to symbols with Lisp character names.
{
and }
are both Lisp-readable symbol names. Parentheses and periods
are not, but you can create a symbol whose name contains those characters using vertical bars, i.e.,
|.|
, |(|
, and |)|
. If you want to make
a symbol for the Sparql OR operator, you would write ||||
.
(See Graham, page 134.)
The input string might include new line or tab characters. They should be treated like spaces.
Chapters 7 and 8 on reading and symbols cover most of what you need. Check the relevant glossary sections for more, including the glossary sections on strings and sequences. Strings are sequences.
Be sure to read the comments in the test cases, to see what's being tested by different test cases.
SELECT parser
Function names: select-queries
Test cases: (run-tests select-queries)
An external client for a semantic web server will send SparQL SELECT forms, such as the ones sent to DBpedia . SparQL statements are sent as simple strings, to be parsed by the server into an executable query.
Define select-queries
to take a string with simplified SparQL
and returns a list of queries that could be passed to
graph-search
. The simplifications to SparQL are
- No PREFIX expressions
- FILTERs have simple infix operators with string or numeric data, but no types, e.g., marking something as a date
Here are the queries above, expressed in simplified SparQL:
SELECT ?movie, ?year WHERE { ?movie movie ?year FILTER NOT EXISTS { ?movie actor "Tom Cruise" } }
This uses the FILTER NOT EXISTS SparQL feature.
SELECT ?movie, ?year WHERE { ?movie movie ?year . ?movie actor "Tom Cruise" FILTER () ?year > 1990 ) }
As far as I can tell from the grammar, a filter can have only one form, so if a query had several, e.g., a movie without Tom Cruise, after 1990, it might be like this:
SELECT ?movie, ?year WHERE { ?movie movie ?year . ?movie actor "Tom Cruise" FILTER NOT EXIST { ?movie actor "Tom Cruise" } FILTER ( ?year > 1990 ) }
None of the test cases explore multiple filters, and serious syntactic parsing is not the point of this exercise.
Note that W3 triples syntax used an isolated period to separate multiple triples.
The input string might include new line characters.
A number of operators are recognized in filter patterns.
SELECT ?movie1, ?movie2, ?actress WHERE { ?movie1 movie 1999 . ?movie2 movie 1999 . ?movie1 actress ?actress . ?movie2 actress ?actress FILTER ( ?movie1 != ?movie2 ) }
For this exercise, you can just handle simple comparisions, i.e., less than, greater than, and not equal. If you want to handle more complex expressions, do the infix parser exercise. That's a good challenge, but more than the test cases require. They only use simple comparisons.
Graph search server
Function names: N/A
Test cases: N/A
This applications uses code from the above exercises.
Function names: graph-search-handler, graph-search-service
There are no unit tests, but the code should return the correct JSON encoding (see below) for all of the graph search SELECT queries given above. Your code should allow strings with new line characters.
Implement a JSON web service to return graph-search
results
in an association list compatible with
the W3C JSON format
for SparQL results. This format is pretty simple:
{ "head": { "vars": [ "book" , "title" ] } , "results": { "bindings": [ { "book": { "type": "uri" , "value": "http://example.org/book/book6" } , "title": { "type": "literal" , "value": "Harry Potter and the Half-Blood Prince" } } , { "book": { "type": "uri" , "value": "http://example.org/book/book7" } , "title": { "type": "literal" , "value": "Harry Potter and the Deathly Hallows" } } } }
In the JSON object returned:
- head.vars has an array of the names of the requested variables, without no question marks.
- results.bindings has an array of binding lists. Each binding is an object, where each variable is a key, and the value is an object with type and value data. For our purposes, the type will always be "literal".
To keep the front-end simple (see last task), if there is bad input, return a JSON object with
Only the variables listed in the SELECT clause should be including in the binding lists encoded in JSON, to reduce data traffic. Parsing out the list of variables is much easier than creating the graph search queries.
- head.vars with the array
["error"]
- results.bindings with an array of one binding list,
that binds
"error"
to an error message.
Create this as a SimpleServer app.
Packaging Notes
Up to this point, we've put everything into the cs325-user
package for convenience. That's fine for prototyping, but once you start
making an application, it's time to be more systematic and modular.
At least three packages are needed here:
graph-search
for the graph search codegraph-search-server
for the SimpleServer app codegraph-search-tests
for the graph search unit tests
All of the code defined above should go into the graph-search
package. This package should only export
those symbols needed to make graph-search-server
code work.
The graph-search-server
package should use graph-search
,
simple-server
, and your JSON library, e.g., cl-json
.
The unit tests should go in the graph-search-tests
package.
Some of the unit tests, for exercise purposes, test functions that don't need
to be exported. To get access to those in the test package, do this:
(in-package :graph-search-tests) (import '(':graph-search::atomize graph-search::filter-blists ...))
Graph Search Server Notes
To simplify debugging, define two top-level functions in graph-search-server
:
graph-search-handler
should handle incoming HTTP requests. It should extract the required incoming data, callgraph-search-service
, and return whatever is returned as JSON.graph-search-service
should parse the string parameters, callgraph-search
, and restructure the results into a list easy to convert to JSON.
With this approach, you can test and trace most of your code without making actual HTTP requests.
To create lists suitable for JSON encoding, see this tip.
Graph search web page
Implement a simple form-based user interface for submitting SparQL queries to the graph search service, and displaying the results. A user should be able to enter a SparQL SELECT form, like the examples above, into a multiline textbox, click a button, and see the results, if any, in an HTML table below. Use the SnorQL browser as your model, but don't worry about PREFIX forms.
Keep this short and simple, and all in one HTML file, including any CSS and JavaScript you need. The emphasis should be on the JavaScript for fetching and displaying results.
Infix Parser
Function names: infix->prefix
Test cases: (run-tests infix->prefix)
Real filter expressions can have arithmetic, comparisons, conjunctions, disjunctions,
and parentheses, just like logical expressions in JavaScript, e.g., "?x - ?y < 100 || ?y > 12".
The best way to handle those is to implement
an operator-precedence parser.
Every infix operator is given a precedence level. This is so that an expression
such as "a + b * c" is interpreted as (+ a (* b c))
not
(* (+ a b) c))
. Here's a common set of predecences, from lowest
to highest:
Operators | Notes |
---|---|
) | right parenthesis |
|| | or |
&& | and |
= != < <= > >= | relational operators |
+ - | addition, subtraction |
* / % | multiplication, division |
( | left parenthesis |
The parsing algorithm reads a list of tokens, and manages two stacks: operators and operands. An operand is anything that is not an operator, e.g., a number or a variable. Here's the algorithm:
- Initialize the operands stack to empty.
- Initialize the operators stack to a left parenthesis.
- When an operand is seen, push it onto the operands stack.
- When an operator is seen:
- While the top of the operator stack is not a left parenthesis and
has higher precedence than the new operator,
- pop the top two operands, o1 and o2
- pop the top operator, top
- push
(top op2 op1)
onto the operand stack
- When the loop finishes, if the new operator was a right parenthesis, pop the top operator; it should be a left parenthesis
- Otherwise push the new operator onto the operator stack
- While the top of the operator stack is not a left parenthesis and
has higher precedence than the new operator,
- When the input is finished, do the above loop with a final right parenthesis.
- Return the top operand.
If we run this algorithm on ?x - ?y < 100 || ?y > 12
,
- We push a left parenthesis onto the operator stack.
?x
is not an operator, so we push it onto the operands.-
is an operator, there is only(
on the operator stack, so we push-
onto the operators.- We push
?y
onto the operands. <
is an operator. The top operator,-
, has higher precedence so- we pop
?y
and?x
from the operands, - we pop
-
from the operators, and - we push
(- ?x ?y)
onto the operands.
- we pop
- Because the top operator is now
(
, we push<
onto the operators stack. - We push
100
onto the operands. ||
is an operator. The top operator,<
, has higher precedence, so- we pop
100
and(- ?x ?y)
from the operands, - we pop
<
from the operators, and - we push
(< (- ?x ?y) 100)
onto the operands.
- we pop
- Because the top operator is now
(
, we push||
onto the operators stack. - We push
?y
onto the operands. >
is an operator. Because||
has lower precedence, we push>
onto the operators.- We push
12
onto the operands. - Because the input is finished, we run the above loop with a final right parenthesis:
- We pop
12
and?y
from the operands. - We pop
>
from the operators. - We push
(> ?y 12)
onto the operands. - We pop
(> ?y 12)
and(< (- ?x ?y) 100)
from the operands. - We pop
||
from the operators. - We push
(|| (< (- ?x ?y) 100) (> ?y 12))
onto the operands. - The top operator is
)
so we pop it. - No operators remain.
- We pop
- We return the top operand,
(|| (< (- ?x ?y) 100) (> ?y 12))
.
Note that the above does not handle unary operators, such as !
and unary minus, or
function call syntax. Variations of the algorithm to do those can be found online.
Be sure your atomize function can handle parentheses without spaces, e.g.,
"(a + b)" should atomize to (|(| a + b |)|)
.