CS325 Triples Exercises

These exercises develop some useful utility functions for working with the semantic triples representation of knowledge. Real world systems can contain millions, even trillions, of triples, and often work with RDF for maximum portability. These exercises use the much simpler triples code developed in class, as a basis for exploring what semantic queries can be like, and ways to optimize AI and Lisp code in general.

Files and packages:

Package: cs325-user

Files:

Base code: graph-search.lisp
Test cases: graph-search-tests.lisp
Test data: movie-triples.txt

Exercises

:NOT queries

Function names: query-search

Test cases: (run-tests graph-search-not)

Semantic web bundle

Change query-search so that (:not query1 query2 ...) patterns work. Such queries are needed to rule out results in a graph search, such as this test case for graph-search-not:

(graph-search '((?movie movie ?year) (:not (?movie actor tom_cruise))))

This should return all movies that do NOT have Tom Cruise as an actor, by first getting all movies, then eliminating those that have Tom Cruise as an actor.

The :not can be a graph of multiple queries, that might include new variables. The :not applies if something matching all the queries can be found. For example,

(graph-search '((?movie actress scarlett_johansson) 
                    (:not (?movie director ?director)
                          (?movie2 actor ?director))))

This asks for movies with Scarlett Johansson, but not those where the director was an actor in some movie.

This is equivalent to SparQL's FILTER NOT EXISTS form. It is not the same as SparQL's MINUS form.

:FILTER patterns

Function names: query-search

Tests: (run-tests filter-blists graph-search-filter)

Semantic web bundle

Change query-search so that (:filter filter1 filter2 ...) patterns work. A filter is a Lisp-like expression of the form

(function arg1 arg2 ...)

Where function is the name of a Lisp function, built-in or user-defined, and each arg is either a pattern variable, like ?name or a constant value, like 1999 or tom_waits.

A (:filter filter1 filter2 ...) pattern is applied to every binding list gathered so far by graph-search. Each binding list is kept only if all the filter forms are true. A filter is true if its function, when applied to the values of the arguments, returns true. The value of each argument is:

the argument itself, if it is not a variable, or is a variable not in the binding list
the value the argument variable has in the binding list

Here are two examples from the test cases for graph-search.

(graph-search '((?movie actor tom_cruise) (?movie movie ?year)
                    (:filter (> ?year 1990))))

This collects movies with Tom Cruise and their years, then filters out movies not greater than 1990.

(graph-search '((?movie1 movie 1999) (?movie2 movie 1999) 
                    (:filter (not-equal ?movie1 ?movie2))
                    (?movie1 actress ?actress) 
                    (?movie2 actress ?actress)))

This finds any actress who was in two different movies in 1999. (Note that two answers are returned that are actually the same. Could that be eliminated with a different filter?)

Because filters are somewhat complicated, there is a set of test cases just for filtering. Your code must pass both test sets:

Do not use EVAL to implement filters. Page 161 of Graham explains why eval is to be avoided in general.

External Interaction

The exercises below are relevant for connecting an external client, like a browser, to a semantic web service in Lisp.

atomize

Function names: atomize

Tests: (run-tests atomize)

Semantic web bundle

Externally, semantic web triples contain numbers, strings, or qualified names. Qualified names look like Lisp symbols, e.g., foo:bar.

For our movie application, there are just symbols and numbers. Strings like "American Beauty" need to be converted to AMERICAN_BEAUTY

Define a function (atomize string) that takes a string containing zero or more "items", where an item can be anything you see in the SELECT queries below. The function should return a list of atoms, either numbers or symbols. Items that look like symbols in the input should be unchanged. String values like "Tom Cruise" should become symbols like TOM_CRUISE.

Punctuation should map to symbols with Lisp character names. { and } are both Lisp-readable symbol names. Parentheses and periods are not, but you can create a symbol whose name contains those characters using vertical bars, i.e., |.|, |(|, and |)|. If you want to make a symbol for the Sparql OR operator, you would write ||||. (See Graham, page 134.)

The input string might include new line or tab characters. They should be treated like spaces.

Chapters 7 and 8 on reading and symbols cover most of what you need. Check the relevant glossary sections for more, including the glossary sections on strings and sequences. Strings are sequences.

Be sure to read the comments in the test cases, to see what's being tested by different test cases.

SELECT parser

Function names: select-queries

Test cases: (run-tests select-queries)

Semantic web bundle

An external client for a semantic web server will send SparQL SELECT forms, such as the ones sent to DBpedia . SparQL statements are sent as simple strings, to be parsed by the server into an executable query.

Define select-queries to take a string with simplified SparQL and returns a list of queries that could be passed to graph-search. The simplifications to SparQL are

No PREFIX expressions
FILTERs have simple infix operators with string or numeric data, but no types, e.g., marking something as a date

Here are the queries above, expressed in simplified SparQL:

SELECT ?movie, ?year 
    WHERE {
      ?movie movie ?year
      FILTER NOT EXISTS {
        ?movie actor "Tom Cruise" 
      }
    }

This uses the FILTER NOT EXISTS SparQL feature.

SELECT ?movie, ?year 
    WHERE {
      ?movie movie ?year .
      ?movie actor "Tom Cruise"
      FILTER ()
        ?year > 1990
      )
    }

As far as I can tell from the grammar, a filter can have only one form, so if a query had several, e.g., a movie without Tom Cruise, after 1990, it might be like this:

SELECT ?movie, ?year 
    WHERE {
      ?movie movie ?year .
      ?movie actor "Tom Cruise"
      FILTER NOT EXIST { 
        ?movie actor "Tom Cruise" 
      }
      FILTER (
        ?year > 1990
      )
    }

None of the test cases explore multiple filters, and serious syntactic parsing is not the point of this exercise.

Note that W3 triples syntax used an isolated period to separate multiple triples.

The input string might include new line characters.

A number of operators are recognized in filter patterns.

SELECT ?movie1, ?movie2, ?actress 
    WHERE {
      ?movie1 movie 1999 .
      ?movie2 movie 1999 .
      ?movie1 actress ?actress .
      ?movie2 actress ?actress
      FILTER (
        ?movie1 != ?movie2
      )
    }

For this exercise, you can just handle simple comparisions, i.e., less than, greater than, and not equal. If you want to handle more complex expressions, do the infix parser exercise. That's a good challenge, but more than the test cases require. They only use simple comparisons.

Graph search server

Function names: N/A

Test cases: N/A

This applications uses code from the above exercises.

Function names: graph-search-handler, graph-search-service

There are no unit tests, but the code should return the correct JSON encoding (see below) for all of the graph search SELECT queries given above. Your code should allow strings with new line characters.

Implement a JSON web service to return graph-search results in an association list compatible with the W3C JSON format for SparQL results. This format is pretty simple:

{
      "head": { "vars": [ "book" , "title" ]
      } ,
      "results": { 
        "bindings": [
          {
            "book": { "type": "uri" , "value": "http://example.org/book/book6" } ,
            "title": { "type": "literal" , "value": "Harry Potter and the Half-Blood Prince" }
          } ,
          {
            "book": { "type": "uri" , "value": "http://example.org/book/book7" } ,
            "title": { "type": "literal" , "value": "Harry Potter and the Deathly Hallows" }
          }
      }
    }

In the JSON object returned:

head.vars has an array of the names of the requested variables, without no question marks.
results.bindings has an array of binding lists. Each binding is an object, where each variable is a key, and the value is an object with type and value data. For our purposes, the type will always be "literal".

To keep the front-end simple (see last task), if there is bad input, return a JSON object with

Only the variables listed in the SELECT clause should be including in the binding lists encoded in JSON, to reduce data traffic. Parsing out the list of variables is much easier than creating the graph search queries.

head.vars with the array ["error"]
results.bindings with an array of one binding list, that binds "error" to an error message.

Create this as a SimpleServer app.

Packaging Notes

Up to this point, we've put everything into the cs325-user package for convenience. That's fine for prototyping, but once you start making an application, it's time to be more systematic and modular.

At least three packages are needed here:

graph-search for the graph search code
graph-search-server for the SimpleServer app code
graph-search-tests for the graph search unit tests

All of the code defined above should go into the graph-search package. This package should only export those symbols needed to make graph-search-server code work.

The graph-search-server package should use graph-search, simple-server, and your JSON library, e.g., cl-json.

The unit tests should go in the graph-search-tests package. Some of the unit tests, for exercise purposes, test functions that don't need to be exported. To get access to those in the test package, do this:

(in-package :graph-search-tests)
    (import '(':graph-search::atomize graph-search::filter-blists ...))

Graph Search Server Notes

To simplify debugging, define two top-level functions in graph-search-server:

graph-search-handler should handle incoming HTTP requests. It should extract the required incoming data, call graph-search-service, and return whatever is returned as JSON.
graph-search-service should parse the string parameters, call graph-search, and restructure the results into a list easy to convert to JSON.

With this approach, you can test and trace most of your code without making actual HTTP requests.

To create lists suitable for JSON encoding, see this tip.

Graph search web page

Implement a simple form-based user interface for submitting SparQL queries to the graph search service, and displaying the results. A user should be able to enter a SparQL SELECT form, like the examples above, into a multiline textbox, click a button, and see the results, if any, in an HTML table below. Use the SnorQL browser as your model, but don't worry about PREFIX forms.

Keep this short and simple, and all in one HTML file, including any CSS and JavaScript you need. The emphasis should be on the JavaScript for fetching and displaying results.

Infix Parser

Function names: infix->prefix

Test cases: (run-tests infix->prefix)

Semantic web bundle

Real filter expressions can have arithmetic, comparisons, conjunctions, disjunctions, and parentheses, just like logical expressions in JavaScript, e.g., "?x - ?y < 100 || ?y > 12". The best way to handle those is to implement an operator-precedence parser. Every infix operator is given a precedence level. This is so that an expression such as "a + b * c" is interpreted as (+ a (* b c)) not (* (+ a b) c)). Here's a common set of predecences, from lowest to highest:

Operators	Notes
)	right parenthesis
\|\|	or
&&	and
= != < <= > >=	relational operators
+ -	addition, subtraction
* / %	multiplication, division
(	left parenthesis

The parsing algorithm reads a list of tokens, and manages two stacks: operators and operands. An operand is anything that is not an operator, e.g., a number or a variable. Here's the algorithm:

Initialize the operands stack to empty.
Initialize the operators stack to a left parenthesis.
When an operand is seen, push it onto the operands stack.
When an operator is seen:
- While the top of the operator stack is not a left parenthesis and has higher precedence than the new operator,
  - pop the top two operands, o1 and o2
  - pop the top operator, top
  - push (top op2 op1) onto the operand stack
- When the loop finishes, if the new operator was a right parenthesis, pop the top operator; it should be a left parenthesis
- Otherwise push the new operator onto the operator stack
When the input is finished, do the above loop with a final right parenthesis.
Return the top operand.

If we run this algorithm on ?x - ?y < 100 || ?y > 12,

We push a left parenthesis onto the operator stack.
?x is not an operator, so we push it onto the operands.
- is an operator, there is only ( on the operator stack, so we push - onto the operators.
We push ?y onto the operands.
< is an operator. The top operator, -, has higher precedence so
- we pop ?y and ?x from the operands,
- we pop - from the operators, and
- we push (- ?x ?y) onto the operands.
Because the top operator is now (, we push < onto the operators stack.
We push 100 onto the operands.
|| is an operator. The top operator, <, has higher precedence, so
- we pop 100 and (- ?x ?y) from the operands,
- we pop < from the operators, and
- we push (< (- ?x ?y) 100) onto the operands.
Because the top operator is now (, we push || onto the operators stack.
We push ?y onto the operands.
> is an operator. Because || has lower precedence, we push > onto the operators.
We push 12 onto the operands.
Because the input is finished, we run the above loop with a final right parenthesis:
- We pop 12 and ?y from the operands.
- We pop > from the operators.
- We push (> ?y 12) onto the operands.
- We pop (> ?y 12) and (< (- ?x ?y) 100) from the operands.
- We pop || from the operators.
- We push (|| (< (- ?x ?y) 100) (> ?y 12)) onto the operands.
- The top operator is ) so we pop it.
- No operators remain.
We return the top operand, (|| (< (- ?x ?y) 100) (> ?y 12)).

Note that the above does not handle unary operators, such as ! and unary minus, or function call syntax. Variations of the algorithm to do those can be found online.

Be sure your atomize function can handle parentheses without spaces, e.g., "(a + b)" should atomize to (|(| a + b |)|).