CS 325: CLOCC XML Parser

As far as I can tell, there's no online documentation for the CLOCC XML Parser, available as part of the the CLOCC package of Lisp utilities. The following is what I've gleaned from reading the code and documentation strings, and should be enough to get you started.

Install the CLOCC XML parser

The Common Lisp Open Code Collection contains a number of useful utilities for Common Lisp. This page is only concerned with the XML parser, but feel free to explore the rest of the collection.

To install this code, you need an archive utility that knows how to handle tar-gzipped files. These are like Zip files on Windows but are more common on Unix systems. There are many free and inexpensive utilities for Windows, such as WinZip, ZipGenius, TugZip, Stuffit, etc., that know how to handle tarred gzip files. (Warning: FilZip did not know how to open this archive.)

The CLOCC project home page the CLOCC home page has a link to the current snapshot. That link currently doesn't work (directory access is not available) but this direct link should get you the 2007 snapshot.
The archive contains a directory called clocc. Extract this whole directory to your Lisp code directory.
Start your Lisp.
Fix clocc/clocc.lisp to be location-independent:
- Open clocc/clocc.lisp in your Lisp editor.
- Search for (defvar *clocc-root*.
- Replace the entire expression with this one:
```
    (defvar *clocc-root*
      (namestring
       (make-pathname :host (pathname-host *load-pathname*)
                      :device (pathname-device *load-pathname*)
                      :directory (pathname-directory *load-pathname*)))
      "*The root CLOCC directory.")
    
```
  This constructs a pathname with the directory containing your clocc.lisp file. Doing it this way is nice because it won't break if you move the CLOCC directory to some other location.
Lispworks only: Fix 2 reader bugs:
- Open clocc/src/cllib/fileio.lisp in your editor.
- Search for #1# (hash one hash).
- Preceding those characters on the same line, find and delete #. (hash period).
- Repeat the above steps with clocc/src/cllib/url.lisp.
Lispworks only: Fix outdated function calls in proc.lisp:
- Open clocc/src/port/proc.lisp in your editor.
- Replace mp:claim-lock with mp:process-lock.
- Replace mp:release-lock with mp:process-unlock.
Load clocc.lisp.
Load (don't compile yet!) clocc/src/cllib/xml.lisp. If you get an alert dialog about defining PRINT-OBJECT, just say OK.
After the file has loaded, compile and load it.

How to use the CLOCC XML Parser

Assuming you've done the steps previously, all you'll need to do is load clocc.lisp and then the compiled version of xml.lisp.

The parser is defined in the cllib: package.

How to parse XML

Find a short XML file on your machine. XML files are used for many purposes. Java and .Net uses them to hold configuration information.

Alternatively, find a short XML file on the web and copy it to your machine. There are many there too. Blogs, for example, use XML files to list new items, using RSS. The Wikipedia entry on RSS gives an example of such an XML.

Once you have a file with XML, you can parse it into Lisp with:

    (cllib:xml-read-from-file pathname)
    
    Example: (cllib:xml-read-from-file "c:/cs325/code/test-bugs.xml")

The function returns a list of the top-level objects read from the XML file. See below for how to inspect these objects.

To parse XML directly from an input stream:

    (cllib:with-xml-input (variable stream)
      <code that calls (read variable) to extract XML forms>
      )

This puts an "XML wrapper" around the input stream, similar to the way Java uses wrappers around input streams. Each call to read will try to read one XML object, including nested objects, much the way read on a regular Lisp input stream will read a list.

When testing XML forms, it's often handy to parse XML stored in a string. Here's a function that will read one XML object from a string:

    (defun xml-read-from-string (string)
     (cllib:with-xml-input (in (make-string-input-stream string))
      (read in)))

If you don't want to type cllib: in front of the XML parser function names, do the following in the package cs325-user:

    (use-package :cllib)

If you do a (use-package :cllib), you will get warnings about some name conflicts. Just select the default response Unintern the conflicting symbol. Or, just learn to type cllib: a lot.

How to process XML objects

Reading from a CLOCC XML stream returns objects of type xml-obj. CLOCC defines several functions for getting data out of the object. To illustrate these functions, assume we've done the following:

    (setq xmlo
      (xml-read-from-string
        "<book id=\"book-1\"><title>Alas!</title><author>Anon</author></book>"))

The following functions will get data out of xmlo:

(cllib:xmlo-nm xmlo): returns the name of the top element, as a string, i.e., "book".
(cllib:xmlo-args xmlo): returns the attribute pairs of the top element, in a list, i.e., (("id" "book-1")).
(cllib:xmlo-data xmlo): returns the children of the top element, in a list, i.e., (#<XML-OBJ title ...> #<XML-OBJ author ...>).

Using recursion, you can explore the entire XML tree with these functions.

CS 325: CLOCC XML Parser

Install the CLOCC XML parser

How to use the CLOCC XML Parser

How to parse XML

How to process XML objects

Contents

Important Links