As far as I can tell, there's no online documentation for the CLOCC XML Parser, available as part of the the CLOCC package of Lisp utilities. The following is what I've gleaned from reading the code and documentation strings, and should be enough to get you started.
Install the CLOCC XML parser
The Common Lisp Open Code Collection contains a number of useful utilities for Common Lisp. This page is only concerned with the XML parser, but feel free to explore the rest of the collection.
To install this code, you need an archive utility that knows how to handle tar-gzipped files. These are like Zip files on Windows but are more common on Unix systems. There are many free and inexpensive utilities for Windows, such as WinZip, ZipGenius, TugZip, Stuffit, etc., that know how to handle tarred gzip files. (Warning: FilZip did not know how to open this archive.)
- The CLOCC project home page the CLOCC home page has a link to the current snapshot. That link currently doesn't work (directory access is not available) but this direct link should get you the 2007 snapshot.
- The archive contains a directory called clocc. Extract this whole directory to your Lisp code directory.
- Start your Lisp.
- Fix clocc/clocc.lisp to be location-independent:
- Open clocc/clocc.lisp in your Lisp editor.
- Search for
(defvar *clocc-root*
. - Replace the entire expression with this one:
(defvar *clocc-root* (namestring (make-pathname :host (pathname-host *load-pathname*) :device (pathname-device *load-pathname*) :directory (pathname-directory *load-pathname*))) "*The root CLOCC directory.")
This constructs a pathname with the directory containing yourclocc.lisp
file. Doing it this way is nice because it won't break if you move the CLOCC directory to some other location.
- Lispworks only: Fix 2 reader bugs:
- Open
clocc/src/cllib/fileio.lisp
in your editor. - Search for
#1#
(hash one hash). - Preceding those characters on the same line, find and delete
#.
(hash period). - Repeat the above steps with
clocc/src/cllib/url.lisp
.
- Open
- Lispworks only: Fix outdated function calls in
proc.lisp
:- Open
clocc/src/port/proc.lisp
in your editor. - Replace
mp:claim-lock
withmp:process-lock
. - Replace
mp:release-lock
withmp:process-unlock
.
- Open
- Load clocc.lisp.
- Load (don't compile yet!) clocc/src/cllib/xml.lisp. If you get an alert dialog about defining PRINT-OBJECT, just say OK.
- After the file has loaded, compile and load it.
How to use the CLOCC XML Parser
Assuming you've done the steps previously, all you'll need to do is load clocc.lisp and then the compiled version of xml.lisp.
The parser is defined in the cllib:
package.
How to parse XML
Find a short XML file on your machine. XML files are used for many purposes. Java and .Net uses them to hold configuration information.
Alternatively, find a short XML file on the web and copy it to your machine. There are many there too. Blogs, for example, use XML files to list new items, using RSS. The Wikipedia entry on RSS gives an example of such an XML.
Once you have a file with XML, you can parse it into Lisp with:
(cllib:xml-read-from-file pathname) Example: (cllib:xml-read-from-file "c:/cs325/code/test-bugs.xml")
The function returns a list of the top-level objects read from the XML file. See below for how to inspect these objects.
To parse XML directly from an input stream:
(cllib:with-xml-input (variable stream) <code that calls (read variable) to extract XML forms> )
This puts an "XML wrapper" around the input stream, similar
to the way Java uses wrappers around input streams. Each call to read
will try to read one XML object, including nested objects,
much the way read
on a regular Lisp input stream will read a list.
When testing XML forms, it's often handy to parse XML stored in a string. Here's a function that will read one XML object from a string:
(defun xml-read-from-string (string) (cllib:with-xml-input (in (make-string-input-stream string)) (read in)))
If you don't want to type cllib:
in front of
the XML parser function names, do the following in the
package cs325-user
:
(use-package :cllib)
If you do a (use-package :cllib)
, you will get warnings
about some name conflicts. Just select the
default response Unintern the conflicting symbol.
Or, just learn to type cllib: a lot.
How to process XML objects
Reading from a CLOCC XML stream returns objects of type
xml-obj
. CLOCC defines several functions for getting
data out of the object. To illustrate these functions, assume
we've done the following:
(setq xmlo (xml-read-from-string "<book id=\"book-1\"><title>Alas!</title><author>Anon</author></book>"))
The following functions will get data out of
xmlo
:
(cllib:xmlo-nm xmlo)
: returns the name of the top element, as a string, i.e.,"book"
.(cllib:xmlo-args xmlo)
: returns the attribute pairs of the top element, in a list, i.e.,(("id" "book-1"))
.(cllib:xmlo-data xmlo)
: returns the children of the top element, in a list, i.e.,(#<XML-OBJ title ...> #<XML-OBJ author ...>)
.
Using recursion, you can explore the entire XML tree with these functions.