Search This Blog

Wednesday, July 1, 2009

The World Is a libferris Filesystem

http://www.linuxjournal.com/article/8901

The libferris virtual filesystem always has sought to push the boundaries of what a filesystem should do in terms of what can be mounted and what metadata is available for files. During the past five years, it has expanded its capabilities from mounting more traditional things, such as tar.gz, SSH, digital cameras and IPC primitives, to being able to mount various Indexed Sequential Access Mechanism (ISAM) files, including db4, tdb, edb, eet and gdbm; various relational databases, including odbc, MySQL and PostgreSQL; various servers, such as HTTP, FTP, LDAP, Evolution and RDF graphs; as well as XML files and Sleepycat's dbXML.

Recently, support for indexing filesystem data using any combination of Lucene, ODBC, TSearch2, xapian, LDAP, PostgreSQL and Web search has been added with the ability to query these back ends for matching files. Matches naturally are presented as a virtual filesystem. Details of using the index and search capabilities of libferris appeared in the February 2005 issue of Linux Journal in my article “Filesystem Indexing with libferris”. I should mention that anything you see mounted as a filesystem in this article can be indexed and searched for as described in that past article on searching.

You can access your libferris virtual filesystem either by native libferris clients or by exporting libferris through Samba.

The two primary abstractions in libferris are the Context and the Extended Attribute (EA). A Context can be thought of as a superclass of a file or directory. In libferris, there is less of a distinction between a file and a directory with the ability for a file to behave like a directory if it is treated like one. For example, if you try to read a tar.gz file as a directory, libferris automatically mounts the archive as a filesystem and lists the contents of the archive as a virtual filesystem.

The EA interface can be thought of as a similar concept to the Linux kernel's EA interface. That is, arbitrary key-value data is attached to files and directories. This EA concept was extended early on in libferris to allow the value for an attribute to be derived from the content of a file. This means simple things like width and height of an image or video file become first-class metadata citizens along with a file's size and modification time. The limits on what metadata is available extend far beyond image metadata to include XMP, EXIF, music ID tags, Annodex media, geospatial tags, RPM metadata, SELinux integration, partially ordered emblem categories and arbitrary personal RDF stores of metadata.

Having all metadata available through a single interface allows libferris to provide filtering and sorting capabilities on any of that metadata. As such, you can sort a directory by any metadata just as easily as you would use ls -Sh to sort by file size. Sorting on multiple metadata values is also supported in libferris; you can sort your files easily by MIME type, then image width, then modification time—with all three pieces of metadata contributing to the final directory ordering. Any libferris virtual filesystem can have filtering and sorting applied to it to obtain a new libferris virtual filesystem.

You can store EA values into a personal RDF store—for example, when you write an image width to an extended attribute. When you subsequently read the image width, you get the value you just wrote to the EA. This extends naturally to other situations, such as when you change the x or y EA for a window, which should move the window.

Allowing EA to be stored in a personal RDF file lets you add metadata to any libferris object, even those for which you have only read access. For example, you can attach emblems or comments to the Linux Kongress Web site just as you would a normal file.

An interesting EA for all files is the content EA, which is equivalent to the file's byte contents. Exposing the file itself through the EA interface means that any information about a file can be obtained via the same interface.

libferris is written in C++ and provides a standard IOStream interface to both Contexts and EA. Many standard file utilities have been rewritten to take advantage of libferris features. These clients include ls, cp, mv, rm, mkdir, cat, find, touch, IO redirection and more.

[...]