Search This Blog

Thursday, July 9, 2009

Tagsistant - A reasoning semantic filesystem for Linux and BSD

http://www.tagsistant.net/index.php

What is a semantic file system? Honestly, I don't know! Many people say that is something you can use to organize your contents using ontologies based on labels or tags. Others say it is related to relationship between files or "things". Others claim it has something to do with circles in fields! ;-)

To me, a semantic filesystem is basically a tool that allows one to catalogue files and to extract subsets using logical queries.

But why a filesystem and not a database or a desktop application? That's the most obvious but often hidden consideration a programmer should do! Software should be as interface neutral as possible. Interface neutral means that you should be able to communicate with this software easily and efficiently.

A desktop tool can be comfortable for the user, can be assistive and accessible. But other softwares will not be advantaged in interfacing with that tool. Perhaps other tools should be modified to apply to new API, requiring days of coding and possibly introducing new bugs! Universal, well known and well tested interfaces should be preferred against special, new, particular interfaces.

A filesystem is probably one (or the most) universal interface you can think of (especially on UNIX, and especially talking about files!). Later, someone will write a colorful and intuitive desktop application (or a series of) which will be appealing, but right now, with a 3 days hack, everyone can tag his/her files by simply using the filemanager interface he's used to.

URIplay - DNS for media

http://uriplay.org/about/overview/

URIplay aims to restore simplicity to the experience without reducing choice. We compile and publish metadata files listing the URIs at which content can be played. Each URI is described in terms of media data format, revenue model, and restrictions applied. Developers will be able to use this information and our open API to refer precisely to the content and select appropriate URIs for their users.

Thursday, July 2, 2009

Majestic-12 Peer-to-Peer www Search Engine

http://www.majestic12.co.uk/

Majestic-12 is working towards creation of a World Wide Web search engine based on concepts of distributing workload in a similar fashion achieved by successful projects such as SETI@home and distributed.net.

YACI - LARGE-SCALE Open-Source Search Engine

http://yacy.net/

YaCy is a scalable web search engine with an integrated web crawler and content analysis and managements functions. One YaCy installation can store more than 20 million documents, but in a community of search peers YaCy can provide a search index of unlimited size.
Peer-to-Peer Index Sharing

We implemented a index sharing method, that works similar to peer-to-peer file sharing. YaCy shares web index information with other peers and is therefore resistant against censoring the same way as P2P file-sharing is resistant against deletion.

OpenSearch® Search More at Once

http://opensearch.a9.com/-/company/opensearch.jsp

OpenSearch® Search More at Once

A9.com invented the OpenSearch technology for search aggregation in 2004.

OpenSearch is a set of simple formats for the sharing of search results. Any website that has a search feature can make their results available in OpenSearch format. Other tools can then read those search results.

Wednesday, July 1, 2009

The World Is a libferris Filesystem

http://www.linuxjournal.com/article/8901

The libferris virtual filesystem always has sought to push the boundaries of what a filesystem should do in terms of what can be mounted and what metadata is available for files. During the past five years, it has expanded its capabilities from mounting more traditional things, such as tar.gz, SSH, digital cameras and IPC primitives, to being able to mount various Indexed Sequential Access Mechanism (ISAM) files, including db4, tdb, edb, eet and gdbm; various relational databases, including odbc, MySQL and PostgreSQL; various servers, such as HTTP, FTP, LDAP, Evolution and RDF graphs; as well as XML files and Sleepycat's dbXML.

Recently, support for indexing filesystem data using any combination of Lucene, ODBC, TSearch2, xapian, LDAP, PostgreSQL and Web search has been added with the ability to query these back ends for matching files. Matches naturally are presented as a virtual filesystem. Details of using the index and search capabilities of libferris appeared in the February 2005 issue of Linux Journal in my article “Filesystem Indexing with libferris”. I should mention that anything you see mounted as a filesystem in this article can be indexed and searched for as described in that past article on searching.

You can access your libferris virtual filesystem either by native libferris clients or by exporting libferris through Samba.

The two primary abstractions in libferris are the Context and the Extended Attribute (EA). A Context can be thought of as a superclass of a file or directory. In libferris, there is less of a distinction between a file and a directory with the ability for a file to behave like a directory if it is treated like one. For example, if you try to read a tar.gz file as a directory, libferris automatically mounts the archive as a filesystem and lists the contents of the archive as a virtual filesystem.

The EA interface can be thought of as a similar concept to the Linux kernel's EA interface. That is, arbitrary key-value data is attached to files and directories. This EA concept was extended early on in libferris to allow the value for an attribute to be derived from the content of a file. This means simple things like width and height of an image or video file become first-class metadata citizens along with a file's size and modification time. The limits on what metadata is available extend far beyond image metadata to include XMP, EXIF, music ID tags, Annodex media, geospatial tags, RPM metadata, SELinux integration, partially ordered emblem categories and arbitrary personal RDF stores of metadata.

Having all metadata available through a single interface allows libferris to provide filtering and sorting capabilities on any of that metadata. As such, you can sort a directory by any metadata just as easily as you would use ls -Sh to sort by file size. Sorting on multiple metadata values is also supported in libferris; you can sort your files easily by MIME type, then image width, then modification time—with all three pieces of metadata contributing to the final directory ordering. Any libferris virtual filesystem can have filtering and sorting applied to it to obtain a new libferris virtual filesystem.

You can store EA values into a personal RDF store—for example, when you write an image width to an extended attribute. When you subsequently read the image width, you get the value you just wrote to the EA. This extends naturally to other situations, such as when you change the x or y EA for a window, which should move the window.

Allowing EA to be stored in a personal RDF file lets you add metadata to any libferris object, even those for which you have only read access. For example, you can attach emblems or comments to the Linux Kongress Web site just as you would a normal file.

An interesting EA for all files is the content EA, which is equivalent to the file's byte contents. Exposing the file itself through the EA interface means that any information about a file can be obtained via the same interface.

libferris is written in C++ and provides a standard IOStream interface to both Contexts and EA. Many standard file utilities have been rewritten to take advantage of libferris features. These clients include ls, cp, mv, rm, mkdir, cat, find, touch, IO redirection and more.

[...]

FUSE Filesystem Suggestions

http://sourceforge.net/apps/mediawiki/fuse/index.php?title=Filesystem_Suggestions#MEDIAFS

MEDIAFS
File system that demuxes and exports frames for multimedia streams, for example animated .gif, .avi and .ogg files, etc. Somewhat like tar:/ and audiocd:/ under KDE. So for example mounting demo.avi on /mnt reveals /audio and /video, files describing the container and format, resolution, channels, chapters/indexes, and the title. /audio contains the tracks converted to trackN.wav, trackN.raw (unconverted stream), and files describing the format of the raw streams. /video contains trackN.YUV (stream of uncompressed frames), trackN.raw, files describing the raw format, and /frames. /video/frames contains timestamped .YUV and .PNM files, maybe even .JPG could be thrown in since the source is already FFT encoded it would be a pretty lightweight operation.

Write support would turn standard shell tools into a non-linear multimedia editing suite. ;-) Obviously write support would be easier to implement on an HUFFYUV/WAV formatted .avi file, which is pretty standard for video editing anyhow.

I admit that all of this can be achieved by using mplayer and co. to extract the media file into a directory, work with the decoded files, and then re-encode the file, but I think this functionality is needed and reusable by a number of programs, from Konqueror (thumbnailing, previews) to biff, and pretty much any media player/editor.

It could easily replace a large number of the programs in use today. It would become an API for multimedia work, hopefully help join the forces working on multimedia APIs in general, make codec inclusion transparent to the apps, etc. Also while this sounds naive compared to mboxfs and other brilliant user-space re-interpretations of metadata and namespace views, media streams are generally simple linear storage containers, much larger in size than typical email, and therefore much more relatively efficient to look at through a mount.

SerhijStasyuk says: IMHO better to implement this via Hybrid objects.

libferris: what is it?

libferris: what is it?

http://www.libferris.com/

A virtual filesystem with federated index and search. If you can mount it you can search it!

In non technical terms libferris makes the file system and other hierarchical storage systems easier to use. For the geeks out there, libferris is a virtual file system (VFS) that runs in the user address space. The FAQ contains entries related to installation, configuration and the usage of libferris.

As of July 2005 libferris can mount many interesting things ranging from a filesystem from your local Linux kernel through to LDAP, Evolution, PostgreSQL, dbXML, and RDF. To get an impression of the current capabilities of libferris mounting see the plugins/context directory of the lastest release. New things to mount are always being added :)

Other than mounting things as a filesystem, the other core concept of libferris is extraction of interesting metadata from your libferris filesystems. This means that simple things like width and height of an image file become first class metadata citizens along with a file's size and modification time. The limits on what metadata is available extend far beyond image metadata to include XMP, EXIF, music ID tags, geospatial tags, rpm metadata, SELinux integration, partially ordered emblem categories and arbitrary personal RDF stores of metadata. Though some consider the last point of purely academic interest the end result is that you can add metadata to *all* libferris objects even those you only have read access too, for example, you can attach emblems to this website just as you would a normal file. The metadata interface gives all metadata from file size to digital signature status information equal standing. As such you can sort a directory by any metadata just as easily as you would ls -Sh to sort by file size. Sorting on multiple metadata values is also supported in libferris, you can easily sort your files by mimetype, then image width, then modification time with all three pieces of metadata contributing to the final directory ordering.

Late in 2004 extensive support for both fulltext and metadata indexing was added to libferris. This means you can supply queries against the contents or metadata of any libferris accessable object and have the results returned as a virtual filesystem. With the above mentioned metadata available for searching, finding your files can be done in many different ways instead of being forced to generate fixed directory trees using part of a file collections semantics as directory names. The metadata and virtual filesystem play together here allowing you to geospatially tag both your digital pictures, trip plans, and relevent websites and recall these objects in a single virtual directory no matter what their path or URL may be.

There is also a Samba VFS module which allows you to expose a libferris filesystem as a Samba share. Kfsmd uses the inotify kernel interface to allow libferris to watch changes made to your kernel filesystem by non libferris applications and update its indexes appropriately. Ferriscreate provides a command line and GTK+2 application for creating "new files" with libferris. With this you can create a new db4 database, dbXML database or fulltext index just as easily as you can make a regular file.

The ego filemanager is a GTK+2 interface built on top of libferris. It provides GTK treeview, gevas/edje and gecko based interfaces and makes extensive use of libferris' clients to provide its functionality.

If you have a project you wish to use libferris with and want extensions made don't hesitate to contact one of the developers to arrange consulting.

For the geeks out there, libferris is a virtual file system (VFS) that runs in the user address space. At the moment libferris is a shared object that each application can dynamically link to in order to see the file system through a nicer abstraction.

New additions to the XML module allow for data to be converted from one format to another by the VFS for you. To copy data to an XML file:

fcreate --create-type=xml --rdn=2.xml root-element=fred /tmp
gfcp -av Makefile.am /tmp/2.xml/fred



To copy data to a db4 file

fcreate --create-type=db4 --rdn=2.db /tmp
gfcp -av Makefile.am /tmp/2.db

Ferris presents a C++ interface that makes heavy use of the STL and IOStreams. Currently ferris has two main internal abstractions: Context and Attribute. A context is much like a traditional file or directory in a file system, the major differences being that a context can have both byte content (like a file) and subcontexts (like a directory). An attribute is a chunk of metadata about a context. Contexts can have many attributes. Some attributes may be large, for example a base 64 encoded version of the context's content (133% context size). On the other hand an attribute can be small, for example the file size is exposed as an attribute.

Access to all contexts and attributes is performed by first requesting either an IStream or IOStream for that context or attribute. In this way the same context/attribute can be open many times at the same time, just like normal kernel based IO.

Ferris uses Loki from "Modern C++ Design" by Alexandrescu. Most objects use automatic garbage collection based on the SmartPtr<> template class from Loki. Where possible objects in ferris use a FerrisRefCounted policy to provide COM like intrusive reference counting. This style is used for Context, Attribute and special wrappers of IOStreams that are provided. IOStreams are wrapped to provide a more flexible API than could be offered using references to IOStreams. There are also new stream classes provided, for example NullStream and LimitingStream. Templates are provided to make SmartPtr<>s to standard IOStreams act just like the underlying stream would, for example, one can have SmartPtr<> ss; ss >> stringObj; and does not have to dereference the SmartPtr<> to use standard IOStreams extractors or inserters.

Ferris uses GModule from glib2 to dynamically load both context and attribute classes at run-time. This way resources are conserved until they are needed. The native file system context is statically linked to ferris at present. When loading either context's or attribute classes ferris uses a double dispatch factory method. Put simply this means that for each plugin there are two libraries, one that tells ferris if the main one really needs to be loaded or not. Using this scheme ferris can load all the meta factory classes at any time and use these very small meta factories to check if the main factory can create objects that are going to be useful. This scheme is of great use for attribute classes. Attribute classes take a context and can "generate" attributes from the context. An example of this sort of class would be a MD5 or Base64 attribute. Both can be generated from the base context. More interesting attributes are PCM audio and RGBA-32bpp image data. By using the double dispatch factory ferris can handle a great deal of attribute generators and load them on demand.

Ferris currently can decode mp3, read id3 tags, decode many image formats and break some animation formats into frames. This makes ferris a solid starting point for multimedia applications.

Ferris will automatically mount sub file systems for you. Examples of a sub file system include a Berkeley database or XML file. For example it is possible to read a context such as /tmp/myxml.xml/mynode. Using this automatic mounting the differences between storage formats effectively disappear. To a ferris enabled application loading data from a native disk file, a Berkeley database, and XML file, or mbox file appear to be the same. This allows the user of the application to choose the correct storage for the data at hand.

It is planned to move to a microkernel architecture in Version 2.1 of ferris. I choose 2.1 so that ferris does not fall into version 2 syndrome :)

Richer File System Metadata Using Links and Attributes

Richer File System Metadata Using Links and Attributes

Alexander Ames, Nikhil Bobb, Scott A. Brandt, Adam Hiatt. Richer File System Metadata Using Links and Attributes. 13th NASA Goddard Conference on Mass Storage Systems and Technologies 2005 (MSST’05).

I tradizionali file system forniscono una struttura debole ed inadeguata per la rappresentazione delle interrelazioni significative dei file e degli altri metadati che ne forniscono il contesto.

Progetti esistenti, che depositano file aggiuntivi orientati ai metadati o in un, su un disco, o entrambe le cose sono limitati dalle tecnologie su cui essi dipendono. Inoltre, essi non prevedono per l’utente dei rapporti definiti i file. Per affrontare questi problemi, è stato creato il Linking File System (LiFS), un file system in cui i file possono avere sia arbitrari attributi utente sia attributi di specifiche applicazioni, e link attribuiti tra i file. Al fine di assicurare le prestazioni quando si accede al link e gli attributi, il sistema è progettato per memorizzare i metadati in una memoria non volatile. Questo documento esamina diversi casi d’uso che possono trarre vantaggio da questo approccio e descrive lo spazio utente prototipato che è stato sviluppato per testare i concetti suddetti.

Risolvere il problema delle interrelazioni fra i file è diventato sempre più urgente in quanto gli utenti si trovano di fronte a una crescente quantità di dati personali quali e-mail, comunicazioni via chat, documenti, file multimediali ecc.

LiFS estende i metadati del file system per includere non solo arbitrarie coppie di valori chiave specificabili dall’utente ma anche relazioni tra i file in forma di link con attributi. Attualmente è più facile trovare un documento cercandolo sul Web fra milioni di documenti che non trovarne uno archiviato localmente nella nostra macchina. Inoltre i documenti sul Web sono inseriti in una struttura ricca di hyperlink, a differenza dei nostri file che non sono tipicamente così.

Il LiFS permette di assegnare attributi e stabilire dei legami tra i file in modo standardizzato tramite una potente infrastruttura in grado di supportare una varietà di diverse applicazioni utente ed operazioni di sistema.

File di attributi supportano direttamente ricerche potenziate del file system. Link di attributi supportano un certo numero di recenti sforzi per estendere gerarchicamente la struttura delle directory per organizzare i file in modo personalizzato e più user-friendly.

L’utilizzo di link pesati tra i file può essere adottato per registrare i modelli di accesso che sono utili per ristabilire il recupero, l’accumulo, l’indicizzazione, e il ranking (graduatoria) dei risultati di ricerca. Infatti, questi link, forniscono un modello astratto per le interrelazioni tra i file precedentemente non disponibili a livello di file system. Dunque la caratteristica chiave del LiFS sono i link tra i file e gli attributi presenti su entrambi.

•Link: ogni link ha un file sorgente, un target file (bersaglio), ed un set di attributi non vuoto costituito da coppie chiave-valore. I link LiFS differiscono dai link POSIX in quanto i collegamenti LiFS rappresentano un rapporto tra file anziché di un semplice riferimento. Gli attributi dei link esprimono la natura di tale rapporto. Qualsiasi tipo di file può potenzialmente contenere un link a qualsiasi altro file. Di conseguenza, ogni file è anche una directory e la distinzione tra i file e le directory viene così eliminata. Oltre al semplice contenimento delle directory, i link possono esprimere una serie di altre relazioni utili come “incluso-in”, “riferito-a”, “dipendente-da”, “creato-da”, “aperto-da”, e molte altre. I link consentono, inoltre, viste personalizzabili dinamicamente per il file system basate sul tipo di link seguito.
•Attributi: entrambi i file e i link possono trasportare un numero di attributi limitato solo dalla memoria disponibile. La dimensione di ciascuna chiave e il suo corrispondente valore è libero, anche se chiavi troppo lunghe con rispettivi valore stringa possono influire sulla capacità di fornire un elevato livello di prestazioni. Sia la chiave che il valore membri di un attributo possono contenere dati arbitrari, compresi i dati binari. Questo permette alle applicazioni di avere metadati più ricchi, come miniature, anteprima video clip, e la cache dei file di spool di stampa senza il sovraccarico di una codifica speciale. Il principale vantaggio degli attributi è che essi consentono agli utenti, le applicazioni e il sistema stesso di annotare i file e i link. Questo permette una ricerca veloce ed efficace dei file, la classificazione, il partizionamento, e la manipolazione, e fornisce l’infrastruttura per altre caratteristiche che potrebbero non essere state considerate dal progettista del file system. Un caso particolare di attributi di file eseguibili sono i file trigger. Un file trigger su di un file specifica una coppia pattern/action. Un pattern specifica l’operazione di file system (un leggere, scrivere, ecc.) da eseguire sul file a cui è associato l’attributo. I file trigger sono un potente meccanismo che semplifica l’attuazione di una vasta gamma di servizi del file system come il versioning, il mirroring, ed altri.
Tutti i file in un file system hanno una storia. L’inizio della storia parte dalla loro provenienza. Non appena i file vengono manipolati e spostati si potrebbe automaticamente accumulare le informazioni riguardanti la loro storia. Provenienza e la storia forniscono dei metadati che possono essere molto utili per l’individuazione e l’organizzazione delle informazioni. Un file può essere il risultato di una computazione del calcolo, essere un estratto di una comunicazione personale, può essere stato creato localmente o essere scaricato dal Web (con informazioni di contesto Web si può determinare se una nuova versione di un file è disponibile o se le corrispondenti risorse Web sono scomparse). Conoscere i dettagli della storia di un file può essere utile in molti settori, tra cui il caching (file può essere spesso ripreso più volte dal Web), la volatilità e la permanenza, ed il monitoraggio delle intrusioni. La posta elettronica o gli applicativi di messaggistica istantanea sono in grado di spedire allegati quali voci del calendario, liste di cose da fare (to-do list), ed altri file agli utenti. La memorizzazione della provenienza di questi attributi o file link creati automaticamente crea un record utile a comprendere il contesto in cui questi file sono stati acquisiti. Le e-mail e le chat stesse possono essere esplicitamente connesse ai file che rappresentano i partner di corrispondenza.

Skytopia : Towards a single folder metadata (database) filesystem

Skytopia : Towards a single folder metadata (database) filesystem:
This article was written after much irritation with the present filesystem used in almost all computer operating systems today. Excluding Windows stuff, I currently have around 80,000 files on my hard drive - over half of which are sound files of various kinds and formats. After organizing such a vast collection with so many different file formats, it's clear that the present way of finding and storing files is much less efficient than a meta-tagged single folder filesystem (or database file system) for various reasons I have discussed below.

Microsoft and Apple have shown signs that the upcoming Windows Longhorn OS and Mac OS X will add some emphasis to file metadata, but I have my doubts that they will fully realize its potential, and utilize a single 'folder' for practically all files."