Cyberborean Chronicles

SCAN Mail plugin

I’d like to announce that SCAN now is able to work with your email. The mail plugin released yesterday introduces new type of the locations for SCAN repository – mailbox locations.

The plugin purpose is to crawl the specified local email folders and aggregate the email messages as the documents in SCAN repository. It also introduces new document type “message/rfc822” for emails and uses the email message headers for setting document metadata. Attached files are extracted from messages and processed as the separate documents with appropriate parsers depending on its content type.

As there are no common convention on how to identify a separate message for opening it with an external application, mail plugin implements its own message viewer UI to open the messages by default. The messages are identified with “mid:” URI scheme which is a standard but unfortunately, seems to not supported by known MUA’s so far. However, it is implemented in the hope of that the future MUA’s will support this standard scheme to open the specific messages with their command line (something like “thunderbird mid:message-id“).

Mail plugin uses GNU JavaMail implementation because it includes JavaMail providers for local mail stores – mboxes and maildirs. In addition, Outlook Express mail is also supported via the separate JavaMail provider made by jmbox project. Thus, the mail plugin supports a wide range of popular MUA’s, including mbox-based (Mozilla family), maildir-based, mixed mbox/maildir (KMail and Evolution) and Outlook Express.

The plugin has been tested on Linux with Mozilla Thunderbird 1.5 and KMail 1.9.6 (in maildir mode) and Windows XP with Mozilla Thunderbird 1.5 and MS Outlook Express 5.0.


Yep, they are. The issues with mail plugin go both from GNU JavaMail limitations and from the side of the concrete MUA’s which seem to be easy with interpreting mbox/maildir principles for their convenience. For instance, Thunderbird uses the non-standard “.sbd” directories to keep the mail subfolders, which the mbox provider is unaware of. So, recursive crawling the Thunderbird mail folders does not work. The only solution for those problems is to develop the MUA-specific JavaMail providers which would know how to deal with a concrete mbox or maildir implementation with all its peculiarities.

Outlook Express provider, as an example of such MUA-specific implementation works rather good, however we noticed a bug with localized MSOE versions where the translated folder names (“Inbox”, “Sent”, “Trash” etc) with non-latin letters had not been processed.

The reports on testing the mail plugin with other MUA’s/platforms are more than welcome.

Leave a Reply