<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Cyberborean Chronicles &#187; metadata</title>
	<atom:link href="http://blog.cyberborean.org/tag/metadata/feed" rel="self" type="application/rss+xml" />
	<link>http://blog.cyberborean.org</link>
	<description>by Alex Alishevskikh</description>
	<lastBuildDate>Wed, 16 Dec 2009 04:33:12 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=abc</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>More automation</title>
		<link>http://blog.cyberborean.org/2008/01/23/more-automation</link>
		<comments>http://blog.cyberborean.org/2008/01/23/more-automation#comments</comments>
		<pubDate>Wed, 23 Jan 2008 00:07:32 +0000</pubDate>
		<dc:creator>Alex Alishevskikh</dc:creator>
				<category><![CDATA[Essays]]></category>
		<category><![CDATA[email]]></category>
		<category><![CDATA[Information Retrieval]]></category>
		<category><![CDATA[metadata]]></category>
		<category><![CDATA[SCAN]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[tagging]]></category>

		<guid isPermaLink="false">http://cyberborean.wordpress.com/2008/01/23/more-automation/</guid>
		<description><![CDATA[I&#8217;m thinking about new feature for SCAN — some conditional actions to be executed individually or in a batch on selected documents. It would be useful for automation of metadata setting, or for defining custom autotagging rules.

An idea is borrowed from e-mail clients, where the similar feature exists for decades as the user-defined filters for [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m thinking about new feature for <a href="http://scan.sf.net">SCAN</a> — some conditional actions to be executed individually or in a batch on selected documents. It would be useful for automation of metadata setting, or for defining custom autotagging rules.</p>
<p><span id="more-200"></span></p>
<p>An idea is borrowed from e-mail clients, where the similar feature exists for decades as the user-defined <a href="http://en.wikipedia.org/wiki/E-mail_filtering">filters</a> for processing the messages. This is how it looks in KMail:</p>
<p><img src="http://cyberborean.org/blog/wp-content/uploads/2008/01/kmail-filters.png" alt="kmail-filters.png" /></p>
<p>In general, a filter checks if a document matches to specific criteria (a rule) and does some action if yes. For instance, if a condition &#8220;text contains &#8216;viagra&#8217; or &#8216;cialis&#8217;&#8221; is true, then some action (&#8220;move to spam&#8221; or &#8220;send assassins to the author&#8221;) would be executed. What is especially good is that it&#8217;s old, popular and intuitive user experience.</p>
<p>In a content aggregator like SCAN, this concept may allow a user to define custom automation rules to set document metadata properties. For instance,</p>
<p><code>IF (url starts with "http://cyberborean.wordpress.com") SET author = "me"</code></p>
<p>Another using I have in my mind is to empower an &#8220;artificial intelligence&#8221; of autotagging with a human intelligence, by user-defined tagging rules:</p>
<p><code>IF (text contains "latte") ADD TAG "coffee"</code></p>
<p>I only doubt about terminology — &#8220;filter&#8221; might be confusing, as it is already used in SCAN vocabulary (URI filters to include/exclude some documents by their URI pattern). Any ideas?</p>
 <img src="http://blog.cyberborean.org/wp-content/plugins/feed-statistics.php?view=1&post_id=200" width="1" height="1" style="display: none;" />]]></content:encoded>
			<wfw:commentRss>http://blog.cyberborean.org/2008/01/23/more-automation/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>SCAN project announce</title>
		<link>http://blog.cyberborean.org/2007/09/14/scan-project-announce</link>
		<comments>http://blog.cyberborean.org/2007/09/14/scan-project-announce#comments</comments>
		<pubDate>Fri, 14 Sep 2007 12:36:54 +0000</pubDate>
		<dc:creator>Alex Alishevskikh</dc:creator>
				<category><![CDATA[Announcements]]></category>
		<category><![CDATA[desktop]]></category>
		<category><![CDATA[IA]]></category>
		<category><![CDATA[Information Retrieval]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[metadata]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[SCAN]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[tagging]]></category>
		<category><![CDATA[taxonomy]]></category>
		<category><![CDATA[Tools]]></category>

		<guid isPermaLink="false">http://cyberborean.wordpress.com/2007/09/14/scan-project-announce/</guid>
		<description><![CDATA[ViceVersa Technologies presents the first public release of SCAN (Smart Content Aggregation and Navigation) platform. SCAN  is a personal        Information Retrieval framework, combining search, text analysis,        tagging and metadata functions to provide new user experience of desktop    [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://scan.sourceforge.net"><img src='http://cyberborean.org/blog/wp-content/uploads/2007/09/scan100i.png' alt='SCAN' align='left' hspace='5'></a><em>ViceVersa Technologies presents the first public release of <a href="http://scan.sourceforge.net">SCAN (Smart Content Aggregation and Navigation)</a> platform. SCAN  is a personal        Information Retrieval framework, combining search, text analysis,        tagging and metadata functions to provide new user experience of desktop        navigation and document management.</em></p>
<p><span id="more-179"></span></p>
<h3>About SCAN</h3>
<blockquote><p>&#8220;&#8230; the abundance of information will be such that either you have        reached such a level of maturity that you are able to be your own        filter, or you will desperately need a filter&#8230; some professional        filter.&#8221;<br />
<em>Umberto Eco: A Conversation on Information<br />
(<a href="http://carbon.cudenver.edu/~mryder/itc_data/eco/eco.html" class="broken_link" >an interview  by Patrick Coppock</a>, February, 1995)</em></p></blockquote>
<p style="margin-top:0;">       SCAN is aiming for a solution of major problems of content organization        and findability in information overload era.</p>
<p style="margin-top:0;"><a href='http://scan.sourceforge.net/uploads/images/browse.png' title='Browse documents'><img src='http://scan.sourceforge.net/uploads/images/browse_tmb.png' alt='Browse documents' align='left' hspace='5' vspace='5' /></a>SCAN aggregates content from different sources into a single documents        collection. This repository may keep records on thousands of documents        independently of their original locations and formats. Every document        record contains a number of metadata properties (such as title,        description, author, creation date, etc) which can be either set        automatically or edited manually.</p>
<p style="margin-top:0;">       Adding documents to the repository is an automated operation. A user        only need to point SCAN to a location and the application will find and        add every document from there. Added document locations will be        monitored for changes (new, modified or deleted documents) to keep the        repository up-to-date.</p>
<p style="margin-top:0;">       The documents content is indexed for search and text analysis. You can        search the documents either by simple text queries, or by using special        forms to make complex queries for searching on document text and        properties. The queries can be saved for repeatable use.</p>
<p style="margin-top:0;"> <a href='http://scan.sourceforge.net/uploads/images/tags.png' title='Tags panel'><img src='http://scan.sourceforge.net/uploads/images/tags_tmb.png' alt='Tags panel' align='left' hspace='5' vspace='5' /></a>      The documents collection is structured with a system of tags, similar to        the services like <a href="http://del.icio.us/">del.icio.us</a> or <a href="http://flickr.com/">Flickr</a>.        Tags are keywords or labels attached to the items to identify them for        quick navigation and finding. All tags together form a t<em>axonomy</em>        representing the semantics of the documents collection. The taxonomy can        be viewed as a &#8220;tags cloud&#8221; for navigating through the documents        repository.</p>
<p style="margin-top:0;">       SCAN text analysis mechanism simplifies the process of tagging. It        analyzes a document content and suggests the most relevant words as to-be tags. It makes manual tagging as simple as selecting the tags from        the proposed candidates. It also can undertake the whole manual process        of tagging, either by automated assigning the tags to the documents, or        by finding the documents, relevant to a specific tag. Another text        analysis application is searching the documents similar to a specific        one (search by pattern).</p>
<p style="margin-top:0;">       SCAN is a component-based software using a number of plugins for        specific features. The basic SCAN platform can be easily extended with        plugins for new document formats, document locations (RSS feeds,        web-sites, e-mail, etc) and language analyzers. Whole new areas of        functionality can be added with user interface extensions. An example of        such extensions is the plugin to browse the repository with a calendar        (grouping the documents by their creation dates).</p>
<p style="margin-top:0;">       SCAN is a <a href="http://java.sun.com/">Java</a> application, so it        works on any Java-enabled platform. SCAN is a free open source software,        distributed under <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache        License, Version 2.0</a></p>
<h3>See also:</h3>
<ul>
<li><a href="http://scan.sourceforge.net/?page_id=19">List of current features</a></li>
<li><a href="http://sourceforge.net/project/showfiles.php?group_id=189359">Download SCAN</a></li>
<li><a href="http://scan.sourceforge.net/?page_id=7">How to obtain SCAN sources from SVN repository</a></li>
<li><a href="http://scan.sourceforge.net/?page_id=4">User&#8217;s Manual</a></li>
</ul>
 <img src="http://blog.cyberborean.org/wp-content/plugins/feed-statistics.php?view=1&post_id=179" width="1" height="1" style="display: none;" />]]></content:encoded>
			<wfw:commentRss>http://blog.cyberborean.org/2007/09/14/scan-project-announce/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Survey: How do you find your documents?</title>
		<link>http://blog.cyberborean.org/2007/03/26/survey-how-do-you-find-your-documents</link>
		<comments>http://blog.cyberborean.org/2007/03/26/survey-how-do-you-find-your-documents#comments</comments>
		<pubDate>Mon, 26 Mar 2007 15:50:00 +0000</pubDate>
		<dc:creator>Alex Alishevskikh</dc:creator>
				<category><![CDATA[Essays]]></category>
		<category><![CDATA[desktop]]></category>
		<category><![CDATA[IA]]></category>
		<category><![CDATA[metadata]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[survey]]></category>
		<category><![CDATA[tagging]]></category>
		<category><![CDATA[Usability]]></category>

		<guid isPermaLink="false">http://cyberborean.wordpress.com/2007/03/26/survey-how-do-you-find-your-documents/</guid>
		<description><![CDATA[For my ongoing research, it would be helpful to gather some feedback from random people on their personal document management, navigation and information seeking preferences. Please send your answers to alexeya (at) gmail (dot) com or just attach a comment below. Thanks in advance!


When you are looking for a document on your machine, the following [...]]]></description>
			<content:encoded><![CDATA[<p>For my ongoing research, it would be helpful to gather some feedback from random people on their personal document management, navigation and information seeking preferences. Please send your answers to alexeya (at) gmail (dot) com or just attach a comment below. Thanks in advance!</p>
<p><span id="more-153"></span></p>
<ol>
<li><strong>When you are looking for a document on your machine, the following navigation hints are important (specify an order, e.g. &#8220;a, d, c, b&#8221;):</strong></li>
<p>a. File name<br />
b. File name and folder<br />
c. Date of creation/modification<br />
d. Document metadata properties<br />
e. Full-text search results</p>
<li><strong>Do you use document titles (or another semantic data) as the file names?</strong></li>
<p>a. Yes, always<br />
b. For important documents only<br />
c. Don&#8217;t care about the file names</p>
<li><strong>Do you keep a sort of semantic folder structure for storing your documents?</strong></li>
<p>a. Yes, and I keep all my documents in a single folders hierarchy, organized semantically<br />
b. Yes, but use the structure for important documents only<br />
c. Use random folder structures, depending on a context of my work<br />
d. Don&#8217;t care about where I&#8217;m saving my files</p>
<li><strong>Do you use document shortcuts (Windows) or symbolic links (*nix) for improving  navigation?</strong></li>
<p>a. Yes, often<br />
b. Rarely<br />
c. No</p>
<li><strong>How often do you use the system full-text search for finding a specific document?</strong></li>
<p>a. This is my everyday way of finding the documents<br />
b. Only when I need to find the document quickly<br />
c. Only when I gave up to find the document by other ways<br />
d. Only when I&#8217;m not sure the document on this topic exists<br />
e. Never use the search for this purpose</p>
<li><strong>Do you use &#8220;advanced search&#8221; capabilities?</strong></li>
<p>a. Yes, often<br />
b. Rarely<br />
c. No, basic search is enough</p>
<li><strong>Do you use the system &#8220;Recent documents&#8221; list?</strong></li>
<p>a. Yes, often<br />
b. Rarely<br />
c. No</p>
<li><strong>Do you enter the metadata (title, author, subject, keywords etc) in the &#8220;Document properties&#8221; dialog box of your text editor?</strong></li>
<p>a. Yes, always<br />
b. For important documents only<br />
c. Don&#8217;t care about it</p>
<li><strong>How do you mark an importance of a document?</strong></li>
<p>a. Place it into a special folder<br />
b. Place it on the desktop<br />
c. Bookmark it (place into the &#8220;Favorites&#8221;)<br />
d. Do nothing</p>
<li><strong>How many documents (in percentage against all documents on your machine) are used actively?</strong></li>
<li><strong>How do you handle outdated documents?</strong></li>
<p>a. Keep them in place<br />
b. Move to a special folder<br />
c. Move to backup media, then delete<br />
d. Delete them</p>
<li><strong>Which formats you use for the text documents </strong><strong>(specify in order of importance, e.g. &#8220;a, d, c, b&#8221;):</strong></li>
<p>a. MS Word<br />
b. OpenDocument<br />
c. PDF<br />
d. HTML<br />
e. XML or SGML (DocBook etc)<br />
f.  Plain text<br />
g. Other</p>
<li><strong>Do you save online documents on your local hard drive?</strong></li>
<p>a. Yes, often<br />
b. Save important or very large documents only<br />
c. Never</p>
<li><strong>Do you keep a sort of a personal electronic library?</strong></li>
<p>a. Yes<br />
b. No</p>
<li><strong>Do you use specialized software for photo albums or multimedia collections management?</strong></li>
<p>a. Yes<br />
b. No, standard system tools are enough</p>
<li><strong>How do you estimate efforts on supporting your local document collections?</strong></li>
<p>a. It is a burden, it takes a lot of my time and harms for my work<br />
b. It takes some time but it worths it<br />
c. It is not a problem with help of the modern desktops<br />
d. Do not see any problem</p>
<li><strong>Do you use <a href="http://del.icio.us">del.icio.us</a>?</strong></li>
<p>a. Yes, for every interesting stuff I meet on the Web<br />
b. Yes, for important links only<br />
c. Yes, for links I want to share with somebody else<br />
d. No</p>
<li><strong>How many tags are in your del.icio.us profile?</strong></li>
<li><strong>How many del.icio.us tags you usually assign per a single link (in average)?</strong></li>
<li><strong>When choosing the del.icio.us tags, you prefer</strong></li>
<p>a. Your own tags<br />
b. Other&#8217;s tags, suggested  by the service</p>
<li><strong>When choosing the del.icio.us tags, you prefer to</strong></li>
<p>a. Reuse the existing tags, as possible<br />
b. Create new tags</p>
<li><strong> Do you try to avoid tags synonymity?</strong></li>
<p>a. Yes<br />
b. No, I don&#8217;t care about synonyms</p>
<li><strong>Do you use tag bundles?</strong></li>
<p>a. Yes<br />
b. No</p>
<li><strong>Which factors are important for tags selection </strong><strong>(specify in order of importance, e.g. &#8220;a, d, c, b&#8221;)?</strong></li>
<p>a. My own subjective associations<br />
b. My vision of of the implicit topic semantics (tend to be objective)<br />
c. Explicit textual properties of the document (terms frequency, etc)<br />
d. Tags, assigned by other people</p>
<li><strong>The best synonym of  the&#8221;tag&#8221; is:</strong></li>
<p>a. Category<br />
b. Term<br />
c. Topic<br />
d. Keyword<br />
e. Label</p>
<li><strong>The purpose of the tags is:</strong></li>
<p>a. Distinction<br />
b. Unification</p>
<li><strong>A number of tags per document is a measure of:</strong></li>
<p>a. Document importance<br />
b. Information diversity<br />
c. Collection size<br />
d. Selection quality</p>
<li><strong>Would automatical tags generation be useful?</strong></li>
<p>a. Yes, and it could completely replace human brains in this area<br />
b. Yes, but it matters as a help for human brains only<br />
c. No, the tags should belong to humans</ol>
<p>Thank you!</p>
 <img src="http://blog.cyberborean.org/wp-content/plugins/feed-statistics.php?view=1&post_id=153" width="1" height="1" style="display: none;" />]]></content:encoded>
			<wfw:commentRss>http://blog.cyberborean.org/2007/03/26/survey-how-do-you-find-your-documents/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>
