New videos from Michael Wesch, presented during the opening keynote at IDEA:
This video explores the changes in the way we find, store, create, critique, and share information. This video was created as a conversation starter, and works especially well when brainstorming with people about the near future and the skills needed in order to harness, evaluate, and create information effectively.
I found an interesting post on the Drupal support list about finding the right method for using taxonomy in Drupal.
I am using the taxonomy module and have several vocabularies added , several look like this: 'Politics' which has Single Hierarchy and Multiple select, and has several Terms such as 'Countries', 'Groups', 'Ideas', and Item such as countries has 'sub' Items such as 'USA', 'England', 'Germany' etc. Now.. whenever I post a new article or blog entry and tag it, say.. USA, I still need to tag Countries and Politics as well or otherwise when clicking Countries it will show an empty list. Is this the right practice?
I think this is probably the biggest issue when approaching description problems with a content management system. In large enterprise CMS, the problem might require maintaining taxonomies using the sophisticated tools for finding and grouping content based on rules created by the taxonomy tools included with or added to the CMS. But people dealing with Open Source CMS have been left to the task of figuring out how to deal with these same problems with a less sophisticated set of tools tools and plugins.
Fortunately, Drupal has recently added the Views module to at least help admins deal with display issues more easily. Taxonomy management is an area that still needs more attention. But for most people, the basic question about what practices are best suited to taxonomy use in the more full-featured systems like Drupal persist. Do you use hierarchy in taxonomies? Do you just free tag everything? How do you set up taxonomies for the outcome you expect or desire?
Of course it depends on the type of description/categorization you intend to do. To pick the right method for creating taxonomies you need to focus on how you want to extract or display that information and on how much effort you want to expend on maintaining it. The Views module seems like it will give you the flexibility you need in most cases for creating display rules. But what you want to consider is whether or not the structure you are creating will require you to spend time maintaining the views over time. For instance, given the Drupal Support example above, say the admin starts with the following structure in his Politics taxonomy:
- Countries
- Afghanistan
- Algeria
- Andorra
- Groups
- Al-Qaeda
- Amnesty International
Now, say that he wants to display all of the content that comes in under the Groups category, including all of its subcategories. He can create a Groups View and select each descendent term in the taxonomy filter. But I don't believe there is a way to simply say "give me every descendent of the Groups term and show that in my view". If there is a way to do that with Views, I'd love to see it. So instead, he has to select the terms in the filter's select field.
But, say he adds "Animal Rights Advocates" under Groups. Now he wants to update his View. What I think he would need to do to update the view is go back to the Views filter and now select "Animal Rights Advocates" as well. He would need to do that for every term added to Groups.
In the example above, Countries and Groups are facets of description. And while he can choose to create a taxonomy for each facet, he's chosen to create a single taxonomy for the Politics domain and insert facets under each one.
In the past -- prior to the release of the View module -- I've worked around the display maintainenance issues by creating separate taxonomies for each facet. See the taxonomies in use on my blog for instance. Yikes, right? I've used that separation to break up the display of my metadata below each entry so I can show terms broken up by taxonomy, e.g. Subjects, People, File formats. The output looks like this:
Works for that display need under each blog entry. But you'll notice when you look at the taxonomies for each facet that I also added some complexity by using hierarchy within some of the taxonomies, e.g. Subjects and People. Taking the People example, if I wanted a view that only showed Groups of people rather than individuals, I'd run into the same maintenance problem you see in the Politics taxonomy of having to manually update the Views filter again.
To make matters more complex, I am starting to use and love the new free tagging capabilities of Taxonomy for selecting my terms. I started to do this because my Subjects taxonomy is rather large and choosing multiple options in a form Select can be problematic. The free tagging options allow me to select terms much more quickly. The problem is that I've also become accustomed to adding terms using this tool. This change in behavior of "tagging as you go" conflicts with my previous behavior of first adding terms to the hierarchical taxonomies before I blog because as I now add free tags, they end up in the root of the Subject Taxonomy, requiring me to still go back and file each term I add under a parent for the hierarchy to continue to make sense. And because the Taxonomy module doesn't presently provide a way to filter terms in the admin view, e.g. to find the term I just added without paging through the list, this task has become cumbersome as an after-the-fact activity.
Here's what I see in the Categories section when I create entries in Drupal:
This is a tricky issue, because I'm at a point where I could decide to go two different ways: to dump the hierarchy within taxonomies because of the level of effort it requires to maintain; or to continue to maintenan the hierarchy within taxonomies because it provides some value in terms of browsing. I'm sure the browsing of hierarchies is only valuable to me, but when I think about the amount of energy expended compared with value, my gut tells me that it's not worth the effort to maintain.
Doing this kind of hierarchical taxonomy management on a blog is probably unheard of, but in certain applications, e.g. enterprise content management it can be absolutely warranted and/or required. As an information worker I tend to try out different methods for describing and organizing information to help me understand which practices work in different contexts. The last several years of using taxonomies on this blog and now introducing free tagging have helped me see that each method holds utility in different circumstances, but getting both to work together nicely is a bit tricky.
So, this may not be a very clear answer to the question about best practices for taxonomy. It does serve as a cautionary tale about how being very descriptive and maintaining relationships within one hierarchy can leave you with maintenance concerns as you scale up. In terms of ease of entry using Drupal, creating facets within one taxonomy might make it easier to select terms when you are creating new content and doing free tagging using one field in Drupal. But organizationally speaking, that approach seems to presents you with some challenging display and maintenance issues, for now at least.
I like the meeting format described in this Business Week article on Marissa Mayer, VP of Search Products & User Experience at Google.
New features are digitally projected onto the right side of a conference room wall, big as a movie screen. Everything Mayer and others say is transcribed and projected on the left. Underneath both looms a giant mega-timer. Everyone gets an average deadline of 10 minutes. Mayer and her team add and subtract to the feature as time runs down. It is iteration at lightning speed.
While the formal quick pitch format is unnecessary in small groups, what makes sense is that this format allows for new features to be proposed with frequency high up the chain of command in a large organization with some amount of structure in terms of time restraints. I assume engineers working on the project can pitch directly to Mayer. The critique format also allows a good deal of iteration while exploring ideas so they can be worked on some more and revisited. This excerpt describes the process:
What's most fascinating to me is the projection of the demo on the screen and the immediate capture of the discussion, which I assume goes directly into that internal project management system they talk about. That's excellent. Capturing these trasactions of verbal communications, although using brute force methods of manual transcription, is what knowledge management is about. The post-processing and information retrieval in their system is what glues it all together. That they're openly capturing everything in these meetings is what makes it effective KM work. It's not so hard to imagine all the pieces fit together into a product or at least a process that could be sold as an idea for realistic KM at work:
- WebEx type presentation software
- Transcription (manual now, voice recognition later in the retrieval system?)
- Search mechanism with some simple hooks for metadata (parsing for based on minimal formatting conventions, e.g. "[field name]:")
If enterprise meetings all went this way, mining of the types of tacit information usually floating around in conversations might begin to mean a bit more. Taken too far it could make a lot of information also mean less I suppose, but who cares as long as we have bigger hammers to tackle the signal to noise problem in retrieval. So I'm wondering if that type of process could be adopted as a model and be rolled into a solution. I want to see more under the hood at Google.
Baseline has a feature story exposing bits about How Google works and what we can learn from them. Most of the story focusses on the unique infrastructure Google has been building to support its expanding needs. But most interesting to me is the small bit that takes a lok Inside Google's Enterprise. The article refers to Page and Brin's pronouncement in their IPO that the company is not conventional and doesn't intend to become so. And this appears to be true judging by the way they run the company internally. They won't follow the pattern of what's been done to run businesses in the past if they can find a better way themselves.
This is exactly the attitude that has slowly been building up a revolution inside the ranks at enterprises large and small. Knowledge workers, fed up with the way things are have turned away from conventional software to manage they way the work in favor of better, simpler applications that get out of their way and let them get on with with it.
The article talks about how Google uses a simple system that manages project information using relatively unstructured email as the interface. The system mails employees every week asking what they worked on the week prior and what they plan to work on during the current week. The response is parsed, fed and indexed into a searchable system that is open to the enterprise so that anyone else can track other employees projects that they are interested in. They call it "living out loud".
What they're doing is creating an open system that matches an open knowledge sharing ecology. That openness allows for the "cross pollenation" of ideas. Even better, it provides opportunity for the one thing that is driving nearly every aspect of the innovative web today -- open conversation. They're creating a system that better ensures sustainability because it works with an existing, accepted process -- communicating through email. This removes barriers to use because email is easy. The unstructured nature of the format also means that it can evolve with the needs of the system on the back end. The computer works harder so that the knowledge worker can just dash off a note and get on with their work.
Wow, right? That's revolutionary thinking, and it's so simple on the user-facing end that you hardly have any excuse for not participating. And opportunists that exploit the system by mining and tracking with it will benefit from it immensely. This is the evolving face of knowledge management. The idea of telling the technology to get out of the way so we can do work is what's driving the enterprise blog and wiki revolution. We all need to publish, share and collaborate, but we want to do it as simply and effortlessly as possible. Google embodies that idea completely inside and out.
Google's new Finance site is really quite elegant. The site offers information on North American stocks, mutual funds and public and private companies along with charts, news and fundamental financial data. Different things to watch for here are interactive charts, and the blog and discussion group retrieval. Most of the other tear sheet type information, e.g. news, company profile (description), and finances you'll find on all of the other finance sites as well.
The line/spark line chart scrolling is cool. it automatically scrolls to the news for the period you are browsing in the chart. You can also change the range of dates in the chart by resizing the year widget -- mouse over the years at the top of the chart and a little resizing widget appears. When you drag and resize the date range, the main line graph shrinks or expands to show better detail on that range and the news box on the right refreshes to show only the items in that date range. Very nice, clean and simple use of AJAX.
Don Norman recently attempted a simplicity backlash after a few articles touted Google's simple UI as one of the reasons for it's success. Most of these simplicity articles talk about the spareness of its search interface as opposed to Yahoo's, for instance. Finance people are also saying that Google is not presenting a clear enough strategy and that their tools are all over the place. I might agree with that. They have a lot of applications that never seem to make it out of Beta.
Norman says that the simplicty factor breaks down when you try to do anything outside of searching web corpus. His argument is valid. If you view Google as a suite of tools for retrieving information, there is often a disconnection between the bodies of indexed data. The problem is rooted partly in poor information architecture problem and partly in poor interaction design. Norman is saying, I think, that the site doesn't yet allow the integration of the pieces into one UI, and rather segments it by application (and dare I say, by working group within Google?).
But when they do rich applications like Google Maps and this new Finance site, they DO do it rather simply and elegantly. (The Google News Reader on the other hand, ugh! That thing needs to take a lesson from these Beta apps.) With Maps and Finance their focus and execution on the functionality of simple little interaction widgets, e.g. moving a Google map around with a cursor, changing a data set range with a scrolling widget, is what sets them apart. In the end, our discerete interaction with specific tools is what is simple, and it's why I continue to use them over other sites. I don't care if their products are siloed and perhaps require poking around in the labs or clicking tabs to find them. When I get there, there is very little menu cruft in the way and it lets me get the job done quickly and efficiently.
Dan Brown puts a finer point on the folksonomy buzz, which is already getting too loud for my ears. Dan makes the point of clarifying that the process of freetagging is not the same as creating folksonomies. The notion of a freetag, or a user-supplied index term as I know it to be historically called in IR, is not the same as builiding what Thomas Vanderwal calls a folksonomy. Folksonomies are like (analagous to) thesauri or taxonomies (without the important aspect of control, of course).
But folksonomies, unlike taxonomies, aren't built, they emerge organically through the accretion of freetags. It's probably a good point to make the analogy that freetag is like index term and folksonomy is like taxonomy (or controlled vocabulary) in order to help people understand what these terms mean. There's no doubt that there are information workers outside of the world of del.icio.us and flickr have no idea what these terms mean and why they should matter. They will need to pay attention sooner or later.
The distinctions Dan and Thomas are making are probably minor to most people who use freetagging sites. I think Dan and Thomas are navel gazing at the minutiae because terms are being coined left and right in industry mags, on discussion boards and in blogs and knowledgable IAs and content people are trying to hone the terms so the meaning matches the use of the vocabulary. This is sort of important at this stage, because soon applications will be released that throw the terms around and as the applications start getting recognition, the meaning of terms will become modified with the use. This is sort of what happened with the term "taxonomy", which a lot of information workers hated because it wasn't quite correct. A few key business people start using a term one way and boom, it's accepted jargon.
This idea of freetagging isn't new by the way, but the bubbling up of tags into large shared lists in heavily used sites is. If only del.icio.us or flickr would think about applying synonym rings to make clustering more usable then we'd have something special. Even I'd be willing to work on that in order to use it, especially on image databases.
Figuring out how to make images findable was the main reason I studied library and information science. I even proposed freetags (user-supplied keywords I called them) in a hypothetical visual resources database I wrote a spec for in 1997. After grad school I turned down job offers related to image indexing at StockObjects and TMS and interviews at Corbis because I wanted be a web designer instead. I didn't like the idea of going into those places to actually work as an indexer massaging the thesauri. I don't even do that where I work now because we have someone who dedicates about 75% of his time doing just that. I couldn't do that.
Anyway, all interesting stuff. I just hope the recent rash of arbitrary technology acronyms and neologisms end soon.
The free Ecco outliner for Windows.
NYPL Digital Gallery provides access to over 275,000 images digitized from primary sources and printed rarities in the collections of The New York Public Library, including illuminated manuscripts, historical maps, vintage posters, rare prints and photographs, illustrated books, printed ephemera, and more.
Perl spider for creating text-based site maps/reports.
The NY Times interviews prominent information professionals to get their reaction to Google's digital library announcement. I believe the value outweighs the concerns, but the concerns are substantial.
Here are some excerpts from the article:
How will research be improved for students already struggling with, among other things, how to authenticate Internet information? What new roles will librarians play in helping people parse a vast amount of more easily obtainable information?"
...
"What I've learned is that libraries help people formulate questions as well as find answers," Ms. Wittenberg said. "Who will do that in a virtual world?"
...
Robert Darnton, a professor of history at Princeton who is writing a book about the history of books, noted that by looking at a book's binding and paper quality, a researcher can discern much about the period in which it was published, the publisher and the intended audience.
...
Some interviewed were concerned that Google could not fully reproduce material that was still under copyright protection, which means all books published in the United States after 1923. And in this day and age, Mr. Nasaw said, far too many students already read excerpts and seldom read the full texts.
...
Already, libraries buy fewer reference materials because such materials are online, she said. At the same time, the number of library visitors doubled in the last 10 years to 1.2 billion visits a year now, she added, with many visitors seeking help in managing vast amounts of information. As she put it: "People are saying, 'I went on Google and I got 40,000 hits. Now what?' "


