Push versus pull vocabulary development

On AIfIA-members, I threw in my thoughts about the current folksonomy/user-driven classification discussion that's been kicking around on iaslash, Interactionary, Headshift, Vanderwal's off the top, atomiq, PeterMe, alex wright, NoiseBetweenStations, rawbrick and sylloge. I can offer my pespective as a practicing librarian, but not as someone who claims to know the literature in this area.

One of things that makes user-supplied terms meaningful in some of the examples we're talking about, e.g. Flickr and del.icio.us, is that these are systems where we're creating our own content. There's great incentive to participate in the classification/term tagging. These systems don't, however, allow you to add terms to modify the classification of content created by others.

I wonder how the idea of user-supplied terms would work in a system where people are mainly readers of content, rather than writers, e.g. in an intranet portal. From a design/development perspective, I wonder about the efficacy of pursuing this type of functionality in this type of environment. Seems perfectly suited to the examples we've seen. But, some types of information on a portal, for instance, are so ephemeral (e.g. news) that I wonder if people would bother to add their own terms.

Getting back to the concept of user supplied terms, I like the idea of the middle ground that Alex speaks about. This is the sort of approach my organization takes. There are different ways to add index terms to a system vocabulary. What we're talking about with the user tagging approach is a push model. Organizations can also use a pull model or a combined approach.

Push: Users push index terms from their vocabulary onto the system vocabulary. e.g. Flickr, del.icio.us

Pull: System (referring to both the computer system and the people responsible for it) solicit index term additions from users. e.g. More typical large information systems

To give an example of a combined model, my org uses both methods to some degree:

Push: our controlled vocabularies were started by doing interviews with business units and continually communicates with business unit subject matter experts to maintain it

Push: we provide a web-based form for people to suggest terms (e.g. suggest companies, subjects, etc.).

We started with a big pull and keep getting terms pushed from our users. The vocabularies are maintained by someone who keeps relationships with people in the business units to continually add/modify terms (Pull). This person also authorizes the terms that are suggested (Pushed), adding them to vocabularies and indicating relationships (normalizing).

We don't, however, allow people to add terms directly to the system. We act as a gateway for accepting or authorizing terms. In our company, which may be typical of many politicized corporate environments, a certian degree of control is enforced. It also helps deal with the issues of noise and imprecision that can be introduced with user-supplied terms. Is this how others deal with pushed terms in a corporate environment? Are there other ways to address user-supplied terms?

I would think that a more open push model might be great if it were kept separate to some degree, until a human did some work to normalize. This is also what I understand James Spahr is doing with his Pratt Talent database. The approach seems ideal. I've personally also thought there was some promise in clustering, especially when dealing with very large sets of data, but I wonder about how this type of categorization is used in actual retrieval. We use clustering to some degree, but our statistics show that categories are not that frequently used when compared to the number of searches executed. I'm talking about clustering used in search engine retrieval. Perhaps the more successful use of clustering we employ is using clustering to aid in auto-classification before human indexers accept and add terms to information objects.

Don't know if I've added to the discussion here, but I thought the in-house librarian's perspective was missing and I was wondering how others were approaching bottom up and top down methods for classification.

I guess I should also mention that user tagging is an occassionally asked for feature of Drupal, which has been included in the James Seng release and was recommended in our usability recommendations (item 9.2) for a future release. We'll see if anyone is up to the task and implements it.

[Update: If you're interested in Drupal user tagging functionalities, check out the links in Boris' comments to this entry.]

Comments

01 Boris Mann@bryg...
09/01/04 @ 13:05

We're going to implement this -- see our public wiki. We'd love to have your thoughts and participation in this. There is a taxonomy-on-the-fly module that can likely provide the basis for most of this.

I have to go back and revisit the Usability wiki and see what we can realistically implement.

02 jibbajabba
09/01/04 @ 13:10

Great. Thanks for the heads up, Boris. I haven't had time to participate in Drupal discussions that aren't pushed to me. Call on me when it comes time to talk about what you're designing.

Advertisement

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <a> <b> <strong> <dd> <dl> <dt> <i> <li> <ol> <u> <ul> <code> <blockquote>
  • Lines and paragraphs break automatically.
  • Web page addresses and e-mail addresses turn into links automatically.
  • You may post code using <code>...</code> (generic) or <?php ... ?> (highlighted PHP) tags.

More information about formatting options