Atom, RSS

Suprglu is super cool. You enter your account names for web services you use (e.g. flickr, del.icio.us, digg, etc.) and RSS feeds for your blogs and Suprglu displays it all in one place. Nice and simple, and you get to modify the style sheet as much as you like. My Suprglu.

[via swissmiss]

Aarrr, matey. Here be a tale of a blogging practice that makes ye look like a bilge rat pirate

As a rule, you should periodically check referrer logs. Usually it's good practice because you find out who's linking to your work. But once in a while you'll also find a site that's either copying your content outright without permission or that's embedding links to your media (images, MP3's, etc.) in their site and essentially pirating your bandwidth. This morning I found a site that was embedding links to my images in their page. Avast! The image on the right shows a bit of their page and how I'm replacing images (See the "Revenge" section below to see how this works).

My site publishes complete entries in its RSS feed. Because of that, other people's web-based aggregators are able to republish my content in its entirety. In the best case, a blogger uses a web-based aggregator to watch feeds and post the ones they like, excerpting the entry. In the worst case, they republish your entire entry without attribution. I don't know what this site owner was doing, but I noticed that their blog was basically aggregating other people's posts. There doesn't seem to be any original content. But in my case, they didn't excerpt, they re-copied my entire blog entry verbatim. What pisses me off is it looks like they wrote the article.

I suppose it's partly my fault for putting full entries in my RSS feed rather than excerpts, but I do this so that people can read my blog in their aggregators without having to actually go my site. This is the downside, I suppose. Web-based aggregators will republish whatever they get.

I take me revenge

To play with them a little, I now replace images referenced from another site with a STOP image. I hate to have to do this, because it messes up the images for legitimate aggregators. I suppose you could be really malicious and post a hardcore porn picture in there instead to make thinkgs look even worse. I'm not that malicious.

You can do the same thing if you find that someone is pirating your media. Using altlab's examples for dealing with bandwidth theft, I modified the .htaccess file on myserver to include these lines.

RewriteEngine On
RewriteCond %{HTTP_REFERER}
    !^http://(www\.)?urlgreyhot\.com/ [NC]
RewriteCond %{HTTP_REFERER} !^$
RewriteRule \.(jpe?g|gif|bmp|png)$ img/pirate.png [L]

To use this code on your site, replace the second line with your domain and modify the fourth line to use the path to your stop image.

I thought about this a few minutes. Because I don't want to do this to everyone, I can use the code below as a method to block from that domain only:

RewriteCond %{HTTP_REFERER}
    ^http://(www\.)?badsite\.net/ [NC]
RewriteCond %{HTTP_REFERER} !^$
RewriteRule \.(jpe?g|gif|bmp|png)$ img/pirate.png [L]

To use this code on your site, replace the domain first line with the domain of the bad site and the change the third line to use the path to your stop image. Take that, ye scurvy lubber!

Yo ho! Here be Mr. Krabs bit of advice to ye
This is why you should always look at your referrers! Be smart about RSS aggregation and blogging. If you are going to use an RSS aggregator to feed your blog, be sure to excerpt and ALWAYS link to the original article and attribute the author.

Update
Moments after doing this, they must have seen the replaced image, so they removed the copied entry from their site.

For the past several months in my group at Lucent we've been testing out a system developed to be a simple self-service publishing application. You might recognize the interface. It follows the model other social bookmarking services have made common.

Tag browser

Identifying the needs

The idea to take the concept of social bookmarking and turn it into more than just a bookmark saving service came as the result of several different types of requests we've gotten in the past. One type of request was for a way to clip or save articles found on our site digital library site. We aggregate a wide variety of diverse sources. The most relevant databases include vendor news (e.g. feeds from Factiva for newspapaers and journals) and internal databases (e.g. internal news publications, technical documents repository).

A second and more urgent request we got was to provide a way for users to save articles found on our site and publish them on portlets within the corporate portal. Portlets are small windows of html content that act like little building blocks or modules in a portal page.

Several things we had done in the past helped us to add on to or evolve our existing database system and develop a new and separate system that would handle these specific bookmarking needs. We had already RSS-ified our databases, providing very complete feeds of our data as XML and partial feeds (bibliographic data) of our data as RSS. Prior to that, the primary method for doing something with database results was to set up an email or web-based alert. But the new set of requirements dealt with two issues:

  • Tagging of individual records
  • Re-use of records off site

Social bookmarking to the rescue

So I began developing the concept for using the social bookmarking model we've been seeing on sites like del.icio.us and furl. The first requirement was to provide a means for flagging records. The second was to provide a way to re-use that data elsewhere.

Our first releases did pretty much everything that del.icio.us does. We provided a bookmarklet/favelet for saving, tagging and commenting on a web page. The default view for bookmarks showed all users tagged bookmark entries, and you could navigate to view all bookmarks under a single tag, the bookmarks of one user, etc.

The screenshots below show the bookmarks main page and the pop-up window for saving/modifying a bookmark.

Bookmarks home page

Editing window

The application was shaping up to be pretty decent, utilizing all of the commonplace features on social bookmarking sites. We integrated the XML and RSS feed feature that we already used on our other databases. Feeds are available for any view the application can generate, e.g. Michael's bookmarks, Michael's bookmarks on tag "searchengines", All users bookmarks, All users bookmarks on "searchengines".

Self service publishing

Now the reason I thought we could try to use this model for self-service portal publishing is the free-tagging model. The idea was to allow individuals or groups to start bookmarking articles from our News databases, e.g. any of the Factiva sources such as newspaper and magazine articles. They could use a common tag, e.g. Mobility-Portal-Hot-News, for instance. Then they could get an aggregation of all of the articles saved with that tag and somehow display them in a portlet. Of course, controlled vocabularies would have worked as well, but the free-tagging model allows them to define the use. The portlet idea is just one applicable use. There are others we could thinkg of including ad-hoc reporting.

Feeds and exporting

This was shaping up to be a pretty decent way to do self-service publishing, but the obstacle of knowing what to do with RSS stood in the way. The concept of a feed is still pretty foreign to most business users. Savvy users can install RSS readers, but re-using that content on web sites would be time consuming. The next step was to provide a means for doing this more simply.

We first provided an HTML output along with RSS, thinking that portlets could display this content as HTML, but that necessitated using iframes. The second idea I came up with was to use JavaScript to put the bookmark entries in a JS feed with the latest entries stored in an array. Then portal owners could insert a JavaScript in a portlet that referenced the JS feed and the recent entries would be displayed on the site as HTML. If you're familiar with how Google Adsense ads work, you know how simple this is.

The screenshots below show the process of preparing scripts for display on a portlet:

JS feed link

JavaScript generator for portlets

Feed published as HTML on portlet

As always with the type of evolutionary design we do where I work, these proofs of concept helped drive the design of other functionalities we could think of. One of the nice things about working in-house somewhere is that you can continue to improve applications over time.

A common request we've gotten in the past was to provide a way to create reports for things. We commonly do output of some data for Excel, for instance. For this tool, it made sense to provide a way to generate bibliographies of bookmarks. So I began creating a tool to tranform the data into APA-style bibliographies at first, with plans to also provide RTF export of bookmark lists.

Bibliography format selection

Bibliography display

Controlling the sprawl

The set of steps we took up until now took each function and divided them into atoms or pieces of functionality that we added to our existing systems. I'm very interested in the organic approach to solving the problems. The programmer I tend to work with likes to work this way. I document the needs and the concept for the application, he makes the prototype and we evolve it together. It's actually a pretty nice approach, and we have the freedom and flexibility to do things this way.

All of these features make the system servicable, but as we conceived of different functionalities to add, it became clear that this system was becoming more and more complex from a user perspective and could do with some simplification. I liken this to getting control of a garden that has become overgrown. At some point all of those aggresive plants start dominating and stifle the smaller ones. What do we do so we can see the parts more clearly again?

At this point, I'm trying to get some traction behind removing all those little XML, RSS, HTML, and JS buttons and replacing them with one button for viewing "Export options". I'm presently trying to design the interaction and interface for this clean up.

It's been an interesting several months testing out this application. It's nice to work on such a small application that suits very narrowly defined needs. Smaller, well defined scenarios are much easier to design for than broader scenarios and rules. In the end, these small scenarios fit into the larger business rules we've established for the site and if done right, will feed back into the way we design other aspects of the site. In this instance, the self-service functionalities created for the bookmark application will be added to our other databases so that people can, for instance, create a search on a news database and generate a JavaScript to display the feed from that source on a portlet. The common example is to do a search Factiva News, for instance, on a topic like 802.11 and automatically display the links to the news items on your portal site.

This application still has a bit further to go. We're still talking about issues such as making some bookmarks private. That is possibly the last system feature we'll add. The remaining work is just refining the interface for exporting. I'm interested in seeing how other library systems are approaching the need to re-use data. Clearly enterprise information systems should be thinking about these types of issues. I'm constantly thinking of how aspects of our system can be made more useful to people throughout the company.

Nooked's directory of corporate RSS feeds.

Technorati's tag indexing has gotten me interested in having a full Atom feed for this site, including categories. So I downloaded the latest Atom module for Drupal 4.5 and then used Walkah's patch to modify the module to generate a full Atom feed. Then http://urlgreyhot.com/personal/atom/feed stopped working. Oddly enough, when I commented out the cache portion it worked again. So now I have an atom feed with categories here: http://urlgreyhot.com/personal/atom.xml

I recording the steps I took to do this because this module is not documented (as is sometimes the case with Drupal modules). Good luck:

1) Downloaded the atom module.

2) Downloaded atom.diff and saved to modules/atom/ directory.

3) Executed command in Unix terminal window: patch < atom.diff

4) Checked the module's output at http://urlgreyhot.com/personal/atom/feed

5) Created an alias via the menu "Administer > Url aliases" from atom/feed to atom.xml.

Thanks to Kika for providing the module and to Walkah for the patch. This is why I now send people who ask me if I freelance over to people like those Bryght guys.

Feed to JS is a PHP script that takes an RSS feed, converts it to a JavaScript that can be used to display the items as HTML.

Server side RSS and Atom news feed aggregator that requires PHP/MySQL.

UPDATE: This is no longer true. Google now finally does RSS for their News service.

Hard to believe that there could be search services that are done better than Google, but there are. I've been getting involved with some projects that are looking at monitoring very specific topics from the web, e.g. doing reputation management. Monitoring blogs is fairly easy using PubSub, Feedster and DayPop, but the rest of the Web, e.g. news sites and other non-RSSified web sites haven't been as easy to monitor with RSS. What we're looking for is a method for monitoring topics in more than just blogs and news sources (which IBM's WebFountain was supposed to help with, but failed to work with Factiva). We're also looking for the occurrences of a topic in the darker Web. The biggest problem, however, when you have someone considering looking for so much is to find a way to make the results relevant and manageable.

MSN's Search Beta provides one solution that works at providing Web search results as RSS. Execute any search and append "&format=rss" to the url and you have an RSS feed. This is described more in greater detail on the msnsearch blog.

Yahoo! News offers the best solution for News to RSS. Execute any search and look for the orange XML button in the right navigation bar.

But Google hasn't done this for their Search and News services. The reason could be that they would be giving away their service without forcing people to use the service on Google and see their advertising. But this lack of a complete service offering that includes RSS makes it difficult for people to use off the shelf software do such things as monitor topics in an exhaustive manner. Some people do have the need to monitor topics exhaustively in many different domains, e.g. blogs, web sites, news sites, news groups.

I love the Google brand, but I wish they could get their heads around a way to offer us their services via RSS. At the very least, I'd like to see Google News offer this service publicly as Yahoo! News does so I don't have to go through Justin Pfister's Google News to RSS application. It's a great service, but because it's on someone's personal site it could be gone or unavailable in the future. Google should just give it to us and embed their ads in the RSS or something.

Of course, then the problem, once you have all these results in RSS is helping people make sense of them. Deduping and analyzing the results to be sure you're getting the relevant stuff near the top is no small task, I'm sure.

Cool RSS/RDF/Atom aggregator.

Blogdigger is a search engine for blogs. Blogdigger uses RSS and Atom to index blog content and make it available for search. Blogdigger also makes all search results available in RSS or Atom, so users can subscribe to keyword searches and automatically be notified, via the News Aggregator of their choice, of new content pertaining to their interests. Blogdigger searches thousands of RSS and Atom feeds, and is built-in to many popular News Aggregators, such as FeedDemon and NetNewsWire.