Archives

April 2004 (7)
March 2004 (12)
February 2004 (12)
January 2004 (22)
December 2003 (19)
November 2003 (16)
October 2003 (26)
September 2003 (18)
August 2003 (38)
July 2003 (80)
June 2003 (13)
May 2003 (24)
April 2003 (76)
March 2003 (75)
February 2003 (51)
January 2003 (73)

Category

Family (5)
FYI (18)
Games (2)
Geek (88)
Geographic (3)
Hacks (13)
Home (15)
Humor (54)
Ideas (20)
Ideaspace (15)
Local (15)
Metadata (10)
Microsoft (2)
MovableType (5)
Nitwits (66)
PKI (2)
Politics (22)
Quotes (3)
RDF (15)
RSS (4)
Security (3)
Semantic Web (13)
Site Info (13)
Social Networks (1)
Spam (9)
Sysadmin (1)
Tips (2)
Tivo (2)
TMFTOTHD (1)
To Do (1)
Unlisted (1)
Web (3)
Windows (1)

Local

« MetroBlogs »
DC metroblogs
beltway bloggers

Links


Assorted bits

Blogroll Me!
GeoURL
Listed on BlogShares




April 23, 2003

Bad RSS readers

How about we all start mining our server logs and start 'outing' the bad RSS readers? Ones that don't:

  • honor scheduling instructions
  • use gzip to save bandwidth
  • follow and learn from redirects
  • use eTags
  • understand HTTP headers

At the same time it looks like it might be worth applying the same logic to the sites serving the feeds. There are plenty that don't offer compressed feeds, don't supply eTags and don't have schedule information in them.

Before we bother with extending all this, let's get the core stuff working first.

Geek
Perma  | Comments (5) | TrackBack (2) | 07:33 AM  | xml
Comments (scroll down to see all 5 comments...)

for blogs - one of the biggest sources of RSS content - schedule is meaningless.

Posted by: James Robertson on April 24, 2003 08:35 AM

What's the spec on a gzip'ed RSS feed. When I go through pages, and I see the orange XML box, I expect that to serve up XML.

Is it something in the HTTP accept headers, as in our fetch code should send the accept to application/x-gzip to get the server to serve up gzipped data (if it knows how to), follwed by text/xml, text/html, and text/plain (in case it doesn't)?

Posted by: Joe on April 24, 2003 11:00 AM

Scheduling is pointless in RSS. There's no way to tell from the RSS spec what timezone the hours are based on; thus, if I make sure I'm not reading from 1 a.m. to 6 a.m. my time... if we're not in the same time zone, it doesn't make any difference.

Posted by: Joseph B. Ottinger on April 24, 2003 11:05 AM

Joe - yes, gzip compression is requested within the HTTP headers. Here's a code snippet from nntp//rss to serve as an example.

// Set the Accept-Encoding header to gzip

httpCon.setRequestProperty("Accept-Encoding", "gzip");

// [snip]
// Connect to server and get response stream
// [snip]

// Check content encoding returned by web server
// If gzip, use GZIPInputStream wrapper to
// decompress

String contentEncoding = httpCon.getContentEncoding();

if(contentEncoding != null && contentEncoding.equals("gzip")) {
responseStream = new GZIPInputStream(responseStream);
}

Posted by: Jason Brome on April 24, 2003 11:11 AM

I understand and somewhat symphathize with this idea of wanting to "out" the bad RSS readers. If you do, I can assure you that you can put mine up at the top of the list if you like :) I've certainly been frustrated by bad channels more times than I can count because so many embed HTML or have bad characters in their XML or something.

Personally I believe the solution is like this:

1: Build a really big test suite. It contains a dozen or more examples of correct channels and incorrect channels for each major version of RSS (i.e. at least 0.90, 0.91, 0.92, 1.0, 2.0). Everybody fights it out and agrees that these represent something that can be used to test parsers and figure out which ones are accepting things they shouldn't (which encourages bad channels) and not accepting valid channels.

2: The sample channels help the RSS aggregators and their parser libraries get to compliance. To help the generators we need validators that we can all count on. Those have to be generated and I'll argue that they should be in software form and not just a website somewhere that you tell to validate your channel.

3: Encourage building of libraries in all major languages for both the reading and writing of RSS. It's crazy that you can sit down with most languages and write something that talks to a mail server via POP3 or IMAP or HTTP to a web server but RSS, which is now also a major transport protocol for the transfer of news on the Internet, doesn't have similar support. With good libraries you can get people to stop writing their own implementations of parsers and generators and cut the amount of non-compliance way way down.

Most people wouldn't go roll their own SOAP implementation to make some remote procedure calls but the barrier to entry on parsing or generating RSS is too low. Everybody thinks they have to do all over again.

Posted by: John Munsch on April 25, 2003 11:41 AM
Post a comment






* if you do not leave a valid e-mail or URL your comment may be deleted *







Navigation

Recent Entries

America and Europe: Vive la différence?
Server changes afoot
Diet behavior mod
Googling for sensitive info
Outlook 2003 and IMAP, a marriage made in Hell
Bike to Work Day, May 7th
Speakeasy rocks
Zippo USB?
When geographic data is nowhere 'near' correct
Local campaign contributions

User comments
Trackbacks

Contact

send me an e-mail E-mail
chat with me using MS messenger MSN Messenger
chat with me via AIM America Online
chat with me on ICQ ICQ
chat with me on Yahoo! Yahoo
Add my vCard to your electronic addressbook vCard
Friend of a Friend FoaF

Syndication

XML  RDF  CDF

Comments

XFML

Extra Stuff

foaf
vCard
pgp info
Linked In
Powered by
Movable Type 2.64