April 23, 2003
Bad RSS readers
How about we all start mining our server logs and start 'outing' the bad RSS readers? Ones that don't:
- honor scheduling instructions
- use gzip to save bandwidth
- follow and learn from redirects
- use eTags
- understand HTTP headers
At the same time it looks like it might be worth applying the same logic to the sites serving the feeds. There are plenty that don't offer compressed feeds, don't supply eTags and don't have schedule information in them.
Before we bother with extending all this, let's get the core stuff working first.
for blogs - one of the biggest sources of RSS content - schedule is meaningless.
Posted by: James Robertson on April 24, 2003 08:35 AMWhat's the spec on a gzip'ed RSS feed. When I go through pages, and I see the orange XML box, I expect that to serve up XML.
Is it something in the HTTP accept headers, as in our fetch code should send the accept to application/x-gzip to get the server to serve up gzipped data (if it knows how to), follwed by text/xml, text/html, and text/plain (in case it doesn't)?
Scheduling is pointless in RSS. There's no way to tell from the RSS spec what timezone the hours are based on; thus, if I make sure I'm not reading from 1 a.m. to 6 a.m. my time... if we're not in the same time zone, it doesn't make any difference.
Posted by: Joseph B. Ottinger on April 24, 2003 11:05 AMJoe - yes, gzip compression is requested within the HTTP headers. Here's a code snippet from nntp//rss to serve as an example.
// Set the Accept-Encoding header to gzip
httpCon.setRequestProperty("Accept-Encoding", "gzip");
// [snip]
// Connect to server and get response stream
// [snip]
// Check content encoding returned by web server
// If gzip, use GZIPInputStream wrapper to
// decompress
String contentEncoding = httpCon.getContentEncoding();
if(contentEncoding != null && contentEncoding.equals("gzip")) {
responseStream = new GZIPInputStream(responseStream);
}
I understand and somewhat symphathize with this idea of wanting to "out" the bad RSS readers. If you do, I can assure you that you can put mine up at the top of the list if you like :) I've certainly been frustrated by bad channels more times than I can count because so many embed HTML or have bad characters in their XML or something.
Personally I believe the solution is like this:
1: Build a really big test suite. It contains a dozen or more examples of correct channels and incorrect channels for each major version of RSS (i.e. at least 0.90, 0.91, 0.92, 1.0, 2.0). Everybody fights it out and agrees that these represent something that can be used to test parsers and figure out which ones are accepting things they shouldn't (which encourages bad channels) and not accepting valid channels.
2: The sample channels help the RSS aggregators and their parser libraries get to compliance. To help the generators we need validators that we can all count on. Those have to be generated and I'll argue that they should be in software form and not just a website somewhere that you tell to validate your channel.
3: Encourage building of libraries in all major languages for both the reading and writing of RSS. It's crazy that you can sit down with most languages and write something that talks to a mail server via POP3 or IMAP or HTTP to a web server but RSS, which is now also a major transport protocol for the transfer of news on the Internet, doesn't have similar support. With good libraries you can get people to stop writing their own implementations of parsers and generators and cut the amount of non-compliance way way down.
Most people wouldn't go roll their own SOAP implementation to make some remote procedure calls but the barrier to entry on parsing or generating RSS is too low. Everybody thinks they have to do all over again.
Posted by: John Munsch on April 25, 2003 11:41 AM






