Incident Report
X040901 |
Category: Incorrect Data - Degraded Functionality Incident ID: X040901 Priority: 5 - Annoying
- Status: Picking Up Loose Ends
Subject: Atom Site Feeds, Blogger 5.15 Repaired in: 2004-09-09 weekly build Assigned To: Dennis Hamilton (analysis) Reported By:
Dennis Hamilton (2004-09-04)Date Opened: 2004-09-04
2004-09-07 on Blogger SupportDate Closed: 2004-09-10 1. Summary
2. Remedies
3. Actions
4. Analysis
5. Lessons Learned
- see also:
- X040702: Blogger FTP Corruption
B040801: Blog Slamdown Procedure
"Your%20Message%20Here" article
"Honey, Where'd You Put the Bloggo?"
"A Feed Too Far"
Incident
Orcmid's Lair Site Feed Degradation (X040901)
2004-09-04 When my 09-03 solitary posting was created, Blogger generated a bogus site feed that provides incorrect links for the articles.
Having made a template change that went in effect at the same time, I am not sure what the contributory karma is.I do notice, however, that the same thing just happened on the Google Blog and that it happened with Nancy White's Blog on her last post Thursday evening. There is no lockdown, but I have declared an incident on the Atom feed at Orcmid's Lair and I am not posting anywhere else until this is resolved enough.
[dh: 2004-09-05T08:16Z The change in template is eliminated as the cause. The Atom feed element that has changed is entirely controlled by Blogger.com. This change apparently occured sometime on Thursday, September 2 and it should effect the feeds of all Blogger.com-intermediated blogs that have had new postings since them.There is no indication that this is a known problem.]
[dh:2004-09-06T18:00Z The degradation is still present and there is no indication that this is a known problem at the Blogger site. I have turned in a support message and pointed to the Google Blog and my lastes notes on Incident X040901.]
[dh:2004-09-10T15:11Z The Atom feeds now seem to be operating as before, based on the results for this posting, last night. This incident was reported on the Blogger Support page on 2004-09-07 and I managed to miss it somehow. So the cool thing is that a systematic process seems to be in place and working [and Blogger is used in the support process.]
[dh:2004-09-11T05:55Z The feeds are being generated correctly and I have provoked recreation of as much of the feeds as I am able.]Sometime on Thursday, September 2, Blogger.com changed the way that it links to articles from the Atom feed. Instead of providing the permalink to the post of the article, the site feed is now providing the URL of the archive page (not even the article within the archive page, which would be workable).
On Tuesday, September 7, the incident was recognized on the Blogger Support page and my first feed update on Thursday, September 9, seems to be working.
I am closing out this incident Friday, September 10, with only loose ends to be cleaned up in the documentation for possible future reference..
There is more provisional material here until the analysis and actions are formulated better.
1. Put a post on Orcmid's Lair to the effect that we are proceeding under degraded operation. Do this before any new posts on each feed, so that subscribers will see the notice in the feed and on the site. These will stay up until there is an all-clear on the situation, or it is clear that the situation will not be remedied.
2. Notified Google support of the incident, where they can find out more, and document that this does not seem to be correct according to the Atom 0.3 specification that they identify their feed as conforming to. [I need to double-check all of that.]
3. Wait for the repair to be available. Repost my feed if appropriate, so subscribers will have proper permalinks in their aggregated materials. I could have manually corrected the feed at once, and I didn't think to do that.
4. Trigger regeneration of the feeds after Blogger deploys a corrected version of the feed generator.
2004-09-04-09:49 Backup the site onto the compagno Orcmid's Lair image.
2004-09-04-09:49 Swap the lair-atom.standby.xml message in place of the degraded lair-atom.xml site feed. This protects the degraded feed from being accessed until we know more and are willing to operate it again.
2004-09-04-10:07 Create a version of lair-atom.standby.xml that has a current time stamp, and place it at lair-atom.xml. This allows the standby notice to be picked up in subscriptions as new. Concurrently post a version of site status that reflects the existence of an incident.
2004-09-04-13:57 Capture the incident-related materials so that they are preserved independent of further changes to the feeds and pages from Blogger.
2004-09-04-23:57 Develop notes and make sure I have all of the materials for illustrating the notes.
2004-09-05-01:21 Post 0.00 provisional version of this X040901 report and update the status page with what we think we know and with a link to this incident report.
2004-09-06-10:59 Complete the notes and bring the analysis to an 0.10 condition, showing the exact details that apply to the feed. Report to Blogger Support.
2004-09-06-12:19 Post an article on Orcmid's Lair that asserts degraded feed operation (if we still have it) and do this on each blog before we post our next articles, until the situation is resolved.
Write Expanded Description with Lessons Learned. Include the observations about Google's site and Nancy White's site as confirmation. Comment on Nancy's Blogger recovery article.
Test the incident on Spanner Wingnut to keep
probing for any changes. Declare a test and then post to see when things change.
An easy alternative will be to watch for Google Blog to come back.
[I didn't do that. I did see that repaired feeds began to appear on
2004-09-09.]
2004-09-06 to 2004-09-09 Await resolution. Don't post too much that the feed won't be recovered enough.
2004-09-10 Restore Correct Feed Entries. To have the next accesses to my Atom Feed provide corrected entries for all incorrectly-fed ones, I am going to force recreation of a feed.
[replace with more stuff]
Nancy's Blog is at http://www.fullcirc.com/weblog/onfacblog.htm
Google's Blog is at http://www.google.com/googleblog/
I also noticed the problem when there was a new post on LogBlob.
I have an initial summary of the situation here.
[notes to be developed further]
When the red lights go on [reference to Ross Anderson passage about that.]
Parts of the slam-down that worked, other parts that were pokey
Remembering to capture the data, including the evidence of difference that confirm the incident. Do this more quickly. The feeds demonstrating that the incident applied to other sites
Example of feed differences important to achieve. Archiving threat tree needed and it leaves the feed exposed unless I back it up more often than the site. Separate incident analysis?
Example of aggregator presentation differences
Identification of version changes in feeds so that trouble-shooting can be matched against change levels - lesson for Blogger, lesson for me
Identification of version of Atom is fine, and there needs to be identification of an implementation profile somehow, and access to that profile as well - lesson for me as a developer. [2004-09-09-09:04 On my end, there needs to be identification of the specific versions of sofware used. That's not so important this time but it matters in other cases. Look at what it takes to know configurations and maintain that information easily.]
Don't slip stream something that has potential to surprise people. Don't slip stream something that may impact the boss or embarrass the management and then go away for a long weekend! [I really do need to implement a manual feed separate from the Wingnut site feed for experimentation with the particular case of feed-variation sensitivity in various aggregators.]
There's a big lesson for me in being attentive to when I am operating in the solution space and have lost sight of the problem space and the impact of solutions on the problem space. Importance of change management review and some provisions for testing changes. I don't know what solitary practices can help here - it really requires a checklist and multiple eye-balls and thinking to be pulled back up to a solution perspective, I think. [Take some of this over to the note on interaction design and Alan Coopers trade classifications.]
[dh:2004-09-06-10:11 There is also something to be said about interoperability, established expectations, and whatever any current specifications might or might not say on the matter.]
[dh:2004-09-10-09:39 I suppose it was all right to wait until the next build to put in the fix, although it makes more work for me to ensure that my feed entries out there are corrected (as much as I can handled that from this end). A lesson for me is that (1) there must be regression and (2) there always needs to be a way to roll-back any change that is pushed-out. This leads to (3) there being problems about changes tied to builds where a roll-back may also remove a critical fix as well as undo an introduced problem. I don't have an answer for that, though it looks like an architectural question as much as a system engineering one.]
[dh:2004-09-10-22:32 There is no good way to tell that a feed update is one that will drive out bad entries and replace them with improved ones. It is necessary to examine the feeds themselves, as XML, and also see what happens in practice. This is an area where different testing is needed, along with some kind of regression methodology.]
You are navigating Orcmid's Lair |
created 2004-09-04-11:02 -0700 (pdt) by orcmid |