BlunderLab
Notebook
B040801 |
- see also:
- X040702C: Incident Response Setup
Some incidents that occur with web logs on nfoCentrale may involve corruption of material on the site. The procedure here is for responding to such an incident as rapidly and simply as possible. The idea is to
Quickly capture everything needed for forensic analysis.
Rapidly remove any corrupted material from being accessible to readers of the blogs.
Post simple notices in place of all corrupted material and that block off the accident scene incident response is underway.
That is as far as the slamdown procedure takes us. It secures the scene so that additional procedures can follow:
Initiate enough incident analysis to determine how to recover.
Clear away the damage, restore the blog, and resume operation.
Continue incident analysis enough to identify the root cause.
Review lessons learned.
Revise operating procedures to reflect lessons learned..
The first step is to prevent FTP access from Blogger.com to the blog files when
the incident involves damage or unexpected changes produced by Blogger.com
there is any other reason to suspect that Blogger.com has lost operational integrity
1.1 Total Lock-Out
If there is any concern that continued FTP access of Blogger.com to the blog is unsafe, the simplest action is to revoke all FTP privileges to the account set up for Blogger.com usage.
So long as there is a total lock-out, no blog postings of any kind will occur:
comments will not be accepted
no pages can be updated on the blog
The only new postings to the blog will be restored files and new ones created as part of the incident response. These postings are performed by the web-site administrator, not Blogger.com.
1.2 Selective Lock-Out
The access control to different blog directories can be updated to selectively remove authorizations of the blogger account to perform FTP access. This allows a total lock-out to be rescinded with some portions of the blog sites still shut down while incident resolution continues.
- variation:
- Depending on the nature of the damage and opportunities for action, it may be desirable to perform steps 3 and 4 before taking this action.
Generally, a complete current backup is made. This captures any damaged material and also any material that may not have been backed up at this point.
Because nfoCentrale backup is done onto a site image that is also versioned in a Visual SourceSafe repository, we will have the last known good values of every page as well as the latest values of every page.
This is done quickly by
Checking out all of the VSS pages for the blog image into the image of the site
FTP transfer of all hosted-site blog pages to the blog image on the development system
Check in of the pages that were checked out (updating those that have changed)
Check in of new pages from the site that had not been transfered to the blog image before.
If the Atom Site feed is itself corrupted or is suspected of providing references to corrupted material, the simple immediate action is to slam down an incident notice into the site feed:
Check out the atom feed page in the site image.
Copy a generic notice over the site image version, touching the date as needed to have the feed be more recent than the last-posted one.
FTP the copy to the hosted site so that no feed downloads will retrieve any damaged material.
Generally do not check in the feed with the incident notice. This is a temporary expedient until the feed can be restored from an earlier, undamaged version.
This is always done first to avoid syndication of damaged or incorrect material.
There are generic incident markers to be put down as follows:
The Web Log Status page is replaced with an incident marker so that anyone who checks that page at this time will see that there is an incident being addressed even though there is no other information. This is accomplished copying the incident-response-in-progress version atop the current version as easily and quickly as possible. See B040801C: Web Log Status Page.
The default web log page and any damaged or failed post pages are overlain in the same way. Each page is checked out, if it already has material in the site image, and then a generic incident-scene marker is put down on top in their place.
The same is done for any archive pages that are corrupted or missing.
All of the incident markers are uploaded to the hosted site via FTP.
At this point, there should be no damaged material in the hosted blog, with no broken links into or out of that blog. No one is exposed to damage any longer. Now we must clear the scene so that recovery and forensic analysis can both occur.
- variation:
- Make sure that backups of material required for forensic analysis have been made before this process begins. Also, depending on how hastily markers were placed in Web Log Status and the Site Feed, an early step will be to replace those emergency measures with more informative substitutes.
It is now necessary to arrange everything so that restoration of operation and resolution of the incident can proceed as independently as possible:
A new incident is opened, unless there is an existing incident which applies to the current situation. Enough initial information is placed in the incident report to provide some indication of what happened and what the next steps are.
The slam of the Web Log Status page is revoked and a new checkout initiated (on the development site). The Web Log Status is updated to reflect the interruption and tie the new incident report to the status summary and status history.
The new material is checked in, transferred to the site image, and posted via FTP to the hosted site. There is now initial information on the condition of the site and the incident being worked on.
The slam of the atom feed done in 3(1) is revoked. If a damaged atom feed was produced as part of the incident, it is branched from the backup and sequestered with the incident material. Then the original branch is rolled-back to an usable version, an incident notice inserted manually into the feed, and that is used to restore the site feed.
In the same way, the checkout of each place where an incident marker is put down is rescinded, any damaged material is branched over to the incident material, and the production version rolled back. (If there is only the incident marker, that becomes the production version instead.) The restored earlier versions will incorporate incident information and indicators of their restored status as appropriate.
At the end of this process, all damaged materials have been captured as part of the incident materials and the production site has been restored to a combination of rolled-back-to earlier materials and incident markers.
Now any further incident analysis and recovery effort is undertaken.
You are navigating Orcmid's Lair |
created 2004-08-03-15:47 -0700 (pdt) by orcmid |