We need corrupt STDFs

One of the most powerful features of LinqToStdf is pluggable corruption detection and recovery. Due to lots of reasons, STDF file corruption was always a big issue. Whether it was equipment malfunction, network reliability, aborted transfers, or stupidity :), STDF files get corrupted. A frustration of mine is that most STDF tools do not handle corrupt files well, and most "repair" tools give you little or no indication of what was wrong, or what was done to fix it.

LinqToStdf lets you plug in "seek algorithms" (StdfFile.AddSeekAlgorithm), which are delegates that consume a stream of bytes and look for recognizable patterns. If, during record parsing, corruption is detected (via a record filter or whatever), you can call StdfFile.RewindAndSeek(). This rewinds the underlying file stream to the "last known good offset" (just after the last successful record conversion) and puts the parser in seek mode, advancing byte by byte and passing them to the SeekAlgorithm chain.

If a SeekAlgorithm thinks it has figured out what happened and found a valid record, it can tell the StdfFile to backup and start there. Any bytes between the last known offset and the new position are pushed through the record stream as a CorruptDataRecord that could be dumped out for analysis of what went wrong.

We currently have 1 build-in SeekAlgorithm that looks for PIRs. PIRs are fixed length records with no optional fields, and as such have a constant 4-byte header (adjusted for endian-ness). This gives fairly high confidence that when you see it, it's actually a PIR and not something else. I'd love for people to increase the number of these built-in seek algorithms based on real-world scenarios though.

This works great with "manually corrupted" files, but we're looking for real-world corruption to validate the approach. If you have any corrupt STDFs, let us know!

Last edited May 24, 2008 at 8:34 PM by marklio, version 1


No comments yet.