The basic idea (ignoring the corruption detection and recovery algorithms for now) is that the main parser simply reads the file and generates UnknownRecords, which are just the RecordHeader and the raw bytes of the record. This is passed off to a "ConverterFactory", which has converters that know how to turn unknown records into real ones based on the header. These converters are generated on the fly based on attributes on the registered record types, or directly in code for "odd" record types that can't be described efficiently with the attributes. The default is to register the V4 spec, although you could register any records you like. The resulting records are run through a series of "filters". These filters can do anything, but there are some "built-in" ones for doing things like enforcing the V4 record ordering, generating "missing" summary records, disallowing unknown records, etc. The model for errors is flexible. The default is to simply throw an expection when an error occurrs, but you have a bunch of options ranging from pushing "error records" through the record stream, or even a pluggable system for detecting corruption and re-establishing the record stream. In addition, there are a number of extension methods on the records that give "structure" to the record stream. For instance, you can get all the Prrs for a wafer, or all the Ptrs for a Prr, etc. All these give you IEnumerable<SomeRecordType> so they are easily queried via linq syntax. Caching allows you to "requery" the same StdfFile multiple times without re-reading the file. Caching is implemented as a filter that applies itself at the top of the filter chain if it is turned on (so any other registered filters are applied only the first time through the file).

The architecture leads to extremely fast and efficient processing of STDF files, and the library can be tuned to run efficiently in just about every scenario I could imagine. For example, caching can make complex or multiple queries work more efficeintly, but if you're just aggregating data, or passing data off to another system, you can turn it off, and you can process huge STDFs without your memory even hiccuping.

In addition, there is an experimental feature for "pre-compiling" a query. In this case, the query is examined at runtime, and only the record types and fields needed to execute the query are parsed. In other words, custom record converters are generated, and everything else turns into "skip" operations in the stream. This can result in amazingly fast parse times, especially if you're doing something like grabbing "summary" type data. With a normal parsing library, you'd "parse" the entire file, but with a precompiled query, you can hit the MIR, and skip all the way to the end and pick up the summary data. It's REALLY fast compared to any STDF processing software I've ever seen.

The library also has support for generating STDF files, especially as the result of processing existing files. It can even consume a continuous stream of records and generate multiple files in a destination directory from it.

Last edited May 22, 2008 at 11:37 PM by marklio, version 1

Comments

No comments yet.