This project is read-only.

Merging Stdf files

Oct 24, 2008 at 1:29 PM
Hi,
I guess I will be the second one to start a discussion.
First off, excellent job in making this project. I've made an application that creates histograms of stdf data based on test numbers in less than a day.
The speed is due for the most part in the ease of getting the data.

Now for the question: I have multiple stdf files that each have one run of information. I want to combine them so that I can do cumulative data analysis.
I am not that apt of a  linq user, but could you recommend a strategy or point me in the direction of some documentation?
Thanks for your help!
Eric
Oct 24, 2008 at 5:52 PM

I'm glad you're finding yourself so productive with the library!  That is definitely one of the goals.

As for combining the data, there are lots of options that might suit your scenario.  The simplest would be to make a List<StdfFile> (or other IEnumerable<StdfFile> implementation) containing all the files you want to combine, then do a compound query on it.  Something like the following for getting test result data for histograms:

var files = new List<StdfFile>();
//fill files
from f in files
  from ptr in f.GetRecords().OfExactType<Ptr>()
   where ptr.TestNumber == 1000
   select ptr.Result;

The downside of this approach is that you'd have all that data in memory, which might be large.  There are some optimizations you could do like do the queries "inside-out", so you get all the data from each file, and then combine it later with another query.

We're working on a new feature that will analyze queries, and optimize them for speed and memory usage.  I'd much rather be able to write a straightforward query like above and let the library do the optimization than have to play optimization games myself.

I hope you find this helpful.

Mark

Oct 24, 2008 at 5:55 PM
Thanks Mark, I'll give it a try and post the results.
Oct 25, 2008 at 5:11 PM
I do a similar thing to what Mark suggested.
It seems to work well and is quite fast.  This implementation doesn't keep all the data in memory.

I created a simple Histogram class that takes an array as a parameter and updates its buckets accordingly.
The Histogram is a Dictionary with the key being the Bucket and the value being the count;

I use a Linq query on the dictionary to order the counts at the end and then plot the results.

Here is the basic code snippet without error trapping ....

foreach
(string fn in stdfOpenDialogue.FileNames)
{
    stdf =
new StdfFile(fn);
    stdf.EnableCaching =
false;

 

    var results = (from ptr in stdf.GetRecords().OfExactType<Ptr>()
        
where ptr.TestNumber == tn 
        
select ptr.Result).ToArray();

 

        histogram.Add(results);
}


var
items = from k in histogram.Buckets.Keys
    
orderby k ascending
    
select k;

// Display distribution chart ...

Hope that helps a little.

 

Oct 28, 2008 at 2:03 AM
Thanks to Rob and Mark for all the ideas. I ended up closer to Rob's model, but both work very well.
My application first queries all the selected files for a list of test numbers, then the user can select which test number to see statistics / histogram plot of results.
I used nplot for the graphing software.
I am making a design verification tool that I will probably turn into a manufacturing quality gate tool, so development will continue.

I was thinking it might make sense to create a single stdf file with merged data from multiple stdf files.
It looks like the linq stdf library supports this, but I didn't see any examples / recommended approaches. Any ideas would be appreciated.

I am concerned about creating files too large to be used. Is there a max recommended file size for stdf files? 

Thanks for your help!
Eric
Oct 28, 2008 at 4:13 AM
Hi Eric,
  It seems we are working on very similar tools. 
I have been using ZedGraph for my plotting but I am now looking at the Nevron .Net Chart controls - commercial but they do give me ability to create some very nice and very fast charts.

With regards to merging stdf files; how large are your files?
My (uncompressed) files vary from 10Meg to 300Meg each.

I prefer work with individual files myself but I can the pros and cons of both individual and merged.
If you have individual files, you could pull out the TSR to give the user something to think about whilst whilst processing the rest in the background

The downside of the iteration over a group of files that I used in the code a couple of posts back is that it gets expensive in time if you want to make several queries on the same files.  Marks suggested approach, and the approach of a merged file, is much better with regards to time but expensive in memory.   I guess it all depends upon what you are going to do with the data.  For an interactive tool, Marks proposal of a 'list of StdfFile' wins bigtime over mine.

Rob
Nov 12, 2008 at 4:21 PM
Hi Rob and Mark,
I've been using my tool on file sizes ranging from 1Meg to 950Meg. For the really large files, it is really too slow to be used interactively. Each query takes on the order of 2 minutes to complete.

So far I have been making separate queries to obtain the test label information and the result data.
I was thinking an optimization might be to create a query that just gets one instance of a test number so that I can get the description and the lower and upper limit, but I am not exactly sure how to do this. The query I am currently using to get lower and upper test limits is the following:

                var testNumInfo = from ptr in stdf.GetRecords().OfExactType<Ptr>()
                                  where ptr.TestNumber.Equals(uiSelTestNum)
                                  let hilim = ptr.HighLimit
                                  let lowlim = ptr.LowLimit
                                  let descr = ptr.TestText
                                  select new { hilim, lowlim, descr };

With 100,000 test number records for one test number, I think this is taking some time to complete. I really only need one instance of a test number.

Thanks in advance,
Eric

Nov 12, 2008 at 8:18 PM

Eric,

If you know your limits and test text don't change, then you can simply call .First() or .FirstOrDefault() on the query above.  It will "stop" after the first one. (FirstOrDefault() will return null if the query returns no records)  Also, your query requires running through the file once for each test.  You might think about getting the first passing part (because typically passing parts run all tests), and get it's child records (GetChildRecords()), and build a lookup from which you can pull all this data out per test.

However, depending on why you're getting that data, the STDF spec has an interesting mitigation for limits and TestText (but not all tester platforms follow it).  From the notes on "Default Data" on the PTR from the v4 spec:

All data following the OPT_FLAG field has a special function in the STDF file. The first PTR for each test will have these fields filled in. These values will be the default for each subsequent PTR with the same test number: if a subsequent PTR has a value for one of these fields, it will be used instead of the default, for that one record only; if the field is blank, the default will be used. This method replaces use of the PDR in STDF V3. If the PTR is not associated with a test execution (that is, it contains only default information), bit 4 of the TEST_FLG field must be set, and the PARM_FLG field must be zero.

Unless the default is being overridden, the default data fields should be omitted in order to save space in the file.

Note that RES_SCAL, LLM_SCAL, HLM_SCAL, UNITS, C_RESFMT, C_LLMFMT, and C_HLMFMT are interdependent. If you are overriding the default value of one, make sure that you also make appropriate changes to the others in order to keep them consistent.

For character strings, you can override the default with a null value by setting the string length to 1 and the string itself to a single binary 0.

So, as a result, we have a built-in record filter (BuiltInFilters.PopulatePtrFieldsWithDefaults) you can register with the StdfFile that will "repopulate" every PTR record with the correct values.  Since it re-uses the same string instances, there's not any significant memory overhead (but there is some CPU overhead, that I dont' expect to be very significant, especially if it keeps you from having to do the above).

Nov 20, 2008 at 8:54 PM
Hi Mark,
    I understand what you mean about using .First() or .FirstOrDefault(), but not how to implement it. How should the query change in syntax?

               var testNumInfo = from ptr in stdf.GetRecords().OfExactType<Ptr>() [I tried to add .First() here, but it wouldn't compile]
                                  where ptr.TestNumber.Equals(uiSelTestNum)
                                  let hilim = ptr.HighLimit
                                  let lowlim = ptr.LowLimit
                                  let descr = ptr.TestText
                                  select new { hilim, lowlim, descr };

Thanks again for your help!
Eric
Nov 20, 2008 at 11:31 PM
take your original query:

                var testNumInfo = from ptr in stdf.GetRecords().OfExactType<Ptr>()
                                  where ptr.TestNumber.Equals(uiSelTestNum)
                                  let hilim = ptr.HighLimit
                                  let lowlim = ptr.LowLimit
                                  let descr = ptr.TestText
                                  select new { hilim, lowlim, descr };

And call First() on it:

     testNumInfo.First();

Viola!
Nov 25, 2008 at 7:42 PM
Thanks Mark,
    I was unwittingly already doing this. I had thought the time would be spent during the query statement not during the access of the return variable. That helps understanding how to structure my code. Once again thanks for your time.
Eric