Parsing stdf slow

Aug 2, 2010 at 6:07 PM
Dear Marklio, First of all, very good job for building this useful application and for making my work easier. Question please: why when I try to parse an stdf file around 50 Mega, it take around 5/6 minutes using the following query: var results = from prr in stdf.GetRecords().OfExactType<Prr>() from ptr in prr.GetChildRecords().OfExactType<Ptr>() let result = ptr.Result where result != null group result.Value by ptr.TestNumber into g select new { TestNumber = g.Key, Results = g }; Similar commercial application (QuickEdit, or sedana) parse the same file within seconds? What I'm doing wrong ? I there a more recent optimize dll? Thank you again Sam
Coordinator
Aug 2, 2010 at 7:53 PM

Thanks for taking a look.  Since you are only grouping ptrs by test number, you can make the much simpler query:

var results = from ptr in stdf.GetRecords().OfExactType(Ptr)
where ptr.Result != null
group ptr.Result.Value by ptr.TestNumber into g
select new { TestNumber = g.Key, Results = g }

This should be much faster.  The goal of the first release was expressiveness of the query.  Optimization of queries is currently being looked at.

We have been investigating optimization of single-use queries using some advanced indexing and optimized parsing, but they are in very early stages.  Wafer map queries that never even finished before were optimized to completing in less than a minute.

If you have a query that behaves slower than an alternative product, I'd like to see it so we can include that in our performance scenarios.

Aug 2, 2010 at 9:11 PM
Thank you for your quick answer. I suppose I put the wrong example. In my application, the user would like to output all the raw data (this is will take few minutes to load a 50 Mega). I use the following query: var results = (from prr in stdf.GetRecords().OfExactType<Prr>() from ptr in prr.GetChildRecords().OfExactType<Ptr>() orderby prr.XCoordinate, prr.YCoordinate, prr.SiteNumber group ptr by new { prr.XCoordinate, prr.YCoordinate, prr.SoftBin, prr.HardBin, prr.SiteNumber } into g select new { XY = g.Key, Results = g }); foreach (var PtrVal in results) { .......... foreach (var PtrVal1 in PtrVal.Results) { ........ } } To output something like: X, Y, Bin, Site, Param1, Param2, Param3, ...ParamN Is there a way to improve the speed ? Thank You Samir
Coordinator
Aug 2, 2010 at 10:29 PM

ah yes.  That's pretty much an ideal case for the optimizations we're investigating.  Currently, GetChildRecords() is a fairly costly operation as it has to run through the entire set of records for each call identifying the records within the scope of the parent record.  In your example, it does so once for each part.  The optimization notices you're getting per-part data and pre-indexes the records for each part on the first pass.  It then rewrites the query to use an optimized version of GetChildRecords which takes advantage of the index and makes each call to GetChildRecords constant time.  It also will skip the parsing of records not used in the query, which can save quite a bit of time.

This optimization work is in a very early prototype, and I haven't been able to devote much time to it, so it isn't widely available.  If you don't mind consuming a prototype, I could make it available for you to test with.

If speed is a concern today, you could do your indexing on the raw stream of records rather than use the query syntax.  There's nothing that forces you to use the "navigation methods" to get the data you want.  They just let you express the complex relationships in STDF records in a simple way.

Aug 2, 2010 at 11:08 PM
I will be very happy to try the new prototype, could you please send it to me Thank you for your time Sam
Aug 10, 2010 at 10:01 PM
Marklio, one more question please. I'm trying to return all Prr and Ptr records for one test number (example 101). I use the following query: var query = (from prr in stdf.GetRecords().OfExactType<Prr>() from ptr in prr.GetChildRecords().OfExactType<Ptr>() where ptr.Result != null && int.Parse(ptr.TestNumber.ToString()) == 101 //Parameter orderby prr.XCoordinate, prr.YCoordinate, prr.SiteNumber group ptr by new { prr.XCoordinate, prr.YCoordinate, prr.SoftBin, prr.HardBin, prr.SiteNumber } into g select new { XY = g.Key, Results = g }); foreach (var val in query) { .......... } This query is slow. How can I speed up this query ?. (The output need to be something like X, Y, SiteNumber, Bin, Param1, Param2, ..., ParamN) On the other hand if I filtered by Bin number the query is very fast. Thank You again for your help Sam
Sep 3, 2010 at 5:20 PM

Hi Marklio,

When do you think, I will be able to test the new prototype. Your application is working perfectly. I just need a way to speed the following Query 

  var Ptr_results = (from prr in stdf.GetRecords().OfExactType<Prr>()                              

from ptr in prr.GetChildRecords().OfExactType<Ptr>()                                                                      

  group ptr by new { prr.XCoordinate, prr.YCoordinate, prr.SoftBin, prr.HardBin, prr.SiteNumber } into g                              

select new { XY = g.Key, Results = g });

I tried to use the above query on 200 Mega size file but it's taken more then 25 minutes.

Thank you again for your support

Sam