This project is read-only.

Implementing Progress Indicators

Oct 20, 2008 at 2:31 PM
It seems I have the honour of being the first to open a discussion.

As of change set 12027 we have access to the ExpectedLength property of StdfFile.

I would like to use this to implement a progress bar for feedback during the stdf file load.

I tried using the following:

    foreach (var rec in stdf.GetRecords())
        // Update progress based on rec.Offset and stdf.ExpectedLength

However, this doesn't work too well as the stream is read in its entirety before the first record is assigned to 'rec'.

Does anybody have any suggestions as to the correct way to track progress?


Oct 20, 2008 at 10:56 PM
Thanks for starting the thread, Rob.

Turning off caching will give you the semantics you want with the code above, but may not be the right solution for you.  I'll brainstorm a bit here, and hopefully Paul will chime in and give you some info on how he's done progress.

There are intrinsicly 2 kinds of "progress" made by LinqToStdf:
  1. Turning the byte stream into record objects
  2. Executing a query

Your code above tracks progress against #1.  There is currently no mechanism for dealing with progress against #2.  Turning off caching will cause both #1 and #2 to occur at the same time, which will give you the semantics you desire, but it also might do really weird stuff for nested queries (those that loop through the records multiple times).

We should get a workitem to track support for #2.  In the meantime, depending on what operation is taking so long, you could try querying items into an intermediate storage (like List<Prr> or something).  Then you would have enough information to generate progress against.

Like I said.  I think Paul's done more thinking about the progress scenarios.  I'll get him to chime in here.

Oct 22, 2008 at 3:53 PM
Thanks for the suggestions Mark.

As you say, turning off caching works nicely for the load progress indicator but, timewise, it hurts subsequent queries.

My STDF files take between 10 and 20 seconds to read so it is nice to be able to give feedback to the user.
Making my own cache of the records I need is certainly an option though not one I prefer as, typically, that would require me caching pretty much all the records; duplicating what you have already implemented.
Having control over #1 would be nice, having a record callback at #1 would also be nice.

At the moment the benefits of using LinqToStdf to cache the records far outweigh the disadvantage of not having a progress bar.
For the time being, a marquee rather than a progress bar will do for me.

If I come up with a good solution I'll post it here.

Oct 22, 2008 at 3:59 PM
On a related note, having the ability to cancel the stream read would be quite nice.

Oct 24, 2008 at 9:59 PM
Rob, What's your stream cancelation scenario?  With caching turned off, if you stop enumerating, you stop reading the stream.  I had thought about making the caching happen on a separate thread, so that you could start reading the stream immediately, but you would block if you were waiting for records to be parsed.  I didn't implement it this way because parsing seemed to be generally "fast enough".

However, if we implemented this, if you disposed the IEnumerable before caching ended, we could stop reading the stream, but that StdfFile would essentially become invalid without some trickery to ressurrect the parsing if you decided to enumerate it again.  Perhaps the records could be read in chunks on demand.  That might be very useful for people writing apps that were only interested in the MIR (which is common).

Fun to think about.
Oct 25, 2008 at 3:54 PM
Hi Mark,
Most common scenario for cancelation is simply the case where I have started to load the wrong stdf file. In this case I would not be interested in resurrecting the stream parse.  Some of the files I have are read in one or two seconds so no problem waiting for those.  However, some of my stdf files take over 30 seconds to be read completely.

Granted that disabling caching does let me easily implement progress and cancelation though with penalties later down the line.

For 90% of the analyses I am working on, I need to have read the stdf file in its entirety before being able start analysis (wafermapping and correlation).
I can live with a load time hit at the start as we have all beome accustomed to waiting for data to load. After that I am trying to make the application as responsive as possible as it is an interactive app.

Chunking might be a good solution. As you mentioned this would be good, for example, for reading the MIR only.  It would be really nice to rapidly scan through a set of files looking for MIR's that meet a certain criteria and then load the whole of that file (cached) for further analysis.
Definitely fun to think about.