This project is read-only.


Generate PLR converter/unconverter with CodeGen


Now that Paul has added string array support, EOS-tolerant array reading, and some other needed things, we should be able to close the gap on the code generation and generate PLR's converter/unconverter.
Closed Apr 15, 2012 at 4:00 AM by marklio
I think we agree this isn't a fruitful effort.


Selzhanik wrote Apr 11, 2012 at 5:21 PM

I had sort of generated a work item for this already at: I will close that, but copy the commentary below:

We currently have no PLR unconverter. I'd like to document some notes here in the tracker so I don't lose track of them...

If I assume I want to write the PLR backwards (fair assumption given the PLR is a set of shared length arrays) then this would be the pattern for each field (starting at the last one):
  • If the field is null, and no fields have yet been written, don't write anything
  • If the field is null, and fields have been written, write an array of missing values
  • If the field is not null...
  • Throw a SharedLengthViolation if the array length is not equal to that previously written
  • Write the array
The problem with that pattern is what if the length of the last array (or first if thinking backwards) is shorter than the lengths of the other arrays? For example, GroupRadixes has a length of 50, GroupModes is null, and GroupIndexes has a length of 100. The unconverter would then write a 50-length array of UInt16.MinValue, instead of what it should write, a 100-length array.

I can see fixing this two ways:
  1. Upon encountering a null field that should be written as an array of missing values, yet only one field has been written and that one field was an array that allowed truncation, look ahead to the next non-null array with the same ArrayLengthFieldIndex to get the correct array length.
  2. Before writing anything, investigate the maximum array length to discover the group count, and use that for writing null arrays that need to be populated with missing values.
Option 2 seems much simpler at this point...

Selzhanik wrote Apr 11, 2012 at 6:11 PM

I've come to a fork in the road...

The EOS-tolerant array reading currently returns a 0-length array when it encounters end of stream prior to reading any values into the array, if throwOnEndOfStream is set to false. I'm fairly certain I want to change this to return null. A 0-length "truncated" array field is really the same physical bytes in the file as a missing field, and I'd rather interpret that as a missing field, as opposed to a pre-maturely truncated array field.

However, I want to continue to return a 0-length array if the expected array length is 0. This is what has always happened for all arrays, and not just PLR arrays. In PLR's case, a zero group count means a worthless PLR, but it is what it is.

For truncated arrays, the EOS-tolerant version currently resizes the array. Originally, my idea here was that the only EOS-tolerent record is PLR, and I wanted a round-trip read/re-write of the PLR to produce the same bytes. But thinking of the more real-world use, I can see how a user of the library would expect that if an index is valid in the GroupIndexes array, then it would be a valid index in subsequent non-null array fields, but the given value would be the special "missing" value. That's my fork in the road: to resize the truncated array, or to fill the remainder of the array with the special "missing" value.

If I go down the path of filling the remainder with missing values, that means we must know that missing value in the generic ReadArray method, so we'd have to pass it as an argument, to be used only if throwOnEndOfStream is set to false. I don't know the CodeGen'ed stuff well enough yet to know it that poses a problem.

For now, I'll leave the default value parameter out the BinaryReader's ReadArray method, and code the explicit PLR converter so that it leaves the arrays in a resized state (but then makes the subsequent arrays null, instead of zero-length). And the unconverter / writer will write the arrays as given to it, throwing if a truncated array is found before a non-missing field, but allowing a truncated array if it is the last field. I expect that's how the CodeGen'ed PLR converter would eventually work, but I expect we'll want the CodeGen'ed PLR unconverter to fill out the truncated array with missing values, not resize them.

Selzhanik wrote Apr 12, 2012 at 8:57 PM

I don't think we should go through the trouble of making the PLR converters CodeGen'ed at this time.

The trouble is that if a "truncated" last array is written (backwards) to the record, then we encounter a missing field, there is no way (without looking ahead), to determine how many missing values to fill in for that field. We could throw on the missing field, forcing the user to deal with it, but that's not in the spirit of what we do with other records, and why should we sacrifice functionality just so we can CodeGen the unconverter?

I think we should leave the attributes in place, but have the converter and unconverter explicitly coded and wired up as such. I do want to finish the task of cleaning up the unconverter, though. We can come back to the idea of CodeGen'ed PLR converters some other day when we don't feel we have enough to do.