The Power Query formula language (M) contains a number of library functions that allow you to parse binary data. You can use these functions to build queries which parse custom binary file formats. This post contains two samples – a simple query which parses X & Y coordinate values from a binary field, and a more advanced query which reads PNG file headers.
Parsing binary data
The following query defines some binary data (Source), a record format (PointFormat), and a parsing format definition (FileFormat).
let Source = #binary( {0x00, 0x00, 0x00, 0x02, 0x00, 0x03, 0x00, 0x04, 0x00, 0x05, 0x00, 0x06}), PointFormat = BinaryFormat.Record([ x = BinaryFormat.SignedInteger16, y = BinaryFormat.SignedInteger16]), FileFormat = BinaryFormat.Choice(BinaryFormat.UnsignedInteger32, (count) => BinaryFormat.List(PointFormat, count)) in FileFormat(Source)
Let’s break this down.
The #binary function lets you pass in a list of byte values. Here we have specified a total of 12 bytes.
PointFormat defines a record format with two BinaryFormat.SignedInteger16 fields (2 bytes each), x and y, for a total of 4 bytes.
The first argument to the BinaryFormat.Choice function indicates that we should read the first 4 bytes (the size of an BinaryFormat.UnsignedInteger32). This value is read and passed in as the count parameter to the BinaryFormat.List function. BinaryFormat.List ends up reading the remaining 8 bytes of the binary (count * sizeof(PointFormat)), and outputs a list of records.
let
Source = #binary(
{0x00, 0x00, 0x00, 0x02,
0x00, 0x03, 0x00, 0x04,
0x00, 0x05, 0x00, 0x06}),
PointFormat = BinaryFormat.Record([
x = BinaryFormat.SignedInteger16,
y = BinaryFormat.SignedInteger16]),
FileFormat = BinaryFormat.Choice(BinaryFormat.UnsignedInteger32,
(count) => BinaryFormat.List(PointFormat, count))
in
FileFormat(Source)
If we put this M code into Power Query, convert the list of records to a table, and then expand, we get something like this:
Query
let Query3 = let Source = #binary( {0x00, 0x00, 0x00, 0x02, 0x00, 0x03, 0x00, 0x04, 0x00, 0x05, 0x00, 0x06}), PointFormat = BinaryFormat.Record([ x = BinaryFormat.SignedInteger16, y = BinaryFormat.SignedInteger16]), FileFormat = BinaryFormat.Choice(BinaryFormat.UnsignedInteger32, (count) => BinaryFormat.List(PointFormat, count)) in FileFormat(Source), FromList = Table.FromList(Query3, Splitter.SplitByNothing(), null, null, ExtraValues.Error), Expand = Table.ExpandRecordColumn(FromList, "Column1", {"x", "y"}, {"x", "y"}) in Expand
Parsing PNG Headers
This sample is provided as an exercise to the reader to understand. To test it out, replace the path value in the call to File.Contents.
let file = File.Contents("C:TempWP_20140508_18_37_15_Pro.png"), fileHeader = BinaryFormat.Text(8, TextEncoding.Ascii), fourCC = BinaryFormat.Text(4, TextEncoding.Ascii), zeroTerminatedString = let listOfBytes = BinaryFormat.List(BinaryFormat.Byte, (byte) => byte <> 0), transformedBytes = BinaryFormat.Transform(listOfBytes, (list) => let count = List.Count(list), listWithoutZero = List.FirstN(list, count - 1), listAsBinary = Binary.FromList(listWithoutZero), binaryAsText = Text.FromBinary(listAsBinary, TextEncoding.Ascii) in binaryAsText) in transformedBytes, headerChunk = BinaryFormat.Record([ ChunkType = "IHDR", Width = BinaryFormat.SignedInteger32, Height = BinaryFormat.SignedInteger32, BitDepth = BinaryFormat.Byte, ColorType = BinaryFormat.Byte, Compression = BinaryFormat.Byte, Filter = BinaryFormat.Byte, Interlace = BinaryFormat.Byte ]), dataChunk = BinaryFormat.Record([ ChunkType = "IDAT", Data = BinaryFormat.Binary() ]), endChunk = BinaryFormat.Record([ ChunkType = "IEND" ]), physicalPixelDimensionsChunk = BinaryFormat.Record([ ChunkType = "pHYs", PixelsPerUnitX = BinaryFormat.SignedInteger32, PixelsPerUnitY = BinaryFormat.SignedInteger32, UnitSpecifier = BinaryFormat.Byte ]), primaryChromaticitiesAndWhitePoint = BinaryFormat.Record([ ChunkType = "cHRM", WhitePointX = BinaryFormat.SignedInteger32, WhitePointY = BinaryFormat.SignedInteger32, RedX = BinaryFormat.SignedInteger32, RedY = BinaryFormat.SignedInteger32, GreenX = BinaryFormat.SignedInteger32, GreenY = BinaryFormat.SignedInteger32, BlueX = BinaryFormat.SignedInteger32, BlueY = BinaryFormat.SignedInteger32 ]), embeddedICCProfileChunk = BinaryFormat.Record([ ChunkType = "iCCP", ProfileName = zeroTerminatedString, CompressionMethod = BinaryFormat.Byte, CompressedProfile = BinaryFormat.Binary() ]), knownChunks = [ IHDR = headerChunk, IDAT = dataChunk, IEND = endChunk, pHYs = physicalPixelDimensionsChunk, cHRM = primaryChromaticitiesAndWhitePoint, iCCP = embeddedICCProfileChunk ], unknownChunk = (chunkType) => BinaryFormat.Record( [ ChunkType = chunkType, Data = BinaryFormat.Binary() ]), chunkHeader = BinaryFormat.Record([ Length = BinaryFormat.SignedInteger32, ChunkType = fourCC ]), chunkFooter = BinaryFormat.Record([ Crc = BinaryFormat.SignedInteger32 ]), chunk = BinaryFormat.Choice(chunkHeader, (header) => let chunkType = header[ChunkType], chunkData = Record.FieldOrDefault(knownChunks, chunkType, unknownChunk(chunkType)), chunkDataLimited = BinaryFormat.Length(chunkData, header[Length]), chunkWithFooter = BinaryFormat.Record([ ChunkData = chunkDataLimited, ChunkFooter = chunkFooter ]), chunkTransformed = BinaryFormat.Transform(chunkWithFooter, each [ChunkData]) in chunkTransformed), fileFormatBuffered = BinaryFormat.Record([ Header = fileHeader, Chunks = BinaryFormat.List(chunk) ]), fileFormatStreaming = BinaryFormat.Choice(fileHeader, (header) => BinaryFormat.List(chunk), type list) in fileFormatBuffered(file)