Quantcast
Channel: Power Query – Matt Masson
Viewing all articles
Browse latest Browse all 20

Parsing Binary data using Power Query formulas

$
0
0

The Power Query formula language (M) contains a number of library functions that allow you to parse binary data. You can use these functions to build queries which parse custom binary file formats. This post contains two samples – a simple query which parses X & Y coordinate values from a binary field, and a more advanced query which reads PNG file headers.

Parsing binary data

The following query defines some binary data (Source), a record format (PointFormat), and a parsing format definition (FileFormat).

let
    Source = #binary(
      {0x00, 0x00, 0x00, 0x02,
       0x00, 0x03, 0x00, 0x04,
       0x00, 0x05, 0x00, 0x06}),

    PointFormat = BinaryFormat.Record([
      x = BinaryFormat.SignedInteger16,
      y = BinaryFormat.SignedInteger16]),

    FileFormat = BinaryFormat.Choice(BinaryFormat.UnsignedInteger32,
                (count) => BinaryFormat.List(PointFormat, count))
 in
    FileFormat(Source)

Let’s break this down.

The #binary function lets you pass in a list of byte values. Here we have specified a total of 12 bytes.

PointFormat defines a record format with two BinaryFormat.SignedInteger16 fields (2 bytes each), x and y, for a total of 4 bytes.

The first argument to the BinaryFormat.Choice function indicates that we should read the first 4 bytes (the size of an BinaryFormat.UnsignedInteger32). This value is read and passed in as the count parameter to the BinaryFormat.List function. BinaryFormat.List ends up reading the remaining 8 bytes of the binary (count * sizeof(PointFormat)), and outputs a list of records.

let
Source = #binary(
{0x00, 0x00, 0x00, 0x02,
0x00, 0x03, 0x00, 0x04,
0x00, 0x05, 0x00, 0x06}),

PointFormat = BinaryFormat.Record([
x = BinaryFormat.SignedInteger16,
y = BinaryFormat.SignedInteger16]),

FileFormat = BinaryFormat.Choice(BinaryFormat.UnsignedInteger32,
(count) => BinaryFormat.List(PointFormat, count))
in
FileFormat(Source)

If we put this M code into Power Query, convert the list of records to a table, and then expand, we get something like this:

image

Query

let
    Query3 = let
    Source = #binary(
      {0x00, 0x00, 0x00, 0x02,
       0x00, 0x03, 0x00, 0x04,
       0x00, 0x05, 0x00, 0x06}),

    PointFormat = BinaryFormat.Record([
      x = BinaryFormat.SignedInteger16,
      y = BinaryFormat.SignedInteger16]),

    FileFormat = BinaryFormat.Choice(BinaryFormat.UnsignedInteger32,
                (count) => BinaryFormat.List(PointFormat, count))
 in
    FileFormat(Source),
    FromList = Table.FromList(Query3, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    Expand = Table.ExpandRecordColumn(FromList, "Column1", {"x", "y"}, {"x", "y"})
in
    Expand

Parsing PNG Headers

This sample is provided as an exercise to the reader to understand. To test it out, replace the path value in the call to File.Contents.

let
    file = File.Contents("C:TempWP_20140508_18_37_15_Pro.png"),
    fileHeader = BinaryFormat.Text(8, TextEncoding.Ascii),
    fourCC = BinaryFormat.Text(4, TextEncoding.Ascii),
    zeroTerminatedString =
        let
            listOfBytes = BinaryFormat.List(BinaryFormat.Byte, (byte) => byte <> 0),
            transformedBytes = BinaryFormat.Transform(listOfBytes, (list) =>
                let
                    count = List.Count(list),
                    listWithoutZero = List.FirstN(list, count - 1),
                    listAsBinary = Binary.FromList(listWithoutZero),
                    binaryAsText = Text.FromBinary(listAsBinary, TextEncoding.Ascii)
                in
                    binaryAsText)
        in
            transformedBytes,
    headerChunk = BinaryFormat.Record([
        ChunkType = "IHDR",
        Width = BinaryFormat.SignedInteger32,
        Height = BinaryFormat.SignedInteger32,
        BitDepth = BinaryFormat.Byte,
        ColorType = BinaryFormat.Byte,
        Compression = BinaryFormat.Byte,
        Filter = BinaryFormat.Byte,
        Interlace = BinaryFormat.Byte
    ]),
    dataChunk = BinaryFormat.Record([
        ChunkType = "IDAT",
        Data = BinaryFormat.Binary()
    ]),
    endChunk = BinaryFormat.Record([
        ChunkType = "IEND"
    ]),
    physicalPixelDimensionsChunk = BinaryFormat.Record([
        ChunkType = "pHYs",
        PixelsPerUnitX = BinaryFormat.SignedInteger32,
        PixelsPerUnitY = BinaryFormat.SignedInteger32,
        UnitSpecifier = BinaryFormat.Byte
    ]),
    primaryChromaticitiesAndWhitePoint = BinaryFormat.Record([
        ChunkType = "cHRM",
        WhitePointX = BinaryFormat.SignedInteger32,
        WhitePointY = BinaryFormat.SignedInteger32,
        RedX = BinaryFormat.SignedInteger32,
        RedY = BinaryFormat.SignedInteger32,
        GreenX = BinaryFormat.SignedInteger32,
        GreenY = BinaryFormat.SignedInteger32,
        BlueX = BinaryFormat.SignedInteger32,
        BlueY = BinaryFormat.SignedInteger32
    ]),
    embeddedICCProfileChunk = BinaryFormat.Record([
        ChunkType = "iCCP",
        ProfileName = zeroTerminatedString,
        CompressionMethod = BinaryFormat.Byte,
        CompressedProfile = BinaryFormat.Binary()
    ]),
    knownChunks =
    [
        IHDR = headerChunk,
        IDAT = dataChunk,
        IEND = endChunk,
        pHYs = physicalPixelDimensionsChunk,
        cHRM = primaryChromaticitiesAndWhitePoint,
        iCCP = embeddedICCProfileChunk
    ],
    unknownChunk = (chunkType) => BinaryFormat.Record(
    [
        ChunkType = chunkType,
        Data = BinaryFormat.Binary()
    ]),
    chunkHeader = BinaryFormat.Record([
        Length = BinaryFormat.SignedInteger32,
        ChunkType = fourCC
    ]),
    chunkFooter = BinaryFormat.Record([
        Crc = BinaryFormat.SignedInteger32
    ]),
    chunk = BinaryFormat.Choice(chunkHeader, (header) =>
        let
            chunkType = header[ChunkType],
            chunkData = Record.FieldOrDefault(knownChunks, chunkType, unknownChunk(chunkType)),
            chunkDataLimited = BinaryFormat.Length(chunkData, header[Length]),
            chunkWithFooter = BinaryFormat.Record([
                ChunkData = chunkDataLimited,
                ChunkFooter = chunkFooter
            ]),
            chunkTransformed = BinaryFormat.Transform(chunkWithFooter, each [ChunkData])
        in
            chunkTransformed),
    fileFormatBuffered = BinaryFormat.Record([
        Header = fileHeader,
        Chunks = BinaryFormat.List(chunk)
    ]),
    fileFormatStreaming = BinaryFormat.Choice(fileHeader, (header) => BinaryFormat.List(chunk), type list)
in
    fileFormatBuffered(file)


Viewing all articles
Browse latest Browse all 20

Trending Articles