Friday, September 1, 2023

Tradeoffs of PInvoke and Marshaling

Today I learned that PInvoke and marshaling have more tradeoffs than I naively considered. In terms of performance engineering, PInvoke and marshaling can be disadvantageous. From Microsoft's documentation:

PInvoke has an overhead of between 10 and 30 x86 instructions per call. In addition to this fixed cost, marshaling creates additional overhead. There is no marshaling cost between blittable types that have the same representation in managed and unmanaged code. For example, there is no cost to translate between int and Int32. For better performance, have fewer PInvoke calls that marshal as much data as possible, instead of more calls that marshal less data per call.

But interop functionality can be quite useful. For example, consider you'd like to talk to some functions in ntdll.dll and get information from some native Windows structures. And in doing this, we might also need to Marshal unmanaged memory. The first thing we do is define the layout of the structure we intend to access, then write functions to access and interact with these native functions and structures. In this way, PInvoke provides access between managed and unmanaged code, enabling us to communicate with low-level components.

Say we desire to retrieve system information from Windows through the ntdll.dll or kernel32.dll library. To accomplish this, we first define the layout of the target structure. Once declarations are in place, we then write a function from managed C# code, passing in necessary parameters, and using P/Invoke to marshal the unmanaged memory containing the results. In this way, interop capability allows us to access low level system data with ease.

Consider if we want to access the SYSTEMTIME structure. We know from Microsoft's documentation that we can access it from the GetSystemTime function like this. This function takes as a parameter a pointer to the SYSTEMTIME structure to receive the current system date and time:

void GetSystemTime(
  [out] LPSYSTEMTIME lpSystemTime
);

But we don't have a place yet to hold the variables from the SYSTEMTIME structure. So we need to make one to handle the unmanaged data from the structure and move it into our managed structure. So, this is the SYSTEMTIME typdef from Microsoft. And we need to recreate this structure in our own program:

typedef struct _SYSTEMTIME {
  WORD wYear;
  WORD wMonth;
  WORD wDayOfWeek;
  WORD wDay;
  WORD wHour;
  WORD wMinute;
  WORD wSecond;
  WORD wMilliseconds;
} SYSTEMTIME, *PSYSTEMTIME, *LPSYSTEMTIME;

First we use DllImport to import the GetSystemTime function from kernel32.dll. And since Csharp doesn't have a WORD type like the one we can see in Microsoft's SYSTEMTIME definition, we use a ushort type instead. And then we carve our a simple Main function to call GetSystemTime and print the year, month and day with Console.WriteLine:

using System;
using System.Runtime.InteropServices;

class Program
{
    [DllImport("kernel32.dll")]
    public static extern void GetSystemTime(out SYSTEMTIME systemTime);

    [StructLayout(LayoutKind.Sequential)]
    public struct SYSTEMTIME
    {
        public ushort Year;
        public ushort Month;
        public ushort DayOfWeek;
        public ushort Day;
        public ushort Hour;
        public ushort Minute;
        public ushort Second;
        public ushort Milliseconds;
    }

    static void Main()
    {
        SYSTEMTIME systemTime;
        GetSystemTime(out systemTime);

        Console.WriteLine($"Year: {systemTime.Year}");
        Console.WriteLine($"Month: {systemTime.Month}");
        Console.WriteLine($"Day: {systemTime.Day}");
    }
}

From this other perspective, a lot of the functionality in Csharp/.NET is nice because it makes working with Windows Native functionality easier somewhat. Csharp, being statically typed, allows us to specify types of variables, parameters, and so on. Being able to tailor data types helps make things readable, and it also helps to illuminate details that might otherwise be abstracted away.

Though, in a more involved example, we might use the actual Marshal functionality and use PtrToStructure to iterate over unmanaged structures.

For example, if we have a structure containing an integer and a double. And so then we initialize and Marshal a pointer allocated to the size of the struct. And then we create a new Marshaled structure, using PtrToStructure to loop through it, reading the values into the managed structure we've created, printing the variables to the console like so:

using System;
using System.Runtime.InteropServices;

class Program
{
    [StructLayout(LayoutKind.Sequential)]
    public struct TestStruct
    {
        public int Number;
        public double Value;
    }
    
    static void Main()
    {
        TestStruct myStruct = new TestStruct
        {
            Number = 3,
            Value = 1.93
        };

        IntPtr structPtr = Marshal.AllocHGlobal(Marshal.SizeOf(myStruct));

        try
        {
            Marshal.StructureToPtr(myStruct, structPtr, false);
            TestStruct marshaledStruct = (TestStruct)Marshal.PtrToStructure(structPtr, typeof(TestStruct));

            Console.WriteLine($"Number: {marshaledStruct.Number}");
            Console.WriteLine($"Value: {marshaledStruct.Value}");
        }
        finally
        {
            Marshal.FreeHGlobal(structPtr);
        }
    }
}

Given that PInvoke is sort of a way of translating calls and Marshaling is akin to adding an additional set of functions and structures inside a program, and therefore adding complexity, it's easy to see why there are performance tradeoffs when using either of these functionalities. But it can also make working with low-level components in Windows better somewhat.

No comments:

Post a Comment