More Dtrace memory editing on the fly

Posted: September 28, 2011 in dtrace, solaris, tech
Tags:

It didn’t take long for the fragile 15 year old Solaris app to break again. Six years ago we moved it to Solaris 10 quite successfully thanks to the library interposers. This enabled us to change the hostId for a process to match the old hostId it expected from nine year old hardware. It was much much happier on new hardware.
A few months ago I wanted to move it into a virtual Solaris environment (a Zone) rather than waste an entire physical box on this application(s). Solaris had recently added the ability to set your own desired hostId for a virtual environment so it looked like it should be a slam dunk. The issue that we ran into which Dtrace solved was that te inodes for virtual root directory were wanky. The Dtrace memory edit, which is the function copyin, fixed that.
All’s well that ends well, but this hasn’t ended yet.
It’s running on a test system and it is time to move it to a production system so we can have our test platform back.
So I loaded up some new T3-1B systems up with Solaris 10 release 10. And I shouldn’t have. Or maybe its good I did, find the issue earlier than later.
The problem is with this release the System Info HW Provider is now ‘Oracle Corporation’ instead of Sun_Microsystems. This unfortunately breaks flexlm as it wants to confirm the license it is trying to use is running on the correct hardware as it was issued for. 😦

I’ve been hacking and slashing my way to a unified solution and it’s mostly there. It starts up but I’m getting an error associated with creating/manging the lock file. I’ll pick that up tomorrow, but without further ado here is my SI_HW_Provider fix.

To replace the “Oracle Corporation” with Sun_Microsystems

#!/usr/sbin/dtrace -Cs
#include

#pragma D option destructive

syscall::systeminfo:entry
/arg0==SI_HW_PROVIDER /
{
self->mach = arg1;
}

syscall::systeminfo:return
/self->mach && execname ==”binaryname”/
{
copyoutstr(“Sun_Microsystems “, self->mach,18);
self->mach=0;
}

#include
syscall::getdents*:entry
/ execname == “binaryname” /
{ self->buf = arg1; }

syscall::getdents*:return
/self->buf && arg1 > 0/
{
this->dep = (struct dirent *)copyin(self->buf, sizeof (struct dirent));
this->dep->d_ino = 4;
copyout(this->dep, self->buf, sizeof (struct dirent));
}

Advertisements
Comments
  1. Spooner says:

    I bet it would run better on Android or IRIX πŸ™‚
    πŸ™‚

  2. Daniel babault says:

    I dont find how to use this hack

    As I have errors, I replace in line 2
    #include by #include

    I launch dtrace sun_change.d but nothing seem to be changed

  3. coffeeortech says:

    This hack was unfortunately less than optimal for my issues and I moved on to an mdb kernel hack to fix my issue. If you want to continue with the dtrace hack post more info and I might be able to help you. There could be a typo, funky characters from copying back forth from the web, or something else.

    The real issue with the dtrace hack is that it has to run as the same user who will have the HW_provider changed. The reason it didn’t work for me is that each user who was going to connect to the flexlm would also have to have his hw_provider changed. Ouch- that means each user would have to have dtrace privileges and that is very NOT acceptable… So it was a good exercise, but I ultimately had to settle on a different hack/fix.

    The following kernel fix is global — it actually changes the kernel. Permanently. At least until patching and then the kernel will have to be “poked” again.
    If you are running it zones it must be done at the global zone layer and will affect all non-globals. No warranty for this, but this was provided to me from an inside man at Oracle and their IS NO warranty on this…

    So as root, run command

    # mdb -kw

    then type:
    hw_provider/s

    And it will spit out:
    hw_provider: Oracle Corporation

    Verify that HW Provider is indeed 18 characters long. If not then STOP!

    If you are good to go, then continue with these commands in sequence. The first will write 8 bytes of ASCII: “Sun_Micr”, the second will write the next 8 bytes “osystems”, and the last will pad the end of the 18 chars with two NULL bytes.

    hw_provider/Z 53756e5f4d696372
    hw_provider+8/Z 6f73797374656d73
    hw_provider+10/w 0

    Now, before you exit check your handiwork:
    hw_provider/s

    hw_provider: Sun_Microsystems

    If it doesn’t look like the above line, oops. Try again? Insert another quarter? Otherwise if you are good then you can:
    ::quit

    Here’s a little bit of C code to compile and check your handiwork–

    # ./a.out (A little C routine – see below)
    Hardware Provider: “Sun_Microsystems”

    a tiny c program to read the hardware provider from sysinfo, or of course you uname -a or other tools:
    –BEGIN–
    #include
    #include

    int main() {
    long count = 257;
    long retval = NULL;
    char buf[257];
    retval = sysinfo(SI_HW_PROVIDER, buf, count);
    printf(“Hardware Provider: \”%s\”\n”,buf);
    return 0;
    }
    –END–

  4. danielbabault says:

    It works for me with a small typo change (second half is same as first in your reply).
    So, I do folowing script.

    #!/bin/sh
    #
    mdb -kw <<EOF
    hw_provider/s
    hw_provider/Z 53756e5f4d696372
    hw_provider+8/Z 6f73797374656d73
    hw_provider+10/w 0
    hw_provider/s
    ::quit
    EOF

    I will run it at each reboot

  5. Elv says:

    Same problem with WPLMD based an LGMGRD. I are running WP from only one user, so dtrace is an option for me. But two questions:
    1. which are to include? I think the “less” and “greater” – signs hiding the filename beween
    2. where to run the dtrace-script?

    Thanks for answer!

    • Elv says:

      It semms to be “sys/systeminfo.h” for the first include and “dirent.h” for the second one

      • coffeeortech says:

        The first section is to fix the SysInfo provider name. I don’t recommend this solution but suggest the mdb kernel poke instead.

        The second section is the inodes number fix if you are plagued by this particular problem.

        This dtrace script must run before the flexlm process starts up. So put it in your start script with a two second pause before launching lmgrd. It catches the flexlm license manager (in your case WPLMD I think) and fixes the little issues and then terminates.

        I believe you will need a second dtrace script for the user. The user’s process tree may compare what he believes is the HW_SI to the license and fail if they don’t match. That’s why I went with the kernel poke– I didn’t want to give dtrace privilege to my users.

      • Elv says:

        Thanks for quick and very useful answer!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s