dtrace progress 20081026 | Sunday, 26 October 2008 |
Now that the kernel stands up fine to running user space probes, I now need to figure out where are they, or what to do with them.
They dont show up in 'dtrace -l', but I suspect thats more of a misunderstanding on my part about what/how.
The kernel does have the PID provider code linked in (fasttrap.c and related files), but some stuff is still commented out.
Perusing the solaris kernel shows I have a number of things missing, such as static probes on process creation/exit, and other subsystems. This will require some hacking to get working, but I *dont think* is important just now. (Hacking means I am going to have to disassemble these functions and patch in static probes, unless I can find good hooks within the kernel which I can daisy chain onto).
Theres complexity in handling user programs since the dtrace libraries rely on the procfs way of manipulating applications and Linux has ptrace() along with the new fprobe() (?) interface.
I am tempted to create a procfs() driver for Linux to hide this complexity. I have never found the ptrace() syscall interface friendly for multithreaded apps, but then its very easy to make mistakes and take a good while to debug them.
Am just poking around at the moment. This is the final lynchpin of dtrace -- if this can be made to work then the rest is just quality control.
I got my first quantize() graph out of dtrace today (its always worked, just I had never gotten around to trying it out).
testing .. testing | Saturday, 25 October 2008 |
Debugging a driver in Linux | Friday, 24 October 2008 |
Any form of programming results in bugs - unexpected behavior. A driver is no different - but the bugs are harder to fathom because many of your favorite debugging aids cannot work in kernel space.
My solution, is printk(). Its very low tech but it works...
I am a fan of gdb and have written my own x86 kernel debuggers (for x386 and x186 processors), but porting them is a pain. Now we have 64-bit chips, I have to decide if I want to port my ancient code. (The debugger is powerful but nothing grand).
In a driver you have to crawl - one atom at a time; enabling huge wads of code and expecting it to work, well, er, wont. Disabling lots of code and ensuring structure is there and then a sprinkling of print statements works well.
I try to use vmware whilst debugging since bad pointers can corrupt filesystems, and losing hundreds of gigs of filesystem is not nice, waiting to fsck or reformat/reinstall.
Linux has a nice feature: GPFs (bad pointers) are caught and logs written to /var/log/messages. If you are lucky no reboot is needed.
Having a stripped down startup is essential - being able to reboot in about 10s is ideal - no waiting for GUI startups. (I use rlogin or telnet or ssh into the vm session).
Mutex debugging is a pain, but I have found that Linux has a drop dead timer: if the kernel is unresponsive after 10+s a message is printed on the console. Next is a reboot (dont resume a VM snapshot, since you will not have access to what went wrong).
After reboot, /var/log/messages will have your printk() statements to help track down where you got stuck.
Next step is to avoid making bugs in the first place....
DTrace Progress 20081023 | Thursday, 23 October 2008 |
I have been working on the /dev/dtrace_helper driver code which is used by a user space app, to find where its not working. This has lead me into a corner of the Linux port, which had been stubbed out or #define'd to compile, saving for a rainy day, the work required to resolve.
That day arrived: much of the Linux kernel driver code is just plain ol' C with a few bits of assembler. However, the dtrace helper ioctl() code needs to be able to find and store attributes of real processes. So, my process-shadowing veneer needs to be stronger than it is. This is the key code which negates the need to change the GPL Linux kernel source; by keeping the dtrace driver pure of GPL, it needs a way to find stuff which is not normally of a concern to a driver.
This is not so much a GPL vs CDDL licensing issue, but more of an issue in that the Linux kernel changes, sometimes quite dramatically, from one release to another. If dtrace is not a part of the kernel, then it needs to be a good citizen and provide easy adaption to new kernels, or kernels compiled with differing compile time options.
Applications like VMware have a similar issue - many times a new kernel will come out and VMware wont work on it, since the compiled code doesnt conform to the new headers or functions.
Of the many thousands of lines of code in dtrace, only a very few (10-20) care about this aspect of the kernel, e.g. convert a PID to a process structure, store/retrieve attributes affecting scheduling. But these few lines are the most difficult...or not...
Other tasks have been taking my time this week, so I shall be going back into the water in a few days....
USDT -- some details | Sunday, 19 October 2008 |
User Defined Tracing - How it works
USDT is a mechanism for user land applications to embed their own probes into executables. For example, a Perl or Python interpreter might use it to gain access to stack traces of applications which are already started.
The goal of the DTrace team was near zero overhead when not invoked. This works well - even commercial applications can embed probes and not worry about performance or run time dependencies.
There are a number of steps to make this work.
Heres an example:
int main(int argc, char **argv)
{
while (1) {
printf("here on line %d\n", __LINE__);
DTRACE_PROBE1(simple, saw__line, 0x1234);
printf("here on line %d\n", __LINE__);
DTRACE_PROBE1(simple, saw__word, 0x87654321);
printf("here on line %d\n", __LINE__);
DTRACE_PROBE1(simple, saw__word, 0xdeadbeef);
printf("here on line %d\n", __LINE__);
sleep(1);
}
}
The DTRACE_PROBEx macros translate into a function call. To gain near-zero overhead, during linking, the function call is replaced by a series of NOP instructions. Heres an example disassembly of the above
0000000000400e7c <main>: 400e7c: 55 push %rbp 400e7d: 48 89 e5 mov %rsp,%rbp 400e80: 48 83 ec 10 sub $0x10,%rsp 400e84: 89 7d fc mov %edi,0xfffffffffffffffc(%rbp) 400e87: 48 89 75 f0 mov %rsi,0xfffffffffffffff0(%rbp) 400e8b: be 07 00 00 00 mov $0x7,%esi 400e90: bf df 11 40 00 mov $0x4011df,%edi 400e95: b8 00 00 00 00 mov $0x0,%eax 400e9a: e8 01 f9 ff ff callq 4007a0 <printf@plt> 400e9f: bf 34 12 00 00 mov $0x1234,%edi 400ea4: 90 nop 400ea5: 90 nop 400ea6: 90 nop 400ea7: 90 nop 400ea8: 90 nop 400ea9: be 09 00 00 00 mov $0x9,%esi 400eae: bf df 11 40 00 mov $0x4011df,%edi 400eb3: b8 00 00 00 00 mov $0x0,%eax 400eb8: e8 e3 f8 ff ff callq 4007a0 <printf@plt> 400ebd: bf 21 43 65 87 mov $0x87654321,%edi 400ec2: 90 nop 400ec3: 90 nop 400ec4: 90 nop 400ec5: 90 nop 400ec6: 90 nop 400ec7: be 0b 00 00 00 mov $0xb,%esi ....
When an application is built, dtrace is run on the object files to rewrite the objects, stubbing out the calls for probes, and creating a table in the executable of the places where the stubs are located. (The code for this is located in libdtrace/dt_link.c).
When the application is started up, a piece of code is executed (before main() is called). [Code located in libdtrace/drti.c]. This code looks at the current system, to see if dtrace is loaded into the kernel and communicates with the /dev/dtrace/helper driver to inform it that new probes are available in this process.
Voila! We are done. Or nearly.
At this point, whilst the application is running, 'dtrace -l' should reveal your new probes.
The Kernel
When a user elects to monitor the probe, the patched (NOP-ed) code will be change into a call back into the kernel to notify the function/probe is being invoked.
Cancellation of the probe will undo the patched code and we are done.
DTrace USDT progress | Sunday, 19 October 2008 |
Next step is ensuring the driver helper functions are enabled and to test it out with a simple example in the release.
Its been a slow week trying to debug just a few lines of code in dt_link.c as GNU object files and Sun ones are subtly different in the way undef symbols are defined in object files.
Stay tuned...
DTrace Progress 20081012 | Sunday, 12 October 2008 |
Its not finished yet, but heres what I have found:
libdtrace/drti.c needs to be compiled to an object file, and linked in with target apps. (This now compiles).
"dtrace -G" is the magic used to covert a prototype file to the object file needed to link with the application to be probed.
We need the dtrace 'helper' device - something I had commented out early on, since I wasnt sure what it was. This is now enabled in /dev/dtrace_helper. (If I can work out how to create a /dev/dtrace/ dir, then I can more closely mimic Solaris; not a big deal whether this is done).
libdtrace/dt_link.c has needed a few minor mods to fit in with the new device name, but some workarounds but assumptions about /usr/ccs/ which is not the compiler directory under Linux.
A small complication is needed to store state on a per proc/task basis, but the shadow mechanism (par_alloc) is used for this. (I need to intercept process/task death to do the garbage collection; subject for another day).
This will be a major milestone to get this working; people have been asking about USDT for Perl/Ruby, and I want to put a probe into CRiSP - just so I can understand the in's and out's.
USDT - Dtrace | Wednesday, 08 October 2008 |
http://blogs.sun.com/barts/entry/putting_user_defined_dtrace_probe
which lets me get started. So, in theory, once I can compile with a decent header file, and reasonable portability, I can go digging in the kernel driver to see if we can invoke/intercept it.
Keep fingers crossed !
fcterm progress | Monday, 06 October 2008 |
I had recently taken a short break from dtrace whilst enhancing fcterm to include the following features:
Infinite scrollback (sort of, now spills to files and can page in from the files, but you dont really want infinite scroll).
Performance: now faster than all the competition.
Auto-restart: if the X server crashes, fcterm will carry all of your pty state over to the newly started X server, without missing a heartbeat. Yes, it waits for a new X server to come along and continues from where you left off; no more lost editing sessions, or shell sessions.
Fixed a typo in dtrace today...time to go back in the water... stay tuned....