Dwarf .. nearly working. Sunday, 26 July 2009  
  0   3004              sys_nanosleep:entry
              0x7f76eab2e104: libc-2.6.1.so`sleep+0x94
              0x7f76eb55a576: libperl.so.5.8.8`Perl_pp_sleep+0x56
              0x7f76eb51d1ee: libperl.so.5.8.8`Perl_runops_standard+0xe
              0x7f76eb4c7f4a: libperl.so.5.8.8`perl_run+0x30a

0 2482 sys_rt_sigaction:entry 0x7f76eab2e17a: libc-2.6.1.so`sleep+0x10a 0x7f76eb55a576: libperl.so.5.8.8`Perl_pp_sleep+0x56 0x7f76eb51d1ee: libperl.so.5.8.8`Perl_runops_standard+0xe 0x7f76eb4c7f4a: libperl.so.5.8.8`perl_run+0x30a ...

The above is the stack trace of Perl, which has no decent frame pointers, yet the stack trace agrees with what gdb sees. (I had to cheat, since 'main()' is missing above).

Its nearly there, but need to resolve some more issues, and then we should have a viable ustack() call even on omit-frame-pointers applications. (Still need to do the 32-bit equivalent of the above).

Posted at 12:01:02 by Paul Fox | Permalink
  Say "goodbye" .. Say "hello" Monday, 20 July 2009  
I have removed the utils/eh.c file.

I have created driver/dwarf.c.

This file is both a userland binary (build/dwarf) and the dwarf decoder subroutine for kernel code to be called from dwarf_isa.c.

Next step is to modify the stack walker to invoke the subroutine and see if we get sensible results from within the dtrace driver.

Posted at 23:20:37 by Paul Fox | Permalink
  And so the gestation of a dwarf begins... Sunday, 19 July 2009  
The utils/eh.c seems to be working and am now converting it from a userland dwarf dumper to a subroutine which can be called in the context of walking the stack.

I'll put out periodic releases if anyone is interested (utils/eh.c) which will become driver/dwarf.c when its ready for compiling into the kernel (not far off).

The next step is to change the ustack() code to call this and see what happens...

Posted at 19:00:53 by Paul Fox | Permalink
  Gestation Period is up...I am pregnant with a Dwarf... Saturday, 18 July 2009  
Having spent the last week or so on understanding the DWARF .eh_frame and .eh_frame_hdr sections, I now have a simple utility to dump out these sections, according to the DWARF spec. This code is analagous to what the binutils/readelf tool can do, but is the first step to making this work inside the kernel to get stack traces from user space apps.

The code is in utils/eh.c (gcc -o eh eh.c -lelf). Its nothing special, and likely to have a few bugs/quirks in it, but the code can now be copied into a kernel module and invoked as a subroutine, with various changes to handle ELF32 + ELF64 (eh.c only handles ELF64 for now).

The following is the kind of output from the tool:

FDE length=00000024 ptr=0034 pc=00402110..00402199
  Augmentation Length: 0x00
0000: 4a          DW_CFA_advance_loc 10 to 0040211a
0001: 8f 02       DW_CFA_offset: r15 at cfa-16
0003: 86 06       DW_CFA_offset: r6 at cfa-48
0005: 66          DW_CFA_advance_loc 38 to 00402140
0006: 0e 40       DW_CFA_def_cfa_offset: 64
0008: 83 07       DW_CFA_offset: r3 at cfa-56
000a: 8e 03       DW_CFA_offset: r14 at cfa-24
000c: 8d 04       DW_CFA_offset: r13 at cfa-32
000e: 8c 05       DW_CFA_offset: r12 at cfa-40
0010: 00          DW_CFA_nop
0011: 00          DW_CFA_nop
0012: 00          DW_CFA_nop
0013: 00          DW_CFA_nop
0014: 00          DW_CFA_nop
0015: 00          DW_CFA_nop
0016: 00          DW_CFA_nop
It may not make sense without reading the specs or understanding what it is trying to do. (eh.c has various big cribbed comments taken from the DWARF spec). The above is like a virtual machine but is used to track what is in a register (eg the current frame pointer) rather than perform arithmetic or logical operations.

Theres still some way to go - taking a demo program and making it into a re-entrant subroutine (and I may have some concerns about performance after looking at the DWARF frames for a sizable executable, like CRiSP, but we will see what happens).

My initial target is /usr/bin/perl - since having a programming and deterministic environment to test and retest is useful.

Posted at 19:33:46 by Paul Fox | Permalink
  DWARF, and Sun Tuesday, 14 July 2009  
I have a person in Sun, actively fixing dtrace to help with their work, and this is proving useful - two or more sets of eyes to pick over some of my dirty work. Already he has fed back quite a few things for the 2.6.18 kernel, which is applicable to other kernels too. Hopefully more fixes will be forthcoming, whilst I fight the Elves and Dwarves.

DWARF - one of the most complex unix areas - but a beautiful piece of work, dating back to the early 1990s by AT&T/Sun.

DWARF is the way debug info is stored in executable ELF files. Not something one normally worries about, and the GNU binutils and gdb packages, along with GCC, know how to do this without blinking.

But, hiding in DWARF is the magic for handling stack unwinding. Because -fomit-frame-pointer became popular in the 1990s as GCC was enhanced to allow use of an extra register on the x86 architecture, a way was needed to walk the stack, when the %EBP register no longer helps find the return addresses.

If you look at an ELF executable, e.g.

$ objdump -h /usr/bin/perl
Idx Name          Size      VMA               LMA               File off  Algn
 15 .eh_frame_hdr 00000034  0000000000400eb4  0000000000400eb4  00000eb4  2**2
 16 .eh_frame     000000ac  0000000000400ee8  0000000000400ee8  00000ee8  2**3
you will see the above two sections. This is the sections for unwinding the stack, typically needed for C++ exceptions, but also for omit-frame-pointer (FPO) code. The DWARF spec, e.g. http://refspecs.freestandards.org/LSB_3.0.0/LSB-Core-generic/LSB-Core-generic/ehframechpt.html will tell you more than you ever wanted to know about this.

The specification, like most specifications, is opaque in many areas, and I am busy writing a disassembler to more fully understand it. (Not useful to anyone else but me). I did find this:

$ readelf -wf /usr/bin/perl
will disassemble these sections, and I found this: http://www.hpl.hp.com/research/linux/libunwind/ and http://www.nongnu.org/libunwind/ which have code to help more fully understand the spec.

Its a shame these key libs arent a standard part of the distributions, and that the kernel itself hasnt yet stumbled on to this, so I may as well try for them.

The problem being solved here is that ustack() is useless on apps compiled without frame pointers, and many distros do exactly that.

Anyway, .eh_frame_hdr is a mini table which maps a program counter to a block of instructions in eh_frame which describes, amongst other things, what the stack looks like within a basic block of code. So, as the cpu pushes/pops things off the stack, it provides a map of where to find the return address of the function, and that is how gdb works nicely on x86_64 architectures (and many others).

Of course, those libraries are significantly complicated since they support many CPU architectures and scenarios, whereas I am only currently caring about x86 32 and 64 bit machines.

Posted at 21:18:44 by Paul Fox | Permalink
  Hiiiii! Hoooo! Its off to work we go. DWARF Sunday, 12 July 2009  
Whilst bumbling around in ELF file format, and after a prompt from Nicolas at Sun, I found out how gdb does its stuff to find stack frames for an omit-frame-pointer.

When code is compiled with GCC, it creates a data structure used for exception handling. I thought this was only used for real C++ apps, but turns out this is there for non-C++ apps also, and is hiding in the ELF sections, loaded into memory:

$  objdump -h /usr/bin/perl
/usr/bin/perl:     file format elf64-x86-64

Sections: Idx Name Size VMA LMA File off Algn ... 15 .eh_frame_hdr 00000074 000000000040289c 000000000040289c 0000289c 2**2 CONTENTS, ALLOC, LOAD, READONLY, DATA 16 .eh_frame 0000020c 0000000000402910 0000000000402910 00002910 2**3 CONTENTS, ALLOC, LOAD, READONLY, DATA ...

So, I need to find these sections in the address space of the running application to be able to walk the stack. Hopefully this gives us a workable solution for ustack().

I have some way to go, not only locating the memory regions for the current stack to find the ELF blocks, but potential issues if user space pages are paged out whilst we are walking the procs address space.

Probably at least a couple of weeks away from getting this working.

Posted at 19:53:53 by Paul Fox | Permalink
  Darnit...i must admit defeat and live my life... Saturday, 11 July 2009  
Mention omit-frame-pointer to people, and if they 'get it', they will seethe at code compiled this way.

Thats me after about 2 weeks of trying to improve ustack(). On the Ubuntu releases I am playing with everything is either compiled without a frame pointer, or GCC has bastardised the stack like a drunk who has thrown up in the toilet.

I have tried various heuristics to get something to work, but I need to dig deeper. (gdb can do it, so I need to see how its doing it).

Anyway, I got Centos 5 - 2.6.18 installed to fix some issues people had reported on the 2.6.18 kernel.

Someone in Sun has contacted me regarding getting dtrace to work on 2.6.18 for the Lustre project. I find it elating and funny that Sun have come to me for dtrace on Linux, since they want it to help debugging. There were three bugs which the person kindly reported on and he is in business, so thats a good mutual deed. (Thanks Nicolas)

I have some other contributions to fix issues with pid/tid, and I am looking this to see what is wrong in dtrace and fix. (Thanks Mauritz).

I need to do something in the ustack area - theres a few pent up fixes/cleanups in my internal code, but I will look at gdb for some hints and see if I can make some progress.

Posted at 16:26:02 by Paul Fox | Permalink
  Heat + Programming dont mix Friday, 03 July 2009  
We've been having a bit of a heat wave this week in the UK, and its partially muddled my brain - its beginning to cool off, so dtrace is looking more attractive.

I have spent the week playing with the symtab code so that ustack() can display the user stack traces. I found various issues with the hacks to get Linux process control to work without radically modifying the existing code - still more to do, but at least I can concentrate on the symtab.

I tripped over a bug in a couple of the ELF functions, where there is a Solaris v. Linux incompatibility in the error return values.

I keep finding code where it tries to open /proc/pid/pstatus which doesnt exist on Linux, and various issues in finding the DYNAMIC/PROCEDURE_LINKAGE_TABLE. At the moment, its displaying the function names fine, but the module (library) names are garbage, probably because its expecting to find the shlib name but I havent stored it anywhere and its pointing to free memory.

I just ran valgrind on dtrace and thats helped track a few uninitiatlised variables, but valgrind doesnt understand the dtrace ioctl()s so any return from an ioctl() taints the output, unless/until I teach valgrind how to interpret these.

I spoke to Adam Leventhal about SDT probes to understand some more of the internals. An interesting point he mentioned to be was how SDT works in Solaris: as the kernel boots up, it scans itself for the SDT probes and readies the breakpoints to be inserted. So there is a mapping of probes, just like for a USDT application, which makes perfect sense.

I mentioned the trickyness of doing Linux SDT probes in the absence of source code changes to the kernel, and I know it can be done, but it may require case-by-case analysis to determine how best to patch the kernel to get the probe points. When I have finished/improved user space symbol and process handling then I can go back to that to play, or, I could just use dtrace to analyse more of the kernel itself.

More, when theres more to write about.

Posted at 21:07:50 by Paul Fox | Permalink