CRiSP + Motif (no dtrace) Saturday, 15 August 2009  
I am taking a short rest from dtrace - its been doing my head in (ustack / dwarf; see previous postings).

Am on holiday from next weekend for a couple of weeks, and I want to do something more rewarding, so am switching back to CRiSP for a while to kick some tyres.

First up is more finer control of file auditing - you can tell CRiSP to keep track of files you edit in an audit trail; useful for those times when you forgot where you placed a file.

I've fixed some other customer reports.

I keep on staring at ribbon bars, and before I fully tackle this (theres some pre-alpha code in CRiSP to do this, but its not ready for primetime), I am revisiting the Motif factor. CRiSP is built on Motif and over the years, it has driven me insane. In recent weeks I have fixed some uninitialised memory refs in Motif which could cause core dumps, but I have always had a goal to remove it totally. Many of the widgets are native Xt widgets, and the few remaining just require a bit of debugging to get rid of it totally - thus making the code more supportable, and ready for other things. (And freeing up a fair amount of memory).

CRiSP has some theming support and in getting rid of Motif, it will be easier to complete that, and finally make menu items to have icons in them.

People have also asked for freetype font support (which exists in CRiSP in a semi undocumented fashion). So, if the Motif removal goes well, then freetype can be made available to most of the widgets.


Posted at 18:43:24 by Paul Fox | Permalink
  Painful dwarf Sunday, 09 August 2009  
Progress is slow, but positive. Ive spent the last week or two trying to find the user stack and the PC. Its easy to get the user stack, but the PC proved elusive, but I have a hack to find it.

Why?

Imagine the SYSCALL instruction fires. This is a special instruction in the amd/x86 cpus which moves from user mode to system mode, *without* pushing the return address on the stack. The Linux kernel, immediately after the transition (entry_64.S) puts the user space SP into the thread task area, but the PC is hiding. On entry to the kernel side of a syscall, it is in the RCX register, but by the time we hit a probe, e.g. sys_open(), we are miles away and the pt_regs array isnt accurate. At the point of probe, we force a breakpoint trap (luckily, only our code executes at this point, so we dont have to consider nested interrupts and blowing the state areas in the thread stack).

What makes this tricky is getting everything to work at once - anything even slightly wrong just gives bogus results -- stack traces which are not accurate or totally missing.

I am better now - I seem to get the first two stack frames, but the third one is elusive (I am either miscomputing the dwarf frame info or misapplying the result to find the next frame; for a third frame, its frustrating since we have gone thru the same looped code twice, so why the third is problematic is not clear).

The code so far is fairly horrid, with lots of experiments in their, and no 32-bit version yet done. My biggest fear is if any of this is subtly dependent on kernel releases (I think it is not), so that would be one weight off my chest.

(Kernel releases are subtly different in syscall/interrupt handling, and also structure layout for the user/process/thread, but I dont think we care too much, yet).


Posted at 00:16:25 by Paul Fox | Permalink
  slow dwarf Wednesday, 05 August 2009  
Been busy doing some CRiSP updates over last few days, so backed off a little on dtrace, but trying to get back into the dwarf issues.

Alas, the current Windows CRiSP release has black arrows on the scrollbars... to be fixed this weekend. Nuts.

I am trying to get this to parse properly:

$ build/dwarf /lib/libpthread.so.0
....
CIE length=00000014
  Version:              01
  Augmentation:         "zRS"
  Code alignment factor: 1
  Data alignment factor: -8
  Return address reg:    0x10
  Augmentation Length:   len=0x01 1b
R encoding 1b (kernel)

2c38 FDE len=7c cie=001c pc=e0ff..e109 tpc=ffffffffffffffff 0000: dwarf.c: unsupported DW entry 0xf 12

I am working thru the various opcodes, being able to parse, but no guarantee the semantics are correct (thats the next phase).

libpthread.so.0 is where the open64 syscall is located when I do my ustack() test against the perl interpreter.

In theory the parsing shouldnt matter, as in the kernel, we skip over blocks of the dwarf instructions to find the matching block, but it helps me to relax a little and better understand this stuff so I can tackle why some SYSCALL instruction blocks arent being handled properly.

People are sending me bug reports on 2.6.30.* kernels (fixed an issue with 2.6.30.4, but now theres a 2.6.30.5 - I cannot keep up with these releases and the gratuitous kernel code changes on each release!). So, just trying to stay above water, but progress is slow.


Posted at 23:43:51 by Paul Fox | Permalink
  mail problems Tuesday, 04 August 2009  
for reasons i dont fully understand, some of my mail is not getting out. my mail macros and bits/pieces are breaking in some areas and i hadnt realised things were not getting out.

If you see no response from me, then this could be the issue - just remail me; if you see dup emails from me, its me attempting to fix the issue.


Posted at 20:44:37 by Paul Fox | Permalink
  dtrace linux status - the dwarfs Saturday, 01 August 2009  
I've been slowly getting the DWARF stack dumper to work. It works for some system calls/probes but not for others. At issue appears to be accuracy in the dwarf.c code - looking at the gdb source for stack walking is interesting as it highlights a number of issues, including trampolines and exception stacks.

A particular issue I am having at present is the sys_open syscall. gdb can show a stack trace but my kernel code cannot find the appropriate dwarf frames mirroring where we came from. So I need to put in more effort to work through the use case scenarios.


Posted at 13:11:54 by Paul Fox | Permalink