dtrace update ... | Monday, 23 February 2015 |
I spent some time trying to get execve() syscall tracing to work - and am still working on that.
Along my journey, I noticed a few things. Firstly dtrace4linux is too complicated - trying to support 32+64b kernels, along the entire path back to 2.6.18 or earlier, is painful. I cannot easily automate regression testing (not without a lot more hard-disk space, and not worthwhile whilst I am aware of obvious bugs to fix). I could simplify testing by picking any release, and just rebooting with different kernels - rather than full ISO images of RedHat/Centos/Ubuntu/Arch and so on.
I also noticed that the mechanism dtrace4linux uses to find addresses in the kernel is slightly overkill. It hooks into the kernel to find symbols which cannot be resolved at link time. The mechanism I have is pretty interesting - relying on a Perl script to locate the things it needs. I found a case where one of the items I need is not visible at all in user space - its solely in the kernel - part of the syscall interrupt code (the per-cpu area). Despite what latest kernels do, some older kernels *dont*. And catering for them is important. In one case I have had to go searching the interrupt code to find this value. I ended up writing a C program to run in user space, prior to the build, and really, it would have been better to generalise this so that everything we need is simply defined in a table compiled in to the code, rather than the /dev/fbt code to read from the input stream. This would ensure that a build compiles and works. Today, sometimes I debug issues with old kernels because a required symbol is missing and we end up dereferencing a null pointer (not a nice thing to do in the kernel).
One problem I had with the above, was that gdb on the older distro releases cannot be used to read kernel memory due to a bug in the kernel precluding reading from /proc/kcore. Fortunately, I include a script in the release which emits a vmlinux.o, complete with symbol table, from the distribution vmlinuz file.
I havent reverified the ARM port of dtrace, but thats something for a different rainy or snowy day.
new dtrace .. small update | Friday, 20 February 2015 |
Note that no new functionality is in here - the issues with libdwarf remain - I may try again to solve that issue, and "dtrace -p" is still a long way off from being functional.
Given that 3.20 is now the current kernel, I may need to see if that works and pray that 3.17-3.20 didnt affect how dtrace works, or, if it does, the work to make it compile should be much less than the issues that 3.16 raised.
Why is gcc/gdb so bad? | Thursday, 19 February 2015 |
One of the powerful features of gcc was that "gcc -g" and "gcc -O" were not exclusive. And gdb came about as a free debugger, complimenting gcc.
Over recent years, gdb has become closer to useless. It is a powerful and complex and featureful debugger. But I am fed up single stepping my code, and watching the line of execution bounce back and forth because the compiler emits strange debug info where we move back and forth over lines of code and declarations.
Today, in debugging fcterm - my attempt to place a breakpoint on a line of code, puts the breakpoint *miles* away from the place I am trying to intercept. This renders "gcc -g" close to useless, unless I turn off all optimisations, and pray the compiler isnt inlining code.
Shame on gcc. Maybe I should switch to clang/llvm.
address: 0000f00000000000 | Saturday, 14 February 2015 |
Strange. Continue to keep finding why dtrace is not passing my tests. I have narrowed it down to a strange exception. If the user script accesses an invalid address, we either get a page fault or a GPF. DTrace handles this and stubs out the offending memory access. Heres a script
build/dtrace -n ' BEGIN { cnt = 0; tstart = timestamp; } syscall::: { this->pid = pid; this->ppid = ppid; this->execname = execname; this->arg0 = stringof(arg0); this->arg1 = stringof(arg1); this->arg2 = stringof(arg2); cnt++; } tick-1s { printf("count so far: %d", cnt); } tick-500s { exit(0); } '
This script will examine all syscalls and try and access the string for arg0/1/2 - and for most syscalls, there isnt one. So we end up dereferencing a bad pointer. But only some pointers cause me pain. Most are handled properly. The address in the title is one such address. I *think* what we have is the difference between a page fault and a GPF. Despite a lot of hacking to the code - I cannot easily debug, since once this exception happens the kernel doesnt recover. I have modified the script above to only do syscall::chdir: which means I can manually test via a shell, doing a "cd" command. On my 3-cpu VM, I lose one of the CPUs and the machine behaves erratically. Now I need to figure out if we are getting a GPF or some other exception.
I tried memory addresses: 0x00..00f, 0x00..0f0, 0x00..f00, ... in order to find this. I suspect there is no page table mapping here or its special in some other way. May need to dig into the kernel GDT or page table to see what is causing this.
UPDATE: 20150215
After a bunch of digging I found that the GPF interrupt handler had been commented out. There was a bit more to this than that, because even when I re-enabled it, I was getting some other spurious issues. All in all, various bits of hack code and debugging had got in the way of a clear message.
I have been updating the sources to merge back in the fixes for the 3.16 kernel, but have a regression on syscall tracing which can cause spurious panics. I need to fix that before I do a next release.
no dtrace updates | Monday, 09 February 2015 |
The issues I hit were all very low level - the cross-cpu calls, the worker interrupt thread, and the current issue - relating to invalid pointers when accessed via a D script. I have a "hard" test which wont pass without crashing the kernel - crashing the kernel really hard, requiring a VM reboot. This is nearly impossible to debug. The first thing I had to do was increase the console mode terminal size - when the panic occurs, the system is totally unresponsive and all I have is the console output to look out, with no scrolling ability. Having a bigger console helps - but it seems like the GPF or PageFault interrupt, when occuring inside the kernel, does not work the same way as it has on all prior Linux kernels. Looking closely at the interrupt routines shows some changes in the way this works - enough to potentially cause a paniccing interrupt to take out the whole kernel; this makes life tough to debug.
If I am lucky, the area of concern is related to the interrupt from kernel space. If I am unlucky, it is not this, but something else. (Am hypothesing that the kernel stacks may be too small).
I have been saving up putting out any updates, despite some pull requests from people, because I am not happy the driver is in a consistent state to release. When I have finished this area of debugging, I can cross-check the other/older kernels, and see if I have broken anything.
It is very painful dealing with hard-crashing kernels - almost nothing helps in terms of debugging, so am having to try various tricks to isolate the instability. These instabilities in theory, exist on other Linux releases - but I will only know when I have gotten to the bottom of the issue.