another dtrace delay Sunday, 01 March 2015  
Everything was looking promising to release a new dtrace sometime last week. It was working on the 3.16 kernel, 3.8, 3.4 and then onto RH5.6 (2.6.18). I ran into a lot of issues on 2.6.18 - not surprising, given the code mutations. Much of the last 2 weeks was on the execve() system call. It would panic the kernel. Despite a lot of experiments and reading of the assembler and kernel code, I kept doing silly things. It really doesnt help that the 2.6.18 kernel will hard panic on a stray GPF - made it very difficult to figure out what was going on.

Eventually I got every line of assembler and issues with registers in C code to work.

Along the way I had an issue with the "old_rsp" symbol. This is not exposed in /proc/kallsyms, and not even in the /boot/ code. I had to write a tool to extract this from inside the kernel. But this ran into complications because /proc/kcore is broken on the RH/Centos kernels. I had to create a new device driver, which has to be loaded into the kernel prior to the build of dtrace ("/proc/dtrace_kmem"). Its a very simple driver only designed to handle the scenario of building dtrace.

Having got this work, then the next roadblock was the rt_sigreturn() syscall which paniced the kernel. Careful investigation showed a missing line of assembler (for the 2.6.18 kernel). Now that works.

Now everything is looking good on RH5/Centos5 but before going on the trawl of later kernels and proving I didnt break anything, I have an issue with x_call.c. Either I use the native smp_call_function() interface - which works great, until we panic the kernel, or I use my implementation, which doesnt seem to be broadcasting to the cpus - this means certain probes get "lost".

So, hopefully this week or next weekend - depending on the xcall issues.

Posted at 21:06:30 by fox | Permalink