Dtrace and printk() | Sunday, 30 October 2011 |
Sometimes a trace like:
$ dtrace -n fbt::[a-e]*:
would work, and sometimes not. Lots of variations and thought processes were applied, and nothing worked.
Then, after a little rest, I went back to basics. Lets assume one function blocks us, so we try the binary search to see which fbt function it is.
Turns out that the new kernel has modified printk() - the kernel printing function in some way. (I think its to do with recursive prints, but not concluded this yet).
What appears to be happening is if printk() is called at the wrong time, the kernel will lock up, waiting on a semaphore, to detect if the console is free for printing.
printk() is not normally called much during dtrace, but there seem to be enough places. If I map printk() to a do-nothing function, then sanity appears to be restored and I can run against fbt:::.
So I need to either avoid printk() in dtrace, or, be judicious where its used. (Dtrace already has an internal dtrace_printf function to write to an internal circular buffer, but thats not visible if the kernel crashes; I may need to fix that).
So, if you are having trouble on Ubuntu 11.04 or 11.10, or other equivalent, using Linux 3.0.x, then stay tuned.
Thanks Nigel Smith, for pushing me to go hunt this down.
Recent kernels, and kernel debugging | Saturday, 29 October 2011 |
During the 2.6.2x and 2.6.3x series, I noticed a marked usability for writing drivers in the kernel. Even bad code would be caught, getting a panic, and useful diagnostics to trace down the issues.
The Ubuntu 11.10 release is turning into a horror show. When the kernel panics, quite often the panic is incomplete before the machine totally freezes.
I am wondering if the latest kernels have a bug in them, or, at least, a regression in the face of a buggy driver.
On the other hand, it could be my fault - after all, dtrace is unproven on the latest kernels.
Heres the deal (see last post): after maybe 1 million probes have fired, on a 4-core box, the kernel will hang or panic. Interestingly, the panics are nearly always the same place (the e1000 ethernet driver). Very strange that it should nearly always panic at the same point. Its telling me something, but I dont know what it is.
I've been trying to put some extra debug into dtrace but I havent worked out what I want - since the error is so rare, and I need a way to get the data out of the kernel into my eyeballs.
Now, I wanted to just mention the state of Linux debugging. I do think its not as good as it could be. Firstly, under no circumstances do I want to sit for an hour or more compiling a kernel with debug symbols and eating disk like there is no tomorrow. Debug symbols in the kernel are a joke - the bloat in the image size is not nice. Even without debug symbols, I am bored of sitting for an hour or more waiting for the Video/USB and other drivers to be compiled - none of which I have. I am even more bored of "make menuconfig" and trying to turn them off - it takes longer to turn them off than it does to compile them. "make menuconfig" is too unstructured.
So, sure, I can use gdb over the network, but I cant unless the kernel has the gdb hooks compiled in. This should really just be a loadable driver. I shouldnt need full debug - whats in /proc/kallsyms is more than sufficient for remote debugging.
Its a little frustrating. Years ago, I built better debugging tools for a Z80 networking kernel and a 80186 terminal server with better debugging support than todays kernels have.
I dont mind compiling a kernel with debug - if that was it. But with so many VMs and versions of Linux, its just not nice.
I think there is a need for a loadable debug module. I may start to write one - I dont need much, just ability to print the stack, dump memory and a few other things. By honoring the GDB protocol, it could be done for remote debugging, without recompiling the kernel.
Anyway, back to the problem at hand. This requires a lot of try-this/try-that in order to work out where the problem is. (I noticed today that I was using a "hack" interrupt vector, which would nicely explain the problem I am seeing, but I dont believe its this problem).
Dtrace .. does it work. Yes. No. Yes. No. What?! | Saturday, 29 October 2011 |
$ dtrace -n vminfo:::'{printf("%s", execname);}'
would panic or hang the kernel - not straightaway, but within say 5mins, especially if doing a:
$ while true > do > date > done
in another window.
I've been staring at the dtrace code and trying various things to see what causes it. Its an annoying panic because I lose control of the kernel and no way of figuring out what happened immediately leading up the issue. The stack trace on the panic doesnt help (I am seeing the same panic in the e1000 driver cleanup code, but no references to dtrace causing this).
I suspect dtrace is taking an interrupt, maybe not restoring a register and sometimes, that register happens to be important.
Especially strange, as, running on Ubuntu 11.04 (2.6.38 kernel), it works fine. I can really torture the system and it stays up.
I need to dive more into the entry64.S code to examine what changes happened around the way an interrupt is handled. If I am lucky I may be able to localise this to a register issue (%GS is a high probability).
Linux is really missing a kernel debugger. Theres kgdb and remote debugging available, but this is really painful, when you suddenly need to have to compile a new kernel, waste more than 1GB of disk because of the symbol table, and then try and get it all "working".
What is needed is a better way to take control on a panic, and poke around, similar to kadb for older Sun machines.
I might have to start writing a crude debugger to help with these annoying "you died but I am not going to tell you why" issues.
Dtrace release 20111026 | Wednesday, 26 October 2011 |
This release is an interim release. It fixes the issue I wrote about earlier, but likely will not compile on kernels earlier than 2.6.39.
It is the first working example of the vminfo provider. Heres a small sample of the vminfo probes:
48 vminfo pgmajfault 49 vminfo unevictable_mlockfreed 50 vminfo pgfree 51 vminfo unevictable_mlockfreed 52 vminfo compactsuccess 53 vminfo compactfail 54 vminfo pgdeactivate 55 vminfo pgrotated 56 vminfo pgactivate 57 vminfo kswapd_low_wmark_hit_quickly 58 vminfo kswapd_high_wmark_hit_quickly
These correspond to the entries in /proc/vmstat and you can now intercept calls to them, e.g.
$ dtrace -n pgmajfault ...
I will attempt to update the release to fix the broken earlier kernels shortly.
(Getting the /proc/vmstats stuff involves examining an enum and isnt amenable to #ifdef coding practise, so I may need to autodetect which ones are available for the current kernel).
Linux! How Dare you?! | Wednesday, 26 October 2011 |
Then I hit a strange problem. One of those "Duh!" moments.
So, to check out the vminfo provider, I need to run dtrace to intercept the probes. But the kernel kept panicing.
Heres the panic:
[ 458.807224] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
Didnt make sense. It worked a moment ago. Or so I thought.
So next, lets downgrade our probe. We *know* syscalls work because I had tried that on getting the Ubuntu 11.10 release and validating on the 3.0 kernel. Lets try something different:
$ dtrace -n fbt::sys_chdir:
(Given a choice of 250,000 probes to choose, I want one, I *know* exists without looking up, and one which is executed rarely, preferably, on demand). Yup. This dies too.
Ok, so revert the code to the last release. Repeat. Panic.
Strange.
Lets try a random.other.kernel (Ubuntu 10.04). No problem.
What?!
Ok, the Linux kernel guys are smart. Very smart. What did they do?
They changed the way a kernel is mapped into memory. Previously, all kernel pages were pretty much executable (especially .data/.bss). This was unnecessary and they appear to have fixed this. Dtrace does something slightly suboptimal - a char[] array is declared for executing the breakpoint trampolines, but this is a BSS symbol. And the page the structure resides in is no longer executable.
*That* explains why it suddenly broke on the latest kernels. The fix is easy: just mark the page as executable. I would like to use the proper API or GCC __attribute__ specifier, but the API calls are problematic - some are GPL-only exports; others dont expose the pgprot permissions etc. The "lets modify the page table directly" approach seems to work.
So, I'll release a new dtrace which fixes this problem (and hopefully a working vminfo provider too).
dtrace update - attempting the vminfo probe | Tuesday, 25 October 2011 |
Theres some issues, such as slightly different code for 32 vs 64 bit kernel, but the approach seems sound.
In effect, this is a google map of the kernel binary - searching the kernel for the instructions which represent increments to the vmstat data counters, and enabling them for probes.
Will report back in a while if this looks good enough to use. Many of the other providers are similar in style, but there are some issues, such as not all counters in the kernel follow the vmstat pattern. There is also the issue of provider function names matching Solaris (some will, some will be extras). And also the callback arguments need to match Solaris spec or something useful (more troublesome).
iOS5 .. some thoughts | Friday, 21 October 2011 |
Wins: iPod Touch - they finally fixed the bug where switching away from video and back, does not take 5+s to figure out whats going on. Thanks. I appreciate that. Only taken 2years to fix.
Loss: Watching videos on the iPad is an exercise in frustration. I dont get why Movies and TV Series are different. With Movies, I can tell what the movie is (my ripper adds a front screen and title). But for TV Series, all I see is an array of images for all tv series. I cannot tell which is which. Please ! Why cant a title be added? I have to guess which one I want and drill down to see what it really is.
Worse, they removed support for having playlists of TV Series sitting in the ipod playlists section. So, there is no text to navigate what to see.
Why is the iPod and iPad different?
Next: iTunes. I like iTunes - its not bad. Its not good either. The most recent release disallows selecting a selection of music tracks and editing the displayed image. Previously I could go to, for example, Amazon, and drag/drop the cover art into the Get-Info popup. This doesnt work when multiple tracks are selected.
Its always a gamble with iOS and iTunes whether the next release is retrograde or forward thinking.
Not being able to have a "guest" ipod/ipad so you can selectively copy, e.g. home videos to a relatives device, is another bad point.
On the plus side, if Apple had gotten this right, we wouldnt talk about them so much.
Oh, and the 64GB phone! At last. A phone with 64GB. Android cannot compete. I wish Google and the device manufacturers could find a place for people who want to load lots of video onto a device. Shame that Apples phone price is so high.
Best to wait til next year to see if we get SDXC or 128GB device support in a phone.
CRiSP and FCTerm | Friday, 21 October 2011 |
What is CRiSP? Its an editor. Its been my hobby horse for so long now, that its old enough to be married, have children, and you can find it hanging out in bars, wondering why it didnt listen to its Daddy and get a decent job.
Its been relatively stagnant for a while - I had run out of things to implement and support.
CRiSP is a multiplatform editor, before that was de-riguer. It runs on Windows/Mac/Linux, and is pure C code. Its small and tight in terms of code (by todays measuring sticks).
Fcterm - is a color terminal xterm. Started many many moons ago when Sun brought out SunOS 3.x, and color monitors were just starting to appear. In those days, "shelltool" and "cmdtool" and the whole XView desktop was just too "black and white". Color xterms didnt really exist (AIX had a nice one). So, fcterm was born. It was small and fast.
As CPUs have gotten faster and faster, its still small and fast, but over recent years it has had new features added (infinite scroll, graphical drawing ability). (See prior post on "proc" using this to effect -- http://crtags.blogspot.com/2011/08/some-illustrations-of-proc.html).
Theres a limit to how much you can add to an xterm. Or is there.
See below for a screen shot of the latest fcterm. This is character mode crisp running inside the window, providing graphical features (also available in the graphical version of CRiSP). I spend most of my time in an xterm - the Ctrl-Z/fg aspect of switching from editing to "doing" is convenient, and its worth making the terminal emulator comfortable.
[fcterm has a butt-ugly popup window for setting attributes; on my todo list to revamp that one day].
Look carefully at the outlining margin and the gridlines for tab indenting.
80pixels | Thursday, 20 October 2011 |
Although Dell have a sophisticated web site, it is sub-par when it comes to reporting a fault. Following the various strands on the web disallows you from having an online chat to figure out how to actually report the fault. A pay-phone number will be expensive, and will have to do that tomorrow.
The "Lets download a plugin and sod-you if you are not on Windows or are using Firefox" is very offensive. Customer care is really an after thought.
At least Dell have a twitter feed (@DellCares) which is potentially good but probably a time waster - in wanting details, sprawled out over hours of waiting for replies and using terse abbreviations (which is understandable given twitters 140 char length).
Its really very comical - that a company the size of DELL with a sophisticated web site (which I hate and like at the same time), have to "breathe through a straw" to communicate with customers and not use the interactive IM mechanism they have on their site.
My banding on the screen went from 1-pixel high to about 60-80 pixels. Oh well. Why does this happen 6 months into the laptop and not after 2-3 years (which I so desperately wanted my old one to do, so I could justify this one!).
DTrace and GIT | Tuesday, 18 October 2011 |
People ask, cant I do "git"? Well...simple answer is "no". There is a unmaintained(?) dtrace github page, but it wasnt set up by me, but an enthusiastic supporter.
I can understand people either wanting to track changes or make contributions.
So, I am opening up the conversation to people: Just how badly can I damage GIT ?!
I have recently started using GIT and automating the commits at home, but I am lacking an understanding of git and how to cope with complexity.
Heres the deal. In theory I have two main machines - a server, rarely switched on, but the "master", and my laptop, where I do most of my work -- manually syncing changes (not just dtrace, but for CRiSP and other things) across the machines.
I set up git on my master and laptop ($HOME/git) and use symlinks in my source code dirs so that the git repository is in its own tree.
Previously, I would just create periodic tarballs as snapshots - which are mostly fine, but not necessarily synchronised to the sync points.
I rsync my laptop/master git repositories - probably a bad thing. Is it?
So, if an external facing git repository is available, what does it achieve?
Q1: I can sync to the external repository from my internal, and stop doing tarballs? (Or keep doing both).
Q2: Who can touch the git repo? Presumably whoever I permission, or, is it a free-for-all?
Q3: Assuming its a trusted circle of people, then how do I sync from the repo back to my local git repo?
I really want to review what people do and likely not accept some contributions or recode them to fit in with my "style".
I dont want to be a Linus/Git-meister (but will if need be - if it helps the greater good).
So, educate me or be gentle with me.
(I am busy adding some new features to CRiSP and fcterm to show outline grids whilst editing, and when I finish this, I may go back to Dtrace and start to remember "What was I planning to do next").
Dtrace on Ubuntu 11.10 | Friday, 14 October 2011 |
So, thats a relief.
Dtrace fixed for Ubuntu 11.04 | Friday, 14 October 2011 |
At some point in the recent past, /proc/kallsyms was layered in security. Looking at the file as a non-root user means we dont have access to the symbol table (all values are zero). We fell over in a heap since we couldnt find the right places to patch in the kernel.
My sillyness really - as either I should run "make load" (or tools/load.pl) as root, or be more careful when symbol lookups fail. The script and driver dont handle the null pointers to well.
Simple fix is to shroud the opening of /proc/kallsyms by a call to sudo.
Put up a new release to fix this.
Dtrace updates.. | Thursday, 13 October 2011 |
The recent news about Oracle doing Dtrace has generated a bit more interest in Dtrace, along with some support issues. I put out a couple of minor fixes for later kernels.
I just tried dtrace on my Ubuntu 11.04 release (the day that 11.10 has come out), and it paniced my kernel. Strange, because it did work a while ago, although I havent done heavy bare metal usage (I do get bored watching Linux/KDE reboot :-) ).
So, am downloading the 11.04 and 11.10 ISOs to give them the VM treatment and see what gives.
DTrace update...sort of. | Friday, 07 October 2011 |
Its a few days after OracleWorld and the Dtrace announcement and people may bump into this blog because of Adam Leventhals post.
So, just a few words on the Dtrace/Linux port which has been available for 2+ years now.
From what I understand Oracle are planning to port Dtrace to Linux, despite this release being available. That is a *good* thing, because it means a 4th (to my knowledge) port of Dtrace. First was FreeBSD, then there was MacOSX, then there was this Linux/Dtrace.
I have learnt a lot from reviewing and understanding the differing implementations. My work on Dtrace is "not-bad" if I am going to self-rate. It is mired in details to do with multiple kernel support and lack of source code modifications to a kernel: Dtrace/Linux is simply a loadable kernel module.
Oracle will pick up the latest code of Dtrace from the Solaris area, and will have some fiddliness to insert into Linux. They pay their employees to do this - and thats great. They can change the kernel source code, and provide a value added kernel (just like Google does with Android and all the other hardware/software vendors who leverage Linux).
This will presumably give them a competitive advantage: for those customers who want Dtrace, then Oracle potentially looks attractive. Many people in the Linux community will write-off Oracle as "not team players". That happened before, with Sun. This leads to healthy, sometimes silly but entertaining debates on the interweb.
Since Oracle is not using this port of Dtrace as the basis of their work, does not imply the death of this project. The likelihood is more people will stumble upon it and lead to more support questions or requests to finish or fix issues.
Oracle could add new providers to the kernel - and this release of Dtrace will not be able to match these, without patching kernel source. I dont know how this will evolve. Maybe Oracle will show us what to do; maybe there will be sufficient impetus to get dtrace into the master Linux kernel, but I doubt that will happen.
The GPL vs CDDL debate still rages. I've written numerous times on my opinion of the debate (namely, I dont have an opinion!).
So continue to try/play with Dtrace.
Dtrace on linux and Oracle? | Tuesday, 04 October 2011 |
Whilst Oracle are not seen as the great open-source giver-aways, this can only be good.
I have no inside knowledge on DTrace for Oracle, and twitter (http://twitter.com/#!/search/%23dtrace) has references to people rejoicing or denouncing it.
Is dtrace battle hardened for production use? No. One person isnt going to prove that DTrace works everywhere for everyone on every kernel version. I have tried; one gets bogged down in details, especially for legacy releases and the myriads of distros that have inconsistent packages and package names.
It works (mostly) for me; I know it crashes on a 16-core box occasionally. (Alas, I dont have a 16-core box. Maybe its the 48GB of RAM which is the issue rather than the number of cores, which affects the page table layout of that system).
As anyone knows who manages software projects, a software project manager probably does little "coding" .. the closer to completion of the project, the harder and more required it is to reach 100% perfection. Ask Linus. I dont know how much real coding vs patching/merging/overseeing he does.
Even the big-guns out there, like RedHat and Oracle - much of the time and expense is tracking down bugs. There are a few people who add value. This is what software engineering is about. Finding your place, and managing those around you to optimise delivery.
I dont know what Oracle is doing; even if its a closed-wall, there are benefits. When Apple introduced DTrace, it did a great job of allowing them to fix and optimise and understand their own systems. Sure, it wasnt perfect in the early days.
So, lets see if Oracle can influence.
From what I read, Adam Leventhal has been experimenting with adding kernel probes to the Linux source. Even if this is the only thing he has done, then its a great news banner. I'm loathe to walk that plank knowing that maintaining such deltas is difficult when you are not a part of a key release. Maybe if Ubuntu or RedHat or someone would offer to allow such merges in, it would be fun to add them.
Solaris has hundreds/thousands of probe points, added over the last 5+ years. Each would have required consideration about what to measure, whether the correct probes were in the right places, and supporting/testing them. Probe-dropping is laborious and nobody will thank you for those probes. A lot of people will benefit when "it just works".
So, lets see what happens.
And why have I been quiet recently? Well, various other mini projects needed to be addressed. "proc" is one of them; I am not happy with it as yet - sometimes the results seem to be suspicious, but it does look good.
But my recent project is to update CRiSP a little. Now it is supporting grids/gridlines, so you can see the true structure of a file.
Keep watching.