dtrace pgfault handling | Tuesday, 29 March 2011 |
Heres the script:
$ dtrace -n syscall::open*:'{printf("%s", stringof(arg0));}'
It doesnt do anything useful - intercepts all the open system calls, prints the name of the file to be opened. Because the last part of the predicate is wildcarded, we match the "entry" and "return" paths of the function.
On return from a function, arg0 and friends are mostly irrelevant, random and pointless.
So - by doing this, stringof() is being called on a bogus pointer. Which should lead to a GPF interrupt. This works well on the later kernels.
But on RedHat AS4, it paniced the kernel.
After a lot of investigation, it transpires, on AS4, we are taking a page fault, not a gpf. And my page fault handler was not handling the fact that on a page fault, the CPU pushes an extra word on to the stack.
So dtrace is/was dangerously unstable to rogue D scripts, like this one.
Very difficult to debug - because I would keep panicing the kernel as I tried all sorts of experiments to locate the area of the problem (the interrupt code and the C callback code). Having located most of the problem to the interrupt code, it took quite a few days to work out what was wrong (I was ignoring the extra word pushed on the taken fault). But this was good - I had often stared at the Linux interrupt handlers to understand the very subtle effect of how the traps are handled vs the "struct pt_regs" layout. I was having problems with the pt_regs pointer being "garbage" in the various dtrace pieces of code, and it was because, even if I survived panicing the kernel, pt_regs was out by one word.
Having exercised (exorcised) the code very hard, I feel much more confident that a user cannot crash the kernel - just as Solaris had lead us to believe.
[I note that in the Solaris kernel, special code is in place in the interrupt trade code (assembly), to determine if the CPU_DTRACE_NOFAULT flag is set. This flag is set within the dtrace code to tell the gpf/pgfault handler not to take the trap, but, to skip over the offending instruction (which is most likely a MOV instruction)].
So, now we have a better handling of gpf + pgfault (although I still worry if during the handling of a GPF, whether we can have a pgfault. Not sure this matters, because if its *our* pgfault, then we only skip over the offending instruction, we dont try to read other parts of memory.
ctfconvert / libdwarf problems
Another fix I hope to have in this release is some improvements for building which people are reporting to me, due to the changes in the last release (stub dwarf.h added to the ctfconvert utility). AS4 doesnt have a viable libdwarf.so library - so either I work out what it has and patch the ctfconvert code, or add in a libdwarf release (which would bloat the distro). The main problem here is we *need* ctfconvert if the files in etc/*.d are to not cause a run-time syntax error, as kernel structs are referred to in the translators.
I may have to patch the dtrace command to ignore such errors when auto-inclusion is enabled when parsing user scripts.
Testing
Now I am getting more familiar with dtrace, I hope to include better tests to avoid problems where things break. I probably wont enable this in the next release, but definitely for the one after this.
Why the ipad1 is better than the ipad2 | Sunday, 27 March 2011 |
Why?
Our aging cat who has become slightly incontinent and in her last days, decided to pee on my ipad1. Maybe she has a preference for Android, I dont know. But the moleskin cover did a 100% job of avoiding damage or leakage on the device. The (fake) moleskin cover, now washed and covered in washing powder, detergent, soap, really doesnt care what you do to it. It doesnt care what cats think about it. It just survives.
So there you have it. 1 out of 5 cats prefer ipad1 to ipad2.
And I shant leave my ipad on the floor, charging, waiting to be used as a litter tray again.
Script to extract a kernel image from vmlinuz | Wednesday, 23 March 2011 |
This will be part of the next dtrace release - I am debugging bad GPF recovery on AS4 kernels, and need to examine the way the INT13 trap is handled. (It works on my other kernels). Since I dont have the kernel relocatable, and google failed to find anyone who had solved this problem, I wrote my own.
This is in the public domain, so feel free to share:
utils/get-vmlinux.pl:
#! /usr/bin/perl# $Header:$
# This script is used to extract a working ELF executable from vmlinuz. # I want to see whats in the kernels in my VMs, so this helps when I # am dealing with old and legacy distros.
# Author: Paul Fox # March 2011
use strict; use warnings;
no warnings 'portable'; # Support for 64-bit ints required use File::Basename; use FileHandle; use Getopt::Long; use IO::File; use POSIX;
####################################################################### # Command line switches. # ####################################################################### my %opts;
sub main { Getopt::Long::Configure('no_ignore_case'); usage() unless GetOptions(\%opts, 'help', 'n', );
usage() if ($opts{help});
my $fname = shift @ARGV; my $system_map = shift @ARGV;
if (!$fname) { $fname = "/boot/vmlinuz-" . `uname -r`; chomp($fname); } if (!$system_map) { $system_map = "/boot/System.map-" . `uname -r`; chomp($system_map); } my $fh; my $fh1;
############################################### # Read system.map. # ############################################### $fh = new FileHandle($system_map); my %syms; my %addr; while (<$fh>) { chomp; my ($addr, $type, $name) = split(/ /); my $seen = $syms{$name}; $syms{$name}{type} = $type; $syms{$name}{addr} = $addr; $addr{$addr} = $name if !$seen; }
my $_text = $syms{_text}{addr};
print "Reading $fname\n"; $fh = new FileHandle($fname); # $fh->binmode();
############################################### # Find the compressed payload, and write # # it out uncompressed. # ############################################### my $data; my $pos = 0; do { $data = ""; $fh->seek($pos++, 0); $fh->read($data, 4); } until ($data eq "\x1f\x8b\x08\x00" || $fh->eof()); printf "Kernel at offset 0x%x\n", $pos - 1; if ($fh->eof()) { print STDERR "Couldnt find gzip marker\n"; exit(1); } $fh->seek(--$pos, 0); $data = "";
while (!$fh->eof()) { my $buf; $fh->read($buf, 4096); $data .= $buf; } print "Generating /tmp/vmlinux.tmp\n"; $fh = new FileHandle("| gunzip >/tmp/vmlinux.tmp"); print $fh $data; $fh->close();
############################################### # Now create an assembly file so we can # # create an ELF file. # ############################################### $fh1 = new FileHandle(">/tmp/vmlinux.s"); print "Generating /tmp/vmlinux.s\n"; # foreach my $s (sort(keys(%syms))) { # print $fh1 ".globl $s\n"; # if ($syms{$s}{type} =~ /t/i) { # print $fh1 "\t.type $s, \@function\n"; # print $fh1 "\t.size $s, 4\n"; # } else { # print $fh1 "\t.set $s, 0x$syms{$s}{addr}\n"; # } # } $fh = new FileHandle("/tmp/vmlinux.tmp"); print $fh1 "\t.text\n"; print $fh1 "linuxstart:\n"; print $fh1 "_start:\n"; print $fh1 "\t .globl _start\n"; my $a = hex($_text); printf "_start=%x\n", $a; while (!$fh->eof()) { $data = ""; $fh->read($data, 1); last if !defined($data); my $astr = sprintf("%x", $a); if (defined($addr{$astr})) { print $fh1 ".globl $addr{$astr}\n"; print $fh1 "\t.type $addr{$astr}, \@function\n"; print $fh1 "$addr{$astr}:\n"; } printf $fh1 "\t.byte 0x%x\n", ord($data); $a += 1; } print $fh1 "\t.size linuxstart, .-linuxstart\n";
############################################### # Create linker script. # ############################################### print "Generating /tmp/vmlinux.ld\n"; $fh1 = new FileHandle(">/tmp/vmlinux.ld"); print $fh1 "SECTIONS {\n"; print $fh1 " .text 0x$_text : { *(.text) }\n"; print $fh1 " .data 0 : { *(.data) }\n"; print $fh1 " .bss : { *(.bss) *(COMMON) }\n"; print $fh1 "}\n"; $fh1->close();
############################################### # Now build the new ELF file. # ############################################### spawn("cd /tmp ; gcc -c vmlinux.s"); spawn("cd /tmp ; ld vmlinux.ld -o v vmlinux.o");
############################################### # This extracts the boot block code - we # # dont want/care about this. # ############################################### # $fh = new FileHandle("/tmp/vmlinux.tmp"); # my $fh1 = new FileHandle(">/tmp/vmlinux"); # print "Generating /tmp/vmlinux\n"; # while (1) { # $data = ""; # $fh->read($data, 4); # if ($data eq "\x7fELF") { # print $fh1 $data; # while (!$fh->eof()) { # $data = ""; # $fh->read($data, 4096); # print $fh1 $data; # } # last; # } # } }
sub spawn { my $cmd = shift;
print $cmd, "\n"; return if $opts{n}; return system($cmd); } ####################################################################### # Print out command line usage. # ####################################################################### sub usage { print <<EOF; get-vmlinux.pl -- tool to extract kernel from a vmlinuz file Usage: get-vmlinux.pl [vmlinuz] [System.map]
This tool is used to extract the kernel image from a boot up vmlinuz file. It helps to have the System.map file for the image, so that the symbol table can be populated.
The goal of this exercise is to allow browsing a distro kernel when the original kernel source/object files are not available.
Switches:
EOF
exit(1); }
main(); 0;
DTrace release for 20110320 | Sunday, 20 March 2011 |
FC14 has a broken libelf.so implementation whereby the functions elf_getshdrstrndx() and elf_getshstrndx() are mapped to the same code but this is broken, since the only difference between them is the return code for success. (I really detest these two functions - they are almost impossible to read/parse since the names are so similar and palindromic).
I have a couple of bug reports to note. On FC14, even when dtrace is not running, there are some kernel GPFs caused by the hrtimer_interrupt firing and doing a smp_call_function. It doesnt seem to harm the system. (Nicely, the desktop shows an alarm since the kernel problem is detected). Maybe FC14 has validation turned on for cross-cpu function calls.
I loaded up dtrace on 2.6.38 (outside of a VM) and it crashed the system - so be careful/wary. (It might crash on 2.6.37 as well, but I cant remember if I tested it).
I could do with modifying the build scripts to tell you which packages are missing depending on the distro you are using, but its a fine balance to expend the effort to work out whats needed from a virgin install.
Android - issues that I have | Saturday, 19 March 2011 |
- Whats with the home screen? Only 7 pages to store personal
shortcuts? Using the normal icon size limits you to 8 or 16 apps.
Oversized widgets are nice but decrease the utility of the homescreen.
Why 7? Why not allow an arbitrary number. Feels like the programmers
wanted to be too "flashy" with the double-home-click sequence where
all 7 pages are shown in a star fashion. That serves very little purpose.
Having 7 key pages would be great for interplatform consistency but allow something, even like the iOS devices, including folders.
Yes, I can download any number of home screen managers, but the supplier of the OS should set a high barrier for developers to exceed, not a low barrier which has you feeling a lack of proper thought was applied to your device.
- Cut and paste. Hm. It works. But it is like sticking pins in your
eyes. I cannot double click to select a word, I can click to
bring up the copy/paste/select toolbar, but to select a word requires
dexterity of an angel to select a word using the green pins which
pop up.
In addition, why can it not recognize phone numbers in web pages or text edit fields? Dialling from a number not in the phone book is painful. No, I will upgrade that comment: very very painful.
I downloaded a couple of free apps to allow dialing from the clipboard but they leave a lot to be desired - having to flip from app to app.
Absolutely no thought was given about this ability in the OS, despite the OS being used on smartphone devices.
- The Phone app is insanely annoying. If you so much as mistouch something, it dials the number - with no ability to get a confirmation. I tried to paste phone numbers into the phone app, but its just not possible. Instead, it is so easy to accidentally dial the wrong person.
- GMail: you would think they could get this right. But no. The number
of times I launch gmail, and it tells me that there is a message (number in brackets).
I have no idea what mail this is - but I believe this might be a mail
it has not sent, with no indication of where it is. GMail
is appallingly confusing, despite the use of labels. Things are
not shown in chronological order. The app works, but the experience is not
nice.
I use POP3 mail to my other accounts, and, not using the GMail app is a good experience. Both the pop3 mailer and gmail use different notification methods (GMail puts a boring white envelope in the status bar, and the other uses an iPod-like number-in-green on the mail icon).
- The app/setup manager organises things alphabetically which is good but there are too many apps, making finding things difficult. Ideally there needs to be a way of organising things into categories, or having some stats on recently used apps or last-executed.
- The media player is still problematic - it is very jerky and no reason to find out why. I presume other apps are stealing cpu or the SD card is not fast enough to support streaming. So, the video app could copy the data to internal memory or do some form of decent buffering. Apple got this right, why cant Google?
- No root access: why not? Its my device? I dont want to jailbreak to do things on my device, but you have to in order to gain access to certain apps or manipulate certain /proc devices. Theres a lot of useful stats in /proc for showing how the device is being used, or performing, but trying to put people into jail seems like it just begs for jailbreak. Full-on jailbreaking means you are potentially opening yourself up to damage from 3rd party apps which you may not comprehend. This is a difficult one, but some sort of super-user access should be offered as standard, and people who dont know what they are doing, could keep clear of this, or rely on helper apps.
iPod Rant | Friday, 18 March 2011 |
We are on iOS 4.3. Since iOS 4.1 a detestful "feature" was added. After docking/syncing, I can watch videos. Video (TV Show, Movies, etc) handling is still bad - I have to craft my programs as TV Shows to avoid the ipod and ipad from showing them all intermingled. The Video app on the iPad continues to show you a frame from each program with no clue as to the title/filename of the entry. This is painful when a 1 hour show is broken down into 12 5-minute fragments.
But, if I switch to music playing from video, and then switch back, the video app hangs and crashes - takes about 5s, and then you are back at the home screen.
So, next you relaunch the video app, and it takes 5+s to rescan the files and let you go select/continue from where it left off.
It really feels like Apple have given up testing their software or even using it, as they showhorn more wasteful features into each release. (Adblock anyone? Why cant we adblock in Safari, and avoid the wasteful downloads of ads when wanting to quickly flip across news feeds?)
Engadget.com and gizmodo.com get a thumbs down. Engadget crams more and more pictures and video onto the home screen that on the ipad and ipod it takes forever to load and scroll around.
Gizmodo.com seem to keep playing around, such that going to the homepage often results in a mobile view of the text (good!), but clicking on the entries doesnt take you anywhere.
Oh, and whilst we are at it, slashdot.org, in their attempts to improve things just result in a nasty experience when used on a mobile device.
Android...have I got it in for you !
CRiSP Fixes and Python | Friday, 18 March 2011 |
I have recently been pondering Python support in CRiSP. Yes, CRiSP supports it - many thanks to Pierre Rouleau who supplied much of the initial macro incantations, but there is more that needs to be done.
In pondering how to make Python support better and more in line with the other languages, I realised that the "ruler" feature is either misimplemented or, with some minor changes, much more useful. Since Python lacks the use of begin/end or open/close braces to delimit blocks, its very difficult for large methods to understand the indentation.
I hope to experiment with something more natural in this area.
Stay tuned.
New dtrace release 20110318 | Friday, 18 March 2011 |
This should also fix the build errors relating to etc/sched.d.
This should also allow me to make better progress, as I wont be debugging code which basically worked, but confused me due to my own brain cell deficit.
Next up is to look at the build errors on Fedora 15 or 14, which users have been reporting to me - thanks for that (even if I havent been able to respond to the emails).
The road to insanity.... | Monday, 14 March 2011 |
/home/fox/src/dtrace@vmub10-64: build/dtrace -S -n io:::start'{printf("%x %x %x %s %s %p", args[1]->dev_major, args[1]->dev_minor, args[1]->dev_instance, args[2]->fi_pathname, args[1]->dev_pathname, arg2);exit(0);}'DIFO 0xf3d0d0 returns D type (integer) (size 4) OFF OPCODE INSTRUCTION 00: 25000001 setx DT_INTEGER[0], %r1 ! 0x0 01: 28000101 ldga DT_VAR(0), %r1, %r1 02: 0e010002 mov %r1, %r2 03: 25000103 setx DT_INTEGER[1], %r3 ! 0x48 04: 07020302 add %r2, %r3, %r2 05: 1e020002 ldsw [%r2], %r2 06: 23000002 ret %r2
DIFO 0xf3d330 returns D type (integer) (size 4) OFF OPCODE INSTRUCTION 00: 25000001 setx DT_INTEGER[0], %r1 ! 0x0 01: 28000101 ldga DT_VAR(0), %r1, %r1 02: 0e010002 mov %r1, %r2 03: 25000103 setx DT_INTEGER[1], %r3 ! 0x48 04: 07020302 add %r2, %r3, %r2 05: 25000203 setx DT_INTEGER[2], %r3 ! 0x4 06: 07020302 add %r2, %r3, %r2 07: 1e020002 ldsw [%r2], %r2 08: 23000002 ret %r2
DIFO 0xf3d420 returns D type (integer) (size 4) OFF OPCODE INSTRUCTION 00: 25000001 setx DT_INTEGER[0], %r1 ! 0x0 01: 28000101 ldga DT_VAR(0), %r1, %r1 02: 0e010002 mov %r1, %r2 03: 25000103 setx DT_INTEGER[1], %r3 ! 0x48 04: 07020302 add %r2, %r3, %r2 05: 25000203 setx DT_INTEGER[2], %r3 ! 0x8 06: 07020302 add %r2, %r3, %r2 07: 1e020002 ldsw [%r2], %r2 08: 23000002 ret %r2 ...
Look at those 0x48's above -- corresponding to the first 3 computed args to the printf.
Now...the question is : why?
The CTF type code is miscomputing the offset into the structure for the dev_major, dev_minor and dev_instance members. ctfdump is showing the correct values:
STRUCT devinfo_t (40 bytes) dev_major type=4 off=0 dev_minor type=4 off=32 dev_instance type=4 off=64 dev_name type=75 off=128 dev_statname type=75 off=192 dev_pathname type=75 off=256
I wander whats going on - maybe I broke something....
Only time will tell.
New dtrace release 20110307 | Monday, 07 March 2011 |
Alas, one up / one down: CRiSP line wrapping is broken whilst investigating something else.
Another day. Another bug.
Annoying CRiSP install bugs | Sunday, 06 March 2011 |
Firstly, during the install process, the progress messages are truncated since we violate the default 64k limit in amount of text which can be displayed in the edit control. This is now fixed for the next release of CRiSP.
Secondly, CRiSP will create shortcuts on the desktop, but these can fail if you do not have admin rights (not yet fixed). These can be ignored, but they confuse users due to the first issue - everything appears to go silent, but in fact, CRiSP is compiling the macros and this takes a few seconds.
Lastly, and this is really annoying - CRiSP will periodically (about once every 30 days), check to see if a software update is available. Since a new version was pushed out recently, people will randomly see these updates, but some earlier release of CRiSP would crash due to a problem in the way TCP connections are handled on Windows. If anyone has problems with this, please manually download the latest release from http://www.crisp.demon.co.uk, and do an install, and this not only cures the GPF, but will shut CRiSP up (until the next software update is available).
Apologies for this - I know its annoying!
dtrace modules: "kernel" and "linux" (CTF shadowing) | Wednesday, 02 March 2011 |
In the original port of dtrace, I modified libdtrace/dt_module.c to parse out /proc/kallsyms to emulate the /system/object filesystem under Solaris. On Solaris, /system/object is the way to read the kernel symbol table. /proc/kallsyms does that on Linux, but there is a fundamental difference.
CTF
Under Solaris, the kernel is built with the CTF (compact type framework) symbols. For every symbol in the kernel, we know the address of the symbol. But the .SUNW_ctf ELF section contains the struct/typedef definitions. Sun can do this because they build the kernel.
Over in Linux land, we arent building the kernel, so we cannot assume the typedef info is available. (Many modern kernels are compiled with -g and a full debug symbol table for tools like systemtap and the kernel debugger, but we cannot mandate this is true for users of dtrace).
So, my original code ended up with two "modules" in the dtrace data structures - one is "kernel" which was pretty much useless, since we had nothing, and one called "linux" (or maybe I had them the other way around!). The linux module had all the symbols in /proc/kallsyms.
Consider this:
$ dtrace -n 'syscall::open*:{printf("%p", cur_thread);}'
Previously this would fail. (It would fail because cur_thread isnt a valid Linux data symbol, but ignore that for now!). It failed because although we can find the value/address in /proc/kallsyms, we didnt have any type info for it. We could do a typecast to get the right effect, but this rapidly gets annoying and messy when dealing with the hundreds of interesting symbols in the kernel. Worse, we need some of these for correct emulation of key data structures (like "curcpu").
So, what I am doing at the moment is handling this "shadow" module, but having two modules in the kernel: "kernel" and "linux". "kernel" contains a copy of /proc/kallsyms - i.e. the values, but "linux" contains the CTF datatypes loaded from the build/linux-$version.ctf file (which is simply an ELF file containing the .SUNW_ctf section).
This mapping is transparent to end user D scripts, and lets me concentrate on fixing up "sched.d" to allow access to key info about CPUs, and move onto to other required data structures.
Hope to have a new release in a few days which fixes this and gives me a head start in allowing access to the full proc structure (task_struct under Linux).