Metrics, Statistics and Code Quality | Saturday, 31 May 2014 |
The discussion below is about C/C++ - the results may not be applicable for Java/.NET, Python/Perl.
Consider the following question: For your favorite application, how many functions are called to start the application?
You can guess at the answer if you dont know the answer.
Follow up question: How many function *returns* do you estimate happen during the program startup?
You are wrong! The number of function returns may not equal (to a close approximation) the number of function calls.
I was surprised by this observation. I tested the 'cr' console mode CRiSP startup. It runs about at about 550,000 function calls to start it up. But, surprisingly, only about 447,000 function returns.
The reason is: inlining. An inlined function may show up as a function entry, but may not show as a function returning.
Once we have a trace of the function executions, the possibilities open up to all sorts of metrics generation.
We can look at the frequency - which functions are called the most. Or we can look at duration. Duration can be problematic for those functions which are called once, but never return (eg. "main" will get called once, but wont return until the application exits - or, maybe never, because exit() may be called).
The tracing tool I have logs every function call, line executed, and function return, for offline analysis. Execution of CRiSP generates a 500MB log file - quite hefty for a 'small' application.
Theres some other things that can easily be done here - such as instrumenting certain instructions or functions to gain an insight into other things. For example, it would be possible to trace all mutex locks, or file I/O, or log the stack of specific scenarios.
Much of this may sound familiar, because gcov, strace and dtrace can do variants of these. The point being that each tool excels at a specific domain of monitoring, but almost none give you programmatic access to detailed working of an app. dtrace comes close, with the D scripting language, but its not really very good for user space introspection (other than trapping function calls and stack traces).
If theres interest, I will publish the tool - its a simple Perl script for the annotation recording, and a small C library. The tool modifies the assembler code of your application (so you really need a special area to build in - you dont want to test or distribute these binaries, since there is a size and speed penalty to this optimisation; I havent finished optimising - the execution penalty when the tracing is disabled is small, but not small enough).
GMail cookie problem - SOLVED | Saturday, 31 May 2014 |
The problem shows itself by trying to get to gmail, and being thrown a "Your browser has cookies turned off" dialog. Its very infuriating, since cookies are not off, and the dialog is not informative as to the true problem - hence taking months to resolve (for me).
A partial solution was to delete all google cookies. It seems like the issue is related to google-chat - something I never use, but gmail insists on putting up the gchat login underneath the mailboxes.
I decided to turn that off, and the problem goes away ! If I restart firefox, I can get to my two tabs for my different mail accounts, and the inbox shows fine, and I can browse/send mail.
The only strangeness is that Gmail displays a
Gmail is having authentication problems. Some features may not work. Try logging in to fix the problem.
Damned if I am going to do that. Maybe if the error message was clear about what the issue is, then I might do the login trick.
But I am very mistrusting of the quality of gmail. Maybe, by disabling gchat, my firefox wont bloat so much.
Please leave my source alone! bug in gcc - yes - a real one! | Saturday, 31 May 2014 |
I added hyperlink detection the other day, because I was bored. (I need to implement hyperlink clicking, next).
I was having problems with my Ubuntu 14.04 - the cpus had been spinning for many days due to issues with khubd kernel process. After the reboot, I was running the new fcterm. But ALT key processing was broken. Weird.
After investigation, it turns out that gcc wants to convert strcpy() calls into stpcpy() calls. Ok, that may be fair.
Heres the source code:
strcpy(rp, keymapping[k].k_value);
Heres the assembler:
.loc 1 1918 0 movq %r15, %rsi movq %r12, %rdi call stpcpy
Why is it converting strcpy to stpcpy ? Because its trying to be clever. stpcpy is like strcpy, but returns the end of the copied string, not the start.
Unfortunately, this version of glibc seems to be broken. strcpy and stpcpy - at best, will do a single strlen() over the src argument. Heres the disassembly.
disass stpcpy Dump of assembler code for function stpcpy: 0x0000000000427150 <+0>: push %r12 0x0000000000427152 <+2>: mov %rsi,%r12 0x0000000000427155 <+5>: push %rbp 0x0000000000427156 <+6>: mov %rdi,%rbp 0x0000000000427159 <+9>: mov %rsi,%rdi 0x000000000042715c <+12>: push %rbx 0x000000000042715d <+13>: callq 0x404ff0 <strlen@plt> 0x0000000000427162 <+18>: mov %rbp,%rdi 0x0000000000427165 <+21>: mov %rax,%rbx 0x0000000000427168 <+24>: callq 0x404ff0 <strlen@plt> 0x000000000042716d <+29>: lea 0x1(%rbx),%edx 0x0000000000427170 <+32>: add %rax,%rbp 0x0000000000427173 <+35>: mov %r12,%rsi 0x0000000000427176 <+38>: mov %rbp,%rdi 0x0000000000427179 <+41>: movslq %ebx,%rbx 0x000000000042717c <+44>: movslq %edx,%rdx 0x000000000042717f <+47>: callq 0x405440 <memcpy@plt> 0x0000000000427184 <+52>: lea 0x0(%rbp,%rbx,1),%rax 0x0000000000427189 <+57>: pop %rbx 0x000000000042718a <+58>: pop %rbp 0x000000000042718b <+59>: pop %r12 0x000000000042718d <+61>: retq End of assembler dump.
See the two calls to strlen? I believe that to be the problem. Now I need to stop strcpy being converted to stpcpy (which is not easy - there is a gcc compiler option to do this, but now I would have to sprinkle this everywhere - for all build tools). Worse, I now know there is a rogue stpcpy on other peoples systems, so CRiSP could be affected if they download my binaries.
I havent suffered a compiler bug in nearly 20y - exactly because my coding style and idioms is trained to avoid complex stuff, and the compilers have become more reliable. But now, that seems to be at an end.
Retracting the gcc security hole | Friday, 30 May 2014 |
http://eli.thegreenplace.net/2011/09/06/stack-frame-layout-on-x86-64/
And I thank my colleague, Luke, for finding this (he actually pointed to the wikipedia page). The low down is this. If you have a leaf function - one that calls no other function, then GCC will emit an assembly optimisation where instead of decreasing the RSP register by the required space, it will reduce it by the amount needed, minus 128 bytes. This can allow short positive and negative addressing offsets from the RSP register.
int func(void) { char buf[4096] = {0}; int i, s;for (i = 0; i < 4096; i++) s += buf[i]; return s; }
A function like the above will reveal the strangeness, if you compile to assembler.
My assertion was that whilst RSP is not at the bottom of the stack, that something like a signal or some other callback, could corrupt the stack and lead to indeterminate results. I tried various things to trigger this, and havent succeeded. The link above, demonstrates that the 128 "red zone" is what saves the day.
So, until I can uncover someone not honoring the ABI protocol, we can all sleep more soundly at night.
Now I can go back to the line profiling/tracing tool...
Security vulnerability in gcc optimised code? | Tuesday, 27 May 2014 |
The first implementation worked - about 30 lines of Perl, and the same for the runtime library. This lets you trace every function call/return and line executed - stored to a log file for later analysis.
I got stuck, because a piece of my code would run in an infinite loop. Examination of the C code and assembler kept telling me to "save the registers" - any attempt to inject a CALL instruction into the body of the compiled output would be sensitive to any register corruptions or flags register issues. However, despite tediously poring over my code and looking for insights, I couldnt get the bug to go away.
After a lot of investigation, and a definite "the bug is exactly NOT where you think it is", I found the culprit. It surprised me, but should not have.
Given any form of assembler, like this:
instr1 // inject code here call __cdebug // end of injection instr2
(Above is simplified), the call to __cdebug, should not affect the execution state. But it did. In fact, the above code looks more like this:
instr1 // inject code here pushf push %rax push %rbx ... call __cdebug ... pop %rbx pop %rax popf // end of injection instr2
The above would break the application. But how?
Well, the compiler can do something like this:
.... subq $nnn,%rsp // code to use the buffer referenced by %rsp .... // restore the stack pointer addq $nnn,%rsp
By moving the stack pointer and reserving space, something strange happens. The above code can be generated by this:
void func(..args...) { char buf[nnn];.... }
The buffer - which is uninitialised, but *will* be used, can be affected by gregarious push/pops writing to the stack. I think this is interesting because it suggests that something like a signal, could do the same thing - writing random results to a part of the stack which is needed, yet not carefully guarded against. This could be a security hole in any application where timers or other signals arise. That would be near impossible to find.
The solution for me is simple .. before doing anything, move the %rsp register out of harms way:
subq $bignum,%rsp .... addq $bignum,%rsp
and that fixed my problem. Now the question is, what is "bignum"? Heres a simple way to find the potential worst case scenario in your code:
/tmp@dixxy: objdump -d ~/crisp/bin/cr| grep sub.*0x....,%rsp 42bd0a: 48 81 ec 18 20 00 00 sub $0x2018,%rsp 42d07a: 48 81 ec 00 20 00 00 sub $0x2000,%rsp 43b200: 48 81 ec 28 20 00 00 sub $0x2028,%rsp 43b645: 48 81 ec 08 20 00 00 sub $0x2008,%rsp 43c5a2: 48 81 ec 18 40 00 00 sub $0x4018,%rsp 43da8b: 48 81 ec 10 20 00 00 sub $0x2010,%rsp ...
Those 0xNNN numbers represent likely similar scenarios to having buffers on the stack and potentially dangerous signal holes in an application. You either want the max() of these numbers, or something which knows the max stack area for a call frame.
Interesting. I dont recall anyone playing with this area before.
Firefox and /usr/share/icons/oxygen/icon-theme.cache | Monday, 19 May 2014 |
Another issue I have with firefox is the sheer size of the process. 3GB for a browser before I have done anything is really amazing. Especially given that on a mobile phone, I can run happily in a few hundred KB (using Dolphin browser - my current favorite, although Opera is better on mobile). Again, avoid Chrome. Because there are so few config options which any/all of the browsers do, to optimise mobile bandwidth.
Anyway I was trying to figure out why the browser is so huge. I tried doing a "pmap" on the firefox process, and I found the file, named in the title, responsible for *144MB* of memory use. Sure, the file is likely mmapped, but thats huge. What the heck is in that files? Obviously icons. But why does firefox need that file?
So I chmod 000 the file and that shaves a significant memory from firefox. But if you look at the memory usage, it looks like firefox is using hugepages to mmap all the shlibs. It really is depressing that firefox wants a large memory footprint and I am looking at what else can be pruned. Actually, on closer examination, it appears there is a 2MB unreadable guard page after the text section of each shared library is loaded.
If I sort the memory sections into order, heres the top N segments
26624K rw--- [ anon ] 26624K rw--- [ anon ] 57992K r-x-- libxul.so 65540K rw-s- pulse-shm-839350638 66484K r---- icon-theme.cache 66484K r---- icon-theme.cache 83996K r---- icon-theme.cache 83996K r---- icon-theme.cache 162816K rw--- [ anon ]
Those icon caches seem to correspond to these two files:
83996 -rw-r--r-- 1 root fox 86010324 Apr 20 14:37 /usr/share/icons/gnome/icon-theme.cache 66484 -rw-r--r-- 1 root fox 68076920 May 10 17:52 /usr/share/icons/hicolor/icon-theme.ca
Thats another 300MB of memory use.
At the moment, my firefox is 1.37GB in size - 361MB resident in memory, which is not bad really.
http://www.protopage.com/ | Wednesday, 14 May 2014 |
Its a nice site, and demonstrates that innovation and art exists everywhere. I dont know how useful the site is - but the default selection, and high level of functionality is impressive.
Of course there are huge amounts of impressive sites on the web, but I do like the "MDI approach" to laying out windows, and it has a lot of depth, and equally, a lot to learn from.
rss.pl site | Friday, 09 May 2014 |
In 2 months, I have downloaded a total of *32MB* of traffic to my mobile. The experiment, first started whilst Ovivomobile was alive, was to avoid the pain/delay of quick web browsing. The goal was 2MB/day, but my diet has done so much better. I have mentioned my site on the blogs before, but with the recent switch away from dyndns, my traffic dropped to zero. Not even googlebot knows of my existence. Which is strange, because I didnt understand why it was searching me in the first place. (Probably because of my links on this blog).
So I am putting in a proper clickable link to see what happens
http://crisp.publicvm.com:3000
Feel free to use, or ignore. I certainly saw some external traffic.
As I sit here waiting for Windows 8.1 to install - about 4d so far to get this installed. By my reckoning, in the future, OS updates will happen faster than the installer can install them. And then the earth will such suck us into the aether.
CRiSP FTP Site Change | Thursday, 08 May 2014 |
I have uploaded the www.crisp.demon.co.uk site to map to
crisp.publicvm.com
So that downloads/updates are restored and I will now remove the banner from the rss.pl site (http://crisp.publicvm.com:3000).
One of the unfortunate things about being one of the first people in the UK to have home internet access via Demon, who got sold/taken over by Thus, and now I think Vodafone), is how poor value for money that is, and being stuck with it. At the time - the price was brilliant - there was no competition and it was truly affordable. The price has never changed in 20+y - which is cool, but I dont use their internet service. I only use the email address - which is valuable to ensure customer continuity. Because of that, and the overprice for the email address, I darent shop around like most companies do (and few are as old as Foxtrot systems).
clang and gif | Monday, 05 May 2014 |
This one is strange. It highlights a bug in GIF code. As far as I can tell - most GIF implementations are based on the same example code from the X11 consortium. The bug appears to be benign - but it is naughty.
static int GetCode(file_info_t *fip, int code_size, int flag, int *errp) { static unsigned char buf[280]; static int curbit, lastbit, done, last_byte; int i, j, ret; unsigned char count;*errp = FALSE;
if (flag) { curbit = 0; lastbit = 0; done = FALSE; return 0; }
if ( (curbit+code_size) >= lastbit) { if (done) { /* if (curbit >= lastbit) then ran off the end of bits */ return -1; }
/* BUG IS HERE - Original code. last_byte will be zero */ /* so we read two bytes before the static buffer, above. */ // buf[0] = buf[last_byte-2]; // buf[1] = buf[last_byte-1];
buf[0] = last_byte >= 2 ? buf[last_byte-2] : 0; buf[1] = last_byte >= 1 ? buf[last_byte-1] : 0;
Here is the error - I love the user of color in clang:
================================================================= ==16171==ERROR: AddressSanitizer: global-buffer-overflow on address 0x00000176c83e at pc 0x8ec684 bp 0x7fffffff2a70 sp 0x7fffffff2a68 READ of size 1 at 0x00000176c83e thread T0 ....stack dump... 0x00000176c83e is located 2 bytes to the left of global variable 'GetCode.buf' from 'gif.c' (0x176c840) of size 280 0x00000176c83e is located 54 bytes to the right of global variable 'LWZReadByte.sp' from 'gif.c' (0x176c800) of size 8 Shadow bytes around the buggy address: 0x0000802e58b0: 00 00 f9 f9 f9 f9 f9 f9 04 f9 f9 f9 f9 f9 f9 f9 0x0000802e58c0: 04 f9 f9 f9 f9 f9 f9 f9 04 f9 f9 f9 f9 f9 f9 f9 0x0000802e58d0: 04 f9 f9 f9 f9 f9 f9 f9 04 f9 f9 f9 f9 f9 f9 f9 0x0000802e58e0: 04 f9 f9 f9 f9 f9 f9 f9 04 f9 f9 f9 f9 f9 f9 f9 0x0000802e58f0: 04 f9 f9 f9 f9 f9 f9 f9 04 f9 f9 f9 f9 f9 f9 f9 =>0x0000802e5900: 00 f9 f9 f9 f9 f9 f9[f9]00 00 00 00 00 00 00 00 0x0000802e5910: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0000802e5920: 00 00 00 00 00 00 00 00 00 00 00 f9 f9 f9 f9 f9 0x0000802e5930: f9 f9 f9 f9 04 f9 f9 f9 f9 f9 f9 f9 04 f9 f9 f9 0x0000802e5940: f9 f9 f9 f9 04 f9 f9 f9 f9 f9 f9 f9 04 f9 f9 f9 0x0000802e5950: f9 f9 f9 f9 00 00 00 00 00 00 00 00 00 00 00 00 Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Heap right redzone: fb Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack partial redzone: f4 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 ASan internal: fe ==16171==ABORTING
This one testing feature (of clang) is really brilliant. Kudos to the developers for helping improve software quality. I do hope everyone else is now using it !
Heartbleed .. and CRiSP | Sunday, 04 May 2014 |
http://www.dwheeler.com/essays/heartbleed.html
A quick recourse on CRiSP. In the early days of CRiSP, it was unreliable - bogus pointer derefs, use-after-free, and many things typical of C code. I tried out Purify in the early days, along with other malloc() detection tools, and they were helpful. Purify was an eye opener - it detect many issues - on Windows and Linux, and helped me to get my act together. I was really impressed with Purify.
I wrote my own Purify for the i386 - it was very slow (back when a 200MHz Pentium Pro was king). It got close to being operationally useful, but debugging it was painful - hitting a bogus instruction emulation and figuring out where the problem was killed it. By then CPU speeds were advancing, and although my implementation was 20x slower than native speed, it was educational - I found some of the weaknesses of Purify.
I built my own malloc checker - its the union of all malloc checkers out there - can use mmap and guard pages for pre or post memory reference checking, and its enabled via environment variables. I run with it all the time - I get to find the bugs first. It can do memory leak detection.
Purify was problematic due to the patents; eventually Valgrind came along - a brilliant replacement for Purify and open source. It bypassed the patent. (Most of Purify's patents have now expired, but the world has moved on).
GCC and glib have improved with error detection.
The paper quoted above talks about clang's address-sanitizer. Its not something you want in production code, but it is very desirable for testing. I built CRiSP this morning with this turned on, and immediately hit three bugs (1 fixed, 2 on my todo list). These are bugs that have never shown up with any of the other tools (uninitialised stack reads), so its excellent that the quality of tools is improving.
I have been working on my "ptrace" (strace replacement) to add crude code-coverage reporting (based on all the functions in all the loaded shared libs). This is getting close to working, so I will release that when done. Code coverage is really just the start of software testing - it tells you where you have gaps.
Its interesting that heartbleed happened - the focus by everyone in the software community to do "better" has to be blessed. We cannot afford another such bug - although undoutedbly they will happen, but now, clever minds are targetting gcc, clang, glibc and many other tools to make software development more enjoyable.
The best thing about all of this, and the tools, is that the technical scenarios behind the tools and problematic code is getting serious review. For me, this means higher quality tools and utilities I use daily. For you (my crisp customers), it means the same.
http://news.ycombinator.com (Hacker News) | Saturday, 03 May 2014 |
As the Internet grew, a number of things happened. $$$ sites spread with potential walled gardens (such as AOL and Yahoo), but I never liked them. They took the pop-fashion approach to presentation and content.
Deja-Vu was a service which sold Usenet on CD-ROM. I subscribed for a year or two - a brilliant service - back in the pre-internet days (in the UK), but the volume of data was growing - exponentially, which hurt them.
Then along came Google - which did search, and bought up Deja-Vu and provided a web based interface to groups.
Sites like slashdot.org, www.theregister.co.uk and quite a few others became the "read" of the day.
We had RSS feeds - but keeping them upto date was a nuisance. Google seemed to have real problems with spam on their newsfeeds. The google groups turned into google+; despite googles cleverness in search algorithms, the groups were a mess. With only web access and a UI that they have designed, "groups" became useless. Even google seemed to rarely index or search them. (And google has never had any sense of a good UI - other than the google home page; Android and every Google app is an appalling mess of blandness and zero functionality).
Slashdot went from a great tech read to becoming blander and blander. Somehow, the technical stories have been diluted in favor of politics, economics, social and politcal stories. It is a shame. Although I try to keep reading it, I feel a sense of "loss".
I bumped into news.ycombinator.com a while back - and it is everything I wanted from the 'net - links to interesting articles - articles I know something about and articles I know nothing about. So I can feel my education level rising. And a simple and high level of relevant comments. Its a definite must read - with a good level of trickle of new stories to the front page. Additionally, it has an RSS feed.
I added Hacker News to my RSS feed (rss.pl) [note the URL change - the old one will disappear shortly]. Additionally, the bytes-per-information level is incredibly dense - a single page is a no more than a couple hundred KB - and great when reading on mobile devices.
If you read my rss collection of sites - you may think they are odd or make sense - most of the sites I am picking feed off each other, so you get the same stories from multiple sites, but HN or HackerNews doesnt suffer from the 'lets repeat what everyone is is saying'.
I hope it continues, in its current form, for as long as possible, and at least until I can find something to augment/compliment it.
X11 Xinput XMODIFERS in CRiSP | Saturday, 03 May 2014 |
I know this stuff is developed by people where English is not their native language, but I couldnt believe how awful the XIM mechanism has gotten on Ubuntu 14.04.
Firstly, /etc/X11/xinit/xinputrc has a language spelling error. I know English is not the authors first language, but that lead me on to the next issue.
$ env | grep -i bus JOB=dbus XMODIFIERS=@im=ibus DBUS_SESSION_BUS_ADDRESS=unix:abstract=/tmp/dbus-wAGMmImK14 QT4_IM_MODULE=ibus GTK_IM_MODULE=ibus
Somebody is being determined to turn on IM mode for X apps; but its broken. I can turn this off - I was manually going to edit the xinputrc, but Ubuntu has sprayed this all over the place. And although I can easily turn it off, my users wont know what is at issue, and I will likely forget.
If I do
$ ps -lp 3594 F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD 1 S 200 3594 3472 0 80 0 - 90021 poll_s ? 00:09:11 ibus-daemon $ psg xim fox 3594 3472 0 Apr20 ? 00:09:11 /usr/bin/ibus-daemon --daemonize --xim
Yet another space hog - for something I dont want.
So, I go to the KDE desktop to configure things. First I visit the dialog to change the language (United Kingdom English, not American English). Yea gods! About 1 minute of activity just to change the values in the dialog - I have no idea what the dialog was doing, but I think it popped out for a MacDonalds lunch - burger and chips - it was so slow. But that didnt help.
Then I found the dialog in system settings to set the XInput mode - much better - there I have an array of choices - all confusing, with really bad english attempting to tell me what to do.
Now I have to implement a workaround for the bug in Ubuntu/X11/XIM and I dont know by working around this, I will hurt something that has worked(?) for 20y.