cpu visualisation Friday, 22 July 2011  
Its quite interesting to contemplate different ways of looking at things.

I have an Intel i7 machine - its fast (its a laptop, so it could be faster if I had a desktop CPU).

Linux provides a lot of raw data, but one thing that "top" lacks is more detailed info. There are display widgets for KDE and GNOME which help you visualise cpu load, but this display shows something interesting:

last pid: 4792 in: 4448 load avg: 1.28 0.71 0.43                      23:21:45
CPU: 8(HT)  @ 2.00GHz, proc:231, thr:464, zombies: 1, stopped: 5, running: 3 [t
dixxy:  7.3% usr, 0.1% nice, 1.5% sys, 84.6% idle, 6.4% iow, 0.1% sirq
RAM:7918M RSS:0K Free:303M Cached:1913M Dirty: 664K Swap:225M Free:7878M
cpu
Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz
          usr   nice    sys   idle    iow    irq   sirq  steal  guest  gnice
CPU0     8.4%   0.0%   2.6%  73.6%  15.2%   0.0%   0.8%   0.0%   0.0%   0.0%
CPU1    63.6%   0.0%   1.8%  21.0%  14.2%   0.0%   0.0%   0.0%   0.0%   0.0%
CPU2     0.2%   0.0%   1.0%  99.2%   0.0%   0.0%   0.2%   0.0%   0.0%   0.0%
CPU3     2.4%   0.0%   1.0%  97.0%   0.4%   0.0%   0.0%   0.0%   0.0%   0.0%
CPU4     0.0%   0.2%   0.2% 101.0%   0.0%   0.0%   0.0%   0.0%   0.0%   0.0%
CPU5     0.0%   0.0%   0.2% 100.2%   0.0%   0.0%   0.0%   0.0%   0.0%   0.0%
CPU6     0.2%   0.0%   0.8%  99.4%   0.2%   0.0%   0.0%   0.0%   0.0%   0.0%
CPU7     0.0%   0.2%   0.6%  98.6%   1.2%   0.0%   0.0%   0.0%   0.0%   0.0%

MHz Cache Bogomips CPU0 2001.000 6144 KB 3990.88 CPU1 1400.000 6144 KB 3990.92 CPU2 800.000 6144 KB 3990.97 CPU3 800.000 6144 KB 3990.93 CPU4 800.000 6144 KB 3990.96 CPU5 800.000 6144 KB 3990.96 CPU6 800.000 6144 KB 3990.94 CPU7 800.000 6144 KB 3990.98

The info is taken from /proc/cpuinfo (this is the "proc" utility - available at my website; run it and type 'cpu' at the command line to see this display).

Note that CPU0 is running at 2GHz - to be expected, although slightly strange. Its strange because this represents the cpu that the proc command is instantaneously running on. It doesnt use much cpu, but the cpu has adjusted the clock to give it speed. (Note that, as an i7, this CPU should be able to ramp up to 2.9GHz but I havent seen evidence in /proc/cpuinfo this occurs).

Note also that cpus 2-7 are idle (800MHz is the lowest speed without actually sleeping).

CPU1 is running at 1.4GHz - I have a backup job running in another window. The question is - *what is cpu1?* I presume its the hyperthreaded cpu, and therefore should run slower than cpu0. Ideally, jobs should run on: cpu0, cpu2, cpu4, cpu6, cpu1, cpu3, cpu5, cpu7, in that order.

The question in my mind - what is hyperthreading -- is it an attribute of the cpu, which is fixed, or does it meander from one cpu to another. If the hyperthreaded sibling is solely virtual, then one can deduce that for this system, we should get unequal performance as the 5th cpu is made to do work.

I just did a test (seeing how many "counts" we can do per second), and ran 5 of them in parallel. Certainly, one of them was not as busy as the other 4. [This was not a good test, since the counter-loop doesnt exercise cache-misses and hyperthread ability, but solely relies on the Linux scheduler to run the processes].

Definitely requires more investigation to understand the effects.


Posted at 23:21:37 by fox | Permalink
  warning! warning! warning! In the beginning was more. Then there was less. Tuesday, 19 July 2011  
In the very old days of computing, you could sit in front of a screen or a teletype and watch the output, a character at a time. 110 baud or 300 baud was eminently readable.

As output devices progress to 9600 baud serial lines, one could fill a screen in a second (80x24). And "cat" or "make" on its own was not good enough to read the text frantically scrolling off the screen.

Zoom forward a few years, and with todays multi-GHz cpus and fast screens, one can 'cat' a 10MB file in a few seconds to the screen.

Did you see the error on line 12,723,104 ? No? Didnt think so.

Tools like "more" and "less" are great for paging slowly through a file and allow searching and backwards motion.

Or, one can use an editor, such as vim/emacs/CRiSP.

These are great.

When building software, e.g. with gcc/g++, and as projects have gotten bigger, it can be difficult to spot an error in the middle of a huge amount of benign output. Worse, gcc has a tendency to overdo the warnings. Scrolling in an xterm to review the output is frustrating, trying to spot the magic "error" in the midst of warnings (or other output).

There are many solutions (such as viewing the output in "more" or "less", and relying on highlighting to find the item you are after). "less" can do highlight, but "more" cannot. CRiSP can do highlighting too.

fcterm (my own personal terminal emulator) can do this too, but you have to tell it what to search for. (I must modify it to have a default set of words - having a single search pattern is not good enough).

I wrote a simple tool called "warn". You use it like this:

$ warn make
...

and all error output lines are shown in red, with warnings in yellow. (My default console is green on black).

Very useful for spotting the wood for the trees.

I havent released it as a standalone tool (it has bare minimum requirements - its plain C code). If people are interested, I will put it out.

Next up is to fix fcterm...


Posted at 22:13:48 by fox | Permalink
  What does '1' mean? Sunday, 17 July 2011  
In the context of load average on a system, a load avg of 1 is something meaningful, if you are on a single cpu system. It represents the cpu is busy, continuously.

Now consider multicore/multicpu machines. A load avg of 1 is not quite so meaningful. On Linux, the load average represents a moving average of processes which are blocking. It slows ramps up and ramps down.

Doing heavy duty work (like parallel compilation) means that "gmake -j" doesnt have enough information to determine if the system is busy.

In the old days, when a source file compilation could take many seconds or minutes, the load average told us what the system was doing.

On an 8-core (Intel i7) cpu, doing 'gmake -j' can invoke tens of parallel compilations, yet, 'top' can show the system as being idle, because the load average takes a while to ramp up.

On an 8-core system, with one cpu being busy, should we say 'the system is busy' (system usage == 100%), or should we say it is idle (system usage == 12.5%)?

The answer depends on what you are measuring and how you want to handle it. If 1 out of 8 cpus is busy (maybe the application is broken and stuck, and eating cpu continuously), then that is important. The system may be busy, but noticing that rogue application is useful. Ignoring it until all 8 cores are busy may never happen.

An additional complexity is that on a totally idle system, a single CPU can ramp up the clock speed; but if that cpu is not doing useful work, then the second cpu may not be able to ramp up as high, and get worse performance.

In the end, what is useful is to notice one or more processes 'behaving badly', e.g. consuming too much cpu, or too many failed syscalls, or too much I/O.

Today top (or my application, 'proc') does not readily show that, but that needs to change.


Posted at 12:59:11 by fox | Permalink
  dtrace gripe Wednesday, 13 July 2011  
I really dislike some aspects of dtrace. Its a great tool, but the "lets pretend we are C" when it isnt is a nuisance. Macro languages should be designed to be expressive, but dtraces' D language is annoying.

Firstly, the lack of if-then-else is a problem. It leads to convoluted use of ?: (which cannot handle multiple statements). I really dont understand why if-then-else isnt there. It doesnt harm the "Thou shalt not have loops" which can lock up a kernel.

Whats annoying is that the C programming language, and D, copying it, does it to an extent that is .. well, annoying !

Consider this: I want a probe which can exit after 5s of execution time. Heres the naive implementation:

BEGIN {
	t = timestamp;
	}
tick-1ms {
	timestamp - t > 5*1000*1000*1000 ? exit(0) : 1;
}

This isnt possible, because exit(0) is a void function.

BEGIN {
	t = timestamp;
	}
tick-1ms {
	timestamp - t > 5*1000*1000*1000 ? (int) exit(0) : 1;
}

But, oh-no! You cannot cast a "void" to an "int". In C, I can understand that (almost) but it leads to painful workarounds. In D, there is even less reason: if a (void) could be cast to "(int) 0", then the above would work. Its still ugly, but functional.

The actual solution is:

BEGIN {
	t = timestamp;
	}
tick-1ms / timestamp - t > 5*1000*1000*1000 / {
	exit(0);
}

Which is fine - although I havent determined if the predicate is worse or more expensive than the actual code. What is annoying is that the predicate is a "different part of the language". What if I wanted to do this:

tick-1ms {
	do-some-stuff;
	if (var > somevalue) { printf("hello"); exit(0);}
	do-some-more-stuff;
	if (var > someOthervalue) {printf("world"); }
	...
}

This can be translated into predicate format, but this can involve ugliness in performing the transformation, especially if the do-stuff lines of code are complex in themselves.

Its time to start addressing these deficiencies in Dtrace (at the risk of being non-standard extensions to the true code).


Posted at 22:30:11 by fox | Permalink
  CRiSP website updated Tuesday, 12 July 2011  
After many years of staring at abject ugliness, http://www.crisp.demon.co.uk has been given a lick of paint, more in tune with the blog site in terms of look and feel and stylesheet.

I have updated some of the very dated things, and hope to update it more so.

Obligatory plug: you can now purchase CRiSP (via paypal) if you so choose.


Posted at 21:47:42 by fox | Permalink