09.21.06

Technical books

Posted in soc at 18:50 +0200 CEST by Unleashed

This summer’s SoC has provided me with more than enough money to buy some books I’ve had in my wish list for quite some time. I’ve ordered several books which should have me pretty busy for this year, mostly kernel internals, because I want to get hands-on with those subjects. FYI, I got The Design and Implementation of the FreeBSD Operating System, Solaris Internals (covering OpenSolaris) and Understanding the Linux Kernel (3rd edition). I also ordered a copy of Unix Systems for Modern Architectures, which I had read (some chapters) at university and I liked it, along with another book dealing with the Linux network stack and yet another dealing with filesystem forensics. I’m still missing a batch including these books which should arrive soon.

I’ve been reading contents about synchronization primitives on the three books I received and all I can say is this money has been well spent. I try to fill my library with quality books, and these will add to it nicely. As you can see I have plenty of reading time ahead of me.

On a side note, I’ve been contacted to do full-time paid work on a wxWidgets application, partly due to having been selected for SoC, partly due to other facts, so I’d like to thank both Gentoo and Google for it. However I expect to continue helping Gentoo on my spare time, especially the BSD faction, which is in real need of manpower. Thank you!

Update: Konstantin Belousov assured me the patch fixing the kernel’s deadlock will appear in 6.2 (heavily reworked, though), so we won’t need to continue patching freebsd-sources. :D

09.04.06

Deadlock dies

Posted in sandbox, soc, gentoo at 13:20 +0200 CEST by Unleashed

Finally! The deadlock in FreeBSD kernel has been tackled and we can hopefully have it fixed along with other devfs issues in 6.2.

Tests show the deadlock is gone now, so we can at last have Gentoo’s sandbox working without problems in FreeBSD (although you need to patch your sources, apply it in sys/fs/devfs directory using gpatch -p1 < patch and recompile the kernel). This patch or something better when it gets committed to -CURRENT should probably go in for freebsd-sources before 6.2 gets released upstream so that G/FBSD users can enable sandbox safely.

Anyway, today I discovered a new sandbox violator, the upgrade to the new version of GNU m4 apparently hit the mkdir bug (aka libtool bug). This is the second package in the list now, and if it does not get corrected ASAP this list is likely to grow much bigger.

Now back to my exams.

08.26.06

For commodity

Posted in sandbox, soc, gentoo at 4:51 +0200 CEST by Unleashed

Just wanted to drop a couple of lines regarding the new ViewVC interface for the FreeBSD’s sandbox support repository hosted in my server. I’ve noticed that there’s no ebuild for it in the portage tree. For evaluation purposes it points to revision 80, which is the last before code getting merged upstream around mid-July (and it’s still the last at this moment). This link plus the known subversion repository link and the one pointing to my releases have been added to the relevant links category in the blog!

08.21.06

Wrapping up

Posted in sandbox, soc, gentoo at 10:25 +0200 CEST by Unleashed

Ok, it was about time to update my SoC blog, specially so today that the official Google deadline is a few hours away, and well, it’s not that I have much spare time available with exams starting next week, so I have to take the opportunity to make some kind of quick status report. I’ll leave an evaluation of the whole summer and the, hmmm, ‘interesting’ experiences I’ve gathered for some time later. Just one note: bug-squashing can be a hell of a nightmare.

Well, as already explained here, sandbox patches to work under FreeBSD were accepted by sandbox’s maintainer, including some nice extras, to appear in sandbox 1.2.20. There were problems with 3 well-known packages, OpenSSL, FFMpeg and Libtool. OpenSSL is being tracked in bug #138344, FFMpeg is to be fixed by its maintainer (ld-elf.so.hints… hints!), and Libtool was the one still waiting for a fix. Well, it turned out to be something important that can affect more packages, see bug #144594 for details (the alpha sandbox version debug tools helped quite a bit here). They should all be easily fixed.

So if life were to be simple this should’ve been all. But no, it’s not that way. You already know from previous entries that sandbox was having problems in what seemed a FreeBSD kernel bug. Well, it’s taken me a lot of time to figure out the details due to the nature of the problem and my unfamiliarity with the kernel, including being unable to debug it at all. Eventually I managed to identify the problem by patching the sources, you know, the good old printf method :D . Some BSD internals knowledge from university were also really helpful (I fixed the problem that prevented me to debug it properly later though). I identified what was happening and informed the FreeBSD folks (although I’m not sure they feel comfortable when someone mentions Gentoo). You can get more details about the whole thing reading the mailing list message or the problem report.

So to make it short: sandbox works, but it likes to trigger a FreeBSD kernel deadlock in a quite vietnamized piece of code, with strong ties to VFS code. Somebody is supposed to be working on this, so let’s hope this gets fixed sooner rather than later, ie. 6.2 (just let me be a bit sceptical though).

And that’s it for now. I hope to continue contributing in the future, at the very least I’m pretty sure sandbox will be receiving some much needed help.

07.15.06

Deadlock driving me mad

Posted in sandbox, gentoo at 23:30 +0200 CEST by Unleashed

So here’s the problem: sandbox does work under FreeBSD apparently in a normal and well-behaved fashion. At some point, some sandbox’ed processes freeze and sandbox keeps waiting for them. Further actions are constrained to anything not involving the disk: launching ps is guaranteed to insta-freeze, for example. And trying to kill those processes won’t work. “top” shows (in this example) processes cc1 and as stalled in WCHAN “devfs” and “devfsm”. And guess what, documentation regarding those waitchannels is nil.

The_Paya kindly helped in diagnosing this problem, stating that in some of his setups a FreeBSD kernel with ACPI disabled didn’t show this problem. However that didn’t seem to work here, but I’ll give it another go just in case.

So I tried to use some of the basic tools FreeBSD provides to try to get some more information on what’s going on. ktrace is useless here. It generates lots of output. Like 1 GB of output just for gcc unpacking and patching. Even after disabling the logging of I/O operations it’s a no-no. And it doesn’t support writing to stderr or whatever I’d want, it must write to a regular file, so that storage for it is needed, although I don’t care about 99.999% of the info.

And “truss” refused to work because it didn’t find /proc/curproc/mem, which doesn’t exist in Gentoo/FreeBSD because it uses linprocfs for /proc by default. I had to mount the normal procfs (not linprocfs) for that to start working (I wonder how will Gentoo/FreeBSD be supposed to work by default with this). Once I did that truss proved also kind of useless: I was unable to trigger the freeze before truss would leave behind an enormous trail of defunct unkillable processes sitting there (not even using procctl(8) solved that), eventually stopping the trace.

So I’m mostly forced to start digging FreeBSD kernel for some hints. Looks like something is triggering a deadlock in kernel land, probably in devfs code, preventing any process in the system to ever access the disk. I’ve added support for every single debug option in the kernel configuration, and started reading some kernel debugging documentation, and will try to gather as much information as possible before sending any kind of help request to the fbsd mailing lists (I’m even testing sandbox in vanilla FreeBSD for them to test if needed). Ah, isn’t it wonderful when you find out that someone else’s code is bringing your whole project down due to it being an OS bug?

While at that I’ll be working next week in a new patch to sandbox providing more useful information for SANDBOX_ABORT support, playing with signals, e.g. a trace of processes (including command lines) leading to the one violating the sandbox. If I find it useful I’ll submit that one for 1.2.21 to not interfer with 1.2.20 stabilization.

I’ll have limited internet connection during next week, so I’ll be reachable only by email, if anyone wants to contact me.

07.10.06

>=sys-apps/sandbox-1.2.20

Posted in sandbox, gentoo at 17:17 +0200 CEST by Unleashed

So things have changed quite a bit since last week. I was preparing a patch to replace the first one blacklisting __getcwd in sandbox, by writing a wrapper for it avoiding the infinite recursion. Turns out sandbox maintainer, Martin Schlemmer (azarah), started working again on sandbox and closed a couple of bugs asking for debug features I was also working in to debug libtool. As sandbox svn repo isn’t available to the public, I asked him for his patches and had an enlightening conversation.

I talked to him about the FreeBSD support and got some answers. First off, wrapping getcwd() doesn’t make sense in sandbox. Indeed. It’s there because of BSD libc issues in the first place. So the idea of writing a new patch wrapping up __getcwd() became kind of pointless. There’s no need to wrap anything else, so the original patch blacklisting __getcwd should be fine, and he was ok with it. That saved me from suffering a lot of headaches.

Soon enough Martin released a new testing version of sandbox (like, next day). I then adapted most of my previous patches to this new version (mostly little fixes, cleanups and debug features) and submitted the one providing FreeBSD support for inclusion. Good news is that sandbox 1.2.20 will support Gentoo/FreeBSD. Bad news is that it will only add support for FreeBSD, as I can’t test any other Gentoo/*BSD platform. So those will have to wait or their maintainers will have to contact me and offer help.

Well, 1.2.20 will also include a lot of new debugging aids. Martin was working on them and I added some reliability support and fixed some problems (and there are still lots of code to adapt to using the new more reliable functions). I’ll probably be submitting more patches for sandbox during next weeks, as the code needs quite a bit of work to bring it to an acceptable shape, but the main goal has been accomplished.

So using those new features, which include logging of the full command line of the process violating the sandbox, I gathered the “mkdir” thing violating the sandbox in libtool yesterday.

All in all, the important thing to note is that you will need >=sys-apps/sandbox-1.2.20_alpha1-r2 (be sure to unmask it) to test Gentoo/FreeBSD support (or use my latest overlay, which contains some more fixes already submitted and some others being tested before submission, and doesn’t break in OpenSSL for the time being).

So testing is needed (but don’t bother to point out that libtool/ffmpeg/openssl violate the sandbox, they’re known issues).

07.09.06

Debugging libtool build

Posted in sandbox, gentoo at 16:32 +0200 CEST by Unleashed

It was about time to update the blog. There has been quite a lot of activity in this project during these days, but I’ll leave that for tomorrow’s post.

I was adding some debug features to sandbox to help me debug the libtool ebuild, which failed trying to “chmod /”. I started using sandboxshell to debug libtool. Surprisingly, libtool wouldn’t build under sandboxshell. In fact, it wouldn’t build whenever SANDBOX_DEBUG was enabled and SANDBOX_VERBOSE wasn’t being explicitly disabled (and I needed it to locate the place in the build process the violations were occurring). So things started to get complicated: I needed to figure out how to debug the sandbox violations without sandbox debug information.

After some time thinking about it and reviewing the code for debug support (which was apparently harmless), I got it. It was in front of me all the time: the output generated by the debug information was interferring with the build system, every process running under the sandbox was being forced to write debug information out to the console, pipes, etc. So for example when one test was being run and some other code was expecting some predetermined output from something it spawned, the sandbox debug information was being read by the code checking for test results and thought it failed.

When I figured this out I wrote a patch for sandbox to limit the verbose debug information, disabling it after a preset number of execve() invocations (in depth, not width). Limiting it to some sane number let libtool emerge. Unfortunately it didn’t produce enough output to determine what was going on. Then I implemented abort-on-1st-violation support, but didn’t help much in this case.

After all that, my plans started to deviate from what I had in mind (which will be the subject for tomorrow’s post). Anyway, I now know that those sandbox violations are due to a command that the libtool build process is issuing under Gentoo/FreeBSD: “mkdir -m u=rwx,g=rx,o=rx,u+wx -p — /”. I have to find out why / is being used and where exactly this happens (any help appreciated), but that will have to wait a little more, because it’s not top priority currently. Stay tuned.

Update: Released a new overlay today. You’ll need to touch package.mask to work with the latest version though.

06.29.06

Resuming: OpenSSL working, libtool next

Posted in sandbox, gentoo at 18:20 +0200 CEST by Unleashed

I came back from Corunna on Monday and have been poking around with sandbox, sandboxshell and OpenSSL and libtool ebuilds these three days. OpenSSL is now fixed for sandbox under BSDs in my new overlay, I submitted a bug report and a patch to solve it, but Flameeyes told me to patch sandbox instead. Done and working. In the meantime I happened to find two little bugs in sandbox code.

My current problem is libtool, which doesn’t want to emerge under sandboxshell and will force me to look much closer at what’s happening with it. Once I have this sorted out I’ll go for another (bigger) patch to sandbox.

Oh, by the way, don’t stand next to a computer kompiling during these days if you are in heated areas (like me). I’m sweating all day long and it doesn’t get any better in the night. Cold water is an utopia unless you fill the fridge with plastic bottles *sigh*. I’ll pay a visit to the swimming pool tomorrow to try to get refreshed.

06.08.06

Testing system

Posted in sandbox, soc, gentoo at 13:16 +0200 CEST by Unleashed

This is going to be my last post for a while (at least until the 26th), since I have to fly to Corunna today as I’m beginning my university exams on Monday and I won’t have much of an internet access there during these days.

Testing my initial sandbox patch proved a bit tricky since emerge -e system was broken on my install. I did end up with a fairly complete list of 95 Gentoo/FreeBSD core system packages (with a couple of basic additions, irc and www client) and reemerged it all with sandbox enabled, rebuilding the system thrice.

There were two unexpected access violations, which I didn’t have the time to look deeply into. First, libtool would try to “chmod /” several times, and then openssl wanted to have read/write access to /dev/crypto. Curiously openssl’s ebuild used add_predict /dev/crypto in test stage, but somehow it didn’t work and I’m no expert on ebuilds, so I left it as is. OpenSSL tries to access it on FreeBSD while generating testing certificates and then on some tests.

With those two hopefully easily fixable access violations it seems one could be able to build the system without problems using sandbox, which was expected to be the most problematic area and turned out it showed very little resistance. :)

So with this in mind, and for the time being until I come up with my next planned patch (more intrusive but should make sandbox more complete), I’d like it if people could test it on Gentoo/FreeBSD patching the sources with this patch (against 1.2.18.1) or using the overlay, as it should provide good information on broken packages.

Btw, something weird that had me two days scratching my head thinking about hardware issues: if you ever see some processes stalled on waitchannel “devfs” or “devfsm”, check available disk space. It was a real pita until I realized what the heck was going on.

See you later this month!

06.04.06

Subtle semantics

Posted in sandbox at 12:09 +0200 CEST by Unleashed

Installed a full-fledged FreeBSD 6.1-RELEASE system, threw a lot of packages into it (over 500), both desktop and server-oriented, and looked for __getcwd imports with some find, readelf and nm magic. Nothing importing that.

According to some people and manuals, __getcwd “is not meant to be called except by the C library; application programmers should use getcwd instead”. So nothing importing __getcwd() was expected, and you could consider code calling __getcwd() to be buggy.

There are differences regarding __getcwd() between FreeBSD’s libc and GNU’s. Looks like GNU did take care of not exposing the symbol, while FreeBSD left it there (go figure why).

On FreeBSD libc:

File: /usr/lib/libc.a(getcwd.o)

7: 00000000 1295 FUNC GLOBAL DEFAULT 1 getcwd
8: 00000000 0 NOTYPE GLOBAL DEFAULT UND __getcwd

File: /usr/lib/libc.a(__getcwd.o)

17: 00000008 0 FUNC GLOBAL DEFAULT 1 __sys___getcwd
18: 00000008 0 FUNC WEAK DEFAULT 1 __getcwd
19: 00000008 0 FUNC WEAK DEFAULT 1 ___getcwd

On GNU libc:

File: /usr/lib/libc.a(getcwd.o)

10: 00000000 303 FUNC WEAK DEFAULT 1 getcwd
11: 00000000 303 FUNC GLOBAL DEFAULT 1 __getcwd

Exported symbols from /lib/libc.so.6 on FreeBSD libc:

1998 920: 00053c4c 1395 FUNC GLOBAL DEFAULT 8 getcwd
423 1252: 00055b34 0 FUNC WEAK DEFAULT 8 __getcwd
1325 1962: 00055b34 0 FUNC GLOBAL DEFAULT 8 __sys___getcwd
1649 2000: 00055b34 0 FUNC WEAK DEFAULT 8 ___getcwd

Exported symbols from /lib/libc.so.6 on GNU libc:

1522 620: 000be890 316 FUNC WEAK DEFAULT 11 getcwd

Curiously, you can call __getcwd() on a GNU/Linux system if you link your program statically, but you’ll get linker errors if you try to link it dynamically, because libc.so doesn’t export it. In FreeBSD you can do both, since its libc exports it (and ___getcwd and __sys___getcwd, all of them the same function).

Also, while on FreeBSD __getcwd() performs the actual system call, on Linux it performs what getcwd() is supposed to do. So __getcwd() isn’t even portable and doesn’t have the same semantics when executed on different systems. That makes one more point for the “don’t export it” list and opens an interesting new alternative: implementing __getcwd() for real.

If you’ve read my past articles on this, you know the problem was an infinite recursion loop starting at getcwd()@sandbox, followed by getcwd()@libc, then __getcwd()@sandbox (it should’ve been at libc), and finally getcwd()@libc again, with the loop completed. The new alternative path, somewhat more elegant and secure than just blacklisting functions, would be to implement __getcwd() with the proper semantics depending on the system we’re running on, so that Linux’s __getcwd() would call getcwd() and FreeBSD’s would call __getcwd().

Man, was that tricky to discover, thankfully I was checking semantics on my test program. Now let’s see if I get some replies from the FreeBSD folks as to why __getcwd() keeps being exported.

« Previous entries ·