I burnt and verify (twice) my image (more if you count the one that were bad):
paul@paul-P5GZ-MX:~$ md5sum nixos-graphical-0.1pre29826-i686-linux.iso
164774618b05c3633124bf6ef793d5e8 nixos-graphical-0.1pre29826-i686-linux.iso

When I boot, it basically goes up to the login prompt.
But just as I am about to enter ‘root’, about 3 cascade oops happens.
They are linked to aufs and squashfs.
Sometimes, a message say that kernel was fixing from recursive (oops)
but need to reboot.

I still can change console with Alt-F[1-6], but after entering root, I
get some other oops,and nothing happens.

I checked my CD was Ok with:
paul@paul-P5GZ-MX:~$ dd if=/dev/cdrom | md5sum
164774618b05c3633124bf6ef793d5e8 -

The CD works fine on an other computer.

I rebooted with oops=panic
I noted the trace manually (well, the lines that are on the screen):
[I reversed the lines, this make more sense to me]
sysenter_do_call
ptregs_execve
sys_execve
check_unsafe_exec
open_exec
security_prepare_creds
do_filp_open
do_last
do_lookup
d_alloc
d_alloc
aufs_lookup
generic_permission
au_lkup_dentry
au_wh_name_alloc
aufs_read_lock
di_write_lock
do_ii_write_lock
down_write
update_curr
au_lkup_one
get_page_from_freelist
vfsub_lookup_one_len

Submitted by Paul Dufresne on 15 October 2011 at 11:09
CDGD5440_not_better_with_nohz_off.log20 October 2011 at 23:51
paging_requst_oops_no2.log20 October 2011 at 22:09
CLGD5440_NOPANIC_NO3.log20 October 2011 at 21:54
squashfs_read_data__bad_area_nosemaphore.txt17 October 2011 at 07:45
lspci-vvnn.log15 October 2011 at 11:17

On 17 October 2011 at 07:49 Paul Dufresne commented:

In my own words, I believe that:
squashfs_cache_get call too soon __mutex_unlock_slowpah resulting in a bad_area_semaphore error while trying to wake up a process that have been swap on disk.

Now: the problem seems to happen during: write_cd_rules


On 17 October 2011 at 22:16 Paul Dufresne commented:

Reported on squashfs project at:
http://sourceforge.net/tracker/?func=detail&aid=3424900&group_id=63835&atid=505341

Also, I tried with nixos-graphical-0.1pre28002-i686-linux.iso, with same results. Oops on my P5GZ motherboard computer, and just the first icon of KDE on my 128 Mb P3.


On 18 October 2011 at 23:46 Eelco Dolstra commented:

FWIW I can’t reproduce the squashfs problem (with the same ISO, in QEMU).

However, with 128 MB RAM in QEMU, the KDE login does hang just as you described. 128 MB is definitely not enough for KDE, unless you enable swap first (and it will probably be painfully slow).


On 20 October 2011 at 22:01 Paul Dufresne commented:

Ok, I was suspecting it could be because I have put an old Radeon 7200 card (ATI R100) in my box, because internal VGA connector is badly soldered [I have put my box upside-down to have gravity push the wire in the good direction, but even that have becomed not be enough].

So I put an even older card, a Cirrus Logic GD5440 pci card, mess with my BIOS configuration (disabling not used stuff mainly), and made a new log which is attached to this bug.

There is a lot of stuff in that.
I suspect TCO ICH7 watchdog (whatever it is) is using the IRQ17 that have been disabled by Linux because no one cared.

But what would interest you more, is the invalid opcode in:
[ 100.695668] ————[ cut here ]————
[ 100.696010] kernel BUG at /tmp/nix-build-xppiw80q18ir8r0g57am3hkl8221cg13-aufs2-20100522.drv-0/git-export/fs/aufs/super.h:318!
[ 100.696010] invalid opcode: 0000 [#2] SMP
[ 100.696010] last sysfs file: /sys/devices/platform/iTCO_wdt/uevent
[ 100.696010] Modules linked in: intel_agp rtc_cmos iTCO_wdt processor thermal rtc_core agpgart led_class thermal_sys asus_atk0110 floppy button i2c_i801 cirrusfb(+) rtc_lib hwmon mac_hid rng_core i2c_core evdev pcspkr sg ipv6 aufs squashfs isofs sd_mod crc_t10dif sr_mod cdrom ehci_hcd uhci_hcd ata_piix libata scsi_mod usbcore nls_base cfq_iosched blk_cgroup dm_mod loop scsi_wait_scan unix
[ 100.696010]
[ 100.696010] Pid: 1460, comm: dbus-daemon Tainted: G D 2.6.35.14 #1 P5GZ-MX/P5GZ-MX
[ 100.696010] EIP: 0060:[] EFLAGS: 00010286 CPU: 0
[ 100.696010] EIP is at au_do_flush+0x1d4/0x1e0 [aufs]
[ 100.696010] EAX: 000005b3 EBX: f74f6400 ECX: ffffffff EDX: f769e000
[ 100.696010] ESI: f74f6a00 EDI: f73d1b00 EBP: f6399940 ESP: f6399920
[ 100.696010] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[ 100.696010] Process dbus-daemon (pid: 1460, ti=f6398000 task=f70ce180 task.ti=f6398000)
[ 100.696010] Stack:

Which looks a bit to me like aufs driver being compiled, but having an invalid opcode for my CPU!???


On 20 October 2011 at 22:04 Paul Dufresne commented:

I guess the next step will be to boot with irqpoll.
I have seen some nohz or nohztick parameter too somewhere.
I need to find back the correct syntax and try it.


On 20 October 2011 at 22:09 Paul Dufresne commented:

Just a note about the 2 bugs that I opened at Squashfs:
http://sourceforge.net/tracker/?func=detail&aid=3424900&group_id=63835&atid=505341 which was rejected, but with a nice comment

http://sourceforge.net/tracker/?func=detail&aid=3426135&group_id=63835&atid=505341 that have a log that is not in this report … yet.


On 20 October 2011 at 22:40 Paul Dufresne commented:

Found what I was searching: nohz=off

http://fedoraproject.org/wiki/Common_kernel_problems says:
Given it’s new and still seeing quite a few changes, nohz=off and/or highres=off may be worth testing. (Though this is kernel 2.6.21 and above only)

Also explained at:
http://lwn.net/Articles/455044/

Did not test it yet.

Log in to post comments