Showing posts with label kernel. Show all posts
Showing posts with label kernel. Show all posts

16 May 2013

417. Briefly: Patching kernel 3.9 with the CK patchset: 3.9-ck-1

Nothing strange here -- basically the same as http://verahill.blogspot.com.au/2013/04/395-ck-kernel-on-debian-and-patching.html but with updated links.

I haven't found a good and succinct description of what the -ck patch set does and that I could link to here, but here's what it says on the Arch -ck page:
"..many Archers elect to use this package for the BFS' excellent desktop interactivity and responsiveness under any load situation. Additionally, the bfs imparts performance gains beyond interactivity"

I don't know if there are objective benchmarks that one can use to demonstrate an improvement in 'responsiveness and interactivity'. Subjectively, however, I feel that there's a slight improvement. You decide for yourself.

Begin here

sudo apt-get install kernel-package fakeroot build-essential ncurses-dev
mkdir ~/tmp
cd ~/tmp
wget https://www.kernel.org/pub/linux/kernel/v3.x/linux-3.9.2.tar.xz
tar xvf linux-3.9.2.tar.xz
cd linux-3.9.2/
wget http://ck.kolivas.org/patches/3.0/3.9/3.9-ck1/patch-3.9-ck1.bz2
bunzip2 patch-3.9-ck1.bz2
patch -p1 < patch-3.9-ck1
patching file arch/powerpc/platforms/cell/spufs/sched.c patching file Documentation/scheduler/sched-BFS.txt patching file Documentation/sysctl/kernel.txt patching file fs/proc/base.c patching file include/linux/init_task.h patching file include/linux/ioprio.h patching file include/linux/sched.h Hunk #6 succeeded at 2738 (offset -10 lines). patching file init/Kconfig patching file init/main.c patching file kernel/delayacct.c patching file kernel/exit.c patching file kernel/posix-cpu-timers.c patching file kernel/sysctl.c patching file lib/Kconfig.debug patching file include/linux/jiffies.h patching file drivers/cpufreq/cpufreq.c patching file drivers/cpufreq/cpufreq_ondemand.c patching file drivers/cpufreq/cpufreq_conservative.c patching file kernel/sched/bfs.c patching file kernel/sched/Makefile patching file include/uapi/linux/sched.h patching file include/linux/sched/rt.h patching file kernel/stop_machine.c patching file include/linux/swap.h patching file mm/memory.c patching file mm/swapfile.c patching file mm/vmscan.c patching file arch/x86/Kconfig patching file kernel/Kconfig.hz patching file kernel/Kconfig.preempt patching file Makefile
make-kpkg clean cat /boot/config-`uname -r`>.config make oldconfig

You might now be asked a long series of questions about how the kernel should be configured (or you might not be -- depending on what kernel version you're currently running). In MOST cases you can select the default option (i.e. hit enter) but you should still read each question and consider it. Making a mistake won't break your computer, so don't be scared.

Next, start the compilation (will take a while):

time fakeroot make-kpkg -j4 --initrd kernel_image kernel_headers
sudo dpkg -i ../linux*3.9-ck*.deb

where 4 is the number of cores on your machine (note: it only has to do with compiling -- you can use the compiled binaries on any number of cores).

Anyway, that's all -- you've now patched, compiled and installed a new kernel. And it didn't even hurt.

29 April 2013

401. AMD FX 8150:issues building kernel -- random failures.

Update 4:  I found the receipts for one pair of sticks and took it to MSY in Melbourne -- they were replaced on the spot without any questions asked. Very happy.

Update 3: The errors were all due to 3 bad ram sticks. Using the only good stick everything works fine. That's 24 Gb of bad ram...this won't be cheap if I can't find the receipts...

Update 2:
Running memtest86 I caught lots of errors (51 in 50 minutes) before I killed the test. I'm currently testing each stick one by one. I'm hoping that what is seemingly RAM errors can be caused by inapproriate BIOS settings, because 32 Gb bios is not cheap to replace...

While I'm swapping RAM sticks I'm also testing a separate set of stick on a different box. If they are error free it will be interesting to see if they trigger errors on the troublesome node. I'm still hoping for BIOS as being the culprit...

So far three out of four tested sticks have shown errors -- they all happen during test #6. The fourth stick has passed all tests seven times.

Update 1: dmesg also shows the same message as the OP here sees: https://bugzilla.redhat.com/show_bug.cgi?id=909702

The OP puts it down to a misconfigured bios, so the quest continues.
Searching for 990FX and FX8150 I get a number of hits:

Here's a newegg review for 990FX:

 I purchased This MB to run with the AMD FX 8150. I have built computers from high end to low end and know the ones in the middle last the longest and are the most stable.
[..]
At this point the fun of the build is gone, and I have too many hours dealing with problems. 
And that's not the only negative FX8?50 + 990FX review.

The worst part of it is that I've been thinking about building another, identical node (good value for money) as well as recommending my build to a student whom is about to do calcs.

Mind you, I've only ever had issues when it comes to compiling the kernel -- it's been solid when it comes to running calculations.

Original post:


NOTE: this is NOT a solution. Just observations.

My AMD FX 8150 is a great CPU -- it makes up the heart of the fastest of my computational nodes, and is eminently affordable. It does, however, cause me grief in one respect -- I can't compile the linux kernel.

The system
The box that's causing me trouble has
* AMD FX 8150 cpu
* gigabyte 990FXA-D3 motherboard
* nvidia GeForce 210 video card
* Corsair GS 800 PSU
* 4x8 Gb patriot viper PV316G186C0K RAM
While not top of the range, the components should be of reasonable quality.

In terms of software and OS, it's an up-to-date wheezy install (gcc 4.7), running kernel 3.7.2 (compiled on a different machine).

Compiling the kernel
I'm compiling the kernel as shown here: http://verahill.blogspot.com.au/2013/02/342-compiling-kernel-38-on-debian.html

The errors are shown at the end of the post

The fact that the errors keep changing might also be pointing towards there being a hardware fault with my CPU, rather than with FX 8150 in general.

3.8 built fine twice, and crashed the third time. 3.8.10 crashed twice, then built fine the third time.

It all sounds like I'm having hardware issues...but they only seem to be triggered during kernel builds. During 'normal use (i.e. using 100% cpu for weeks at a time) it is perfectly stable. Compiling e.g. nwchem (another pretty heavy compile) also goes absolutely fine.

Troubleshooting something like this also wouldn't be easy. See the end of the post for a list over various errors that I was getting during compilation of different kernel versions.

Anyway, I hit google...



BIOS
That Windows has issues with 8150 might seem unrelated, but it appeared that my errors could be solved by a bios update to my 990 fxa-d3 mobo:
 http://scalibq.wordpress.com/2011/10/19/amd-bulldozer-can-it-get-even-worse/
"The actual reported error is quite random, it just depends on where the CPU fails first. So you generally get a different error code with every BSOD."
and
"AMD’s KB article focuses solely on some boards with the 990FX chipset."
Well, I do have a 990FXA-D3 gigabyte motherboard.

My bios is shown by lshw as
*-firmware
          description: BIOS
          vendor: Award Software International, Inc.
          physical id: 0
          version: F7
          date: 05/30/2012
          size: 128KiB
          capacity: 4032KiB
So the obvious solution was to flash the bios.

Turns out, flashing the BIOS is a headache on Gigabyte motherboards (not buying anything from them again). What happened with simply burning a CD and booting with it in the drive?

Flashing the bios

I downloaded the bios (version F8): http://download.gigabyte.eu/FileList/BIOS/mb_bios_ga-990fxa-d3_f8.exe.

I unzipped it with 7z, giving me 990FXAD3.F8 -- I then put that file in the root of a USB stick..

I've tried with a number of USB sticks, including a blank stick formatted with W95 Fat32 and keeping the stick plugged in before rebooting.

In Q-flash, I always ended up with a prompt saying Floppy A <Drive>, and when I hit enter it says '..    <dir>'. 0 Files found. Yet it also said Total size 7.48G, Free Size: 7.44 G, which matched the size of the USB stick.

Finally I managed to get it to work:
*  in fdisk I only created a 1 gb partition on the USB stick, set type (t) to 6 (Fat16), made it bootable, and wrote changes to disk.
* I then ran mkdosfs -F 16 /dev/sdb1 (my usb stick was /dev/sdb).
*  I then copied the 990FXD3.F8 file to the usb stick root (after mounting it of course) and THAT worked.

Memtest86
Because RAM has traditionally been a major culprit behind hardware errors (especially the random, difficult-to-diagnose type) it's always a good idea to run a memtest. To do that, install memtest86+ (sudo apt-get install memtest86+) and reboot. There should be a new menu item (scroll down) in grub. Memtest takes quite a while, especially if you have a lot of RAM (32 Gb...).

Lo and behold, there are errors:
Tst  Pass  Failing Address              Good        Bad        Err-Bits  Count Chan
------------------------------------------------------------------------------------
6     0     0007383b4f4  -  1848.2MB   fffffbff     ffffffff   00000400    1
6     0     00039c1f294  -   924.1MB   fffffbff     ffffffff   00000004    2
6     0     00120203034  -  4610.0MB   00000004     00000000   00000004    3
6     0     001ca16c464  -  7329.4MB   00020004     00000000   00020000    4
[..]

I counted 51 errors before killing the test (time to identify the bad stick). Many of these occurred in a more limited address space than those shown above. Sigh...the RAM was the most expensive part of this build...

According to this there's a slight chance that the RAM might be ok, but it's still not a good sign.

I've tested each stick by itself -- so far 3 out of 4 sticks have yielded errors during test 6. I did seven passes on the fourth stick and no errors.

The outcome
However, even with the new bios the kernel compiles still fail -- it takes longer for it to fail, but it fails.

I do see the odd thing in dmesg though:
[ 4260.342268] as[29370]: segfault at 4541b5e ip 0000000000410306 sp 00007fff40ec4420 error 4 in as[400000+51000]

So either FX 8150 is still not properly supported by the BIOS, or I've bought a lemon.

The question remains: why do I only see failure during kernel compiles and no other conditions?

After bios flash:

Kernel 3.9-rc8
  CC      drivers/base/dd.o
In file included from /home/me/tmp/linux-3.9-rc8/arch/x86/include/asm/processor.h:23:0,
                 from /home/me/tmp/linux-3.9-rc8/arch/x86/include/asm/atomic.h:6,
                 from include/linux/atomic.h:4,
                 from include/linux/sysfs.h:20,
                 from include/linux/kobject.h:21,
                 from include/linux/device.h:17,
                 from drivers/base/dd.c:20:
/home/me/tmp/linux-3.9-rc8/arch/x86/include/asm/special_insns.h: In function 'native_read_cr0':
/home/me/tmp/linux-3.9-rc8/arch/x86/include/asm/special_insns.h:24:2: internal compiler error: in build_int_cst_wide, at tree.c:1238
Please submit a full bug report,
with preprocessed source if appropriate.
See  for instructions.
The bug is not reproducible, so it is likely a hardware or OS problem.
make[3]: *** [drivers/base/dd.o] Error 1
make[2]: *** [drivers/base] Error 2
make[1]: *** [drivers] Error 2
make[1]: Leaving directory `/home/me/tmp/linux-3.9-rc8'
make: *** [debian/stamp/build/kernel] Error 2
Kernel 3.8.10
  UPD     include/generated/compile.h
  CC      init/version.o
  LD      init/built-in.o
ipc/built-in.o:(.debug_info+0x1ed81): undefined reference to `.LASF108'
make[1]: *** [vmlinux] Error 1
make[1]: Leaving directory `/home/me/tmp/linux-3.8.10'
make: *** [debian/stamp/build/kernel] Error 2
Kernel 3.7.6
  CC [M]  fs/gfs2/super.o
  CC [M]  fs/gfs2/sys.o
In file included from /home/me/tmp/linux-3.7.6/arch/x86/include/asm/smp.h:13:0,
                 from include/linux/smp.h:38,
                 from include/linux/sched.h:30,
                 from fs/gfs2/sys.c:10:
/home/me/tmp/linux-3.7.6/arch/x86/include/asm/apic.h:394:1: internal compiler error: Segmentation fault
Please submit a full bug report,
with preprocessed source if appropriate.
See  for instructions.
The bug is not reproducible, so it is likely a hardware or OS problem.
make[3]: *** [fs/gfs2/sys.o] Error 1
make[2]: *** [fs/gfs2] Error 2
make[1]: *** [fs] Error 2
make[1]: Leaving directory `/home/me/tmp/linux-3.7.6'
make: *** [debian/stamp/build/kernel] Error 2
Kernel 3.5
CC [M] drivers/scsi/lpfc/lpfc_els.o CC [M] drivers/scsi/lpfc/lpfc_hbadisc.o CC [M] drivers/scsi/lpfc/lpfc_init.o In file included from /home/me/tmp/linux-3.5/arch/x86/include/asm/msr.h:139:0, from /home/me/tmp/linux-3.5/arch/x86/include/asm/processor.h:20, from /home/me/tmp/linux-3.5/arch/x86/include/asm/thread_info.h:22, from include/linux/thread_info.h:54, from include/linux/preempt.h:9, from include/linux/spinlock.h:50, from include/linux/seqlock.h:29, from include/linux/time.h:8, from include/linux/timex.h:56, from include/linux/sched.h:57, from include/linux/blkdev.h:4, from drivers/scsi/lpfc/lpfc_init.c:22: /home/me/tmp/linux-3.5/arch/x86/include/asm/paravirt.h: In function 'store_gdt': /home/me/tmp/linux-3.5/arch/x86/include/asm/paravirt.h:304:2: internal compiler error: Segmentation fault Please submit a full bug report, with preprocessed source if appropriate. See for instructions. The bug is not reproducible, so it is likely a hardware or OS problem. make[4]: *** [drivers/scsi/lpfc/lpfc_init.o] Error 1 make[3]: *** [drivers/scsi/lpfc] Error 2 make[2]: *** [drivers/scsi] Error 2 make[1]: *** [drivers] Error 2 make[1]: Leaving directory `/home/me/tmp/linux-3.5' make: *** [debian/stamp/build/kernel] Error 2

Kernel 3.4.42
Second crash:
  CC [M]  fs/coda/psdev.o
  CC [M]  fs/coda/cache.o
In file included from include/linux/mm.h:256:0,
                 from fs/coda/coda_linux.h:17,
                 from fs/coda/cache.c:24:
include/linux/page-flags.h:232:1: internal compiler error: Segmentation fault
Please submit a full bug report,
with preprocessed source if appropriate.
See  for instructions.
The bug is not reproducible, so it is likely a hardware or OS problem.
make[3]: *** [fs/coda/cache.o] Error 1
make[2]: *** [fs/coda] Error 2
make[1]: *** [fs] Error 2
make[1]: Leaving directory `/home/me/tmp/linux-3.4.42'
make: *** [debian/stamp/build/kernel] Error 2
First crash:
  CC      kernel/signal.o
gcc: internal compiler error: Segmentation fault (program as)
Please submit a full bug report,
with preprocessed source if appropriate.
See  for instructions.
make[2]: *** [kernel/signal.o] Error 4
make[1]: *** [kernel] Error 2
make[1]: Leaving directory `/home/me/tmp/linux-3.4.42'
make: *** [debian/stamp/build/kernel] Error 2


Kernels that won't build and the errors -- before bios flash:
3.9-rc8
CC [M] fs/nfs/nfs4client.o CC [M] fs/nfs/nfs4sysctl.o CC [M] fs/nfs/nfs4session.o CC [M] fs/nfs/pnfs.o fs/nfs/pnfs.c: In function 'read_seqcount_retry': fs/nfs/pnfs.c:1951:1: internal compiler error: Segmentation fault Please submit a full bug report, with preprocessed source if appropriate. See for instructions. The bug is not reproducible, so it is likely a hardware or OS problem. make[3]: *** [fs/nfs/pnfs.o] Error 1 make[2]: *** [fs/nfs] Error 2 make[1]: *** [fs] Error 2 make[1]: Leaving directory `/home/me/tmp/linux-3.9-rc8' make: *** [debian/stamp/build/kernel] Error 2
3.8.10
  CC [M]  fs/nfs/inode.o
In file included from include/net/scm.h:6:0,
                 from include/linux/netlink.h:8,
                 from /home/me/tmp/linux-3.8.10/include/uapi/linux/neighbour.h:5,
                 from include/linux/netdevice.h:51,
                 from include/linux/icmpv6.h:12,
                 from include/linux/ipv6.h:59,
                 from include/net/ipv6.h:16,
                 from include/linux/sunrpc/clnt.h:26,
                 from fs/nfs/inode.c:26:
include/linux/security.h:2581:1: internal compiler error: Segmentation fault
Please submit a full bug report,
3.8.6
CC [M] drivers/hid/hid-lg.o CC [M] drivers/hid/hid-lgff.o CC [M] drivers/hid/hid-lg2ff.o CC [M] drivers/hid/hid-lg3ff.o CC [M] drivers/hid/hid-lg4ff.o CC [M] drivers/hid/hid-picolcd_core.o CC [M] drivers/hid/hid-picolcd_fb.o CC [M] drivers/hid/hid-picolcd_backlight.o drivers/hid/hid-picolcd_backlight.c:120:1: internal compiler error: Segmentation fault Please submit a full bug report, with preprocessed source if appropriate. See for instructions. The bug is not reproducible, so it is likely a hardware or OS problem. make[3]: *** [drivers/hid/hid-picolcd_backlight.o] Error 1 make[2]: *** [drivers/hid] Error 2 make[1]: *** [drivers] Error 2 make[1]: Leaving directory `/home/me/tmp/linux-3.8.6' make: *** [debian/stamp/build/kernel] Error 2
3.8
CC mm/dmapool.o CC mm/hugetlb.o /bin/sh: line 1: 25153 Done(2) gcc -E -D__GENKSYMS__ -Wp,-MD,mm/.hugetlb.o.d -nostdinc -isystem /usr/lib/gcc/x86_64-linux-gnu/4.7/include -I/home/me/tmp/linux-3.8/arch/x86/include -Iarch/x86/include/generated -Iinclude -I/home/me/tmp/linux-3.8/arch/x86/include/uapi -Iarch/x86/include/generated/uapi -I/home/me/tmp/linux-3.8/include/uapi -Iinclude/generated/uapi -include /home/me/tmp/linux-3.8/include/linux/kconfig.h -D__KERNEL__ -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Werror-implicit-function-declaration -Wno-format-security -fno-delete-null-pointer-checks -Os -m64 -mtune=generic -mno-red-zone -mcmodel=kernel -funit-at-a-time -maccumulate-outgoing-args -fstack-protector -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -DCONFIG_AS_CFI_SECTIONS=1 -DCONFIG_AS_FXSAVEQ=1 -DCONFIG_AS_AVX=1 -DCONFIG_AS_AVX2=1 -pipe -Wno-sign-compare -fno-asynchronous-unwind-tables -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -Wframe-larger-than=2048 -Wno-unused-but-set-variable -fomit-frame-pointer -g -Wdeclaration-after-statement -Wno-pointer-sign -fno-strict-overflow -fconserve-stack -DCC_HAVE_ASM_GOTO -D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR(hugetlb)" -D"KBUILD_MODNAME=KBUILD_STR(hugetlb)" mm/hugetlb.c 25154 Segmentation fault | scripts/genksyms/genksyms -a x86_64 -r /dev/null > mm/.tmp_hugetlb.ver make[2]: *** [mm/hugetlb.o] Error 139 make[1]: *** [mm] Error 2 make[1]: Leaving directory `/home/me/tmp/linux-3.8' make: *** [debian/stamp/build/kernel] Error 2
3.7.6

The errors differ each time:

Second run:
  CC [M]  fs/ext2/namei.o
  CC [M]  fs/ext2/super.o
fs/ext2/super.c: In function 'ext2_fill_super':
fs/ext2/super.c:762:12: internal compiler error: Segmentation fault
Please submit a full bug report,
with preprocessed source if appropriate.
See  for instructions.
The bug is not reproducible, so it is likely a hardware or OS problem.
make[3]: *** [fs/ext2/super.o] Error 1
make[2]: *** [fs/ext2] Error 2
make[1]: *** [fs] Error 2
make[1]: Leaving directory `/home/me/tmp/linux-3.7.6'
make: *** [debian/stamp/build/kernel] Error 2
First run:
  CC      drivers/base/power/main.o
/bin/sh: line 1: 12317 Done                    gcc -E -D__GENKSYMS__ -Wp,-MD,drivers/base/power/.main.o.d -nostdinc -isystem /usr/lib/gcc/x86_64-linux-gnu/4.7/include -I/home/me/tmp/linux-3.7.6/arch/x86/include -Iarch/x86/include/generated -Iinclude -I/home/me/tmp/linux-3.7.6/arch/x86/include/uapi -Iarch/x86/include/generated/uapi -I/home/me/tmp/linux-3.7.6/include/uapi -Iinclude/generated/uapi -include /home/me/tmp/linux-3.7.6/include/linux/kconfig.h -D__KERNEL__ -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Werror-implicit-function-declaration -Wno-format-security -fno-delete-null-pointer-checks -Os -m64 -mtune=generic -mno-red-zone -mcmodel=kernel -funit-at-a-time -maccumulate-outgoing-args -fstack-protector -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -DCONFIG_AS_CFI_SECTIONS=1 -DCONFIG_AS_FXSAVEQ=1 -DCONFIG_AS_AVX=1 -pipe -Wno-sign-compare -fno-asynchronous-unwind-tables -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -Wframe-larger-than=2048 -Wno-unused-but-set-variable -fomit-frame-pointer -g -Wdeclaration-after-statement -Wno-pointer-sign -fno-strict-overflow -fconserve-stack -DCC_HAVE_ASM_GOTO -D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR(main)" -D"KBUILD_MODNAME=KBUILD_STR(main)" drivers/base/power/main.c
     12318 Segmentation fault      | scripts/genksyms/genksyms -a x86_64 -r /dev/null > drivers/base/power/.tmp_main.ver
make[4]: *** [drivers/base/power/main.o] Error 139
make[3]: *** [drivers/base/power] Error 2
make[2]: *** [drivers/base] Error 2
make[1]: *** [drivers] Error 2
3.7.2
The errors differ every time:

Second run:
CC [M] drivers/hwmon/tmp102.o ======= Backtrace: ========= /lib/x86_64-linux-gnu/libc.so.6(+0x76d76)[0x2b2bb07f6d76] /lib/x86_64-linux-gnu/libc.so.6(+0x7a658)[0x2b2bb07fa658] /lib/x86_64-linux-gnu/libc.so.6(__libc_malloc+0x70)[0x2b2bb07fbb90] scripts/genksyms/genksyms[0x4075fa] scripts/genksyms/genksyms[0x4037c0] scripts/genksyms/genksyms[0x402de6] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfd)[0x2b2bb079eead] scripts/genksyms/genksyms[0x400f59] ======= Memory map: ======== 00400000-0040e000 r-xp 00000000 08:05 36440720 /home/me/tmp/linux-3.7.2/scripts/genksyms/genksyms 0060d000-0060e000 rw-p 0000d000 08:05 36440720 /home/me/tmp/linux-3.7.2/scripts/genksyms/genksyms 0060e000-00616000 rw-p 00000000 00:00 0 0089b000-00c58000 rw-p 00000000 00:00 0 [heap] 2b2bb0330000-2b2bb0350000 r-xp 00000000 08:01 11802457 /lib/x86_64-linux-gnu/ld-2.13.so 2b2bb0350000-2b2bb0352000 rw-p 00000000 00:00 0 2b2bb054f000-2b2bb0550000 r--p 0001f000 08:01 11802457 /lib/x86_64-linux-gnu/ld-2.13.so 2b2bb0550000-2b2bb0551000 rw-p 00020000 08:01 11802457 /lib/x86_64-linux-gnu/ld-2.13.so 2b2bb0551000-2b2bb0552000 rw-p 00000000 00:00 0 2b2bb0558000-2b2bb0561000 r-xp 00000000 08:01 2233201 /usr/lib/x86_64-linux-gnu/libfakeroot/libfakeroot-sysv.so 2b2bb0561000-2b2bb0761000 ---p 00009000 08:01 2233201 /usr/lib/x86_64-linux-gnu/libfakeroot/libfakeroot-sysv.so 2b2bb0761000-2b2bb0762000 rw-p 00009000 08:01 2233201 /usr/lib/x86_64-linux-gnu/libfakeroot/libfakeroot-sysv.so 2b2bb0762000-2b2bb0763000 rw-p 00000000 00:00 0 2b2bb077c000-2b2bb077d000 rw-p 00000000 00:00 0 2b2bb0780000-2b2bb0900000 r-xp 00000000 08:01 11802454 /lib/x86_64-linux-gnu/libc-2.13.so 2b2bb0900000-2b2bb0b00000 ---p 00180000 08:01 11802454 /lib/x86_64-linux-gnu/libc-2.13.so 2b2bb0b00000-2b2bb0b04000 r--p 00180000 08:01 11802454 /lib/x86_64-linux-gnu/libc-2.13.so 2b2bb0b04000-2b2bb0b05000 rw-p 00184000 08:01 11802454 /lib/x86_64-linux-gnu/libc-2.13.so 2b2bb0b05000-2b2bb0b0a000 rw-p 00000000 00:00 0 2b2bb0b10000-2b2bb0b12000 r-xp 00000000 08:01 11802447 /lib/x86_64-linux-gnu/libdl-2.13.so 2b2bb0b12000-2b2bb0d12000 ---p 00002000 08:01 11802447 /lib/x86_64-linux-gnu/libdl-2.13.so 2b2bb0d12000-2b2bb0d13000 r--p 00002000 08:01 11802447 /lib/x86_64-linux-gnu/libdl-2.13.so 2b2bb0d13000-2b2bb0d14000 rw-p 00003000 08:01 11802447 /lib/x86_64-linux-gnu/libdl-2.13.so 2b2bb0d14000-2b2bb0d16000 rw-p 00000000 00:00 0 2b2bb0d30000-2b2bb0d45000 r-xp 00000000 08:01 11796731 /lib/x86_64-linux-gnu/libgcc_s.so.1 2b2bb0d45000-2b2bb0f45000 ---p 00015000 08:01 11796731 /lib/x86_64-linux-gnu/libgcc_s.so.1 2b2bb0f45000-2b2bb0f46000 rw-p 00015000 08:01 11796731 /lib/x86_64-linux-gnu/libgcc_s.so.1 2b2bb4000000-2b2bb4021000 rw-p 00000000 00:00 0 2b2bb4021000-2b2bb8000000 ---p 00000000 00:00 0 7fff5cd00000-7fff5cd23000 rw-p 00000000 00:00 0 [stack] 7fff5cdd8000-7fff5cdd9000 r-xp 00000000 00:00 0 [vdso] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] /bin/sh: line 1: 18863 Done(2) gcc -E -D__GENKSYMS__ -Wp,-MD,drivers/acpi/.video.o.d -nostdinc -isystem /usr/lib/gcc/x86_64-linux-gnu/4.7/include -I/home/me/tmp/linux-3.7.2/arch/x86/include -Iarch/x86/include/generated -Iinclude -I/home/me/tmp/linux-3.7.2/arch/x86/include/uapi -Iarch/x86/include/generated/uapi -I/home/me/tmp/linux-3.7.2/include/uapi -Iinclude/generated/uapi -include /home/me/tmp/linux-3.7.2/include/linux/kconfig.h -D__KERNEL__ -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Werror-implicit-function-declaration -Wno-format-security -fno-delete-null-pointer-checks -Os -m64 -mtune=generic -mno-red-zone -mcmodel=kernel -funit-at-a-time -maccumulate-outgoing-args -fstack-protector -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -DCONFIG_AS_CFI_SECTIONS=1 -DCONFIG_AS_FXSAVEQ=1 -DCONFIG_AS_AVX=1 -pipe -Wno-sign-compare -fno-asynchronous-unwind-tables -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -Wframe-larger-than=2048 -Wno-unused-but-set-variable -fomit-frame-pointer -g -Wdeclaration-after-statement -Wno-pointer-sign -fno-strict-overflow -fconserve-stack -DCC_HAVE_ASM_GOTO -Os -DMODULE -D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR(video)" -D"KBUILD_MODNAME=KBUILD_STR(video)" drivers/acpi/video.c 18864 Aborted | scripts/genksyms/genksyms -a x86_64 -r /dev/null > drivers/acpi/.tmp_video.ver make[3]: *** [drivers/acpi/video.o] Error 134 make[2]: *** [drivers/acpi] Error 2 make[1]: *** [drivers] Error 2 make[1]: Leaving directory `/home/me/tmp/linux-3.7.2' make: *** [debian/stamp/build/kernel] Error 2

First run:
  CHK     include/generated/uapi/linux/version.h
  CHK     include/generated/utsrelease.h
  CALL    scripts/checksyscalls.sh
  Building modules, stage 2.
  MODPOST 2369 modules
ERROR: "ieee80211_get_hdrlen" [drivers/staging/rtl8192u/r8192u_usb.ko] undefined!
ERROR: "ieee80211_is_empty_essid" [drivers/staging/rtl8192u/r8192u_usb.ko] undefined!
make[2]: *** [__modpost] Error 1
make[1]: *** [modules] Error 2
make[1]: Leaving directory `/home/me/tmp/linux-3.7.2'
make: *** [debian/stamp/build/kernel] Error 2
3.6.3
LD [M] drivers/input/misc/pcf50633-input.ko CC drivers/input/misc/pcspkr.mod.o In file included from drivers/input/misc/pcspkr.mod.c:1:0: include/linux/module.h:299:9: internal compiler error: Segmentation fault Please submit a full bug report, with preprocessed source if appropriate. See for instructions. The bug is not reproducible, so it is likely a hardware or OS problem. make[2]: *** [drivers/input/misc/pcspkr.mod.o] Error 1 make[1]: *** [modules] Error 2 make[1]: Leaving directory `/home/me/tmp/linux-3.6.3' make: *** [debian/stamp/build/kernel] Error 2
3.5.0
The errors keep changing.

Second run:
  CC [M]  drivers/gpu/drm/via/via_map.o
  CC [M]  drivers/gpu/drm/via/via_mm.o
  CC [M]  drivers/gpu/drm/via/via_dma.o
drivers/gpu/drm/via/via_dma.c:741:21: internal compiler error: Segmentation fault
Please submit a full bug report,
with preprocessed source if appropriate.
See  for instructions.
The bug is not reproducible, so it is likely a hardware or OS problem.
make[5]: *** [drivers/gpu/drm/via/via_dma.o] Error 1
make[4]: *** [drivers/gpu/drm/via] Error 2
make[3]: *** [drivers/gpu/drm] Error 2
make[2]: *** [drivers/gpu] Error 2
make[1]: *** [drivers] Error 2
make[1]: Leaving directory `/home/me/tmp/linux-3.5'
make: *** [debian/stamp/build/kernel] Error 2
First run:
  CC      drivers/hid/hid-sony.mod.o
drivers/hid/hid-sony.mod.c:46:1: internal compiler error: Segmentation fault
Please submit a full bug report,
with preprocessed source if appropriate.
See  for instructions.
The bug is not reproducible, so it is likely a hardware or OS problem.
make[2]: *** [drivers/hid/hid-sony.mod.o] Error 1
make[1]: *** [modules] Error 2
make[1]: Leaving directory `/home/me/tmp/linux-3.5'
make: *** [debian/stamp/build/kernel] Error 2
3.4.42
CC [M] fs/quota/quota_tree.o CC [M] fs/reiserfs/bitmap.o fs/reiserfs/bitmap.c: In function 'scan_bitmap_block.constprop.9': fs/reiserfs/bitmap.c:236:9: warning: 'next' may be used uninitialized in this function [-Wmaybe-uninitialized] CC [M] fs/reiserfs/do_balan.o CC [M] fs/reiserfs/namei.o gcc: internal compiler error: Segmentation fault (program as) Please submit a full bug report, with preprocessed source if appropriate. See for instructions. make[3]: *** [fs/reiserfs/namei.o] Error 4 make[2]: *** [fs/reiserfs] Error 2 make[1]: *** [fs] Error 2 make[1]: Leaving directory `/home/me/tmp/linux-3.4.42' make: *** [debian/stamp/build/kernel] Error 2


Another dmesg error:

04 April 2013

375. Bisecting the kernel -- looking for the commit behind the kworker/i915 issue.

This is an attempt to trace this issue: http://verahill.blogspot.com.au/2013/03/368-slow-mouse-and-keyboard-triggered.html.

See also
http://forums.gentoo.org/viewtopic-p-7278760.html
https://bbs.archlinux.org/viewtopic.php?pid=1248190

Update: I'm revising this post. Not quite done yet.

Another Update: Someone else has just bisected and ended up with a different commit:
https://bbs.archlinux.org/viewtopic.php?pid=1254285#p1254285
The patch they found makes a lot more sense than any of the ones I ended up with.

However, I do still see issues with the kernel I end up with, so I'm either starting from the wrong 'good', or there are several bad commits.

A third update: I just experienced slow-down with 1k interrupts per second on 3.7.10, which may indicate that I picked a bad starting point for bisecting. I'll leave the post up since it can serve as a guide for how to bisect in general.

A fourth update:
A fix is on the way to kernel 3.9 or 3.10.
https://patchwork.kernel.org/patch/2400621/
https://patchwork.kernel.org/patch/2402211/


Original post:

I didn't really want to do this, for several reasons:

* the bug only manifests itself on one of my computers, with intel graphics -- and that's a laptop (I'm not a huge fan of laptops, especially not with budding carpal tunnel syndrome, so testing the bisected version is a bit of a pain -- literally)

* the bug isn't consistently triggered, so you need to test the new kernel for an hour or more, and only by the absence of a specific behaviour can you see whether all is good i.e. testing is a bit of hit and miss.

* compiling the kernel takes a long time, and I don't currently have a suitable computer for it at home where my laptop (the one with the i915) is, so I'll build the kernel on a work computer, then install it at home, then repeat each day.

On the other hand, it's a learning experience, so here we go.

I'm looking at https://www.kernel.org/pub/software/scm/git/docs/git-bisect.html

I'm compiling the kernel debian style since it generates .deb files which are easy to install on other machines. Again, there are posts on how to compile the kernel using a more generic approach, or specifically for Arch, elsewhere on this blog.

NOTE: there's no sure-fire way of triggering this issue, but it seems to occur more frequently when the fan is on but approaching the thermal cut-off point where it will turn off. This makes troubleshooting somewhat more difficult.


1. Download/checkout
First get the sources -- it pulls everything from 2.6.12, so it's bigger than a normal kernel source download. I'll skip explicitly telling you what packages you need to have in order to compile the kernel or use git, since if you don't know, you're probably not really ready for this anyway. Besides, that information is available in other posts on this blog.

mkdir -p ~/tmp/kernel_bisect
cd ~/tmp/kernel_bisect/
git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git linux-git
Cloning into 'linux-git'... remote: Counting objects: 2942099, done. remote: Compressing objects: 100% (443946/443946), done. remote: Total 2942099 (delta 2471580), reused 2940624 (delta 2470325) Receiving objects: 100% (2942099/2942099), 608.66 MiB | 4.58 MiB/s, done. Resolving deltas: 100% (2471580/2471580), done. Checking out files: 100% (42425/42425), done.
cd linux-git/


2A. Bisect
I know that 3.8.5 is bad and that 3.7.10 may be good (seem to remember having issues with the Arch version of 3.7.10, but no issues with my own build on debian. Not sure I can trust my memory).

I know for a fact that 3.2 is good. I don't want to bisect everything from 3.8.5 to 3.2 though, so I'll take a leap of faith and presume that 3.7.10 is good (which I'll regret if that's not true) and 3.8.0 is bad.

I know that the issue is present on amd64.

I think it's an issue to do with i915.

git bisect start -- arch/amd64 drivers/gpu/drm/i915
git bisect bad v3.8
git bisect good v3.7
Bisecting: 160 revisions left to test after this (roughly 7 steps) [28d491df4c6b00f9148a9885dba1f36a078535dc] drm/i915: Bad pixel formats can't reach the sprite code
2B. Compile
I'm being a bit lazy here as well, possibly at the cost of not being able to identify my issue: I'll use silentoldconfig to accept all the default settings.
make oldconfig
time fakeroot make-kpkg -j3 --initrd kernel_image kernel_headers

where 3 in -j3 is the number of cores (see here and here and here for more about the proper value for -j) -- I have a triple core AMD. It took around 43 minutes to build.

This will generate two .deb files in the parent folder -- one kernel image and one with headers.
Install these on your testing system.


sudo dpkg -i ../linux-image-3.7.0-rc2+_3.7.0-rc2+-10.00.Custom_amd64.deb ../linux-headers-3.7.0-rc2+_3.7.0-rc2+-10.00.Custom_amd64.deb

Since you installed .deb packages it's pretty easy to roll back any changes later.

3. Testing
What happens now depends on whether the kernel is plagued by the issue you're trying to bisect for:

3A. The problem persists
You've now narrowed it down to half as many commits.
git bisect bad
make-kpkg clean
make oldconfig
time fakeroot make-kpkg -j3 --initrd kernel_image kernel_headers
sudo dpkg -i ../linux-image-3.7.0-rc2+_3.7.0-rc2+-10.00.Custom_amd64.deb ../linux-headers-3.7.0-rc2+_3.7.0-rc2+-10.00.Custom_amd64.deb


3B. The problem is not present
You've now narrowed it down to half as many commits.
git bisect good
make-kpkg clean
make oldconfig
time fakeroot make-kpkg -j3 --initrd kernel_image kernel_headers
sudo dpkg -i ../linux-image-3.7.0-rc4+_3.7.0-rc4+-10.00.Custom_amd64.deb ../linux-headers-3.7.0-rc4+_3.7.0-rc4+-10.00.Custom_amd64.deb

Repeat step 3 until you've isolated the commit that caused the issue.

For me
* the first bisect didn't cause kworker slow downs (bisect good)
* the second bisect immediately led to slowdown (bisect bad)
* the third one was fine (bisect good)
* the fourth one had no issues (bisect good)
* the fifth one was good (bisect good)
* the sixth one was definitely bad (bisect bad)

powertop gives the following when it's slowing down
 0 mW    423.5 ms/s       4.5        kWork          i915_hotplug_work_func

and watch cat /proc/interrupts:
The number of interrupts increases very, very rapidly for '49:  PCI-MSI-edge i915' during mouse slowdown. Normal rate is around 70 per two seconds.
This is where it gets less clear -- from what I can see:
* the seventh one was a little bit bad (bisect bad).
* the eight one had minor issues when the fan went off (bisect bad).

But the first time I did the bisect I came to the exact opposite conclusion for the last two bisects. I'm suspecting that it's really down to two or more commits that together cause bad behavior, but on their own are merely annoying.

Anyway, I've ended up with this commit as the current culprit (again, the first time around I ended up with a different commit):
607a6f7a6621f65706ff536b2615ee65b5c2f575 is the first bad commit
commit 607a6f7a6621f65706ff536b2615ee65b5c2f575
Author: Daniel Vetter 
Date:   Wed Nov 14 17:47:39 2012 +0100

    drm/i915: drop buggy write to FDI_RX_CHICKEN register
    
    Jani Nikula noticed that the parentheses are wrong and we & the bit
    with the register address instead of the read-back value. He sent a
    patch to correct that.
    
    On second look, we write the same register in the previous line, and
    the w/a seems to be to set FDI_RX_PHASE_SYNC_POINTER_OVR to enable the
    logic, then keep always set FDI_RX_PHASE_SYNC_POINTER_OVR and toggle
    FDI_RX_PHASE_SYNC_POINTER_EN before/after enabling the pc transcoder.
    
    So the right things seems to be to simply kill the 2nd write.
    
    Cc: Jani Nikula 
    Reviewed-by: Chris Wilson 
    [danvet: Dropped a bogus ~ from the commit message that somehow crept
    in.]
    Signed-off-by: Daniel Vetter 

:040000 040000 f789c6c199c9db5c9d0d7961760574b5f0b1ede9 9e0cd2a09cab610b437164b1a74f620e43686ef1 M      drivers

It's just really difficult to reproduce the issue consistently with the last couple of kernels. I am in no way confident that the above commit is what's causing all this.

The last confirmed troublesome bisect (#6):

Here's the log:
git bisect start '--' 'arch/amd64' 'drivers/gpu/drm/i915'
# bad: [19f949f52599ba7c3f67a5897ac6be14bfcb1200] Linux 3.8
git bisect bad 19f949f52599ba7c3f67a5897ac6be14bfcb1200
# good: [29594404d7fe73cd80eaa4ee8c43dcc53970c60e] Linux 3.7
git bisect good 29594404d7fe73cd80eaa4ee8c43dcc53970c60e
# good: [28d491df4c6b00f9148a9885dba1f36a078535dc] drm/i915: Bad pixel formats can't reach the sprite code
git bisect good 28d491df4c6b00f9148a9885dba1f36a078535dc
# bad: [b4a98e57fc27854b5938fc8b08b68e5e68b91e1f] drm/i915: Flush outstanding unpin tasks before pageflipping
git bisect bad b4a98e57fc27854b5938fc8b08b68e5e68b91e1f
# good: [12f3382bc0262e981a2e58aca900cbbdbbe66825] drm/i915: implement WaDisablePSDDualDispatchEnable on IVB & VLV
git bisect good 12f3382bc0262e981a2e58aca900cbbdbbe66825
# good: [b9e0bda3cd325b55f336efb751736163f62abded] drm/i915: Always calculate 8xx WM values based on a 32-bpp framebuffer
git bisect good b9e0bda3cd325b55f336efb751736163f62abded
# good: [1c8b46fc8c865189f562c9ab163d63863759712f] drm/i915: Use LRI to update the semaphore registers
git bisect good 1c8b46fc8c865189f562c9ab163d63863759712f

and here are the remaining commits:


commit b4a98e57fc27854b5938fc8b08b68e5e68b91e1f
Author: Chris Wilson 
Date:   Thu Nov 1 09:26:26 2012 +0000

    drm/i915: Flush outstanding unpin tasks before pageflipping
    
    If we accumulate unpin tasks because we are pageflipping faster than the
    system can schedule its workers, we can effectively create a
    pin-leak. The solution taken here is to limit the number of unpin tasks
    we have per-crtc and to flush those outstanding tasks if we accumulate
    too many. This should prevent any jitter in the normal case, and also
    prevent the hang if we should run too fast.
    
    Note: It is important that we switch from the system workqueue to our
    own dev_priv->wq since all work items on that queue are guaranteed to
    only need the dev->struct_mutex and not any modeset resources. For
    otherwise if we have a work item ahead in the queue which needs the
    modeset lock (like the output detect work used by both polling or
    hpd), this work and so the unpin work will never execute since the
    pageflip code already holds that lock. Unfortunately there's no
    lockdep support for this scenario in the workqueue code.
    
    Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=46991
    Reported-and-tested-by: Tvrtko Ursulin 
    Signed-off-by: Chris Wilson 
    [danvet: Added note about workqueu deadlock.]
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=56337
    Signed-off-by: Daniel Vetter 

 drivers/gpu/drm/i915/intel_display.c |   22 ++++++++++++++++------
 drivers/gpu/drm/i915/intel_drv.h     |    4 +++-
 2 files changed, 19 insertions(+), 7 deletions(-)

commit a726915cef1daab57aad4c5b5e4773822f0a4bf8
Author: Daniel Vetter 
Date:   Tue Nov 20 14:50:08 2012 +0100

    drm/i915: resurrect panel lid handling
    
    But disabled by default. This essentially reverts
    
    commit bcd5023c961a44c7149936553b6929b2b233dd27
    Author: Dave Airlie 
    Date:   Mon Mar 14 14:17:55 2011 +1000
    
        drm/i915: disable opregion lid detection for now
    
    but leaves the autodetect mode disabled. There's also the explicit lid
    status option added in
    
    commit fca874092597ef946b8f07031d8c31c58b212144
    Author: Chris Wilson 
    Date:   Thu Feb 17 13:44:48 2011 +0000
    
        drm/i915: Add a module parameter to ignore lid status
    
    Which overloaded the meaning for the panel_ignore_lid parameter even
    more. To fix up this mess, give the non-negative numbers 0,1 the
    original meaning back and use negative numbers to force a given state.
    So now we have
    
    1  - disable autodetect, return unknown
    0  - enable autodetect
    -1 - force to disconnected/lid closed
    -2 - force to connected/lid open
    
    v2: My C programmer license has been revoked ...
    
    v3: Beautify the code a bit, as suggested by Chris Wilson.
    
    Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=27622
    Tested-by: Andreas Sturmlechner 
    Reviewed-by: Chris Wilson 
    Signed-off-by: Daniel Vetter 

 drivers/gpu/drm/i915/i915_drv.c    |    6 +++---
 drivers/gpu/drm/i915/intel_panel.c |   25 +++++++++++--------------
 2 files changed, 14 insertions(+), 17 deletions(-)

commit 8fed6193736bf22e0e44c03ee783761e9cc37238
Author: Takashi Iwai 
Date:   Mon Nov 19 18:06:51 2012 +0100

    drm/i915: Enable DP audio for Haswell
    
    This patch adds the missing code to send ELD for Haswell DisplayPort,
    based on Xingchao's original patch.
    
    A test was performed with HSW-D machine and NEC EA232Wmi DP monitor.
    
    Cc: Xingchao Wang 
    Signed-off-by: Takashi Iwai 
    Reviewed-by: Paulo Zanoni 
    Signed-off-by: Daniel Vetter 

 drivers/gpu/drm/i915/intel_ddi.c |    9 +++++++++
 1 file changed, 9 insertions(+)

commit c9839303d186d6270f570ff3c5f56c2327958086
Author: Chris Wilson 
Date:   Tue Nov 20 10:45:17 2012 +0000

    drm/i915: Pin the object whilst faulting it in
    
    In order to prevent reaping of the object whilst setting it up to
    handle the pagefault, we need to mark it as pinned. This has the nice
    side-effect of eliminating some special cases from the pagefault handler
    as well!
    
    Signed-off-by: Chris Wilson 
    Signed-off-by: Daniel Vetter 

 drivers/gpu/drm/i915/i915_gem.c |   29 +++++++++--------------------
 1 file changed, 9 insertions(+), 20 deletions(-)

commit fbdda6fb5ee5da401af42226878880069a6b8615
Author: Chris Wilson 
Date:   Tue Nov 20 10:45:16 2012 +0000

    drm/i915: Guard pages being reaped by OOM whilst binding-to-GTT
    
    In the circumstances that the shrinker is allowed to steal the mutex
    in order to reap pages, we need to be careful to prevent it operating on
    the current object and shooting ourselves in the foot.
    
    Signed-off-by: Chris Wilson 
    Signed-off-by: Daniel Vetter 

 drivers/gpu/drm/i915/i915_gem.c |    9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

commit be7cb6347e0c3aa1956748a860a2465a7ea128c4
Author: Chris Wilson 
Date:   Mon Nov 19 15:30:42 2012 +0000

    drm/i915: Remove bogus test for a present execbuffer
    
    The intention of checking obj->gtt_offset!=0 is to verify that the
    target object was listed in the execbuffer and had been bound into the
    GTT. This is guarranteed by the earlier rearrangement to split the
    execbuffer operation into reserve and relocation phases and then
    verified by the check that the target handle had been processed during
    the reservation phase.
    
    However, the actual checking of obj->gtt_offset==0 is bogus as we can
    indeed reference an object at offset 0. For instance, the framebuffer
    installed by the BIOS often resides at offset 0 - causing EINVAL as we
    legimately try to render using the stolen fb.
    
    Signed-off-by: Chris Wilson 
    Reviewed-by: Eric Anholt 
    Signed-off-by: Daniel Vetter 

 drivers/gpu/drm/i915/i915_gem_execbuffer.c |    9 ---------
 1 file changed, 9 deletions(-)

commit b92fa839015f27ba0f5c7ef9812eba9ecff538c2
Author: Chris Wilson 
Date:   Fri Nov 16 11:43:21 2012 +0000

    drm/i915: Remove save/restore of physical HWS_PGA register
    
    Now that we always restore the HWS registers (both physical and GTT
    virtual addresses) when re-initialising the rings, we can eliminate the
    superfluous save/restore of the register across suspend and resume.
    
    Signed-off-by: Chris Wilson 
    Signed-off-by: Daniel Vetter 

 drivers/gpu/drm/i915/i915_drv.h     |    1 -
 drivers/gpu/drm/i915/i915_suspend.c |    8 --------
 2 files changed, 9 deletions(-)

commit d09105c66eb813ab3f57ba5e738f477f6ff92dec
Author: Ben Widawsky 
Date:   Thu Nov 15 12:06:09 2012 -0800

    drm/i915: Fix warning in i915_gem_chipset_flush
    
    drivers/gpu/drm/i915/i915_drv.h:1545:2: warning: '______f' is static but
    declared in inline function 'i915_gem_chipset_flush' which is not static
    
    Reported-by: kbuild test robot 
    dri-devel-Reference: <50a4d41c data-blogger-escaped-.586vhmwghpukzbkb="" data-blogger-escaped-fengguang.wu="" data-blogger-escaped-intel.com="">
    Signed-off-by: Ben Widawsky 
    Signed-off-by: Daniel Vetter 

 drivers/gpu/drm/i915/i915_drv.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

commit 42d42e7e4220753bab3eb7b857721f203a4cd821
Author: Damien Lespiau 
Date:   Wed Oct 31 19:23:16 2012 +0000

    drm/i915: Only check for valid PP_{ON, OFF}_DELAYS on pre ILK hardware
    
    ILK+ have this register on the PCH. This check was triggering unclaimed
    writes.
    
    Signed-off-by: Damien Lespiau 
    Reviewed-by: Paulo Zanoni 
    Signed-off-by: Daniel Vetter 

 drivers/gpu/drm/i915/intel_bios.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

commit 607a6f7a6621f65706ff536b2615ee65b5c2f575
Author: Daniel Vetter 
Date:   Wed Nov 14 17:47:39 2012 +0100

    drm/i915: drop buggy write to FDI_RX_CHICKEN register
    
    Jani Nikula noticed that the parentheses are wrong and we & the bit
    with the register address instead of the read-back value. He sent a
    patch to correct that.
    
    On second look, we write the same register in the previous line, and
    the w/a seems to be to set FDI_RX_PHASE_SYNC_POINTER_OVR to enable the
    logic, then keep always set FDI_RX_PHASE_SYNC_POINTER_OVR and toggle
    FDI_RX_PHASE_SYNC_POINTER_EN before/after enabling the pc transcoder.
    
    So the right things seems to be to simply kill the 2nd write.
    
    Cc: Jani Nikula 
    Reviewed-by: Chris Wilson 
    [danvet: Dropped a bogus ~ from the commit message that somehow crept
    in.]
    Signed-off-by: Daniel Vetter 

 drivers/gpu/drm/i915/intel_display.c |    3 ---
 1 file changed, 3 deletions(-)

16 January 2013

321. Compiling Kernel 3.7.3 (and 3.7.2) on Debian Testing/Wheezy. More data on make -jN.

Updated for 3.7.3

Since post '319. Collection of errors when compiling kernel 3.7.x on AMD FX 8150' is getting traffic from people wanting to compile kernel 3.7.2, and because I didn't know whether the azx_runtime_suspend bug had been fixed, I had to try it out. So here's how to compile kernel 3.7.2 and 3.7.3 -- for 3.7.2 simply replace all instances of 3.7.3..

Looking at the code changes here: http://lists-archives.com/linux-kernel/27763782-alsa-hda-move-runtime-pm-check-to-runtime_idle-callback.html and comparing with what I'm actually seeing in sound/pci/hda/hda_intel.c it seems that 3.7.2 and 3.7.3 have been fixed and no patches need to be applied.

Testing the kernel bears that out.


Compiling the kernel
sudo apt-get install kernel-package fakeroot build-essential ncurses-bin ncurses-dev
mkdir ~/tmp
cd ~/tmp
wget http://www.kernel.org/pub/linux/kernel/v3.0/linux-3.7.3.tar.bz2
tar xvf linux-3.7.3.tar.bz2
cd linux-3.7.3/
cat /boot/config-`uname -r`>.config
make oldconfig
make-kpkg clean

If you want to add specific drivers etc to the kernel, run
make menuconfig

Note that if you're transitioning from kernel 3.5 to 3.7 you will needto specifically and explicitly include a lot of the graphics (pci tv cards, usb web cams) drivers that used to be automatically included before. Then continue:

time fakeroot make-kpkg -j6 --initrd kernel_image kernel_headers
sudo dpkg -i ../linux-image-3.7.3_3.7.3-10.00.Custom_amd64.deb ../linux-headers-3.7.3_3.7.3-10.00.Custom_amd64.deb

And you're done. Keep reading to learn more about -j6.


Optimal -jN
See here for another post on -jN: http://verahill.blogspot.com.au/2013/01/305-make-jn-should-n-equal-number-of.html. In short, it's not always clear whether N should equal the number of cores, or be larger than the number of cores. In that post, N+1 was the optimal configuration, but that was a very short compilation where i/o likely played a large role.

More data is needed, so here it is. Seems like N=number of cores is the best option for long builds (as was pointed out to me in a comment). This was done with kernel 3.7.2.

On a four-core Intel i5-2400 with 16 Gb memory
N Time ------------- 2 30m 58s 3 22m 36s 4 19m 49s 5 22m 2s 6 23m 13s

Acquired using sar/sysstat
Here's what's happening with -j4:
Basically, the first 15 minutes things are running in parallel, with t i/o slowing things down during the last 5 minutes.

On a six-core AMD Phenom II 1055T with 8 Gb memory
N Time (s) ------------- 4 34m 16s 5 27m 19s 6 24m 60s 7 30m 18s 8 31m 47s


Hardware profiles:
Intel machine:
00:00.0 0600: 8086:0100 (rev 09) 00:02.0 0300: 8086:0102 (rev 09) 00:16.0 0780: 8086:1c3a (rev 04) 00:16.3 0700: 8086:1c3d (rev 04) 00:19.0 0200: 8086:1502 (rev 04) 00:1a.0 0c03: 8086:1c2d (rev 04) 00:1b.0 0403: 8086:1c20 (rev 04) 00:1c.0 0604: 8086:1c10 (rev b4) 00:1c.2 0604: 8086:1c14 (rev b4) 00:1d.0 0c03: 8086:1c26 (rev 04) 00:1e.0 0604: 8086:244e (rev a4) 00:1f.0 0601: 8086:1c4e (rev 04) 00:1f.2 0104: 8086:2822 (rev 04) 00:1f.3 0c05: 8086:1c22 (rev 04)

AMD machine:
00:00.0 0600: 1022:9601 00:01.0 0604: 1022:9602 00:07.0 0604: 1022:9607 00:11.0 0106: 1002:4390 00:12.0 0c03: 1002:4397 00:12.1 0c03: 1002:4398 00:12.2 0c03: 1002:4396 00:13.0 0c03: 1002:4397 00:13.1 0c03: 1002:4398 00:13.2 0c03: 1002:4396 00:14.0 0c05: 1002:4385 (rev 3c) 00:14.1 0101: 1002:439c 00:14.2 0403: 1002:4383 00:14.3 0601: 1002:439d 00:14.4 0604: 1002:4384 00:14.5 0c03: 1002:4399 00:18.0 0600: 1022:1200 00:18.1 0600: 1022:1201 00:18.2 0600: 1022:1202 00:18.3 0600: 1022:1203 00:18.4 0600: 1022:1204 01:05.0 0300: 1002:9715 01:05.1 0403: 1002:970f 02:00.0 0200: 10ec:8168 (rev 03) 03:05.0 0200: 10ec:8169 (rev 10

22 May 2012

160. Compiling kernel 3.4 on debian

The steps are the usual ones. At this point compiling your kernel is perhaps more of a hobby than a necessity to most people, unless you happen to have some fancy piece of hardware that's about to become supported.

It's not difficult, so there's no reason not to give it a spin.

UPDATE 9/7: Works fine with 3.4.4 as well (as it should). Compile time with -j7 on AMD X6 1055 is

real 33m27.472s
user 84m7.295s
sys 15m58.668s

which is underwhelming.


-- Start Here --
sudo apt-get install kernel-package fakeroot build-essential

mkdir ~/tmp
cd ~/tmp
wget http://www.kernel.org/pub/linux/kernel/v3.0/linux-3.4.tar.bz2
tar xvf linux-3.4.tar.bz2
cd linux-3.4/
cat /boot/config-`uname -r`>.config
make oldconfig

If your current kernel is 3.3.5 the questions that await are given at the bottom of this post with links to descriptions of the different options. As usual, if in doubt, just hit enter.

make-kpkg clean

Building takes ages (depending on number of cores committed), so don't launch it at 4 pm on a Friday if you need to shut down your computer before going home... As usual, use the -jX switch for parallel builds, where X is the number of cores+1 (i.e. 4 cores => -j5)

The following command goes on a single line
time fakeroot make-kpkg -j5 --initrd --revision=3.4.0 --append-to-version=-amd64 kernel_image kernel_headers

Once the build is done, move the .deb files out of the way and to your linux-3.4 directory for safe-keeping
 mv ../*3.4.0*.deb .
sudo dpkg -i *.deb

Done.

The image weighs in at about 33 Mb and the headers at 7.6 Mb
And compile time with 4 out of 6 cores?  Well, not too bad:

real    34m51.027s
user    73m35.644s
sys     15m9.169s



Questions:
Boottime Graphics Resource Table support (ACPI_BGRT) [N/m/y/?] (NEW)
      Default ASPM policy
      > 1. BIOS default (PCIEASPM_DEFAULT) (NEW)
        2. Powersave (PCIEASPM_POWERSAVE) (NEW)
        3. Performance (PCIEASPM_PERFORMANCE) (NEW)
      choice[1-3]: 1
Enable PCI resource re-allocation detection (PCI_REALLOC_ENABLE_AUTO) [N/y/?] (NEW)
x32 ABI for 64-bit mode (EXPERIMENTAL) (X86_X32) [N/y/?] (NEW) See also cateee
Connection tracking timeout (NF_CONNTRACK_TIMEOUT) [N/y/?] (NEW)
Connection tracking timeout tuning via Netlink (NF_CT_NETLINK_TIMEOUT) [N/m/?] (NEW)
LOG target support (NETFILTER_XT_TARGET_LOG) [N/m/?] (NEW) M
Plug network traffic until release (PLUG) (NET_SCH_PLUG) [N/m/y/?] (NEW)
PEAK PCAN-PC Card (CAN_PEAK_PCMCIA) [N/m/?] (NEW)
 PEAK PCAN-ExpressCard Cards (CAN_PEAK_PCIEC) [Y/n/?] (NEW)
PEAK PCAN-USB/USB Pro interfaces (CAN_PEAK_USB) [N/m/?] (NEW)
 Support for DiskOnChip G4 (EXPERIMENTAL) (MTD_NAND_DOCG4) [N/m/?] (NEW)
Universal Flash Storage host controller driver (SCSI_UFSHCD) [N/m/?] (NEW)
virtio-scsi support (EXPERIMENTAL) (SCSI_VIRTIO) [N/m/?] (NEW)
 Verity target support (EXPERIMENTAL) (DM_VERITY) [N/m/?] (NEW)
Solarflare SFC9000-family hwmon support (SFC_MCDI_MON) [Y/n/?] (NEW)
Solarflare SFC9000-family SR-IOV support (SFC_SRIOV) [Y/n/?] (NEW)
 Drivers for the AMD PHYs (AMD_PHY) [N/m/?] (NEW)
QMI WWAN driver for Qualcomm MSM based 3G and LTE modems (USB_NET_QMI_WWAN) [N/m/?] (NEW)
support MFP (802.11w) even if uCode doesn't advertise (IWLWIFI_EXPERIMENTAL_MFP) [N/y/?] (NEW)
Additional debugging output (RTLWIFI_DEBUG) [Y/n] (NEW)
TI OMAP4 keypad support (KEYBOARD_OMAP4) [N/m/y/?] (NEW)
Synaptics USB device support (MOUSE_SYNAPTICS_USB) [N/m/y/?] (NEW)
Cypress TTSP touchscreen (TOUCHSCREEN_CYTTSP_CORE) [N/m/y/?] (NEW) 
Ilitek ILI210X based touchscreen (TOUCHSCREEN_ILI210X) [N/m/?] (NEW)
Xen Hypervisor Multiple Consoles support (HVC_XEN_FRONTEND) [Y/n/?] (NEW)
HSI support (HSI) [N/m/y/?] (NEW)
Intel PCH EG20T as PTP clock (PTP_1588_CLOCK_PCH) [N/m/?] (NEW) 
Dallas 2781 battery monitor chip (W1_SLAVE_DS2781) [N/m/?] (NEW) 
 2781 battery driver (BATTERY_DS2781) [N/m/?] (NEW)
Summit Microelectronics SMB347 Battery Charger (CHARGER_SMB347) [N/m/?] (NEW) 
Microchip MCP3021 (SENSORS_MCP3021) [N/m/?] (NEW) 
TPS65217 Power Management / White LED chips (MFD_TPS65217) [N/m/?] (NEW)
  TI TPS62360 Power Regulator (REGULATOR_TPS62360) [N/m/?] (NEW) 
  GPIO IR remote control (IR_GPIO_CIR) [N/m/?] (NEW) 
 Keene FM Transmitter USB support (USB_KEENE) [N/m/?] (NEW)
AzureWave 6007 and clones DVB-T/C USB2.0 support (DVB_USB_AZ6007) [N/m/?] (NEW) 
Realtek RTL28xxU DVB USB support (DVB_USB_RTL28XXU) [N/m/?] (NEW)
Allow to specify an EDID data set instead of probing for it (DRM_LOAD_EDID_FIRMWARE) [N/y/?] (NEW)
  DisplayLink (DRM_UDL) [N/m/?] (NEW)
Intel740 support (EXPERIMENTAL) (FB_I740) [N/m/y/?] (NEW) 
Exynos Video driver support (EXYNOS_VIDEO) [N/y/?] (NEW)
Backlight driver for TI LP855X (BACKLIGHT_LP855X) [N/m/?] (NEW)
Saitek non-fully HID-compliant devices (HID_SAITEK) [N/m/?] (NEW)
TiVo Slide Bluetooth remote control support (HID_TIVO) [N/m/?] (NEW)
 Generic OHCI driver for a platform device (USB_OHCI_HCD_PLATFORM) [N/y/?] (NEW) 
Generic EHCI driver for a platform device (USB_EHCI_HCD_PLATFORM) [N/y/?] (NEW)
USB Fintek F81232 Single Port Serial Driver (USB_SERIAL_F81232) [N/m/?] (NEW) 
USB Metrologic Instruments USB-POS Barcode Scanner Driver (USB_SERIAL_METRO) [N/m/?] (NEW)
 LED support for PCA9633 I2C chip (LEDS_PCA9633) [N/m/?] (NEW)
Xen ACPI processor (XEN_ACPI_PROCESSOR) [M/n/?] (NEW) 
Memory allocator for compressed pages (ZSMALLOC) [M/y/?] (NEW) 
 Intel Management Engine Interface (Intel MEI) (INTEL_MEI) [N/m/y/?] (NEW) 
USB over WiFi Host Controller (USB_WPAN_HCD) [N/m/?] (NEW) 
Apple Gmux Driver (APPLE_GMUX) [N/m/y/?] (NEW) 
QNX6 file system support (read only) (QNX6FS_FS) [N/m/y/?] (NEW) 
 NFSv4.1 Implementation ID Domain (NFS_V4_1_IMPLEMENTATION_ID_DOMAIN) [kernel.org] (NEW) 
RPC: Enable dprintk debugging (SUNRPC_DEBUG) [N/y/?] (NEW) 
Print additional diagnostics on RCU CPU stall (RCU_CPU_STALL_INFO) [N/y/?] (NEW)
Yama support (SECURITY_YAMA) [N/y/?] (NEW)
Camellia cipher algorithm (x86_64) (CRYPTO_CAMELLIA_X86_64) [N/m/y/?] (NEW)
CRC32 perform self test on init (CRC32_SELFTEST) [N/y/?] (NEW) 
 CRC32 implementation
  > 1. Slice by 8 bytes (CRC32_SLICEBY8) (NEW)
    2. Slice by 4 bytes (CRC32_SLICEBY4) (NEW)
    3. Sarwate's Algorithm (one byte at a time) (CRC32_SARWATE) (NEW)
    4. Classic Algorithm (one bit at a time) (CRC32_BIT) (NEW)
  choice[1-4?]: 




Links to this post:
http://askubuntu.com/questions/147725/ubuntu-12-04-fail-to-upgrade-to-kernel-3-4
http://www.deltageek.fr/installer-un-nouveau-noyau-linux/
http://thinkpad-forum.de/threads/141365-Linux-Probleme-mit-neuen-Modellen-(W-L-X-Tx30)/page2
http://crunchbang.org/forums/viewtopic.php?id=24814