Showing posts with label git. Show all posts
Showing posts with label git. Show all posts

04 April 2013

375. Bisecting the kernel -- looking for the commit behind the kworker/i915 issue.

This is an attempt to trace this issue:

See also

Update: I'm revising this post. Not quite done yet.

Another Update: Someone else has just bisected and ended up with a different commit:
The patch they found makes a lot more sense than any of the ones I ended up with.

However, I do still see issues with the kernel I end up with, so I'm either starting from the wrong 'good', or there are several bad commits.

A third update: I just experienced slow-down with 1k interrupts per second on 3.7.10, which may indicate that I picked a bad starting point for bisecting. I'll leave the post up since it can serve as a guide for how to bisect in general.

A fourth update:
A fix is on the way to kernel 3.9 or 3.10.

Original post:

I didn't really want to do this, for several reasons:

* the bug only manifests itself on one of my computers, with intel graphics -- and that's a laptop (I'm not a huge fan of laptops, especially not with budding carpal tunnel syndrome, so testing the bisected version is a bit of a pain -- literally)

* the bug isn't consistently triggered, so you need to test the new kernel for an hour or more, and only by the absence of a specific behaviour can you see whether all is good i.e. testing is a bit of hit and miss.

* compiling the kernel takes a long time, and I don't currently have a suitable computer for it at home where my laptop (the one with the i915) is, so I'll build the kernel on a work computer, then install it at home, then repeat each day.

On the other hand, it's a learning experience, so here we go.

I'm looking at

I'm compiling the kernel debian style since it generates .deb files which are easy to install on other machines. Again, there are posts on how to compile the kernel using a more generic approach, or specifically for Arch, elsewhere on this blog.

NOTE: there's no sure-fire way of triggering this issue, but it seems to occur more frequently when the fan is on but approaching the thermal cut-off point where it will turn off. This makes troubleshooting somewhat more difficult.

1. Download/checkout
First get the sources -- it pulls everything from 2.6.12, so it's bigger than a normal kernel source download. I'll skip explicitly telling you what packages you need to have in order to compile the kernel or use git, since if you don't know, you're probably not really ready for this anyway. Besides, that information is available in other posts on this blog.

mkdir -p ~/tmp/kernel_bisect
cd ~/tmp/kernel_bisect/
git clone git:// linux-git
Cloning into 'linux-git'... remote: Counting objects: 2942099, done. remote: Compressing objects: 100% (443946/443946), done. remote: Total 2942099 (delta 2471580), reused 2940624 (delta 2470325) Receiving objects: 100% (2942099/2942099), 608.66 MiB | 4.58 MiB/s, done. Resolving deltas: 100% (2471580/2471580), done. Checking out files: 100% (42425/42425), done.
cd linux-git/

2A. Bisect
I know that 3.8.5 is bad and that 3.7.10 may be good (seem to remember having issues with the Arch version of 3.7.10, but no issues with my own build on debian. Not sure I can trust my memory).

I know for a fact that 3.2 is good. I don't want to bisect everything from 3.8.5 to 3.2 though, so I'll take a leap of faith and presume that 3.7.10 is good (which I'll regret if that's not true) and 3.8.0 is bad.

I know that the issue is present on amd64.

I think it's an issue to do with i915.

git bisect start -- arch/amd64 drivers/gpu/drm/i915
git bisect bad v3.8
git bisect good v3.7
Bisecting: 160 revisions left to test after this (roughly 7 steps) [28d491df4c6b00f9148a9885dba1f36a078535dc] drm/i915: Bad pixel formats can't reach the sprite code
2B. Compile
I'm being a bit lazy here as well, possibly at the cost of not being able to identify my issue: I'll use silentoldconfig to accept all the default settings.
make oldconfig
time fakeroot make-kpkg -j3 --initrd kernel_image kernel_headers

where 3 in -j3 is the number of cores (see here and here and here for more about the proper value for -j) -- I have a triple core AMD. It took around 43 minutes to build.

This will generate two .deb files in the parent folder -- one kernel image and one with headers.
Install these on your testing system.

sudo dpkg -i ../linux-image-3.7.0-rc2+_3.7.0-rc2+-10.00.Custom_amd64.deb ../linux-headers-3.7.0-rc2+_3.7.0-rc2+-10.00.Custom_amd64.deb

Since you installed .deb packages it's pretty easy to roll back any changes later.

3. Testing
What happens now depends on whether the kernel is plagued by the issue you're trying to bisect for:

3A. The problem persists
You've now narrowed it down to half as many commits.
git bisect bad
make-kpkg clean
make oldconfig
time fakeroot make-kpkg -j3 --initrd kernel_image kernel_headers
sudo dpkg -i ../linux-image-3.7.0-rc2+_3.7.0-rc2+-10.00.Custom_amd64.deb ../linux-headers-3.7.0-rc2+_3.7.0-rc2+-10.00.Custom_amd64.deb

3B. The problem is not present
You've now narrowed it down to half as many commits.
git bisect good
make-kpkg clean
make oldconfig
time fakeroot make-kpkg -j3 --initrd kernel_image kernel_headers
sudo dpkg -i ../linux-image-3.7.0-rc4+_3.7.0-rc4+-10.00.Custom_amd64.deb ../linux-headers-3.7.0-rc4+_3.7.0-rc4+-10.00.Custom_amd64.deb

Repeat step 3 until you've isolated the commit that caused the issue.

For me
* the first bisect didn't cause kworker slow downs (bisect good)
* the second bisect immediately led to slowdown (bisect bad)
* the third one was fine (bisect good)
* the fourth one had no issues (bisect good)
* the fifth one was good (bisect good)
* the sixth one was definitely bad (bisect bad)

powertop gives the following when it's slowing down
 0 mW    423.5 ms/s       4.5        kWork          i915_hotplug_work_func

and watch cat /proc/interrupts:
The number of interrupts increases very, very rapidly for '49:  PCI-MSI-edge i915' during mouse slowdown. Normal rate is around 70 per two seconds.
This is where it gets less clear -- from what I can see:
* the seventh one was a little bit bad (bisect bad).
* the eight one had minor issues when the fan went off (bisect bad).

But the first time I did the bisect I came to the exact opposite conclusion for the last two bisects. I'm suspecting that it's really down to two or more commits that together cause bad behavior, but on their own are merely annoying.

Anyway, I've ended up with this commit as the current culprit (again, the first time around I ended up with a different commit):
607a6f7a6621f65706ff536b2615ee65b5c2f575 is the first bad commit
commit 607a6f7a6621f65706ff536b2615ee65b5c2f575
Author: Daniel Vetter 
Date:   Wed Nov 14 17:47:39 2012 +0100

    drm/i915: drop buggy write to FDI_RX_CHICKEN register
    Jani Nikula noticed that the parentheses are wrong and we & the bit
    with the register address instead of the read-back value. He sent a
    patch to correct that.
    On second look, we write the same register in the previous line, and
    the w/a seems to be to set FDI_RX_PHASE_SYNC_POINTER_OVR to enable the
    logic, then keep always set FDI_RX_PHASE_SYNC_POINTER_OVR and toggle
    FDI_RX_PHASE_SYNC_POINTER_EN before/after enabling the pc transcoder.
    So the right things seems to be to simply kill the 2nd write.
    Cc: Jani Nikula 
    Reviewed-by: Chris Wilson 
    [danvet: Dropped a bogus ~ from the commit message that somehow crept
    Signed-off-by: Daniel Vetter 

:040000 040000 f789c6c199c9db5c9d0d7961760574b5f0b1ede9 9e0cd2a09cab610b437164b1a74f620e43686ef1 M      drivers

It's just really difficult to reproduce the issue consistently with the last couple of kernels. I am in no way confident that the above commit is what's causing all this.

The last confirmed troublesome bisect (#6):

Here's the log:
git bisect start '--' 'arch/amd64' 'drivers/gpu/drm/i915'
# bad: [19f949f52599ba7c3f67a5897ac6be14bfcb1200] Linux 3.8
git bisect bad 19f949f52599ba7c3f67a5897ac6be14bfcb1200
# good: [29594404d7fe73cd80eaa4ee8c43dcc53970c60e] Linux 3.7
git bisect good 29594404d7fe73cd80eaa4ee8c43dcc53970c60e
# good: [28d491df4c6b00f9148a9885dba1f36a078535dc] drm/i915: Bad pixel formats can't reach the sprite code
git bisect good 28d491df4c6b00f9148a9885dba1f36a078535dc
# bad: [b4a98e57fc27854b5938fc8b08b68e5e68b91e1f] drm/i915: Flush outstanding unpin tasks before pageflipping
git bisect bad b4a98e57fc27854b5938fc8b08b68e5e68b91e1f
# good: [12f3382bc0262e981a2e58aca900cbbdbbe66825] drm/i915: implement WaDisablePSDDualDispatchEnable on IVB & VLV
git bisect good 12f3382bc0262e981a2e58aca900cbbdbbe66825
# good: [b9e0bda3cd325b55f336efb751736163f62abded] drm/i915: Always calculate 8xx WM values based on a 32-bpp framebuffer
git bisect good b9e0bda3cd325b55f336efb751736163f62abded
# good: [1c8b46fc8c865189f562c9ab163d63863759712f] drm/i915: Use LRI to update the semaphore registers
git bisect good 1c8b46fc8c865189f562c9ab163d63863759712f

and here are the remaining commits:

commit b4a98e57fc27854b5938fc8b08b68e5e68b91e1f
Author: Chris Wilson 
Date:   Thu Nov 1 09:26:26 2012 +0000

    drm/i915: Flush outstanding unpin tasks before pageflipping
    If we accumulate unpin tasks because we are pageflipping faster than the
    system can schedule its workers, we can effectively create a
    pin-leak. The solution taken here is to limit the number of unpin tasks
    we have per-crtc and to flush those outstanding tasks if we accumulate
    too many. This should prevent any jitter in the normal case, and also
    prevent the hang if we should run too fast.
    Note: It is important that we switch from the system workqueue to our
    own dev_priv->wq since all work items on that queue are guaranteed to
    only need the dev->struct_mutex and not any modeset resources. For
    otherwise if we have a work item ahead in the queue which needs the
    modeset lock (like the output detect work used by both polling or
    hpd), this work and so the unpin work will never execute since the
    pageflip code already holds that lock. Unfortunately there's no
    lockdep support for this scenario in the workqueue code.
    Reported-and-tested-by: Tvrtko Ursulin 
    Signed-off-by: Chris Wilson 
    [danvet: Added note about workqueu deadlock.]
    Signed-off-by: Daniel Vetter 

 drivers/gpu/drm/i915/intel_display.c |   22 ++++++++++++++++------
 drivers/gpu/drm/i915/intel_drv.h     |    4 +++-
 2 files changed, 19 insertions(+), 7 deletions(-)

commit a726915cef1daab57aad4c5b5e4773822f0a4bf8
Author: Daniel Vetter 
Date:   Tue Nov 20 14:50:08 2012 +0100

    drm/i915: resurrect panel lid handling
    But disabled by default. This essentially reverts
    commit bcd5023c961a44c7149936553b6929b2b233dd27
    Author: Dave Airlie 
    Date:   Mon Mar 14 14:17:55 2011 +1000
        drm/i915: disable opregion lid detection for now
    but leaves the autodetect mode disabled. There's also the explicit lid
    status option added in
    commit fca874092597ef946b8f07031d8c31c58b212144
    Author: Chris Wilson 
    Date:   Thu Feb 17 13:44:48 2011 +0000
        drm/i915: Add a module parameter to ignore lid status
    Which overloaded the meaning for the panel_ignore_lid parameter even
    more. To fix up this mess, give the non-negative numbers 0,1 the
    original meaning back and use negative numbers to force a given state.
    So now we have
    1  - disable autodetect, return unknown
    0  - enable autodetect
    -1 - force to disconnected/lid closed
    -2 - force to connected/lid open
    v2: My C programmer license has been revoked ...
    v3: Beautify the code a bit, as suggested by Chris Wilson.
    Tested-by: Andreas Sturmlechner 
    Reviewed-by: Chris Wilson 
    Signed-off-by: Daniel Vetter 

 drivers/gpu/drm/i915/i915_drv.c    |    6 +++---
 drivers/gpu/drm/i915/intel_panel.c |   25 +++++++++++--------------
 2 files changed, 14 insertions(+), 17 deletions(-)

commit 8fed6193736bf22e0e44c03ee783761e9cc37238
Author: Takashi Iwai 
Date:   Mon Nov 19 18:06:51 2012 +0100

    drm/i915: Enable DP audio for Haswell
    This patch adds the missing code to send ELD for Haswell DisplayPort,
    based on Xingchao's original patch.
    A test was performed with HSW-D machine and NEC EA232Wmi DP monitor.
    Cc: Xingchao Wang 
    Signed-off-by: Takashi Iwai 
    Reviewed-by: Paulo Zanoni 
    Signed-off-by: Daniel Vetter 

 drivers/gpu/drm/i915/intel_ddi.c |    9 +++++++++
 1 file changed, 9 insertions(+)

commit c9839303d186d6270f570ff3c5f56c2327958086
Author: Chris Wilson 
Date:   Tue Nov 20 10:45:17 2012 +0000

    drm/i915: Pin the object whilst faulting it in
    In order to prevent reaping of the object whilst setting it up to
    handle the pagefault, we need to mark it as pinned. This has the nice
    side-effect of eliminating some special cases from the pagefault handler
    as well!
    Signed-off-by: Chris Wilson 
    Signed-off-by: Daniel Vetter 

 drivers/gpu/drm/i915/i915_gem.c |   29 +++++++++--------------------
 1 file changed, 9 insertions(+), 20 deletions(-)

commit fbdda6fb5ee5da401af42226878880069a6b8615
Author: Chris Wilson 
Date:   Tue Nov 20 10:45:16 2012 +0000

    drm/i915: Guard pages being reaped by OOM whilst binding-to-GTT
    In the circumstances that the shrinker is allowed to steal the mutex
    in order to reap pages, we need to be careful to prevent it operating on
    the current object and shooting ourselves in the foot.
    Signed-off-by: Chris Wilson 
    Signed-off-by: Daniel Vetter 

 drivers/gpu/drm/i915/i915_gem.c |    9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

commit be7cb6347e0c3aa1956748a860a2465a7ea128c4
Author: Chris Wilson 
Date:   Mon Nov 19 15:30:42 2012 +0000

    drm/i915: Remove bogus test for a present execbuffer
    The intention of checking obj->gtt_offset!=0 is to verify that the
    target object was listed in the execbuffer and had been bound into the
    GTT. This is guarranteed by the earlier rearrangement to split the
    execbuffer operation into reserve and relocation phases and then
    verified by the check that the target handle had been processed during
    the reservation phase.
    However, the actual checking of obj->gtt_offset==0 is bogus as we can
    indeed reference an object at offset 0. For instance, the framebuffer
    installed by the BIOS often resides at offset 0 - causing EINVAL as we
    legimately try to render using the stolen fb.
    Signed-off-by: Chris Wilson 
    Reviewed-by: Eric Anholt 
    Signed-off-by: Daniel Vetter 

 drivers/gpu/drm/i915/i915_gem_execbuffer.c |    9 ---------
 1 file changed, 9 deletions(-)

commit b92fa839015f27ba0f5c7ef9812eba9ecff538c2
Author: Chris Wilson 
Date:   Fri Nov 16 11:43:21 2012 +0000

    drm/i915: Remove save/restore of physical HWS_PGA register
    Now that we always restore the HWS registers (both physical and GTT
    virtual addresses) when re-initialising the rings, we can eliminate the
    superfluous save/restore of the register across suspend and resume.
    Signed-off-by: Chris Wilson 
    Signed-off-by: Daniel Vetter 

 drivers/gpu/drm/i915/i915_drv.h     |    1 -
 drivers/gpu/drm/i915/i915_suspend.c |    8 --------
 2 files changed, 9 deletions(-)

commit d09105c66eb813ab3f57ba5e738f477f6ff92dec
Author: Ben Widawsky 
Date:   Thu Nov 15 12:06:09 2012 -0800

    drm/i915: Fix warning in i915_gem_chipset_flush
    drivers/gpu/drm/i915/i915_drv.h:1545:2: warning: '______f' is static but
    declared in inline function 'i915_gem_chipset_flush' which is not static
    Reported-by: kbuild test robot 
    dri-devel-Reference: <50a4d41c data-blogger-escaped-.586vhmwghpukzbkb="" data-blogger-escaped-fengguang.wu="""">
    Signed-off-by: Ben Widawsky 
    Signed-off-by: Daniel Vetter 

 drivers/gpu/drm/i915/i915_drv.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

commit 42d42e7e4220753bab3eb7b857721f203a4cd821
Author: Damien Lespiau 
Date:   Wed Oct 31 19:23:16 2012 +0000

    drm/i915: Only check for valid PP_{ON, OFF}_DELAYS on pre ILK hardware
    ILK+ have this register on the PCH. This check was triggering unclaimed
    Signed-off-by: Damien Lespiau 
    Reviewed-by: Paulo Zanoni 
    Signed-off-by: Daniel Vetter 

 drivers/gpu/drm/i915/intel_bios.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

commit 607a6f7a6621f65706ff536b2615ee65b5c2f575
Author: Daniel Vetter 
Date:   Wed Nov 14 17:47:39 2012 +0100

    drm/i915: drop buggy write to FDI_RX_CHICKEN register
    Jani Nikula noticed that the parentheses are wrong and we & the bit
    with the register address instead of the read-back value. He sent a
    patch to correct that.
    On second look, we write the same register in the previous line, and
    the w/a seems to be to set FDI_RX_PHASE_SYNC_POINTER_OVR to enable the
    logic, then keep always set FDI_RX_PHASE_SYNC_POINTER_OVR and toggle
    FDI_RX_PHASE_SYNC_POINTER_EN before/after enabling the pc transcoder.
    So the right things seems to be to simply kill the 2nd write.
    Cc: Jani Nikula 
    Reviewed-by: Chris Wilson 
    [danvet: Dropped a bogus ~ from the commit message that somehow crept
    Signed-off-by: Daniel Vetter 

 drivers/gpu/drm/i915/intel_display.c |    3 ---
 1 file changed, 3 deletions(-)

23 February 2013

343. 'track changes' in LaTeX: collaborative writing

While I find latex superior to word processors in almost every single aspect, there is one thing I have to admit that MS/Libre/Open Office have over LaTeX: track changes. Since LaTeX is nothing but code you can obviously use something like git or svn to manage versions of .tex code. There are some differences though -- e.g. git would show you the diff output, but not necessarily the context of the changes, and I'm having a difficult enough time convincing people to use LaTeX without having to worry about setting up a git server as well (I can obviously cheat around it by running a local git server, downloading their file and push it to my git server etc, but then I would be the only one who'd be able to track the changes. Also, it's not elegant).

I have a student who is currently preparing a thesis, and while I've managed to give most of my feedback via the pdf annotation function in mendeley, it doesn't feel natural when it should be a simple matter of editing the .tex directly (yet in a way that leaves the student in control over whether to accept or reject the changes).

So here's an exploration of three different ways of passing comments and corrections back and forth when using LaTeX.

I'm also not the first one to ask this question. See e.g.

1. Make a pdf and annotate it
 This is a simple and straightforward method for passing comments back and forth. It is not a very convenient method for making any substantial changes though. On top of that, linux does not currently have any particularly useful program for annotating pdf files. See here: for more information.

Quite apart from the lack of tool, I really want to emphasize that making small annotations is not a replacement for a tool that allows you to really stir up the text.

2. Using git
You can easily set up your own git server and use that for versioning.

You can see the version differences using a couple of methods, but gitk might be the easiest one. The downside with this is that you lose some of the context of the changes. Also, it shows you the code that was changed, the changes in the final document (e.g. formatting, figures etc.).

Still it's a pretty intuitive and easy way of tracking changes -- and in terms of versioning it is certainly the best solution. You can fork, roll back and generally have a perfect account of how the document has evolved.
git in particular makes it easy to see added code

In terms of small changes like fixing typos it is a lot more subtle though

3. latexdiff
latexdiff is an easy to use tool that's in the debian repos. Simply do
latexdiff version1.tex version2.tex > diff.tex

and then open diff.tex and compile it to see the changes.

Pleasantly suprised

4. Other solutions
there are plenty of solutions for introducing mark-up directly into the tex document, like trackchanges, changes etc. I've used them in the past but it lack elegance when doing heavier collaborative writing.


I can't help but think that the best solution is to combine git with latexdiff -- or even a special fork of git that uses latexdiff.

Googling shows that apparently I"m not the only one who thinks so: as well as
I haven't actually tested any of those methods yet, and it still more or less requires a common git server.

22 January 2013

324. Setting up a private git server

Update 11 Feb 2013: fixed a typo that ruined everything.

I've recently moved one of my more mature projects to, which allowed me to re-discover the joys of using a version management system, in this case git. While I've spent some time playing with SVN in the past, git feels more natural to me. That's it's used widely in the FOSS community certainly doesn't hurt either.

So that got me interested in setting up a local git server for projects that may or may not result in anything tangible i.e. projects that I don't want to put online, but are under so much flux that they may break anytime (and I thus want to be able to roll back).

[A pointer to my students: you can obviously use git to maintain versions of your .tex documents as well...]

In this particular case the development machine and the git host are the same. The project is called shellnmr (an attempt to make a bash-like shell in python for NMR processing)

This is very much inspired by/stolen from:

Getting set up:

On the git host (which has not yet got a copy of the source code)
sudo apt-get install git
mkdir ~/work/programming/var -p
cd ~/work/programming/var
mkdir shellnmr
cd shellnmr/
git init --bare
Initialized empty Git repository in /home/me/work/programming/var/shellnmr
First time:
I''ll use localhost below because in this particular example the git host and the development machine is the same.

On the/a development machine, where we keep the already existing project in ~/work/shellnmr

cd ~/work/shellnmr
git init
Initialized empty Git repository in /home/me/work/shellnmr/.git/
git add . git commit -m 'initial commit'
[master (root-commit) 5214860] initial commit 2 files changed, 178967 insertions(+) create mode 100644 create mode 100755
git remote rm origin git remote add origin me@localhost:/home/me/work/programming/var/shellnmr git push origin master
Counting objects: 2, done. Delta compression using up to 3 threads. Compressing objects: 100% (18/18), done. Writing objects: 100% (2/2), 339.14 KiB, done. Total 2 (delta 5), reused 0 (delta 0) To me@localhost:/home/me/work/programming/var/shellnmr * [new branch] master -> master

Everyday usage:

Say you've edited the code on your development machine and want to commit the changes:
git commit -a
git push


Let's say you go home and want to continue editing the code there and that this is the first time you're doing so:
git clone user@githost:/home/me/work/programming/var/nmrshell nmrshell-code

When you're done editing, just commit as normal;
git commit -a
git push

Let's say you've edited the code on another computer, committed the code, and then want to get the latest updates on your regular development machine again:
git pull origin master

And that's about it.

GUI - gitk:
The debian repos have a git GUI as well. To install do
sudo apt-get install git-gui gitk

Start it by launching gitk from the directory where you keep your files (on the development machine)