Friday, February 5, 2010

Ubuntu fails to load nvidia kernel module

As much a note to myself, as a warning to others…

After a kernel upgrade, my Ubuntu Karmic started the X server in low-resolution mode. My Xorg.0.log said:

(II) LoadModule: "nvidia"
(II) Loading /usr/lib/xorg/modules/drivers//nvidia_drv.so
(II) Module nvidia: vendor="NVIDIA Corporation"
        compiled for 4.0.2, module version = 1.0.0
        Module class: X.Org Video Driver
(EE) NVIDIA: Failed to load the NVIDIA kernel module. Please check your
(EE) NVIDIA:     system's kernel log for additional error messages.
(II) UnloadModule: "nvidia"
(II) Unloading /usr/lib/xorg/modules/drivers//nvidia_drv.so
(EE) Failed to load module "nvidia" (module-specific error, 0)
(EE) No drivers available.

I tried modprobe nvidia and such, but it seemed that the module actually did not exist. This module should be installed by the package nvidia-185-kernel-source, which was present on my system. However, it turns out that the kernel module is compiled on-the-fly by a program called jockey which controls DKMS, the Dynamic Kernel Module Support.

It is possible to force a recompile using dpkg-reconfigure:

$ sudo dpkg-reconfigure nvidia-185-kernel-source
Removing all DKMS Modules
Done.
Loading new nvidia-185.18.36 DKMS files...
Building for architecture x86_64
Module build for the currently running kernel was skipped since the
kernel source for this kernel does not seem to be installed.

I need the kernel source, eh? Why the hell is that not a dependency, if the driver package is useless without it? Anyway, let's install the kernel source then:

$ uname -r
2.6.31-19-generic
$ sudo apt-get install linux-source-2.6.31

Installs fine, but makes no difference. Turns out that dpkg-reconfigure was lying: I just need the headers. Here we go:

$ sudo apt-get install linux-headers-2.6.31-19-generic
...
$ sudo dpkg-reconfigure nvidia-185-kernel-source
Removing all DKMS Modules
Done.
Loading new nvidia-185.18.36 DKMS files...
Building for architecture x86_64
Building initial module for 2.6.31-19-generic
Done.

nvidia.ko:
Running module version sanity check.
 - Original module
   - No original module exists within this kernel
 - Installation
   - Installing to /lib/modules/2.6.31-19-generic/updates/dkms/

depmod......

DKMS: install Completed.
$ modprobe nvidia
$

That's better.

Several bug reports indicate similar problems, but the current way this is handled is terribly inadequate. The driver package should pull in the kernel headers if it needs them. There was no warning about this when the kernel was upgraded. There was no warning when the module failed to compile on boot. A fix for a problem with the same symptoms was released back in December; another one is in the upcoming Lucid release.

Oh yeah, I ended up rebooting my system. Whatever happened to Ctrl+Alt+Backspace? (Answer.)

Update, 2010-08-04: After another kernel upgrade, my display driver was hosed again. After hours of tinkering, I typed sudo dpkg-reconfigure nvidia-current and was greeted with the message gzip: stdout: No space left on device. Apparently, my /boot partition was full (of abandoned kernels). Something to check, for whoever runs into similar problems! Also, the kernel module appears to have been renamed from nvidia to nvidia-current.

23 comments:

zippeurfou said...

Thank you so much !
You're the only one who helped me ! :)
You should be first in google result :)

Georg Muntingh said...

From the link: In Ubuntu, "AltGr" + "SysReq" + K is the new Ctrl + Alt + Backspace; something I was dying to know, but too lazy to look up. :)

wwwhizz said...
This comment has been removed by the author.
wwwhizz said...

You can also add the following to /etc/X11/xorg.conf in order to restore the desired Ctrl-Alt-Backspace behaviour

Section "ServerFlags"
Option "DontZap" "false"
EndSection

Thomas ten Cate said...

Yeah, but it'll be gone at the next upgrade. Or your config file won't be upgraded.

Anonymous said...
This comment has been removed by a blog administrator.
Don Doerner said...

I am at 2.6.31-21 and had the same problem. Unfortunately, the same solution (with '19' replaced by '21' of course) doesn't work.

Any idea what to try next?

Thomas ten Cate said...

Not without being at the keyboard of your system, sorry.

Anonymous said...

Many days after you posted this, it's still helpful. these steps helped be fix 2.6.32-26 to work again with the nvidia driver.

Anonymous said...

Thanks - a year later this helped me solve my problems with Kubuntu 10.10

Anonymous said...

THANK YOU.

Yannick said...

Help me to fix my problem with kubuntu 10.04 kernel 2.6.32-28.

Thanks a lot, the only post I found that say the right commands to do to fix this problem.

McPond Software said...

This can also happen if your kernel is held back.
The following packages have been kept back:
linux-generic linux-image-generic

So
sudo apt-get install linux-generic
will force it to upgrade, then the nvidia stuff will find the kernel sources and configure correctly.

Alexander Gorban said...

Thank you a lot! It helped me

RezaRob said...

When you have booted into your new kernel, try:
./NVIDIA-Linux-x86-260.19.44.run -K
To generate a new kernel module.
--uninstall followed by a reinstall should also work, I guess.

RezaRob said...

And after that, don't forget to nvidia-xconfig

Billy said...

dood!!!!! thanks a million

saved me some troubleshooting

Unknown said...

This helped clue me in. The 'missing kernel sources' message is very misleading. In my case the generic headers were auto selected, but the upgrade had given me a -pae kernel and the -pae headers were not installed.

Reaver said...

Thanks! after 11.10 upgrade this was the only thing that worked. It's bad that this issue still hasn't been adressed.

Anonymous said...

Thanks a lot for your information.

It helped getting my dual display system up again after upgrading to Ubuntu 11.10

I just had to replace your header version with mine for kernel 3.0.0.13 and I used nvidia-current.

felix said...

Thank you very much!

another felix said...

cool man! after one retart the problem happened to me as well. i actually had no idea what i really did, but when i copied "sudo dpkg-reconfigure nvidia-curren" in the terminal. the system started to compute and after a restart i was able to start my ubuntu 10.4 with the newest kernel! nice!

Anonymous said...

Thankyou for the great article. After spending a day on this problem your article had it solved in 10 minutes. Thanks again