[Bug 481794] New: CUPS print job stalled after suspend/resume
https://bugzilla.novell.com/show_bug.cgi?id=481794 Summary: CUPS print job stalled after suspend/resume Classification: openSUSE Product: openSUSE 11.1 Version: Final Platform: i686 OS/Version: openSUSE 11.1 Status: NEW Severity: Normal Priority: P5 - None Component: Printing AssignedTo: jsmeix@novell.com ReportedBy: Ulrich.Windl@rz.uni-regensburg.de QAContact: jsmeix@novell.com Found By: --- User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.19) Gecko/20081213 SUSE/1.1.14-1.1 SeaMonkey/1.1.14 The computer was suspended to disk while a longer print job to a remote network printer was in progress. After resuming, the print job ist still displayed as active, but no pages are actually printed. There are no CUPS errors reported, and not TCP network connections are visible. However there is a "socket:" process that consumes a lot of CPU for a long time. What that process does is this: [...] read(5, "", 1024) = 0 select(6, [5], [5], NULL, NULL) = 1 (in [5]) read(5, "", 1024) = 0 select(6, [5], [5], NULL, NULL) = 1 (in [5]) read(5, "", 1024) = 0 [...] Reproducible: Didn't try Steps to Reproduce: 1. Print a lengthy job 2. Suspend computer to disk (for several hours) while the remote printer is still on 3. Resume from suspend Actual Results: Print job is displayed as active, but doesn't actually make progress. No errors are reported by CUPS. CUPS page logs indicates some progress, but pages aren't actually printed: LJ4250 wiu09524 249 [03/Mar/2009:12:56:59 +0100] 62 1 - localhost LJ4250 wiu09524 249 [03/Mar/2009:12:56:59 +0100] 63 1 - localhost LJ4250 wiu09524 249 [04/Mar/2009:07:57:43 +0100] 64 1 - localhost LJ4250 wiu09524 249 [04/Mar/2009:07:57:43 +0100] 65 1 - localhost Expected Results: Order of preference, one of: 1) The print job is resumed with the next page 2) The whole print job is restarted 3) The printer is disabled and an error is reported 4) An error is reported -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=481794 User jsmeix@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=481794#c1 Johannes Meixner <jsmeix@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |INVALID --- Comment #1 from Johannes Meixner <jsmeix@novell.com> 2009-03-04 03:03:41 MST --- Disrupting an active print job this way can cause arbitrary results depending on the particular CUPS backend (usb, parallel, socket, lpd, ipp, ...) and the particular state when the suspend hits the backend and the filters. A disrupted print job can usually not be resumed at all because it is usually not possible to re-set the printer in the exact state when the disrupt happened. In this particular case a resume does not make sense because this would mean to block the printer until the computer is resumed (i.e. one client system which is suspended would block the whole printer until the particular client-system is resumed). The default ErrorPolicy in cupsd.conf is "stop-printer" which results that the queue is disabled, see http://localhost:631/help/ref-cupsd-conf.html You should try out if another ErrorPolicy works better. For your particular case I would try "abort-job". I don't think that disrupting an active print job this way is supported at all by the current CUPS version so that I close the report as invalid for the current CUPS version. I assume that currently all you can do is to wait until the print job is sent to a remote queue or until locally active print jobs are finished. I will ask on cups@easysw.com how suspend and resume is suppoded to work with CUPS. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=481794 User jsmeix@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=481794#c2 --- Comment #2 from Johannes Meixner <jsmeix@novell.com> 2009-03-04 04:22:56 MST --- I asked on cups@easysw.com see http://www.cups.org/newsgroups.php?gcups.general+T+Q"Suspend+and+resume+for+active+print+jobs" -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=481794 User jsmeix@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=481794#c3 --- Comment #3 from Johannes Meixner <jsmeix@novell.com> 2009-03-04 04:54:05 MST --- I was told by our "suspend and resume experts" that - suspend and resume should be totally transparent for running processes so that the filters and the backend process should not notice at all when there was a suspend and resume (the kernel "freezes" the processes and continues them exactly where they have been stopped) - suspend and resume for network connections is basically as if the network cable was unplugged and re-plugged Therefore the only point of failure according to the above seems that the socket backend may not handle it gracefully when the network cable was unplugged and re-plugged. Ulrich, could you please test the following: 1. How does suspend and resume during an active printing job work when the printer is not connectewd via network but e.g. via USB? 2. What happens during an active printing job when the printer is connected via network (and you use the socket backend) when you unplug the network cable from the printer, wait a few minutes, and re-plug it? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=481794 User Ulrich.Windl@rz.uni-regensburg.de added comment https://bugzilla.novell.com/show_bug.cgi?id=481794#c4 Ulrich Windl <Ulrich.Windl@rz.uni-regensburg.de> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|INVALID | --- Comment #4 from Ulrich Windl <Ulrich.Windl@rz.uni-regensburg.de> 2009-03-04 07:32:21 MST --- Regarding comment #1: I'd expect that after a long suspend any TCP connection will either time out (if keepalive is active), or it will be resumed. In the first case I'd expect the program that tries to send over the TCP connection to detect that the connection id gone. I think we talk about the first case here. Also in PostScript, especially with EPS, you can very well resume a job at the page where the job had a problem. For other printer technologies things may be more difficult. On the "supportedness" of the issue: I did start the suspend when at the same moment the print job arrived. Bad things just happen; they don't ask for permission ;-) Regarding comment #3: Depending on the length of the "network interruption" the connection may still be valid after resume, or not. I wouldn't rely on a valid connection after resume, but I'd rely on errors being detected, as those could happen even without suspend (as you stated). I don't have an USB printer to test, and I think that rather unrelated to the issue. Also, unplugging the network cable locally will trigger an interrupt "link is down" by the NIC, so that's also a different case. To go back on the topic: My print job is still displayed as being printed while the process "socket://lpdvm003.klinik.uni-regensburg.de:9100 249 wiu09524 uguide 1 InputSlot=Tray_2 PageSize=Letter job-uuid=urn:uuid:f622b550-2492-3129-5e24-6b982ed5d96a" ist still using more CPU. Not surprisingly to me that process is still doing: [...] read(5, "", 1024) = 0 select(6, [5], [5], NULL, NULL) = 1 (in [5]) read(5, "", 1024) = 0 select(6, [5], [5], NULL, NULL) = 1 (in [5]) read(5, "", 1024) = 0 select(6, [5], [5], NULL, NULL) = 1 (in [5]) read(5, "", 1024) = 0 [...] Unfortunately that process is also ignoring a SIGTERM, just as the " LJ4250 249 wiu09524 uguide 1 InputSlot=Tray_2 PageSize=Letter job-uuid=urn:uuid:f622b550-2492-3129-5e24-6b982ed5d96a /var/spool/cups/d00249-001" process. However a "rccups stop" and "rccups start" cleaned up the mess (restarting the job from the beginning): [...] LJ4250 wiu09524 249 [04/Mar/2009:07:57:43 +0100] 65 1 - localhost LJ4250 wiu09524 249 [04/Mar/2009:07:57:43 +0100] 66 1 - localhost LJ4250 wiu09524 249 [04/Mar/2009:15:29:11 +0100] 1 1 - localhost LJ4250 wiu09524 249 [04/Mar/2009:15:29:11 +0100] 2 1 - localhost LJ4250 wiu09524 249 [04/Mar/2009:15:29:11 +0100] 3 1 - localhost [...] Job completed now. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=481794 User jsmeix@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=481794#c5 Johannes Meixner <jsmeix@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|REOPENED |RESOLVED Resolution| |UPSTREAM --- Comment #5 from Johannes Meixner <jsmeix@novell.com> 2009-03-25 00:48:12 MST --- See Michael Sweet's answer on cups@easysw.com at http://www.cups.org/newsgroups.php?gcups.general+T+Q"Suspend+and+resume+for+active+print+jobs" ----------------------------------------------------------------------------
Is there any experience how suspend and resume should work for active print jobs?
On Mac OS X, yes - we actually watch for power/suspend events and act accordingly. There is supposed to be some DBUS stuff for this, but as of yet we've not received documentation or patches to use it. Basically, we need to know the system is about to suspect so we can stop any active jobs - if the interface allows it, we may even delay suspend to allow an active job to finish... ---------------------------------------------------------------------------- (There is a typo: "suspect" should read "suspend".) This shows that it cannot be fixed by us (i.e. Novell/Suse) alone but only via upstream (CUPS together with DBUS and probably Kernel) so that I close this bug report accordingly. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=481794 User jsmeix@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=481794#c6 Johannes Meixner <jsmeix@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |seife@novell.com --- Comment #6 from Johannes Meixner <jsmeix@novell.com> 2009-03-25 04:33:20 MST --- Meanwhile I learned that we already have a way to notify CUPS in case of power/suspend events. This is provided by the pm-utils package in particular the scripts in /usr/lib/pm-utils/sleep.d/ which are called in alpabetical ordering during suspend with a parameter "hibernate" or "suspend" and in reverse alpabetical ordering during resume with a parameter "thaw" or "resume" see /usr/lib/pm-utils/sleep.d/06autofs for a simple example and have a look at the documentation in /usr/share/doc/packages/pm-utils/README Ulrich, please test if the following works: 1. cp -p /usr/lib/pm-utils/sleep.d/06autofs /usr/lib/pm-utils/sleep.d/07cupsd 2. Modify /usr/lib/pm-utils/sleep.d/07cupsd so that its content is ----------------------------------------------------------------------- #!/bin/bash /usr/lib/pm-utils/functions case "$1" in hibernate|suspend) stopservice cups sleep 3 ;; thaw|resume) restartservice cups ;; *) ;; esac exit $? ----------------------------------------------------------------------- Note the unconditional simple "sleep 3" so that the cupsd gets at least three seconds to terminate active print jobs. Of course such an unconditional sleep delays the seuspend in any case but here it is only for a simple first test to find out if the general idea works at all. 3. Try out what happens for an active print job when you do suspend and resume. Background information: In general it is not possible to resume an active print job for a network printer correctly because after the suspend the established TCP network connection to the network printer will get somehow lost (the connection is not actively and correctly terminated during suspend) so that the network printer will after some time do whatever is implemented in the network printer when suddenly there is no longer any communication possible via the TCP connection. Usually the network printer will after some time abort the printout (usually somewhere in the middle of a page) and fall back into its initial "reset/ready" state where it waits to accept a new arriving print job. Therefore the only generally working solution is to also stop the active print job during suspend which is done above simply via a full "stopservice cups". It would not work in general to only cancel a currently active job because there could be a subsequent job which becomes immediately active after the first one was cancelled. Accordingly during resume a full "restartservice cups" is needed which would re-print all jobs which were stopped during "stopservice cups" from the beginning which is the only generally working solution because the printers should be usually meanwhile in a "reset/ready" state where only jobs from the beginning print out correctly. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=481794 User Ulrich.Windl@rz.uni-regensburg.de added comment https://bugzilla.novell.com/show_bug.cgi?id=481794#c7 --- Comment #7 from Ulrich Windl <Ulrich.Windl@rz.uni-regensburg.de> 2009-03-25 07:49:40 MST --- (In reply to comment #6)
/usr/lib/pm-utils/sleep.d/
Shouldn't that be better located in /var or /etc, because /usr might be read-only? The files in package pm-utils are not flagged as "configuration files" BTW... While I'm trying your proposal, let me note:
case "$1" in hibernate|suspend) stopservice cups sleep 3
While we lack an equivalent for "[cups]disable -a" (disable all printers) here, and "[cups]enable -a" (enable all printers, or better those that were disabled before), this solution might be preferrable to "sleep 3": while /usr/bin/lpstat >/dev/null 2>&1 do #echo "Server is still running -- waiting" sleep 1 done (When there are no jobs queued or printing, the command returns nothing; if the server is down it returns an error ("lpstat: Verbindung zum Server nicht möglich" in Germany); otherwise it returns a list of print jobs) Your "sleep 3" will usually return a success code, even if the stop failed, BTW.
;; thaw|resume) restartservice cups ;; *) ;; esac
exit $?
-- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=481794 User jsmeix@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=481794#c8 --- Comment #8 from Johannes Meixner <jsmeix@novell.com> 2009-03-25 08:18:48 MST --- Use for i in $( seq 10 ) ; do lpstat -r || break ; sleep 1 ; done to avoid an endless wait loop. cupsdisable may not work in any case because it may let the currently active print job finish (a quick test here indicates this but it may have happened that the whole print job data had already arrived in the printer). -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=481794 User seife@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=481794#c9 --- Comment #9 from Stefan Seyfried <seife@novell.com> 2009-03-25 08:40:35 MST --- The return code of the hook does not matter, it's ignored anyway :-) And yes, user-supplied hooks belong into /etc/pm/sleep.d/ -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com