[opensuse] Long standing serial port issue in openSuse
I have a C application that access a GPS receiver over a serial port. Over the years, using Linux (SUSE variants back a long way), I have experienced a problem that has never gone away. Although it appears on the surface to be a programming question, I think it is really a kernel / serial port issue. As I see it in openSUSE, I am starting my quest for a solution here. The problem is that, after a variable amount of time, reads of the serial port stop blocking (as the port has been configured to do) when there is no data, and start returning immediately with errno set to EAGAIN. This can happen after 5 minutes of running the app, or after 5 hours. There is no discernible pattern. The code in the application is childishly simple: 1. Open the serial port in the C application, controlling it via termio(). 2. Do a blocking read on the port. 3. When data arrived, write it to a disk file. 4. Go to step 2. We have done variations on this theme. For example, setting the serial port to send SIGIO when there is data, and not doing a blocking read. It works great for a while. But suddenly the kernel starts sending SIGIO over and over, even when there is no data. This implies that at some lower level, the kernel thinks there is data to provide, even though the subsequent read will disagree. I would think that, since PPP is using the serial port, Linux's serial port code would be bulletproof. But I also think that such stability is perhaps present when the serial port is used as PPP uses it. For example, I would think that PPP data is of known sizes (headers of known size that report data sizes to be read), and may not do reads of any size. Also, PPP writes as well as reads. Our app never writes to the port. I can provide code examples. I do not think they will indicate anything unusual. Anyone else using the serial port to record data over long periods of time? If it is not a kernel issue, is it possible that there is some application that runs (X? something else?) that wakes up and interferes with the serial port? No serial port specific things seem to be changed. Baud rate and flow control remain unchanged. But maybe just opening it resets the blocking mode for all users? We do not use PPP and, AFAIK no applications that use the serial port are enabled. Could there be any apps enabled in a rather standard SUSE install that may occasionally interfere with the serial port? -- Roger Oberholtzer OPQ Systems / Ramböll RST Ramböll Sverige AB Kapellgränd 7 P.O. Box 4205 SE-102 65 Stockholm, Sweden Office: Int +46 8-615 60 20 Mobile: Int +46 70-815 1696 And remember: It is RSofT and there is always something under construction. It is like talking about large city with all constructions finished. Not impossible, but very unlikely. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Roger Oberholtzer wrote:
I have a C application that access a GPS receiver over a serial port. Over the years, using Linux (SUSE variants back a long way), I have experienced a problem that has never gone away. Although it appears on the surface to be a programming question, I think it is really a kernel / serial port issue. As I see it in openSUSE, I am starting my quest for a solution here.
The problem is that, after a variable amount of time, reads of the serial port stop blocking (as the port has been configured to do) when there is no data, and start returning immediately with errno set to EAGAIN. This can happen after 5 minutes of running the app, or after 5 hours. There is no discernible pattern.
The code in the application is childishly simple:
1. Open the serial port in the C application, controlling it via termio(). 2. Do a blocking read on the port. 3. When data arrived, write it to a disk file. 4. Go to step 2.
Anyone else using the serial port to record data over long periods of time?
Hi Roger we have two applications that both read/write to/from a serial port. One application reads temperature measurements and writes commands, the other is a standard NTP reading a DCF77 signal from the serial port. The first application is on SUSE 9.0, the second is on openSUSE 10.2. They've both been running for a couple of years, and neither have ever showed any problems. Your pseudo-code looks pretty much like mine (for the temp controller), except I write a command to the serial port before I read the temperature. /Per Jessen, Zürich -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Wed, 2008-08-27 at 12:14 +0200, Per Jessen wrote:
Roger Oberholtzer wrote:
I have a C application that access a GPS receiver over a serial port. Over the years, using Linux (SUSE variants back a long way), I have experienced a problem that has never gone away. Although it appears on the surface to be a programming question, I think it is really a kernel / serial port issue. As I see it in openSUSE, I am starting my quest for a solution here.
The problem is that, after a variable amount of time, reads of the serial port stop blocking (as the port has been configured to do) when there is no data, and start returning immediately with errno set to EAGAIN. This can happen after 5 minutes of running the app, or after 5 hours. There is no discernible pattern.
The code in the application is childishly simple:
1. Open the serial port in the C application, controlling it via termio(). 2. Do a blocking read on the port. 3. When data arrived, write it to a disk file. 4. Go to step 2.
Anyone else using the serial port to record data over long periods of time?
Hi Roger
we have two applications that both read/write to/from a serial port. One application reads temperature measurements and writes commands, the other is a standard NTP reading a DCF77 signal from the serial port. The first application is on SUSE 9.0, the second is on openSUSE 10.2. They've both been running for a couple of years, and neither have ever showed any problems.
OOC, do you run X on these systems? Sometimes YAST/SAX seems to put lines like InputDevices "/dev/ttyS0" in the X config file. It does this for 10 or so serial devices. Or it puts in InputDevices "/dev/input/mice" Where are the potential devices for /dev/input/mice defined? -- Roger Oberholtzer OPQ Systems / Ramböll RST Ramböll Sverige AB Kapellgränd 7 P.O. Box 4205 SE-102 65 Stockholm, Sweden Office: Int +46 8-615 60 20 Mobile: Int +46 70-815 1696 And remember: It is RSofT and there is always something under construction. It is like talking about large city with all constructions finished. Not impossible, but very unlikely. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Roger Oberholtzer wrote:
we have two applications that both read/write to/from a serial port. One application reads temperature measurements and writes commands, the other is a standard NTP reading a DCF77 signal from the serial port. The first application is on SUSE 9.0, the second is on openSUSE 10.2. They've both been running for a couple of years, and neither have ever showed any problems.
OOC, do you run X on these systems?
Nope, no X. /Per Jessen, Zürich -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Wed, 2008-08-27 at 14:00 +0200, Per Jessen wrote:
Roger Oberholtzer wrote:
we have two applications that both read/write to/from a serial port. One application reads temperature measurements and writes commands, the other is a standard NTP reading a DCF77 signal from the serial port. The first application is on SUSE 9.0, the second is on openSUSE 10.2. They've both been running for a couple of years, and neither have ever showed any problems.
OOC, do you run X on these systems?
Nope, no X.
We run X. The GPS is recorded as part of a measurement in a moving vehicle, and it can steer location info during data collection. Maybe the X server is part of my problem. (Of course, I do not want to rule out anything else.) I remember back when you had to restart X if you added an input device. If it was not found at X's start, it was not available. More recently, X finds these devices as they are available. Which is really nice. Does X ever re-check the serial ports after it starts? In my use, X is never restarted when this problem occurs. Or maybe it is something in how /dev/input/mice is implemented. What does it consider to be a potential mouse? -- Roger Oberholtzer OPQ Systems / Ramböll RST Ramböll Sverige AB Kapellgränd 7 P.O. Box 4205 SE-102 65 Stockholm, Sweden Office: Int +46 8-615 60 20 Mobile: Int +46 70-815 1696 And remember: It is RSofT and there is always something under construction. It is like talking about large city with all constructions finished. Not impossible, but very unlikely. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Roger Oberholtzer wrote:
On Wed, 2008-08-27 at 14:00 +0200, Per Jessen wrote:
Roger Oberholtzer wrote:
we have two applications that both read/write to/from a serial port. One application reads temperature measurements and writes commands, the other is a standard NTP reading a DCF77 signal from the serial port. The first application is on SUSE 9.0, the second is on openSUSE 10.2. They've both been running for a couple of years, and neither have ever showed any problems. OOC, do you run X on these systems?
Nope, no X.
We run X. The GPS is recorded as part of a measurement in a moving vehicle, and it can steer location info during data collection.
Maybe the X server is part of my problem. (Of course, I do not want to rule out anything else.) I remember back when you had to restart X if you added an input device. If it was not found at X's start, it was not available. More recently, X finds these devices as they are available. Which is really nice. Does X ever re-check the serial ports after it starts? In my use, X is never restarted when this problem occurs.
Or maybe it is something in how /dev/input/mice is implemented. What does it consider to be a potential mouse?
Sorry, I can't answer that but I'd be interested to know too :) I have a couple of thoughts though. You seem quite focussed on the possibility that the problem is related to X - is there a particular reason for that? Can you set up a test system without X - or with X remoted to a different physical machine - to see if the problem goes away? One other difference Per mentioned is that his applications write to the serial port. Is there any way to try that on your system in case there's some hardware or driver glitch? When the problem occurs, are you able to check the status of the serial port? Either from within your application or by looking in /proc etc. Has the port somehow been set to non-blocking? Cheers, Dave -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Wed, 2008-08-27 at 14:29 +0100, Dave Howorth wrote:
Sorry, I can't answer that but I'd be interested to know too :)
I have a couple of thoughts though. You seem quite focussed on the possibility that the problem is related to X - is there a particular reason for that?
Not really. I am just thinking of any apps that may fiddle with the serial port when my app is running. I don't want to eliminate that over some kernel thing.
Can you set up a test system without X - or with X remoted to a different physical machine - to see if the problem goes away?
I am setting up something like that now. Not there yet. I am running my app remotely, but I need to remove the X running on the actual machine with the serial port in question.
One other difference Per mentioned is that his applications write to the serial port. Is there any way to try that on your system in case there's some hardware or driver glitch?
This happens on different hardware. So I feel rather sure it is not a hardware glitch. Leaving, possibly, a driver glitch.
When the problem occurs, are you able to check the status of the serial port? Either from within your application or by looking in /proc etc. Has the port somehow been set to non-blocking?
I would love to know how. The serial settigns (baud rate and all) can, I think, be determined. But blocking mode is a generic file descriptor setting that I do not think is reported in /proc. I am adding a trace to see if the flag is getting changed when the problem occurs. No results yet. One more detail, I run the serial port in canonical mode. So the read calls return only when there is a complete line of data. Which is how GPS NMEA records are reported. I am guessing that well tested apps like PPP do not run the serial port in this mode. -- Roger Oberholtzer OPQ Systems / Ramböll RST Ramböll Sverige AB Kapellgränd 7 P.O. Box 4205 SE-102 65 Stockholm, Sweden Office: Int +46 8-615 60 20 Mobile: Int +46 70-815 1696 And remember: It is RSofT and there is always something under construction. It is like talking about large city with all constructions finished. Not impossible, but very unlikely. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Roger Oberholtzer wrote:
I would love to know how. The serial settigns (baud rate and all) can, I think, be determined. But blocking mode is a generic file descriptor setting that I do not think is reported in /proc.
You can get/set it with fcntl().
One more detail, I run the serial port in canonical mode. So the read calls return only when there is a complete line of data. Which is how GPS NMEA records are reported. I am guessing that well tested apps like PPP do not run the serial port in this mode.
I think I would take a look at how ntp does it - ntp has interface code for GPS receivers too, maybe they've worked out the right way to do it? /Per Jessen, Zürich -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Wed, 2008-08-27 at 16:17 +0200, Per Jessen wrote:
Roger Oberholtzer wrote:
I would love to know how. The serial settigns (baud rate and all) can, I think, be determined. But blocking mode is a generic file descriptor setting that I do not think is reported in /proc.
You can get/set it with fcntl().
As I do. But I would like to see it in /proc as well so I can check it at will.
One more detail, I run the serial port in canonical mode. So the read calls return only when there is a complete line of data. Which is how GPS NMEA records are reported. I am guessing that well tested apps like PPP do not run the serial port in this mode.
I think I would take a look at how ntp does it - ntp has interface code for GPS receivers too, maybe they've worked out the right way to do it?
Been there. Their code only reads from the serial port when appropriate. Whenever it wants to sync times. Otherwise, the serial port is not accessed and data is dropped. As such, and serial port oddity like this would go un-noticed. ntp is seldom in a read state. I, however, read all the time as I need all data all the time. I have also considered if, but can't see how, a spurious error in the serial port data that matches a flow control character is received (like XON) could mess up the flow control. As I save every character that arrives, I can see that no such character arrives. All data is nice ASCII 7-bit printable characters. -- Roger Oberholtzer OPQ Systems / Ramböll RST Ramböll Sverige AB Kapellgränd 7 P.O. Box 4205 SE-102 65 Stockholm, Sweden Office: Int +46 8-615 60 20 Mobile: Int +46 70-815 1696 And remember: It is RSofT and there is always something under construction. It is like talking about large city with all constructions finished. Not impossible, but very unlikely. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Roger Oberholtzer wrote:
On Wed, 2008-08-27 at 16:17 +0200, Per Jessen wrote:
Roger Oberholtzer wrote:
I would love to know how. The serial settigns (baud rate and all) can, I think, be determined. But blocking mode is a generic file descriptor setting that I do not think is reported in /proc.
You can get/set it with fcntl().
As I do. But I would like to see it in /proc as well so I can check it at will.
Ah, I see.
I think I would take a look at how ntp does it - ntp has interface code for GPS receivers too, maybe they've worked out the right way to do it?
Been there. Their code only reads from the serial port when appropriate. Whenever it wants to sync times. Otherwise, the serial port is not accessed and data is dropped. As such, and serial port oddity like this would go un-noticed. ntp is seldom in a read state. I, however, read all the time as I need all data all the time.
That seems to be the main difference - my temp controller also only reads once I've issued the command. How about code from one of those utilities that convert serial to TCP? I can't remember any name right now, but such a utility would also have to read all the time. /Per Jessen, Zürich -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Aug 27, 2008, at 6:21 PM, Per Jessen wrote:
That seems to be the main difference - my temp controller also only reads once I've issued the command. How about code from one of those utilities that convert serial to TCP? I can't remember any name right now, but such a utility would also have to read all the time.
We make use of the GPS' pulse-per-second signal to stay tightly coupled with the GPS time. We are required to locate things with sub-meter accuracy in a vehicle traveling at 90 km/h. So part of our system is an inertial navigation system. The PPS signal is on one of the GPS control lines. We had considered a serial->USB converter. They do work fine with the GPS - but no PPS signal. It is always something, no? BTW, Linux lets to monitor these changes in a pleasant manner. Roger Oberholtzer -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Roger Oberholtzer wrote:
inertial navigation system. The PPS signal is on one of the GPS control lines. We had considered a serial->USB converter. They do work fine with the GPS - but no PPS signal.
How do you get an accurate PPS off a serial line if you have to keep polling it? Can't you let it pull an interrupt line or something? (sorry, I didn't mean to start a redesign of your system). /Per Jessen, Zürich -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Wed, 2008-08-27 at 20:46 +0200, Per Jessen wrote:
Roger Oberholtzer wrote:
inertial navigation system. The PPS signal is on one of the GPS control lines. We had considered a serial->USB converter. They do work fine with the GPS - but no PPS signal.
How do you get an accurate PPS off a serial line if you have to keep polling it? Can't you let it pull an interrupt line or something? (sorry, I didn't mean to start a redesign of your system).
The information is available via two different interfaces. The location data (NMEA strings with time) is received via read() and the PPS is via an ioctl() that waits for a change in the state of specified serial port lines before returning: ioctl(nmeaPort, TIOCMIWAIT, TIOCM_RI); When this returns, the line in question (ring indicator, which is where Trimble receivers put the PPS signal) has just had a transition. Of course, there are process scheduling issues with this that still introduce error, but we are happy with how the Linux kernel is letting us know of the line transition. The read() call lives in a thread (where my problem occurs) that does nothing but read the GPS and write it to a file. It also parses the string, saving the most recent ZDA record and seeing what the most recent PPS PC time was. To put all this together, I read the NMEA strings as fast as they arrive (my problem code). We use 10 Hz receivers, and we only take the required NMEA strings, so there is no trouble with staying rather up to date reading. But, of course, there are numerous delays imposed on this NMEA data along the way. Some are in the GPS receiver, and so ultimately, we have no control over them. One NMEA record (called ZDA) is sent at 1 Hz. The PPS signal tells exactly when the time in the ZDA record was determined in the GPS receiver. I need to record the PC time when this ioctl() returns. Combining this time with the time in the very next ZDA record lets me sync my PC clock with the GPS clock. In our app, all times recorded are from the PC clock. Using the above method, we can tell what the corresponding GPS time is. This allows us to tell where we were at any given time, and thus locate all data in space. In fact, the GPS is combined with a high resolution distance measurement device and couple gyroscopes and inclinometers to improve location accuracy. Our system gets a bit more complicated in that we have DSPs that are also recording low level transducer data, and their time is only synchronized with the PC time at less frequent points. Gez, I guess I said more than you probably wanted to know... -- Roger Oberholtzer OPQ Systems / Ramböll RST Ramböll Sverige AB Kapellgränd 7 P.O. Box 4205 SE-102 65 Stockholm, Sweden Office: Int +46 8-615 60 20 Mobile: Int +46 70-815 1696 And remember: It is RSofT and there is always something under construction. It is like talking about large city with all constructions finished. Not impossible, but very unlikely. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Thu, 28 Aug 2008 16:26:59 Roger Oberholtzer wrote:
On Wed, 2008-08-27 at 20:46 +0200, Per Jessen wrote:
Roger Oberholtzer wrote:
inertial navigation system. The PPS signal is on one of the GPS control lines. We had considered a serial->USB converter. They do work fine with the GPS - but no PPS signal.
How do you get an accurate PPS off a serial line if you have to keep polling it? Can't you let it pull an interrupt line or something? (sorry, I didn't mean to start a redesign of your system).
The information is available via two different interfaces. The location data (NMEA strings with time) is received via read() and the PPS is via an ioctl() that waits for a change in the state of specified serial port lines before returning:
ioctl(nmeaPort, TIOCMIWAIT, TIOCM_RI); [...snip...]
Roger, Can I suggest that you go to www.xastir.org and maybe correspond with some of the developers of the Xastir project? Xastir is a ham-radio mapping program that interfaces with GPS for the purposes of mobile tracking - it too uses the serial port for continuous monitoring of GPS NMEA strings. It also writes to the port to send positional beacons via a radio but nevertheless they do have routines for handling the GPS that may be useful to you, if nothing else in giving you ideas on where to look when debugging your code. I'm sure they wouldn't mind - after all, Xastir is open source, so why reinvent the wheel? Regards, -- =================================================== Rodney Baker VK5ZTV rodney.baker@iinet.net.au =================================================== Where there's a will, there's an Inheritance Tax.
Roger Oberholtzer wrote:
Gez, I guess I said more than you probably wanted to know...
Yes, but it was an interesting story, thanks. I can't help much with your problem, although I still think one of those serial-top-tcp utilities must be doing something very similar, so might be worth having a look at. /Per Jessen, Zürich -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
participants (4)
-
Dave Howorth
-
Per Jessen
-
Rodney Baker
-
Roger Oberholtzer