Subject: (2.14) Solaris 2.x special needs |
---|
Solaris 2.5: Sun assures that Solaris 2.5 does no longer have the socket bug (see fix #7 below) and Dave Zavatson <dhzavatson@ucdavis.edu> writes that the bug still exists ... So if you see "'resource temp unavailable' errors, you have to apply it. Joe St Sauver <JOE@OREGON.UOREGON.EDU> submitted the following: | Symptom: One of the topologically distant sites notices far lower than normal | article throughput. Further investigation by the remote site (using netstat) | identifies a large number of "completely duplicated packets" originating | with the Solaris feed host. | Resolution: The local Solaris 2.5 host had not applied Sun patches 103169-05 | ("ip driver and ifconfig fixes") and 103447-03 ("tcp patch") as can be | obtained from ftp://sunsolve1.Sun.COM/pub/patches/patches.html | (Solaris 2.5.1 users, see 103582-01 and 103630-01). | Without these patches, when working with hosts that are topologically | remote, TCP/IP throughput reportedly can drop to as little as 5% of | what it should be. | For further information, see: <199607140422.VAA04495@yorick.cygnus.com> | quoting a 7 June 1996 article posted to comp.unix.solaris by Cathe A. Ray | (Manager of Internet Engineering for Sun). | Thanks to Howard Goldstein <hgoldste@bbs.mpcs.com> for the detective work in | isolating and resolving this problem!SOLARIS 2.4: Install the Recommended cluster patch from Sun. The Recommended cluster patch is: ftp://sunsolve1.sun.com/pub/patches/2.4_Recommended.tar.Z The README is: ftp://sunsolve1.sun.com/pub/patches/2.4_Recommended.README Then follow the directions in ftp://ftp.isc.org/isc/inn/unoff-patches/OLD/solaris-2.4.patch. The patch needs to be applied BY HAND, it is not in the correct format to work with Larry Wall's patch program. Also, do *not* link with the /usr/ucblib stuff, and HAVE_WAITPID should be set to "DO". On 3/25/95 Sun introduced patch 101945-23 which fixes bug #1178506 titled "INN wounded after upgrade to SunOS 5.4". This fixes the "cant read Resource temporarily unavailable" bug that some have reported. But Even if the Sun Patch mentions "1186224 socket select hangs in NON-BLOCKED mode", this seems not to be totally fixed. Ian Dickinson <idickins@fore.com> doesn't notice it on his lightly loaded server. But on heavily loaded machines, it occurs occasionally (<5 times a day). See below for a patch (Solaris Fix #7 ) It seems that the last version of the kernel patch for Sparc is 19945-36; 191945-29 is known to work. For x86 the latest version is 101946-29, which has problems with Unix domain sockets, so 101946-12 seems to be the last usable one here ... Include /opt/SUNWspro/bin and /usr/bin in your path before /usr/ucb as /usr/ucb/sed does not work well. SOLARIS 2.3: If you install the "Recommended cluster patch" I *think* you will only need to pay attention to Fix #5 listed below. It would be helpful if people sent an update about this. The Recommended cluster patch is: ftp://sunsolve1.sun.com/pub/patches/2.3_Recommended.tar.Z The README is: ftp://sunsolve1.sun.com/pub/patches/2.3_Recommended.README (note: If you trust other people to compile programs for you [especially ones that run as root] you can get inn1.4sec pre-compiled w/gcc at ccnews.ke.sanet.sk:/pub/solaris/inn1.4sec-src+bin.tar.gz) INN works with Solaris 2.[0123]. It's not easy, but it will work. The problem is that depending on which Solaris patches you have installed, you have to install various INN patches. There are too many combinations of Sun patches and INN patches to be able to say what is required and what isn't. (See the "SOLARIS 2.3" tip above for one tried and tested configuration). Here is the general guide: Step 1: Use the info for config.data for Solaris 2.x that is included Install.ms. Step 2: As you go, if you get any of the problems listed below, try the fix listed. Eventually you will be up and running with only the fixes you need. If you try to install ALL the fixes at once, things will definitely not work. COMPILER TIPS: Use gcc or /opt/SUNWspro/bin/cc. Do *not* use /usr/ucb/cc. In fact, remove /usr/ucb from your path when you compile. For directory structure - be careful about /var/news, as the news(1) tool also writes in this area an might damage your files. (Need more input on this). The patch program supplied with Solaris 2.5 appears to not understand the "new-style" context diffs which virtually everyone uses these days so you have to fetch the gnu-patch as described in part8 of this FAQ. Also it doesn't know -p0 option ; it wants -p 0 and the file to patch has to be writable. ---------- Solaris Fix #1 Under Solaris 2.[012] (SunOS 5.0, 5.1, 5.2) you must add the following at the beginning of each file using gethostbyname(): #define gethostbyname __switch_gethostbynameUnder Solaris 2.3 gethostbyname() might work without changes depending on your configuration. We haven't figured out when they work and when they don't. If you run into problems, try to change "gethostbyname()" to "solaris_gethostbyname()" and then use the gethostbyname() listed in the Solaris Porting FAQ. This isn't a perfect solution, because you now need a different binary for Solaris 2.[012] systems. It also seems to be a good idea to put dns in front of nis in /etc/nsswitch.conf hosts: dns nis files It would be great if someone were to submit a solaris_gethostbyname() function who's binary works under all Solaris revs and gives all the semantics of BSD gethostbyname(). In particular, one that doesn't have the problems discussed in sun bugid #1126573 or #1135988. It would be amazing if this was submitted by one of the many Sun employees that flame the INN FAQ maintainer in comp.sys.sun.admin every time he bitches about how much he hates Solaris 2.x. :-) ---------- Solaris Fix #2 Under all Solaris 2.* versions there is a problem with innwatch.ctl. It expects to use "df -i" to find out how many inodes are free on your disk. /usr/{sbin,5bin,bin}/df doesn't support the "-i" option, it has a "-e" option that outputs the info you want, but in a different format. You should use "/usr/ucb/df -i" instead, since this version of df includes the "-i" option. If you have too much space left on your disks (;-)) you will see the following: Filesystem iused ifree %iused Mounted on /dev/md/dsk/d10 103495213433720 7% /var/spool/newsSo awk will print 7% as number of free inodes ... Ian Dickinson <idickins@fore.com> wrote a inndf which can be found at the usual place. This inndf compiled with gcc and -DHAVE_STATVFS seems to work though (after Nash E. Foster <nef10958@usln1b.glaxo.com> ). A new version of this is available which works with large filesystems is available from ftp://ftp.csv.warwick.ac.uk/pub/usenet/inn/inndf.tar.gz If you have your news spool NFS mounted from another box, which is absolutely not recommended (see INN FAQ #5.15 , ME cant nonblock), then the following might help: rsh other_box /usr/ucb/df -u /var/spool/news /usr/ucb/df is part of the BSD Compatibility stuff. If you loaded Solaris 2.x without that, you can replace innwatch.ctl's disk checks with these lines: ## If load is OK, check space (and inodes) on various filesystems ## =()<!!! /usr/bin/df -k . | awk 'NR == 2 { print $4 }' ! lt ! @<INNWATCH_SPOOLSPACE>@ ! throttle ! No space (spool)>()= !!! /usr/bin/df -k . | awk 'NR == 2 { print $4 }' ! lt ! 8000 ! throttle ! No space (spool) ## =()<!!! /usr/bin/df -k @<_PATH_BATCHDIR>@ | awk 'NR == 2 { print $4 }' ! lt ! @<INNWATCH_BATCHSPACE>@ ! throttle ! No space (newsq)>()= !!! /usr/bin/df -k /news2/spool/out.going | awk 'NR == 2 { print $4 }' ! lt ! 800 ! throttle ! No space (newsq) ## =()<!!! /usr/bin/df -k @<_PATH_NEWSLIB>@ | awk 'NR == 2 { print $4 }' ! lt ! @<INNWATCH_LIBSPACE>@ ! throttle ! No space (newslib)>()= !!! /usr/bin/df -k /news2/privcontrol | awk 'NR == 2 { print $4 }' ! lt ! 40000 ! throttle ! No space (newslib) ## =()<!!! /usr/bin/df -k @<_PATH_OVERVIEWDIR>@ | awk 'NR == 2 { print $4 }' ! lt ! @<INNWATCH_OVERVIEWSPACE>@ ! throttle ! No space (overview)>()= !!! /usr/bin/df -k /news3/overview | awk 'NR == 2 { print $4 }' ! lt ! 6000 ! throttle ! No space (overview) ## =()<!!! /usr/bin/df -e . | awk 'NR == 2 { print $2 }' ! lt ! @<INNWATCH_SPOOLNODES>@ ! throttle ! No space (spool inodes)>()= !!! /usr/bin/df -e . | awk 'NR == 2 { print $2 }' ! lt ! 200 ! throttle ! No space (spool inodes)---------- Solaris fix #3 Don't run the "lint" step if you use Solaris. In fact, nobody needs to execute this step except Rich, when he's writing new code. If you have a Solaris machine without "lint", just make "lint" a symlink to "/bin/echo". ---------- Solaris fix #4 People running Solaris 2.3 have built INN with HAVE_UNIX_DOMAIN set to TRUE and everything seems to be ok. I guess Sun has fixed enough bugs in 2.3 to make it usable. I recommend the latest "recommended patches" if you run any version of Solaris 2.x. To install all of the "Recommended Patches" in one command, refer to: ftp://sunsolve1.sun.com/pub/patches/patches.html ---------- Solaris fix #5 If "inews" outputs "Bad Message-ID" when posting Under Solaris 2.x (where x = 0, 1, 2 or 3) you need to change the file "getfqdn.c". Find the lines that read: if (strchr(hp->h_name, '.') == NULL) { /* Try to force DNS lookup if NIS/whatever gets in the way. */ (void)strncpy(temp, buff, sizeof buff); (void)strcat(temp, "."); hp = gethostbyname(temp); }and delete them. ---------- Solaris fix #6 If posting gets you "441 Can't generate Message-ID, Error 0" and you are running with DNS, then the problem is with Solaris 2.3's gethostbyname. dns. If you ask for a host with "hostname." it returns "hostname." instead "hostname.yourdomain.com" as expected by nn. The workaround is to define "domain" in your inn.conf and apply the following patch to getfqdn.c: *** getfqdn.c.~1~ Sun Sep 4 09:02:37 1994 --- getfqdn.c Sun Sep 4 09:53:11 1994 *************** *** 35,45 **** if ((hp = gethostbyname(buff)) == NULL) return NULL; ! if (strchr(hp->h_name, '.') == NULL) { ! /* Try to force DNS lookup if NIS/whatever gets in the way. */ ! (void)strncpy(temp, buff, sizeof buff); ! (void)strcat(temp, "."); ! hp = gethostbyname(temp); ! } ! if (hp != NULL && strchr(hp->h_name, '.') != NULL) { if (strlen(hp->h_name) < sizeof buff - 1) return strcpy(buff, hp->h_name); --- 35,39 ---- if ((hp = gethostbyname(buff)) == NULL) return NULL; ! if (strchr(hp->h_name, '.') != NULL) { if (strlen(hp->h_name) < sizeof buff - 1) return strcpy(buff, hp->h_name);---------- Solaris fix #7 From Ian Dickinson <ian@fore.com>: Sun appear to reduced the frequency of the problem, but not fixed the bug itself. I still need this under SunOS5.4 101945-29. You should already have -DSUNOS5 in your DEFS setting in config.data anyway. (Note that in 1.5.x this workaround is already in the source. You can enable with with specifying -DPOLL_BUG in the DEFS settings in config.data. Thanks to rhaskins@shiva.com who pointed that out). This should apply - maybe with a bit of fuzz: *** innd/chan.c.ORIG Wed Dec 14 11:03:16 1994 --- innd/chan.c Thu Dec 15 17:00:54 1994 *************** *** 497,502 **** --- 497,508 ---- bp->Left = bp->Size - bp->Used; i = read(cp->fd, &bp->Data[bp->Used], bp->Left - 1); if (i < 0) { + #ifdef SUNOS5 + /* return of -2 indicates EAGAIN, for SUNOS5.4 poll() bug workaround */ + if (errno == EAGAIN) { + return -2; + } + #endif syslog(L_ERROR, "%s cant read %m", p); return -1; } *** innd/nc.c.ORIG Thu Mar 18 21:04:28 1993 --- innd/nc.c Thu Dec 15 17:00:41 1994 *************** *** 783,788 **** --- 783,794 ---- /* Read any data that's there; ignore errors (retry next time it's our * turn) and if we got nothing, then it's EOF so mark it closed. */ if ((i = CHANreadtext(cp)) < 0) { + #ifdef SUNOS5 + /* return of -2 indicates EAGAIN, for SUNOS5.4 poll() bug workaround */ + if (i == -2) { + return; + } + #endif if (cp->BadReads++ >= BAD_IO_COUNT) { if (NCcount > 0) NCcount--;---------- Solaris fix #8 From: Joe St Sauver <joe@decoy.uoregon.edu> We recently upgraded some machines in our news farm to fast ethernet, and after doing so we noticed poor performance (ping times of 30msec between two machines each connected to dedicated switch ports on the same switch...). Poking around a little, we noticed that under Solaris 2.5, tcp_conn_req_max is set to 32 by default, which is a little low if you are working with a fair number of peers or have a lot of readers. We bumped that value to 1000 or so (1024 max under Solaris 2.5), using: # ndd -set /dev/tcp tcp_conn_req_max 1000and now ping times are back into the 0 or 1 msec reported range you'd hope to see from that sort of topology. :-) ------------------------------ [Last Changed: $Date: 1997/09/23 01:25:52 $ $Revision: 2.34 $] [Copyright: 1997 Heiko Rupp, portions by Tom Limoncelli, Rich Salz, et al.] |