Objectives:This chapter discusses troubleshooting and resolving server problems. The objectives important to this chapter are:
Concepts:Identify Server Hardware and Operating System ComponentsServer problems can be divided into two types: hardware and software problems. To deal with them, you can start with some basic idea of what hardware and software you have. Hardware Bus types - The types of bus found on a server's motherboard will affect its performance. (As a point of reference, Pentium 1 through 4 processors have a 64 bit data bus.) Four major motherboard bus architectures are listed:
Hard drives and disk channels - A disk channel is the data path used to access hard drives. Three types are listed:
Processors - NetWare 6 supports up to 32 processors on a server. Each must be at least a Pentium II or an AMD K7. To determine if a processor will work in your system, be aware that it must be supported by a PSM, Platform Support Module, included with NetWare 6 or downloaded from the manufacturer. Novell provides a PSM with NetWare 6, MPS14.PSM, that will support processors compliant with Intel Multiprocessor Specification 1.1 and 1.4. Memory - Many problems with computers, workstations or servers, can be simplified by adding more working memory (RAM). A NetWare 6 server requires at least 256 MB of RAM. Although a NetWare 6 server can have as much as 64 GB of RAM, only the first 4 GB can be directly addressed as cache memory. RAM above 4 GB is allocated to virtual memory. Ideally, a server should be scalable: you should be able to add more processors, memory, and/or disks to a server. Scalability includes the idea of combining several servers into a cluster, to provide redundant access to services. Troubleshooting may be easier if you have an understanding of the NetWare Operating System. You should be aware from other classes that the file that loads when a server is started is server.exe. This file actually contains several programs. The first to be run is loader.exe, which loads server.nlm, and other nlm files. These programs load in a series of stages called loadstages, numbered 0 through 5. The startup.ncf file is loaded between loadstages 0 and 1. The autoexec.ncf file is loaded between loadstages 4 and 5. As a server boots up, many nlm files are typically loaded. You can view the list of loaded nlm files at the server console by using the modules command. In this list, you can tell something about the kind of nlm a file is by its color in the list:
The heart of the NetWare operating system is the kernel. In previous versions of NetWare, there were two kernels, one that supported a single processor, and one that supported multiple processors. There is a single kernel for NetWare 6 which works either way. Even with a single processor, a NetWare server manages multiple processes simultaneously. The processes are referred to as threads. The thread cannot actually be run simultaneously with only one processor, so each thread is given access to processor cycles on a rotating basis. In this manner, the server can be said to be multitasking. Some process threads can be preempted by other threads, but this feature must be included by the programmer, unless the program is written in Java, which support preemption for all programs. Troubleshoot and Resolve NetWare Server IssuesIf a server is not functioning normally, it is advised to get on the Internet and check for solutions on Novell's web site. Both the Knowledgebase and the Cool Solutions web tools should be used. If you have a problem, but the server is not actually locked up, try inspecting various console screens with Ctrl-Esc or Alt-Esc. If possible, you can try to access the NetWare internal debugger on the server by pressing Shift-Alt on the right side of the keyboard, while pressing Shift-Esc on the left side. If you get into the debugger, you can exit it by pressing the letter G. If the server is locked, and you cannot do any of these things, try pressing Ctrl-Alt-Esc. You should see a menu that will allow you to take the server down. If the server obeys this command, it will avoid corrupting data on the NetWare volumes. Some general advice is offered about hard drive problems. Make sure power cables and data cables are attached correctly, interrupt conflicts are resolved, and devices on a SCSI bus are assigned unique numbers on that bus. Server memory errors are discussed. Remember that a NetWare server still starts as a DOS machine. Problems can occur if your DOS boot partition contains files with commands in conflict with NetWare. Some versions of DOS will automatically add DOS=HIGH as a command in the config.sys file. It should be removed, along with any reference to memory managers. Nlm files that do not give back memory when they end are said to cause the server to "leak memory", a problem commonly seen on Windows workstations. Novell recommends unloading and loading nlm files to determine which ones are doing this. Updated versions or patches may resolve the issue. Sometimes you may need to free some memory on a server right away. Try unloading any nlm files you do not currently need, dismounting volumes not in use, and unloading name space not in use. If the number of available cache buffers is less than 20% of the total cache buffers, you may have no choice but to add RAM to the server. Troubleshoot and Resolve Critical Server AbendsThe word abend is a shortend form of the phrase abnormal end. It refers to a server process that stops running unexpectedly. Server abends can result in loss of data, locked up servers, and loss of services on the network. Abends come in two basic types: they are either detected by the processor, or by the operating system. Processor deteced abends could be called hardware abends or processor exceptions. Operating system detected abends could be called software abends. A program could abend lots of ways:
It is possible to have abend messages saved in a log, to automatically restart a server after an abend, and to to control the time to wait until an automatic restart. Abend log files are created in the DOS partition, but are moved to the SYS:SYSTEM folder when the server reboots. The information to be found in an abend log includes:
Sometimes, Novell recommends that you consider a core dump. This means dumping the contents of the server's RAM to a file. The file will be the same size as the server's RAM, so you will need that much free room in your DOS partition. A feature implied but not previously mentioned in older texts is that the core dump will include the images on each console screen at the time of the dump. Novell servers can be configured for automatic response to abends. There are four potential settings for the Auto Restart After Abend parameter:
The specified time mentioned above (Auto Restart After Abend Delay Time) can range from 0 to 60 minutes. If the server is set to option 0 above, an abend should take the server to a command line where the administrator can type single letter commands to choose how to resolve the problem. This can be confusing, since the interface will interpret the same letter differenly depending on the type of abend that has occurred. Some commands to be aware of:
In the text, we learn that a core dump is also called a memory image. Some abend conditions will offer the option of creating the core dump file. If your condition does not, Novell offers an option to force a core dump. Note that you can now create full core dump or a cacheless core dump, one that does not record what was in cache memory. Two ways to create a core dump file:
As the core dump/memory image is being created, you will be prompted for a path and filename. The default is c:\coredump.img. The file may be stored several different ways.
Before sending an image file to Novell for analysis, call their support line an get a support incident number. They will authorize you to send the image file, which should be renamed with the first eight digits of your incident number. Note that Novell charges for this service unless the cause of the abend is determined to be a fault in NetWare. If sending the file by FTP, Novell asks you to compress it in a ZIP file first. Troubleshoot and Resolve Server Communication IssuesIf servers cannot communicate with each other, you will have to find out why and correct the problems. One method of detecting such a problem is to run DSTRACE, and watch for errors coded "-625". These errors indicate that eDirectory communication is not taking place. To troubleshoot server-to-server communication problems:
To troubleshoot workstation-to-server communication problems:
To prevent problems, collect a baseline of information about your network before problems occur. Running a utility like LANalyzer on your network will collect data over time. You can refer to this data to determine what is normal and what is not. The text recommends finding bottlenecks in your network and upgrading them. The intention of the message is to tell you to find the NICs, hubs, switches, etc. in your network that run at lower data rates than the rest of your network, and upgrade them to the network data rate. Documentation is the part of the job that seems to be done the least often. When something goes wrong, you will want good documentation about your network. Contribute to the solution by making log entries, mapping the site, keeping documentation about hardware and software, and anything else that you may suddenly need in a place where you can find it. Finally, the chapter recommends being proactive to protect your network. Add RAM when possible. Replace worn hardware before it breaks. Practice proper grounding procedures when handling computer components. A static electricity discharge of 20 to 30 volts is all it takes to harm electronic components, but a human being cannot feel a discharge of less than about 3,000 volts.
|