ABSTRACT
  ABSTRACT

       This program was primarily  developed  for  use  in  those
  cases  where  a  customer has an operating system that does not
  have the DSA Bad Block Replacement capability  already  in  it.
  Several  operating  systems  already have a method of doing DSA
  Bad Block Replacement (BBR) and  would  only  rarely  need  the
  services  of this program (VMS is one of many).  However, their
  are cases where even with an operating system that has the  BBR
  capability,  the  need to replace a specific LBN, or allow this
  program to search for bad blocks, may be appropriate.

       In those cases where a customer is  running  an  operating
  system  that  does  not have the BBR capability, bad blocks may
  develop and show up as ECC errors  in  the  system  error  log.
  When this occurs, this program can be used to replace those bad
  blocks and  eliminate  the  logging  of ECC  errors related  to
  specific  bad  block(s).  It's a known fact that bad blocks can
  become worse with time.  What may start out as a 6  symbol  ECC
  error  may  eventually end up as an uncorrectable ECC error, if
  action is not taken to replace that block.

 OPERATIONAL_CONCEPT
  OPERATIONAL CONCEPT

       In order to verify that an LBN indeed  needs  replacement,
  this   program  (And  all  other  Operating  Systems  with  BBR
  capability) tries to "verify" that the bad block report is  not
  a transient error and in fact relates to a defect in the media.
  If  all  reports  of  Bad  Blocks  (Like  ECC  errors)   caused
  replacement,  without  question, then a substantial possibility
  would exist that blocks with no defect would be  replaced.   As
  you know, many times errors are caused by the environment (Like
  static, electrical problems etc).  Errors reported that are not
  related  to  a  media  defect should not be replaced.  Thus, an
  attempt is made to verify  that  a  bad  block  report  can  be
  duplicated  and  therefore related to a higher probability of a
  media defect being the cause.  This is the way  it  happens  in
  the most usual case:

  1.  This program uses the MSCP ACCESS command to do large  Byte
      count  transfers  (Read)  from  the drive being tested.  It
      starts with LBN 0 and progresses to the highest user LBN on 
      the drive. The ACCESS command moves data from the drive to 
      the controller, where the data is checked for any errors. 
      The data IS NOT actually moved to host memory. This way we can
      read data at the fastest possible rate and thereby make  it
      possible for more passes to be made across the media in the
      shortest possible amount of time.

  2.  If an error (Lets say an 8 symbol ECC error) is detected by
      the  controller,  a  flag  gets  set  called  the Bad Block
      Reported flag.  This flag is reported to  this  program  by
      whats  called  an  "end packet".  This packet describes the
      success or failure of the ACCESS command that we  gave  the
      controller.   The  end packet also provides the LBN address
      that caused the error.

  3.  When this program sees this Bad  Block  Reported  Flag,  it
      takes  the  LBN  address that caused the Bad Block reported
      flag to set and goes into a series of READ commands to  the
      LBN  that reported the bad block.  A series of 32 reads are
      attempted.  If  a  second  Bad  Block  report  occurs,  the
      program terminates the testing and replaces the block.

  4.  If 32 re-READ's are accomplished without error or bad block
      report  then  the  error  is  considered  a  transient  and
      replacement is not attempted.

  5.  HOWEVER, if it is considered a  transient  error,  the  LBN
      address  is  logged  in a memory table called the transient
      error table.  If on a subsequent passes across  the  media,
      this  same  LBN  address reports an error or Bad Block, the
      LBN is replaced without argument and  without  any  further
      testing.   In other words, if we have a transient error and
      then later this same  LBN  reports  another  error,  it  is
      replaced.


       THAT IS WHY RUNNING THIS PROGRAM FOR  MULTIPLE  PASSES  IS
  BENEFICIAL.  The transient error table is kept in memory and is
  kept updated as long as the program is running.  If the program
  is  terminated,  the  table is lost and any entries in it.  The
  benefit of this table is that any marginal bad blocks that just
  does not show an error each time it is read can be replaced.

 ATTACH

					! For VAX 11/780
  ATT DW780 HUB DWn 3 4			! attaches the DW780

					! For VAX 11/750
  ATT DW750 HUB DWn			! attaches the DW750

					! For VAX 11/730
  ATT DW730 HUB DWn			! attaches the DW730

					! For VAX 8200
  ATT DWBUA HUB DWn 0 4			! attaches the DWBUA

					! For VAX 8800
  ATT NBIA HUB NBIAm n			! attaches the BI
  ATT NBIB NBIAn NBIBm N M		! attaches the BI adapter
  ATT DWBUA NBIBm DWn 0 4		! attaches the DWBUA
  
  ATT KDB50 HUB DUa			! attaches the KDB50 for 8200
  ATT KDB50 DBIBm DUa 5			! attaches the KDB50 for 8800

  ATT UDA50 DWn DUa 772150 154 5 2	! attaches the UDA50

  ATT RA80 DUa DUan			! attaches the RA80
  ATT RA81 DUa DUan			! attaches the RA81
  ATT RA60 DUa DJan			! attaches the RA60
  ATT RA82 DUa DUan			! attaches the RA82

 DEVICES
  Devices Supported by this program

  EVRLK supports the UDA50 and KDB50 controllers and the RA60, RA80, 
  RA81 RA82 disk drives.  The UDA50 may be attached to the VAX-11 UNIBUS 
  adapter (DW780, DW750, DW730), or the VAX-BI UNIBUS adapter (DWBUA).
  The KDB50 must be attached to the HUB (for VAX 8200), or the NBIB (VAX 8600).

 KDB50
  Device:  KDB50
  Link:  HUB or DBIBn
  Generic name:  DUa
  Additional information:
    BI Node Number (HEX) [hex 0-f] ? <n>
    BR [4-7] ? <5>

 RA60
  Device:  RA60
  Link:  DUa
  Generic name:  DJan

 RA80
  Device:  RA80
  Link:  DUa
  Generic name:  DUan

 RA81
  Device:  RA81
  Link:  DUa
  Generic name:  DUan

 RA82
  Device:  RA82
  Link:  DUa
  Generic name:  DUan

 UDA50
  Device:  UDA50
  Link:  DWn
  Generic name:  DUa
  Additional information:
    UNIBUS address (UDAIP): [octal 760000-777774] <772150>
    UNIBUS vector: [octal 4-774] <154>
    UNIBUS BR level: [decimal 4-7] <5>
    UNIBUS bandwidth (Burst Rate) [decimal 0-63] <0>
 END_OF_PASS_SUMMARY
  RUN SUMMARY DISPLAYED AT THE END OF PASS  
       The operator is notified of End of  Pass,  for  the  drive
  being tested, by the following:

           1.  EOM reached.   Elapsed runtime hh:mm:ss
               The End of Media is reached  in  Automatic  
               replacement mode when the highest LBN on the media 
               has been tested.

           2.  EOF detected.
               The End of File is reached in Manual replacement  
               mode when the operator enters a null LBN for 
               replacement (CARRIAGE RETURN).

       When End Of Pass (EOP) is reached in either  Automatic  or
  Manual  mode,  the  following pass summary is displayed for the
  drive being tested.  This summary will also be displayed by the
  operator "PRINT" command at the diagnostic supervisor prompt.

          REPLACEMENT INFORMATION FOR THIS PASS

            BAD BLOCKS reported:     ddd.
            ECC error detected:      ddd.
            FORCED ERRORs detected:  ddd.
            FORCED ERRORs written:   ddd.
            PRIMARY replacement:     ddd.
            SECONDARY replacements:  ddd.
            RBNs marked unusable:    ddd.
            Total replacements made: ddd.

       The entries should be self-explanatory, relative to  other
  help  available.  The Bottom line indicates the total number of
  replacements that were made, which  should  not  exceed  a  few
  hundred,  unless  something  more is wrong with the drive.  The
  number of  Forced  Errors  written,  indicates  the  number  of
  Uncorrectable  ECC  (ECH) errors that were encountered.  If you
  have any forced errors written, or forced errors detected,  you
  should  not  fail to restore the customer data from the back-up
  that was made, before you started execution  of  this  program.
  If  excessive  replacements  were  made,  read RA81-TT-12 for a
  possible explanation and recovery procedure.

 ERRORS
  Error reporting:

       This  program  will report all fatal System, controller, 
  MSCP, BBR and Error log errors.  SA register contents, MSCP Status 
  and current LBN are displayed.  After displaying you the error 
  information it  will drop the  drive  or terminate.

       If you see any  of  the  indicated  errors  reported,  you
  should  discontinue  the  use of this program and use the other
  diagnostics, or system error  logger,  to  better  isolate  any
  problems.  Some of the status codes (HELP EVRLK STATUS_CODES)
  could be taken for errors, so be sure you know what the program
  is  trying  to  tell  you.   

  There is always a possibility of a drive/controller generating 
  a hard error that will cause an incomplete replacement. If you 
  get a hard error, you should run at least one pass of this 
  program, in automatic mode, to take care of any incomplete 
  replacements and any other bad blocks.

       Listed below will be the categories of hard  errors.   You
  can request help for the category your interested in and get a
  list of the errors within that category.  Also  available,  are
  decoding  charts to provide the meanings of many of the entries
  in an error report (Like a chart of the Status/Event Codes  ---
  "HELP EVRLK ERRORS STATUS_EVENT")

 MSCP_ERRORS
  MSCP command/response errors:

       The following sections list the possible  errors  reported
  that  indicate  the  failure  of  an MSCP command issued by the
  host.  Remember that a period after a number indicates that the
  number  is  in  Decimal.   Many of these errors will display an
  item called the "Endmsg status:" This stands  for  End  Message
  Status  and  provides  detail  on the error that occurred.  The
  meanings of the Endmsg status can be found by looking it up  in
  the list of Status/Event codes provided by typing:  "HELP EVRLK
  ERRORS STATUS_EVENT"

       Also, an item called "Endmsg Flag" will be  displayed  for
  many  of the errors.  The Endmsg Flags report additional status
  to the host.

 SET_CHAR
  Failed SET CONTROLLER CHARACTERISTICS:

       "SET CONTROLLER CHARACTERISTICS" is an MSCP  command  that
  is  used  by the host to set certain controller characteristics
  (Like timeouts and to enable various kinds of messages).   This
  error  occurs  when  command  completion status (Endmsg status)
  indicates a problem executing this command.  This type of error
  will be reported in the following manner:

                  FAILED SET CONTROLLER CHARACTERISTICS;
                    Host access timeout not disabled.
                    Default timeout is 60 seconds.
                    Error logging is not enabled.
                    Endmsg status:  xxx

       The default host access timeout  is  retained,  and  error
  logging  will not be enabled.  Errors are reported via the "End
  Message" Status field.  The meanings of the Endmsg status can be 
  found by looking it up  in the list of Status/Event codes 
  provided by typing:  "HELP EVRLK ERRORS STATUS_EVENT"


       Processing of the specified unit  (drive)  will  continue.
  However,  if  this  error  message  is  seen,  the  user should
  terminate the test and try running it over.  If the problem  is
  again  reported, the problem should be investigated using other
  diagnostics.
 READ
  Failed READ:

       This error occurs when an MSCP "READ" command  fails  with
  an  end  message  status (Endmsg status:) other than Success or
  Data Error.  The error is reported in the following manner:

                  Failed READ; LBN:  dddddd.
                    Endmsg flags:       xxx (X)
                    Endmsg status:      xxx (X)

       This error can be the result of many different conditions.
  The  host  issued a "READ" command to the controller being used
  for testing and then received an End Packet indicating that  an
  error  was  detected while trying to execute that command.  The
  LBN is the one the "READ" was attempting to access, at the time
  the  error  occurred,  as  reported  by  the  End  Packet.  The
  meanings of the Endmsg status can be found by looking it up  in
  the list of Status/Event codes provided by typing:  "HELP EVRLK
  ERRORS STATUS_EVENT"

       The Endmsg Flags report additional  status  to  the  host.
  The meanings of the Endmsg Flags can be found by typing:  "HELP
  EVRLK ERRORS COMMAND_RESPONSE_ERRORS ENDMSG_FLAGS"
 WRITE
  Failed WRITE:

       This error occurs when an MSCP "WRITE" command fails  with
  an  end  message  status (Endmsg status:) other than Success or
  Data Error.  The error is reported in the following manner:

                Failed WRITE;  LBN:  dddddd.
                  Endmsg flags:         xxx (X)
                  Endmsg status:        xxx (X)

       This error can be the result of many different conditions.
  The  host issued a "WRITE" command to the controller being used
  for testing and then received an End Packet indicating that  an
  error  was  detected while trying to execute that command.  The
  LBN is the one the "WRITE" was attempting to write at the  time
  the  error  occurred,  as  reported  by  the  End  Packet.  The
  meanings of the Endmsg status can be found by looking it up  in
  the list of Status/Event codes provided by typing:  "HELP EVRLK
  ERRORS STATUS_EVENT"

       The Endmsg Flags report additional  status  to  the  host.
  The meanings of the Endmsg Flags can be found by typing:  "HELP
  EVRLK ERRORS COMMAND_RESPONSE_ERRORS ENDMSG_FLAGS"
 ACCESS
  Failed ACCESS:

       This error occurs when an MSCP "ACCESS" command fails with
  an  end  message  status (Endmsg status:) other than Success or
  Data Error.  PLEASE READ THIS COMPLETE ERROR DESCRIPTION.   The
  error is reported in the following manner:

                  Failed ACCESS;  LBN:  dddddd.
                    Endmsg flags:          xxx (X)
                    Endmsg status:         xxx (X)

       This error can be the result of many different conditions.
  The  host  issued  an  "ACCESS" command to the controller being
  used for testing and then received  an  End  Packet  indicating
  that  an  error  was  detected  while  trying  to  execute that
  command.  The LBN is the one the  "ACCESS"  was  attempting  to
  read  at  the  time  the error occurred, as reported by the End
  Packet.  An ACCESS command is the MSCP  command  used  by  this
  program to read all the LBN's on the media.  The ACCESS command
  causes a read type operation, checks for any  error  conditions
  (Bad Block), but does not transfer any data to the host memory.
  This purpose for this command is to verify  that  data  can  be
  read  without  error.  The meanings of the Endmsg status can be
  found by looking it  up  in  the  list  of  Status/Event  codes
  provided by typing:  "HELP EVRLK ERRORS STATUS_EVENT"

       The Endmsg Flags report additional  status  to  the  host.
  The meanings of the Endmsg Flags can be found by typing:  "HELP
  EVRLK ERRORS COMMAND_RESPONSE_ERRORS ENDMSG_FLAGS"
 REPLACE
  Failed REPLACE:

       This error occurs when an  MSCP  "REPLACE"  command  fails
  with  an end message status (Endmsg status:) other than Success
  or Data Error.  The  error  is  reported in the following manner:

                  Failed REPLACE;  LBN:  dddddd.
                    Endmsg flags:           xxx (X)
                    Endmsg status:          xxx (X)

       This error can be the result of many different conditions.
  The  host  issued  a  "REPLACE" command to the controller being
  used for testing and then received  an  End  Packet  indicating
  that  an  error  was  detected  while  trying  to  execute that
  command.  The LBN is the one the "REPLACE"  was  attempting  to
  use  at  the  time  the  error occurred, as reported by the End
  Packet.  A REPLACE command is the MSCP command used by programs
  that  do bad block replacement to actually cause the bad LBN to
  be logically replaced.  The execution of the REPLACE command is
  very  involved and leaves little room for error.  The header of
  the bad LBN is written with special codes and  the  replacement
  RBN  is  allocated  to  the bad LBN.

  If this error is seen, the  media  should be reformatted  using  
  the  field formatter  program.  The failure of this command could
  be due to a defect in the drive, other than media  bad  blocks.
  Be  sure  that  any  other  causes  for  a failure like this is
  investigated  and  any   appropriate   repairs   made,   before
  attempting  to  reformat the media.  A failure like this one is
  one of the good reasons for using the  formatter,  however,  be
  sure  not  to  use  "RECONSTRUCT"  mode  of  the formatter (See
  RA81-TT-19).  The meanings of the Endmsg status can be found by
  looking  it  up  in  the list of Status/Event codes provided by
  typing:  "HELP EVRLK ERRORS STATUS_EVENT"

       The Endmsg Flags report additional  status  to  the  host.
  The meanings of the Endmsg Flags can be found by typing:  "HELP
  EVRLK ERRORS COMMAND_RESPONSE_ERRORS ENDMSG_FLAGS"
 CMD_REF
  Command/Response reference number mismatch:

       In Mass Storage Control Protocol (MSCP) commands that  are
  issued  to the controller are given a "unique" number.  In this
  way, the host can distinguish this command, and any  responses,
  from any other commands that may be issued.  When a response to
  a command is received,  the  host  attempts  to  associate  the
  "unique"  reference  number  to  a  command  that  has  not yet
  received a response.  If a response reference number can not be
  matched  to  a command with the same number (from commands that
  have not had responses yet) then this error occurs.  This error
  is reported in the following manner:

                  Command/response reference number mismatch;
                    Command ref:    xxxx
                    Endmsg ref:     xxxx
                    Endmsg flags:    xxx   
                    Endmsg status:   xxx

       The user should find that the Command Ref and  Endmsg  ref
  do  not  match.   Why  this  would happen, is hard to say.  The
  source for this kind of trouble, is not usually the drive being
  tested.  Rather it is more likely that a controller or "system"
  problem exists.  The meanings of the Endmsg status can be found
  by  looking it up in the list of Status/Event codes provided by
  typing:  "HELP EVRLK ERRORS STATUS_EVENT"

       The Endmsg Flags report additional  status  to  the  host.
  The meanings of the Endmsg Flags can be found by typing:  "HELP
  EVRLK ERRORS COMMAND_RESPONSE_ERRORS ENDMSG_FLAGS"

       For this kind of error, you may find that the Endmsg flags
  and  Endmsg  status  will not indicate an error.  In this case,
  possibly a "system" type error would be more likely.
 ENDCODE
  Fatal Endcode detected:

       An "end message" is the  means  by  which  the  controller
  tells  the host about how the command was processed and whether
  any errors occurred while executing a command.  This  error  is
  indicating  that a fatal status was reported by an end message.
  The error is reported in the following manner:

                  Fatal endcode detected;
                    Endmsg endcode: xxxx

       Since the error is reporting Fatal end message status, you
  should  find  the  an  "Endmsg  status"  code  displayed.   The
  meanings of the Endmsg status can be found by looking it up  in
  the list of Status/Event codes provided by typing:  "HELP EVRLK
  ERRORS STATUS_EVENT"

       The Endmsg Flags report additional  status  to  the  host.
  The meanings of the Endmsg Flags can be found by typing:  "HELP
  EVRLK ERRORS COMMAND_RESPONSE_ERRORS ENDMSG_FLAGS"
 VERIFY
  Failed Replacement Verification:

       After a "REPLACE" command has been  issued  and  completed
  with  good  results,  the customers data must be written to the
  replacement block (RBN).  At this point in time,  the  original
  bad  LBN  has  been  written  with  a  code that will cause all
  references to  the  LBN  address  to  be  "revectored"  to  the
  replacement  block.   The host issues a "WRITE" command,
  with the compare modifier, and the customer data will end up in
  the replacement block.  If this process reports a problem, this
  error message will result:

              Failed Replacement verification;  LBN:  dddddd.
                Endmsg flags:     xxx (X)
                Endmsg status:    xxx (X)

       The program assumes that the cause for the error  is  that
  the  replacement block is bad.  If this the case, a retry
  of the replacement process  takes  place.   This  time  another
  replacement  block  is  used  and  the  original  one is marked
  unusable.  The meanings of the Endmsg status can  be  found  by
  looking  it  up  in  the list of Status/Event codes provided by
  typing:  "HELP EVRLK ERRORS STATUS_EVENT"

       The Endmsg Flags report additional  status  to  the  host.
  The meanings of the Endmsg Flags can be found by typing:  "HELP
  EVRLK ERRORS COMMAND_RESPONSE_ERRORS ENDMSG_FLAGS"
 ENDMSG_FLAGS
  End message flags:

         Bit flags, collectively called end flags, used to report
    various  conditions  detected  due  to  this  command but not
    directly related to success or failure.  The following  flags
    are defined:

    Bad Block Reported:  (200 OCTAL -- Bit 7) (80 Hex)

            Set if one or more bad blocks were detected  and  the
       host   is  expected  to  perform  bad  block  replacement.
       Indicates that the  host  should  replace  the  bad  block
       identified by the LBN provided.


    Bad Blocks Unreported:  (100 OCTAL -- Bit 6) (40 Hex)  

            Set if one or more bad blocks were detected  and  not
       reported  in the "first bad block" field of an End Message
       Packet.

            If the "Bad Block Reported" flag is  also  set,  this
       indicates  that  two or more bad blocks were detected, and
       the  host  (This  program)  should   perform   bad   block
       replacement.   In  this  case the "first bad block" (LBN:)
       field only reports the first bad block  in  the  transfer.
       If  this  happens,  the program should be run for multiple
       passes.

    Error Log Generated:  (40 OCTAL -- Bit 5)  (20 Hex)

            Set if one or more error log messages were  generated
       that  refer  to  this  command  -- i.e., that contain this
       command's command reference number.  This flag allows  the
       host  to  save  any  outstanding  command  context that it
       wishes to include in the error report.   The  MSCP  server
       must  send the error log messages either before or shortly
       after it sends  the  end  message  containing  this  flag.
       Depending  upon the type of error log report referenced by
       this flag, this program will determine whether to  display
       the  error  log  report  or  not.  In most cases, the Disk
       Transfer Error Log report is used to  report  blocks  that
       need  replacement, reporting this error log report is
       somewhat  meaningless,  since  the  program  will  make  a
       replacement due to having received this error log type.
 CTRLR_INIT
  Controller initialization errors:

       The following errors can  be  reported  during  controller
  initialization.   These are basically "HARD" serious errors and
  usually indicate a very sick controller.  This program, as with
  any  operating system, must first initialize the controller, to
  prepare it for use.  Any unexpected or error values read from the 
  controller register will be displayed.

 ADDRESS
  Controller address error occurred:

       This error occurs when the  program  attempts  to  read  a
  controller  register  and a memory access error occurred.  This
  can  be  caused  by  the  operator  specifying   an   incorrect
  controller  address.   The  specified Unit will be dropped from
  testing.  This type of error will be reported in the  following
  manner:

    ADDRESS ERROR OCCURRED WHILE ACCESSING CONTROLLER REGISTER;
      UDASA address:   oooooo (O)
      UDASA contents:    xxxx (X)
      UDASA expected:    xxxx (X)
                 XOR:    xxxx (X)
      Check controller address.

       The  SA  address  displays  the  UNIBUS  address  of   the
  controller  register that was being accessed at the time of the
  error.  This is most likely caused by the  operator  specifying
  an incorrect controller address.  Unit is dropped from testing.
 CONTROLLER_DIAG
  Controller resident diagnostics detected failure:

       This  error  indicates  that   the   controller   resident
  diagnostics reported an error during initialization.  This type
  of error will be reported in the following manner:

       CONTROLLER RESIDENT DIAGNOSTICS DETECTED FAILURE;
      UDASA address:   oooooo (O)
      UDASA contents:    xxxx (X)
      UDASA expected:    xxxx (X)
                 XOR:    xxxx (X)

       The SA register contains the  detected  error  code.   The
  specified   Unit   will  be  dropped  from  testing.   Decoding
  information for the SA register contents (error  code)  can  be
  obtained by typing:  "HELP EVRLK ERRORS SA_CODES"
 STEP_BIT
  Step bit error in SA register during initialization:

       This error  indicates  that  during  the  4  step  (phase)
  initialization  of  the  controller a error was detected.  This
  type of error will be reported in the following manner:

       STEP BIT ERROR IN SA REGISTER DURING INITIALIZATION;
      UDASA address:   oooooo (O)
      UDASA contents:    xxxx (X)
      UDASA expected:    xxxx (X)
                 XOR:    xxxx (X)

       The SA address is the UNIBUS  address  of  the  controller
  register   that   was   being   accessed,  during  the  4  step
  initialization of the controller, when the error was  detected.
  The  SA contents is what the controller register contained when
  the error was detected and may be an error code.   SA  expected
  is  what this program expected, for correct results during that
  step in the initialization.  Decoding information  for  the  SA
  register  contents  (error  code)  can  be  obtained by typing:
  "HELP EVRLK ERRORS SA_CODES"
 STEP_3
  SA register did not zero after step 3 write:

       This error occurs when  the  SA  register  does  not  zero
  during Step 3 of controller initialization.  This type of error
  will be reported in the following manner:

       SA register did not zero after step 3 write:
      UDASA address:   oooooo (O)
      UDASA contents:    xxxx (X)
      UDASA expected:    xxxx (X)
                 XOR:    xxxx (X)

       The SA address is the UNIBUS  address  of  the  controller
  register   that   was   being   accessed,  during  the  4  step
  initialization of the controller, when the error was  detected.
  The  SA contents is what the controller register contained when
  the error was detected and may be an error code.   SA  expected
  is  what this program expected, for correct results during that
  step in the initialization.  Decoding information  for  the  SA
  register  contents  (error  code)  can  be  obtained by typing:
  "HELP EVRLK ERRORS SA_CODES"
 SA_INIT
  SA register error during initialization:

       This is the more common of the  Controller  Initialization
  Errors that can be reported.  The SA register Error bit (Bit 15
  should be set).  This type of error will  be  reported  in  the
  following manner:

       SA REGISTER ERROR DURING INITIALIZATION;
      UDASA address:   oooooo (O)
      UDASA contents:    xxxx (X)
      UDASA expected:    xxxx (X)
                 XOR:    xxxx (X)

       The SA address is the UNIBUS  address  of  the  controller
  register   that   was   being   accessed,  during  the  4  step
  initialization of the controller, when the error was  detected.
  The  SA contents is what the controller register contained when
  the error was detected and may be an error code.   SA  expected
  is  what this program expected, for correct results during that
  step in the initialization.  Decoding information  for  the  SA
  register  contents  (error  code)  can  be  obtained by typing:
  "HELP EVRLK ERRORS SA_CODES"
 CLEAR
  Controller did not clear ring structure in host memory:

       An  error  occurred  during  initialization  of  the  Host
  communication  area.  At a certain point during host/controller
  initialization, the controller is responsible for clearing  the
  host  communication  area (Rings-In Host Memory) and this error
  results from a  failure  of  the  controller  to  perform  this
  function (As DETECTED BY THE HOST).

       CONTROLLER DID NOT CLEAR RING STRUCTURE IN HOST MEMORY;
         Host address:  xxxxxx (X)

       The Host address is a host memory  location  in  the  host
  communication  area  that  the  controller was to have cleared.
  The problem could be a problem in the  controller  or  possibly
  the   controller   had  problems  accessing  the  hosts  memory
  locations of the communications area.
 CONTROLLER
  Host/controller communication errors:

       The following errors can be reported during normal program
  execution.   These  are  basically  "HARD"  serious  errors and
  usually indicate a very serious problem.  Errors  are  reported
  via the "Status/Address" (SA) register.

 NO_INTERRUPT
  No interrupt received for 30 seconds:

       This error occurs when an interrupt timeout occurs  during
  a wait for an MSCP end message. The response ring buffer is only 
  one slot and an interrupt is expected for every END PACKET that  
  the controller sends to the host.
  

    NO INTERRUPT RECEIVED FOR 30 SECONDS;
      UDASA address:   oooooo (O)
      UDASA contents:    xxxx (X)

       The Host got tired of waiting for  an  expected  interrupt
  from  the  controller.   This  does not always necessarily mean
  that the controller was at fault.  The SA address is the UNIBUS
  address  of  the  controller  Status/Address  register that was
  being used at the time the problem was detected  by  the  host.
  If  the  controller  detected  a problem, that accounted for no
  interrupt being delivered to the host, The Status/Address  (SA)
  register contents should show an error code and the SA register
  error bit should be set (Bit 15).  The specified Unit  will  be
  dropped from testing.
 SA_REGISTER
  Fatal error reported in SA register:

       This error can occur when the Error Bit is set in  the  SA
  register, during normal on-line use of the controller.  This is
  a common error reporting mechanism for the controller and  this
  type of error may occur more often than most others.  The error
  bit (Bit 15)  setting  causes  the  controller  to  discontinue
  fetching command packets from the host.  The host will read the
  SA register, detecting  that  an  error  condition  exists  and
  display contents of the SA register in this message.

    FATAL ERROR REPORTED IN SA REGISTER;
      UDASA address:   oooooo (O)
      UDASA contents:    xxxx (X)

       The SA register contains the  detected  error  code.   The
  specified   Unit   will  be  dropped  from  testing.   Decoding
  information for the SA register contents (error  code)  can  be
  obtained by typing:  "HELP EVRLK ERRORS SA_CODES"
 ERRORLOG
  Errorlog packets:

       Their are several sources of error information  in  an  RA
  subsystem.   One  of the most common, is error log packets that
  the controller assembles and sends to the host.   Their  are  5
  types  of  error  log  packets.   Three  of  these types can be
  reported by this program.

  1.  SDI Error Log Packet (Drive detected errors).

  2.  Controller Error Log Packet.

  3.  Host Memory Access Error Log Packet.

 CONTROLLER
  Controller error log packet:

       This error occurs when an Error  log  packet  is  received
  specifying  that  a Controller detected an error within itself.
  The Unit being tested will be dropped.  This type of error will
  be reported in the following manner:

    Controller Error Reported:
      Status/event: xxx


       The Status/Event code will give you an indication of  what
  the trouble may be.  A list of the Status/Event codes and their
  meanings  can  be  obtained  by  typing:   
                 "HELP EVRLK ERROR STATUS_EVENT"
 MEMORY
  Host memory access error log packet:

       This error occurs when an Error  log  packet  is  received
  specifying  that a Host Memory Access Error was detected.  This
  type of error  packet  reports  problems  that  the  controller
  experienced  "dealing"  with host memory (Like while doing data
  transfers etc).  This type of error will  be  reported  in  the
  following manner:

    Host Memory Access Error reported;
      Status/event: xxx
      Host address: xxxxxx

       The host bus (Like UNIBUS) address is  displayed  and  was
  the  one being used at the time of the error.  The Status/Event
  code will give you and indication of what the problem  was.   A
  list  of  the  Status/Event  codes  and  their  meanings can be
  obtained by typing:  "HELP EVRLK ERROR STATUS_EVENT"
 SDI
  SDI error log packet:

       This error occurs when an Error  log  packet  is  received
  specifying  that  an  SDI  Error  or  Drive  Detected Error was
  detected.  If a Status/Event code of EB (S/E Codes are in  HEX)
  is  reported,  a  valid  Drive error code may be reported.  The
  Drive Error Code is displayed along with  the  Drive  dependent
  information  in the packet, IN HEXADECIMAL.  This type of error
  will be reported in the following manner.

    SDI Error reported;  LBN:  dddddd.
      Status/event:      xxx
      Drive error code:  xxxxxx
      Drive dependent information (X)

      Byte:      15  14  13  12  11  10  09  08  07  06  05  04  
      Contents:  xx  xx  xx  xx  xx  xx  xx  xx  xx  xx  xx  xx

       The LBN  is  the  Logical  Block  Number  that  was  being
  accessed  at  the  time  the  error  was  detected.   Many  LBN
  addresses do not relate directly to the occurrence of an  error,
  the  LBN  address  may be  zero when  you may  not expect it to
  be.  The LBN is always reported by  this  program  in  decimal.
  The  "Drive  Dependent  Information", and Drive Error Code, are
  typed in  HEX  because  that's  the  way  this  information  is
  provided in most of the drive service manuals.  You can look-up
  the drive error code  in  the  drive  service  manual  for  the
  meaning  of  the  reported  problem.  Most of the drive service
  manuals (Like RA80 and RA81) provide some  information  on  the
  meanings  of  the  bytes  displayed  for  the  Drive  Dependent
  information.  This information is sometimes known as the "drive
  specific  status  bytes"  of the "extended status".  

 INITIALIZATION
  Initialization errors:

       The following sections list the possible  errors  reported
  during  initialization  of  program  tables, etc.  These errors
  could indicate  a  possible  "system"  problem  or  programming
  error.  Only one type of error exists in this category.

       This error occurs when the memory table allocation  fails.
  Possibly  insufficient  memory  exists on the system to support
  the testing of the specified number of drives.   The  error  is
  reported in the following manner:

           FAILED DYNAMIC MEMORY ALLOCATION FOR SELECTED UNITS;
             RESTART PROGRAM AND SELECT FEWER UNITS.
 RCT
  RCT read/write errors:

       The following sections list the possible  errors  reported
  when  a  READ  or  WRITE to the Replacement Control Table (RCT)
  occurs.  The RCT does not require special MSCP  commands,  just
  normal  READ  and WRITE commands with the LBN address such that
  the commands are going to reference  blocks  in  the  RCT.   An
  error  is  reported  only  once  for  a  particular  RCT block.
  Depending upon the severity of the error, program execution may
  or  may  not continue.  Their must be at least one good copy of
  an RCT block for execution to continue.  Their are four  copies
  of  the  RCT  and  at  least  one  good  copy must be found for
  replacements to occur.  If a good copy cannot be obtained, then
  reformatting  the media must be accomplished.  Reformatting for
  this kind of error is a good use of the formatter.  However, be
  sure  that a good copy of the Format Control Table (FCT) exists
  (See RA81-TT-19).

       Replacements of bad blocks within the RCT are not possible
  (That's  why  their  are  four  copies). You may see RCT errors 
  frequently, however, they are not always fatal.

    Example of "soft" errors in RCT blocks:

        SCANNING RCT...

          RCT copy 1, block 352 (   891424.)  Status/event: 000110
          RCT copy 2, block 352 (   892189.)  Status/event: 000710
          RCT copy 1, block 552 (   891624.)  Status/event: 000110

       Meaning of the Status/Event codes  (Error  codes)  can  be
  obtained by typing "HELP EVRLK ERRORS STATUS_EVENT".  "Badness"
  would be when all 4 copies of the same block number would  have
  an error.  That would mean that a good copy of an RCT could not
  be assembled.

 READ
  Failed RCT READ:

       Simply stated, a READ command was issued to a block in one
  of  the  copies  of  the RCT and an error was reported back for
  that command.  This is not necessarily fatal, as their are four
  copies  of  the RCT and requirements are that at least one copy
  should have a good block for the  one  needed.   The  error  is
  reported in the following manner:

                   Failed RCT READ;
                     RCT copy: ddd
                     Block no: ddd
                     Endmsg status: xxx

       The RCT copy defines which of the four copies  were  being
  referenced  when the error was detected.  Block no is the Block
  number (LBN) that was being referenced within  the  copy,  when
  the  error  was  detected.  The Endmsg status should provide an
  indication  of  what  the  error  condition  was  and  can   be
  interpreted from the list of Status/Event codes by typing "HELP
  EVRLK ERRORS STATUS_EVENT" Program execution should continue.
 WRITE
  Failed RCT WRITE:

       Simply stated, a WRITE command was issued to  a  block  in
  one of the copies of the RCT and an error was reported back for
  that command.  This is not necessarily fatal, as their are four
  copies  of  the RCT and requirements are that at least one copy
  should have a good block for the  one  needed.   The  error  is
  reported in the following manner:

                   Failed RCT WRITE;
                     RCT copy: ddd
                     Block no: ddd
                     Endmsg status: xxx

       The RCT copy defines which of the four copies  were  being
  referenced  when the error was detected.  Block no is the Block
  number (LBN) that was being referenced within  the  copy,  when
  the  error  was  detected.  The Endmsg status should provide an
  indication  of  what  the  error  condition  was  and  can   be
  interpreted from the list of Status/Event codes by typing "HELP
  EVRLK ERRORS STATUS_EVENT" Program execution should continue.
 READ_ALL
  Failed READ of all copies of RCT:

       This is serious.   This  error  occurs  when  the  program
  detects  a  failure during a READ of all copies of a particular
  RCT block.  Replacements can not be done unless enough good RCT
  blocks exist to make-up a good RCT copy.  This program will not
  do replacements, under this  condition,  and  neither  can  any
  operating  system that has the capability of doing replacements
  (Dynamic BBR).  The error is reported in the following manner:

                   Failed READ of all copies of RCT;
                     Block no: ddd
                     Endmsg status: xxx

       The Block no is the block (LBN) that was being referenced.
  This  same  block  exists  in  each  of  the RCT copies.  
  References to this block in each of the copies resulted  in  an
  error.   The Endmsg status should provide an indication of what
  the error condition was and can be interpreted from the list of
  Status/Event  codes  by typing "HELP EVRLK ERRORS STATUS_EVENT"
  The Unit (drive) is dropped from testing.

       The appropriate recovery for this kind of error, should be
  to  first  make  sure  that  the drive does not have a hardware
  problem causing data errors that are not related to the  media.
  If  your  certain that the basic drive is functioning properly,
  the recovery would be  to  try  reformatting  the  media.   The
  Format  Control  Table (FCT) must be able to be used during the
  format (Read RA81-TT-19).  Once this has been accomplished, you
  can  re-run  this program, to assure yourself that the drive is
  now operating properly.  If the problem persists, after  having
  reformatted, the HDA/Pack may need replacement.
 WRITE_ALL
  Failed WRITE of all copies of RCT:

       This is serious.   This  error  occurs  when  the  program
  detects  a failure during a WRITE of all copies of a particular
  RCT block.  Replacements can not be done unless enough good RCT
  blocks exist to make-up a good RCT copy.  This program will not
  do replacements, under this  condition,  and  neither  can  any
  operating  system that has the capability of doing replacements
  (Dynamic BBR).  The error is reported in the following manner:

                   Failed WRITE of all copies of RCT;
                     Block no: ddd
                     Endmsg status: xxx

       The Block no is the block (LBN) that was being referenced.
  This  same  block  exists  in  each  of  the RCT copies.  
  Rreferences to this block in each of the copies resulted  in  an
  error.   The Endmsg status should provide an indication of what
  the error condition was and can be interpreted from the list of
  Status/Event codes by typing: "HELP EVRLK ERRORS STATUS_EVENT".
  The  Unit  (drive)  is  dropped  from  further testing.

       The appropriate recovery for this kind of error, should be
  to  first  make  sure  that  the drive does not have a hardware
  problem causing data errors that are not related to the  media.
  If  your  certain that the basic drive is functioning properly,
  the recovery would be  to  try  reformatting  the  media.   The
  Format  Control  Table (FCT) must be able to be used during the
  format (Read RA81-TT-19).  Once this has been accomplished, you
  can  re-run  this program, to assure yourself that the drive is
  now operating properly.  If the problem persists, after  having
  reformatted, the HDA/Pack may need replacement.
 NO_NULL
  No Null descriptor entry found in RCT:

       This is serious.  In the case of the RA81, The Replacement
  Control  Table  (RCT)  contains  descriptors  for  17,000  plus
  replacement blocks.  If they are all used (No Null  Descriptor)
  then something BIG is wrong.  The descriptors could in fact all
  be used or possibly the RCT  has  been  written  with  Garbage,
  which  makes  it  appear  that all replacement blocks have been
  used.  Never-the-less, if this error occurs,  you  should  read
  RA81-TT-12.   The  recovery  will be to reformat the media.  If
  the drive is operating normally, this should fix  the  RCT,  as
  long   as   the   Format  Control  Table  (FCT)  is  good  (See
  RA81-TT-19).   You  can  re-run  this  program   after   having
  reformatted,  just  to  be sure the problem has been corrected.
  The drive will be dropped form further testing.  The  error  is
  reported in the following manner:

                   No Null descriptor entry found in RCT;
                     RCT is probably corrupt.
                     Reformat media.
 CRASH
  Crash Occurred during previous replacement Phase (1 or 2)

       The following is not necessarily an ERROR.   It  may  have
  been  the  result of an error but when detected is not an error
  condition.  The condition described  is  the  detection  of  an
  incomplete  replacement  (i.e.   a  replacement  that  for some
  reason was not finished).  This condition is detected  by  this
  program when scanning the RCT.  Action is taken to complete the
  replacement and the following output gives the results of  this
  operation.

      Crash occurred during previous replacement Phase 1.
       
      Attempting Recovery...

      LBN:   679864.   Status:  RBN:    13331.   Operation: SEC

       This example indicates that a crash occurred, sometime  in
  the past, during a phase 1 replacement.  Replacement at phase 1
  will assign a replacement RBN, and attempt recovery.   If  this
  program  shows  a  crash recovery, it would be wise to allow at
  least one pass of this program,  in  Automatic  Mode,  to  take
  place.   No  "Status"  will be listed in this situation, as the
  recovery process does not know  the  original  reason  (status)
  replacement was requested.

 SA_CODES
  Controller error codes (reported by SA register):

       The SA (Status Address) register can display an error code
  during  the initialization of the controller as well as when it
  is "on-line" to the  host.   However,  the  format  of  the  SA
  register is different for reporting initialization error codes,
  as opposed to when it is on-line.


    SA REGISTER (CONTROLLER) ERROR CODES DURING INITIALIZATION
    ----------------------------------------------------------
    15      11 10                 0   
    +-+-+-+-+-+-------------------+     SA REGISTER     
    |E|S|S|S|S|   INITIALIZATION  | Formatted as would be 
    |R|4|3|2|1|     ERROR CODE    | represented when an error 
    +-+-+-+-+-+-------------------+ occurs during initialization.
     ^|-------|
     |     \____ STEP BITS.  Initialization is a four step 
     |         process. These bits describe which initialization 
     |         step the error occurred in.
     |
     |__________ ERROR BIT.  Qualifies the other bits of the 
                       register as containing valid information 
                       relating to an error that occurred during 
                       initialization.


                         UDA50 initialization
                        SA register error codes
                      ---------------------------              
                                                        Possible
    Octal   Hex Error Description                       FRU
    ------  ----  ----------------------------------    -------
    104000  8800  Fatal Sequence Error                    7485
    104040  8820  D processor ALU                         7485
    104041  8821  D processor control rom parity error    7485
    105102  8A42  D proc with no board #2 or RAM Parity 
                  error                                   7486
    105105  8A45  D proc RAM Buffer Error                 7486
    105152  8A6A  D proc SDI Error                        7486
    105153  8A6B  D proc Write Mode Wrap SERDES Error     7486
    105154  8A6C  D proc Read SERDES,RSGEN & ECC Error    7486
    106040  8C20  U proc ALU Error                        7485
    106041  8C21  U proc Control Register Error           7485
    106042  8C22  U proc DFAIL/Control Rom Parity/BD #1   7485
    106047  8C27  U proc Constant PROM error with D proc 
                  running SDI test                        7485
    106055  8C2D  Unexpected trap found,Abort Diagnostic  7485
    106071  8C39  U proc constant PROM Error              7485
    106072  8C3A  U proc Control ROM Parity Error         7485
    106200  8C80  Step 1 Data Error (MSB not set)         7485
    107103  8E43  U proc RAM Parity Error                 7486
    107107  8E47  U proc RAM Buffer Error                 7486
    107115  8E4D  Test Count was wrong (BD#2)             7486
    112300  94C0  Step 2 Error                            7485
    122240  14A0  NPR Error                               7485
    122300  A4C0  Step 3 Error                            7485
    142300  C4C0  Step 4 Error                            7485


    Controller error codes while on-line to host (SA register)
    ----------------------------------------------------------

       This section provides a list  of  the  various  controller
  error  codes  that  can  reside  in  the  SA register, when the
  controller detects a hard  error  during  on-line  use  of  the
  controller.   Do  not  get  these confused with the SA register
  initialization error codes ABOVE.  They are different.

  15         10              0                
  +-+-+-+-+-+----------------+             SA REGISTER     
  |E|       |    ONLINE      | Formatted as would be represented 
  |R|       |   Error Code   | when the UDA is "ONLINE" to the 
  +-+-+-+-+-+----------------+ Host.                    
   ^
   |___ ERROR BIT (Error Code will be displayed in bits 0 - 10) 
          IF bit 15 is NOT set, the contents of bits 0 - 10 are 
          undefined.
                                                                

                      UDA/KDA ONLINE
                  SA REGISTER ERROR CODES
                  -------------------------
                                                        POSSIBLE
    Octal    Hex    Error Description                     FRU
    ------   ---    ---------------------------          -----
    100001   8001   UNIBUS Packet Read Error              7485*
    100002   8002   UNIBUS Packet Write Error             7485*
    100003   8003   UDA ROM & RAM Parity Error       7485/7486
    100004   8004   UDA RAM Parity Error                  7486
    100005   8005   UDA ROM Parity error                  7485
    100006   8006   UNIBUS Ring Read Error                7485*
    100007   8007   UNIBUS Ring Write Error               7485*
    100010   8008   UNIBUS Interrupt Master Failure       7485
    100011   8009   Host Access Timeout Error             7485*
    100012   800A   Host Exceeded Command Limit           7485*
    100013   800B   UNIBUS Bus Master Failure             7486
    100014   800C   DM XFC Fatal Error                    7486
    100015   800D   Hardware Timeout of Instruction Loop  7485*
    100016   800E   Invalid Virtual Ckt Identifier        7485*
    100017   800F   Interrupt Write Error on UNIBUS       7485*

        * - denotes possible host CPU error.......
 STATUS_EVENT
  Status/event codes:

       The following list shows the  various  Status/Event  codes
  used  in  many  of  the  messages  displayed  by  this program.
  Certain deviations may exist for specific controllers.  This is
  the   "generic"   list.    Remember,   this   program   reports
  Status/Event codes in HEX.

                                DESCRIPTION
    Octal Hex    -----------------------------------------------
      0    0     Successful completion
      1    1     Invalid command (high byte = byte offset of bad 
                 command field)
      2    2     Command aborted
      3    3     Drive off line (unknown unit or Online to 
                 another controller)
      4    4     Drive available
      5    5     Media Format Error
      6    6     Unit Write Protected
      7    7     Compare error (on compare command or compare 
                 modifier)
     10    8     Data Error (Or may have been written with 
                 "Forced Error Flag")
     11    9     Host Buffer Access Error
     12    A     Controller Error (Command Time Out/Retry 
                 Exceeded)
     13    B     Drive Error 
     40   20     Spin down ignored (multi-unit drives only)
     43   23     No volume mounted or drive run/stop switch out
     45   25     Format Control Table unreadable - EDC Error
     51   29     Odd transfer start address in MSCP packet
     52   2A     SERDES  overrun error (controller probably 
                 broken)
   * 53   2B     SDI level two response timeout (Or maybe seek 
                 incomplete) (53/2B also shows up with a drive 
                 error as a result NOT CAUSE) 
    100   40     Still Connected (multi-unit drives only)
    103   43     Drive inoperative (UDA cannot communicate with 
                 drive)

    105   45     Format Control Table unreadable - Invalid 
                 Sector Header
    110   48     Header compare error (header not found and no 
                 revector)
    111   49     Odd byte count in MSCP packet
    112   4A     EDC error (SERDES broken or EDC written bad -- 
                 Controller)
   *113   4B     Invalid SDI level two response (unsuccessful or 
                 trash)
    145   65     Format Control Table unreadable - Data Sync 
                 Timeout
    150   68     Data Sync timeout (Data Sync field in sector)
    151   69     UNIBUS nonexistent memory error
    152   6A     Inconsistent controller state
    153   6B     Positioner Error (headers consistent but not on 
                 cyl)
    200   80     Duplicate Unit Numbers
    203   83     Drive offline and duplicate unit numbers
    211   89     UNIBUS parity error (UNIBUS read)(Host Memory 
                 Parity?)
   *213   8B     Lost read/write ready during or between 
                 transfers ('213 also shows up with drive 
                 errors as a result NOT CAUSE)
    245   A5     Drive not 512 Byte format (16 bit format)
    253   AB     Lost Drive clock during data/SDI transfer
    305   C5     Drive not formatted or Format Control Table 
                 Corrupted
   *313   CB     Lost drive receiver ready during transfer (see 
                 213 comment) 
    345   E5     ECC error on FCT read (media format - FCT 
                 unreadable)
    350   E8     Uncorrectable ECC error (This uncorrectable 
                 data may be re-written to an RBN with the 
                 Forced Error Flag)
    353   EB     Drive detected error (DRIVE HAD ERROR! -- Find 
                 drive error code)
    400   100    Already Online  (in response to ONLINE)   
    403   103    Drive offline.  By field service or internal 
                 Diagnostics
    410   108    One symbol ECC error 
    413   10B    RTDS pulse/parity error (IS UDA M7486 UP TO 
                 REV ?)
    450   128    Two symbol ECC error           
    510   148    Three symbol ECC error 
    550   168    Four symbol ECC error  
    610   188    Five symbol ECC error  
    650   1A8    Six symbol ECC error   
    710   1C8    Seven symbol ECC error 
    750   1E8    Eight symbol ECC error 
    10006 1006   Drive software write protected  
    20006 2006   Drive hardware write protected 
        
    KEY: EDC = Error Detection Code (Written in each sector)
         SDI = Standard Disk Interface (Drive Bus)
         SERDES = SERializer/DESerializer (In Controller)
         FCT = Format Control Table (Written on media surface)
         ECC = Error Correction Code (Written in each sector)   

                               NOTE
          Codes 53,113,213 and 313 occur  most  often  as
          the  RESULT  of  a  "Drive  Error" (S/E Code EB
          HEX).  You should look for  a  problem  in  the
          drive  before believing that these codes really
          represent the description of your problem.
 TRANSIENT
  Transient error table overflow:

       The Transient Error Table, that is kept in memory as  long
  as  this  program  is running, has the ability to keep track of
  512 Transient errors for each drive being tested.   This  table
  should  be  of  sufficient  size for all cases where bad blocks
  that are only marginally bad need to be tracked.  If this table
  overflows,  a  message is typed and the testing terminates.  IF
  THIS HAPPENS, YOU HAVE MORE TROUBLE THAN WHAT THIS PROGRAM  CAN
  HELP WITH.  This message is reported in the following manner:

    Transient error table overflow;
      ADDITIONAL MAINTENANCE REQUIRED.

       If you log more than 512 transients, then the  table  will
  in fact overflow and it may be more than bad blocks causing the
  errors you see.  Most likely  you  have  a  data  path  problem
  giving  false  indications  of  ECC  errors or something like a
  worn-out spindle ground brush.

       The recovery from this condition would  be  to  first  fix
  whatever is causing the high transient error rate then reformat
  the media.  Most likely their are many replacements  that  were
  made  on  blocks  that were in fact not bad.  Reformatting will
  put the media back to a known state (Please  read  RA81-TT-12).
  You  may  want  to  run  several  passes of this program, after
  having reformatted, just to be sure that  the  formatter  found
  all the bad spots.

      For additional information type: 
            "HELP EVRLK BACKGROUND OPERATIONAL_CONCEPT"
 UNIT
  Unit initialization errors:

       The following errors are reported  when  an  error  occurs
  during  the  initialization  of  a  unit  (Drive).   These  are
  basically "HARD" serious errors and usually indicate a  problem
  either  in  a  drive or communicating with a drive.  Errors are
  reported via the "End Message" Status field (Endmsg status:) If
  the  problem  is  severe,  the  specified unit (drive) will not
  continue being tested.

                               NOTE
          The meanings of the Endmsg status can be  found
          by  looking  it  up in the list of Status/Event
          codes provided by  typing: "HELP EVRLK ERRORS
          STATUS_EVENT"

 GET_UNIT_STATUS
  Failed GET UNIT STATUS:

       "Get Unit Status" is an MSCP command that is used  by  the
  host  to  get certain information about a drive connected to an
  RA drive controller.  This error occurs when an attempt to "GET
  UNIT  STATUS"  fails  during Unit initialization.  This type of
  error will be reported in the following manner:

                  FAILED GET UNIT STATUS;
                    Drive ddd is not accessible.
                    Endmsg status:  xxx

       Drive ddd is the decimal drive logical unit  address  that
  the  "Get Unit Status" was issued for.  Possibly the drive port
  switches are not pushed in or the incorrect drive  address  was
  given  to  the  program.   Errors  are  reported  via  the "End
  Message" Status field (Endmsg  status:)  The  meanings  of  the
  Endmsg  status  can  be  found  by looking it up in the list of
  Status/Event codes  provided  by  typing: "HELP EVRLK ERRORS
  STATUS_EVENT"
 ONLINE
  Failed ONLINE:

       "ONLINE" is an MSCP command that is used by  the  host  to
  bring  a  unit (drive) Online to the controller.  It also makes
  certain drive specific information available to  the  host  and
  the  drive  to  become  usable.  This error occurs when command
  completion status indicates a problem  executing  this  command
  successfully.   This  type  of  error  will  be reported in the
  following manner:

                   Failed ONLINE;
                     Endmsg status:  xxx

       Errors are reported via the  "End  Message"  Status  field
  (Endmsg status:) The meanings of the Endmsg status can be found
  by looking it up in the list of Status/Event codes provided  by
  typing:  "HELP EVRLK ERRORS STATUS_EVENT"
 HELP
  VDS Bad Block Replacement Utility:

       This program is a highly specialized utility  for  use  in
  those  cases  where  an RA drive has been diagnosed to have bad
  blocks on the media.  These  bad  blocks  may  be  causing  ECC
  errors  that  can  be  isolated  to  a  specific  Logical Block
  Number(s) (LBN).  This program will replace those  blocks  that
  it  either  defines as bad (AUTOMATIC Replacement Mode) or will
  take an LBN address that the user  supplies  and  replace  only
  that  block  (MANUAL  Mode).   Their is also a VERIFY mode that
  allows the user to quickly determine if any bad blocks exist.

  In order to properly execute this program in AUTOMATIC or MANUAL
  mode, the following steps must be followed without exception.
  Unpredictable results could result if not followed.

  1. Customer must backup the data from the drive(s) to be used 
     and verify its correctness.

  2. Execute this program on the drive(s).  When performing AUTO-
     MATIC replacment, multiple passes on each drive, is 
     recommended (DS>Start/pass:nnn).

  3. Customer must restore the backed up data to those drives
     from which it was backed up and again verify its correctness.

 INIT_INFO
  Initialization information:

       Each Unit selected for test is  brought  Online,  and  the
  following  information  is  displayed  for  that  unit (Drive).
  Remember a unit is not necessarily the logical unit address  of
  a drive.

              INITIALIZATION INFORMATION FOR _DUan
                Controller:  UDA50
                Address:     172150 
                Drive no:    d      
                Volume SN:   dddd.
                Volume size: dddddd.


   o  The CONTROLLER will be a UDA50.

   o  DRIVE NO:  Is the Drive being initialized logical address

   o  VOLUME SN:  This is the Serial Number of the HDA  or  Pack.
      This information comes from the Format Control Table (FCT),
      which is read during the execution  of  the  MSCP  "ONLINE"
      command.

   o  VOLUME SIZE:  This is the  number  of  user  LBN's  on  the
      media.   For example, the RA81 formatted for a VAX (16 bit)
      has 891072 (decimal) user LBN's.  Actually,  starting  with
      LBN zero, the last user LBN would be 891071.
 LBN_STATUS_CODES
  LBN status/ Replacement operations:

       This program was designed to not only replace  bad  blocks
  but  also  to  provide  understandable  information  about  the
  "integrity"  of  the  media  and  the  replacements  that   are
  accomplished.   As  such,  many different types of messages are
  provided.  Errors  are  listed  separately  under  "HELP  EVRLK
  ERRORS".   Anything  that  is not an error will be addressed in
  this section.

       The  most  common  types  of  messages   relate   to   the
  replacement  of a block or what the program finds when a "read"
  of the block occurs.  The following decode chart  is  displayed
  on  the  user  terminal, when this program is first run to give
  the meanings of what is found or what  action  is  taken  on  a
  block.

            LBN Status codes:                     
            BBR - Bad Block reported.
            TRA - Transient Error table entry.
            ECH - Hard ECC Error encountered.
            HCE - Header Compare Error encountered.
            DST - Data Sync Timeout encountered.
            FER - Forced Error encountered. 
            WFE - Block written with FE.

            RBN Operation types:
            PRIMARY - Primary replacement for LBN.
            SECONDARY - Secondary replacement for LBN.
            UNUSABLE - RBN marked unusable.

 PRIMARY
  Primary replacement:

       The LBN was tested, and the Primary  replacement  RBN  was
  used to store the LBN's data.

       LBN: 23271. Status: BBR    RBN:   456. Operation: PRIMARY
 SECONDARY
  Secondary replacement:

       The LBN was tested, and the BBR flag was  set  repeatedly.
  The  LBN's Primary replacement RBN is currently used by another
  LBN.  A Secondary replacement RBN was used.

       LBN: 23271. Status: BBR    RBN:   455. Operation: SECONDARY
 UNUSABLE
  Replacement block marked unusable:

       Upon reading data from an RBN, an  error  resulted.   This
  indicated  that  the RBN is bad and must be marked as unusable.
  Also, the customer data residing in the RBN must be moved to  a
  new  RBN.   A  Secondary  replacement  always results from this
  condition.

       LBN: 23271. Status: BBR    RBN:   456. Operation: UNUSABLE
       LBN: 23271. Status: BBR    RBN:   455. Operation: SECONDARY
 ECH/WFE
  Uncorrectable ECC error/Write with Forced Error:

       An LBN is read repeatedly with an uncorrectable ECC  error
  (ECH).   The  indicated replacement operation is performed.  If
  enabled, the LBN data is written with  the  Forced  Error  (FE)
  flag set, and the following information appears.

       LBN: 23271. Status: WFE    RBN:   456. Operation: SECONDARY

       If the question about  writing  uncorrectable  ECC  errors
  WITHOUT  the  Forced  Error  Flag  is  answered  with a NO, the
  corrupted data is written to the RBN as is, and  the  following
  information appears:

       LBN: 23271. Status: ECH    RBN:   456. Operation: SECONDARY

 FER
  Forced Error encountered:

       When an uncorrectable ECC error  is  encountered,  several
  attempts are made to read the data correctly (Using every means
  available).  If these attempts fail, the block  generating  the
  Uncorrectable  ECC  error  is  assumed to be bad and in need of
  replacement.  If you have a system that does "Dynamic Bad Block  
  Replacement " (Like  VMS,  RSTS,  RSX,  IAS) , or  your running
  this  program,  the   replacement   process   will   take   the
  uncorrectable  (Corrupt) data and move it to a good replacement
  block (RBN).  Now, we have a condition where  we  have  corrupt
  data  in  a  good  block  and if left this way the corrupt data
  would be read with no "indication" that the data is  "corrupt".
  Therefore,  in  order  to  tell a user that the data was at one
  time  uncorrectable and  exists now as  "corrupted"  data  (In
  a  good  RBN), the "Forced Error Flag" is attached to the block
  of "corrupted" data.  When this block is read by a  user,  this
  flag  (Inverted  EDC character) is also read and is intended to
  inform the user that the requested data is "not reliable".  THE
  FORCED  ERROR FLAG IS NOT (REPEAT NOT) AN ERROR !!.  WHEN READ,
  IT WILL NOT MAKE AN ENTRY IN  THE  SYSTEM  ERROR  LOG.   IT  IS
  REPORTED  AS  A  "STATUS" CODE 10 (OCTAL) IN A TRANSFER REQUEST
  "END PACKET" (This program calls the end  packet  status  field
  the  "Endmsg  status",  in  several  message  types that can be
  displayed)

       LBN: 23271. Status: FER

 TRA
  Transient error table entry:

       During the "scan" for bad blocks, the BBR flag was set for
  a  particular  LBN.  Once this condition was noted, the program
  tries to verify the report of a bad block.  If  the  bad  block
  report can not be verified, the operator is notified of the LBN
  and NO replacement is attempted.  If you do not feel good about
  an ECC error being reported and then the program not being able
  to verify the error, you can use Manual  Mode  to  replace  the
  reported  LBN.   Notice  in the example below that no status is
  shown.  This indicates that a transient ECC error  occurred  on
  the indicated LBN.

       LBN: 23271. Status: TRA

                               NOTE
          Many times the occurrence of an ECC error  that
          is  not  repeatable  can  indicate  a Data Path
          problem  or  other  conditions  that  are   not
          related to bad spots.  If this condition occurs
          many times during a pass  of  this  program,  I
          would  start to look for a data path problem or
          some other problem causing transient errors.
 HCE
  HEADER COMPARE ERROR

       An LBN is read and a Header  Compare  Error  is  detected.
  LBN's  can be replaced for header errors as well as ECC errors.
  If this occurs, the LBN will be replaced in a manner similar to
  the following example:

   LBN: 23271. Status: HCE   RBN:   456. Operation: SECONDARY
 DST
  DATA SYNC TIMEOUT

       An LBN is read and a Data Sync Timeout is detected.  LBN's
  can be replaced for this error as well as ECC errors.  The Data
  Sync is a field that is between the header and data field.   It
  is used to "sync-up" the drive logic just before the data field
  is read.  If this occurs, the LBN will be replaced in a  manner
  similar to the following example:

   LBN: 23271. Status: DST   RBN:   456. Operation: SECONDARY
 BBR
  BAD BLOCK REPORTED

       Status Code BBR is used to indicate that  when  the  block
  was  read  an  ECC  error  occurred.   This  ECC  error  was NOT
  uncorrectable but the block needed replacement anyway. A request
  for  Bad  Block  Replacement  is  made  and  the  block  is
  replaced with either a primary or secondary replacement  block.
  When  this  occurs,  a  message  similar  to  the  following is
  displayed on the console terminal.

   LBN: 23271. Status: BBR   RBN:   456. Operation: SECONDARY
 ELAPSED_RUNTIME
  Elapsed runtime:

       In  Automatic  mode,  the  following  runtime  message  is
  periodically   displayed  to  indicate  that  execution  is  in
  progress:

             LBN dddddd.   Elapsed runtime is hh:mm:ss

       The LBN listed indicates where on the media the program is
  currently testing; the elapsed runtime is zeroed when each Unit
  is selected for test.
 SCANNING_RCT         
  Scanning RCT...  (CAN A GOOD COPY BE ASSEMBLED)

       Just after typing the  Initialization  information  for  a
  drive  that  is  going  to  be  tested, the statement
  "Scanning the RCT..." is typed.  This portion of  the  program,
  attempts  to  find enough good RCT blocks to account for a good
  complete  copy.   Any  error  that  it  finds  doing  this   is
  displayed.   These  messages  (Shown  Below)  are  only  status
  messages  and  DO  NOT  necessarily  mean  that  the  drive  is
  defective.  The status is shown in the following manner:

     RCT copy 1, block 184 (  891256.)  Status/event:  000110
     RCT copy 1, block 185 (  891257.)  Status/event:  000153
     RCT copy 2, block 184 (  892021.)  Status/event:  000350

       These messages do not become critical until the same block
  number  shows  an  error  for  each copy.  In the above example
  Block 184 is bad in two copies.  This is not fatal, since their
  are  more  than  2  copies in our example.  You will get a hard
  error  report,  if  all  copies  of  a  block  are  bad.    The
  Status/Event  codes  can be interpreted by typing:  "HELP EVRLK
  ERRORS STATUS_EVENT").
 ZERO_RCT_SIZE
  ZERO RCT SIZE -- CAN'T DO REPLACEMENTS ON THIS PRODUCT

       This message is telling you that your running this program
  on  a  drive  that does not support Bad Block replacement.  The
  zero RCT size means that the drive has  no  "real"  Replacement
  Control Table and replacements are not possible.  If think that
  you have bad blocks on a drive like this,  you  should  consult
  the  appropriate  service  manual  for action you need to take.
  The message is reported in the following manner:

                  Zero RCT size detected:
                    BBR not supported by this drive.
                    Endmsg status:  xxx

       The Endmsg status will  most  likely  not  tell  you  much
  (Unless  something  really  out  of the ordinary is happening).
  The  Endmsg  status  can  be  interpreted  from  the  list   of
  Status/Event codes by typing: "HELP EVRLK ERRORS STATUS_EVENT".
  The Unit (drive) is dropped from testing.
 MODES_OF_OPERATION 
  MODES OF OPERATION (VERIFY, AUTOMATIC, MANUAL)

            This program will run in one of three modes
            -------------------------------------------

  VERIFY MODE
      This allows the  user  to  "scan"  the  media  without  the
      program  taking  any action on what it finds.  In this way,
      you can get a "picture" of any problem found,  without  the
      program  writing  in  any way on the media.  For additional
      information, and  instructions  on  invoking  VERIFY  mode,
      type:  
               "HELP EVRLK MODES_OF_OPERATION VERIFY"

  AUTOMATIC MODE
      This is the mode used when you do not  know  what  the  bad
      block  LBN(s)  are,  or  do  not  have  the  bad  block LBN
      address(es) available in decimal.  Or,  possibly  you  just
      want  the program to run unattended, and take action on any
      blocks it finds bad.  Normally you should use  Manual  Mode
      and  enter  the  bad  blocks, given that you know (from the
      error logger or other sources of information)  what  blocks
      are   bad.    Large  pass  counts  in  Automatic  Mode  are
      recommended  (Run  EVRLK/Pass:"number  of  passes").    For
      additional   information,   and  instructions  on  invoking
      AUTOMATIC  mode,  type:  
                "HELP EVRLK MODES_OF_OPERATION AUTOMATIC"

  MANUAL MODE
      This is the preferred method for using the program and  the
      most efficient.  If you can obtain a decimal address of the
      bad block(s) you want replaced, this  mode  will  take  the
      information  (LBN  address)  and without hesitation replace
      it.  Operating System error loggers, should provide the LBN
      address  of  those  blocks  that  generate  ECC  and Header
      errors.  For additional information,  and  instructions  on
      invoking MANUAL mode, type:  
                "HELP EVRLK MODES_OF_OPERATION MANUAL"
 AUTOMATIC
  AUTOMATIC Replacement mode:

       This is the mode used when you do not know  what  the  bad
  block  LBN(s) are, or do not have the bad block LBN address(es)
  available in decimal.  Or, possibly you just want  the  program
  to  run unattended, and take action on any blocks it finds bad.
  Normally you should use Manual Mode and enter the  bad  blocks,
  given  that you know (from the error logger or other sources of
  information)  what  blocks  are  bad.   Large  pass  counts  in
  Automatic  Mode  are  recommended  (Run EVRLK/Pass:  "number of
  passes").

  Execution in this mode is entered by:

       1.  Insure the Customer has  backed-up  and  verified  the
           data from the subject drive.

       2.  Specify AUTOMATIC replacements.

       3.  Answer "Enable Replacements" with a "Yes".

       4.  After the completion of the program, the customer  can
           restore and verify the backed-up data.

    +---------------------------------------------------------+
    | If the system crashes (CPU failure, power fail etc)     |
    | DURING THE EXECUTION OF THIS PROGRAM (Either Manual or  | 
    | Automatic Mode) IT IS REQUIRED that you run at least one| 
    | pass of this program after recovering from the crash    |
    | condition.  Abrupt termination of this program could    |
    | leave an incomplete replacement. One quick pass of this | 
    | program would allow for the completion of any incomplete| 
    | replacements.  Using the control "C" to terminate this  |
    | program is not considered an abrupt termination and     |
    | should not leave any incomplete replacements, although  | 
    | use of control "C" to terminate this program is not     |
    | recommended.                                            |
    +---------------------------------------------------------+

                               CAUTION
      It is imperative that every effort be used to establish 
      the fact that errors showing up in the error log are 
      indeed related to a bad block, before attempting to use 
      this program.  If ECC errors showing up in an error log 
      are the result of a hardware problem, this program (When 
      used in Automatic Mode) could replace a considerable 
      number of blocks that are indeed good.  If this happens, 
      and later you fix the problem causing the ECC errors, you 
      should reformat the HDA/Pack.  A description of this 
      situation is given in RA81-TT-12.  The information in 
      this Tech Tip applies to all RA series drives and would 
      be helpful in this situation.
 RUN_TIMES
  RUN TIMES IN AUTOMATIC MODE
                Sample Run times in Automatic Mode
                ----------------------------------
                      RA82 -- About 17 Minutes
                      RA81 -- About 9 Minutes
                      RA60 -- About 4.5 Minutes
                      RA80 -- About 3 Minutes


                               NOTE
          This time is extended, when replacements occur.
          If you have many bad blocks,  the program will 
          take considerably longer.
 MANUAL
  MANUAL Replacement mode:

       This is the preferred method for using the program and the
  most efficient.  If you can obtain a decimal address of the bad
  block(s) you want replaced, this mode will take the information
  (LBN  address)  and  without  hesitation replace it.  Operating
  System error loggers, should provide the LBN address  of  those
  blocks  that generate ECC and Header errors.  Sometimes you may
  have to "convert" the LBN address from Hexadecimal or Octal, as
  provided  in the error log report, to decimal so it can be used
  by this program.  Providing the wrong decimal address of a  bad
  block  is  not  good.   The  program  will  replace any LBN you
  specify and if you specify the wrong one,  you  will  have  one
  additional replaced block on the media.

       If your operating system has Dynamic Bad Block Replacement
  (BBR) -- Like VMS -- you will want to be sure that the block in
  question has not already been replaced, before you  replace  it
  in  MANUAL  mode.   To  do this run this program in VERIFY mode
  (Type: "HELP EVRLK MODES_OF_OPERATION VERIFY") and  answer  yes
  to  the  question  "Display RCT Replacement Descriptors".  This
  will dump out the current set of replaced blocks  and  you  can
  look for the one in question.

       If you have no idea what the  LBN  addresses  of  the  bad
  block(s)  are, you can possibly run this program in Verify Mode
  and if Verify finds any bad blocks, it will report them to  you
  in  decimal.   Taking  these addresses and using them in Manual
  Mode, will result in the quickest method  for  replacing  known
  bad  blocks.   However, the customers operating system has much
  more time to find bad blocks than does Verify Mode.  You should  
  use the error log for bad block determination whenever possible
  

       Execution in this mode is entered by:

       1.  Insure the Customer has  backed-up  and  verified  the
           data from the subject drive.

       2.  Specify MANUAL replacements.

       3.  Answer "Enable Replacements" with a "Yes".

       4.  Provide the LBN address of the bad block  in  decimal,
           when  asked  for.   A Carriage Return at this question
           will exit MANUAL mode.

       5.  After the completion of the program, the customer  can
           restore the backed-up data and verify it.
 VERIFY
  VERIFY MODE

       This allows the user  to  "scan"  the  media  without  the
  program  taking  any action on what it finds.  In this way, you
  can get a "picture" of any problem found, without  the  program
  writing  in  any way on the media.  This is extremely valuable,
  since the drive can be write protected  and  thus  the  lengthy
  Backup and Restore operations recommended need not be done.  By
  getting a "look" at the condition of the media,  you  can  then
  decide whether to commit to either Automatic or Manual mode and
  do the backup and restore of the customer data.  This mode  can
  also  be  used to get the replacement descriptors displayed (By
  answering the question "Display Replacement Descriptors" with a
  yes)  without  the Backup and Restore operations.  This way you
  can get a "look" at the total number of replacements quickly.

                               NOTE
          The best way to  identify  bad  blocks  is  the
          Operating  System  Error  Logger.   Verify only
          runs as long as allowed to, whereas, the  error
          logger will log possible bad blocks as they
          occur.  It  is  possible  for  a  block  to  be
          "pattern   sensitive"   and  the  current
          pattern in a block will not exhibit  as  a  bad
          block while VERIFY is running.

  Execution in this mode is entered by:

  1.  Write protect the subject drive ---  after  the  drive  has
      become ready.  CAUTION:  spinning up an RA80 (only) with it
      write protected will result in a fault, because of a  write
      fail during the spin-up write tests.

  2.  Start the program and specify AUTOMATIC mode.

  3.  Answer the question "Enable replacements" with a "N0".

       Verify Mode can also be entered by specifying Manual  Mode
  with replacements disabled.  In this way, you can "look" at the
  "status" of a particular LBN.

 PROMPTS 
  Operator Prompts:

       The program  will  require  the  user  to  answer  several
  questions,  depending  upon  how  the  program is run.  Several
  questions are also only relative to whether MANUAL or AUTOMATIC
  mode  is  selected  (which  is  the first question asked).  You
  should be aware of how  answering  the  questions  affects  the
  operation  of  the program.  For information about each kind of
  question, see the list below.

 BACKUP
  Have you backed-up customer data on all drives? [(No), Yes] Yes<cr>

       Before execution of the utility, each drive must be backed
  up. The operator must answer this question with a "Y" to continue. 
  Default is "No".

 AUTO_MODE
  Automatic or manual replacement? [(AUTOMATIC), MANUAL] <cr>

       The default response selects Automatic  replacement mode; a
  "MANUAL" selects Manual replacement mode.  Default is "AUTOMATIC".

       If AUTOMATIC REPLACEMENT MODE is selected, the entire user
  LBN  area of each Unit will be processed, and this program will
  replace all blocks found to be bad.  The selected  drives  will
  be run "one at a time" in a serial fashion.

       If MANUAL REPLACEMENT MODE is selected, the operator  must
  specify the desired block(s) to be replaced.  In this mode, the
  LBN address sup- plied to the program must be in  decimal  (Not
  Octal,  Hexadecimal  etc).  Only a single Unit can be processed
  during each program execution, in Manual Mode.

       For information on VERIFY MODE and additional  information
  on this subject type:  "HELP EVRLK MODES_OF_OPERATION"

 ENTER_LBN
  Enter LBN to be replaced (decimal) or <cr> to exit [(-1), 0, nnnnnn.]

       This is asking for you to specify which LBN  you  wish  to
  have  replaced.   Be  sure  you are prepared to provide the LBN
  address in Decimal.  Different Operating Systems error  loggers
  display  the  LBN  address in various radix's (Like Hex, Octal,
  Decimal etc) Be sure you know what your dealing with  and  make
  the proper conversions, if necessary.

       The LBN entered is checked against the maximum  number  of
  user  LBN's  available  If the limit check fails, the following
  message appears:

       Out of range or overflow

       HI would be the maximum LBN address that can  be  replaced
  on  that  drive.   If  the  LBN  passes  the  limit  check, the
  replacement is made and the operation is displayed.

       Once  you  have  provided  the   LBN   address   for   the
  replacement,  and  the  replacement  has  occurred,  this  same
  question will  be  asked.   If  you  have  no  more  LBN's  for
  replacement, just hit Carriage Return to exit.

 DISPLAY_RCT
  Display RCT replacement descriptors? [(no), Yes] <cr>

       This  parameter  affects   program   execution   in   both
  replacement modes.  The Replacement Control Table (RCT) keeps a
  "log" of all LBN's that get replaced and the Replacement  Block
  Number  (RBN)  that  is  used.   In Automatic replacement mode,
  display selection will cause the  updated  Replacement  Control
  Table  (RCT) descriptors to be printed at the end of each pass,
  for each Unit processed.   In  Manual  replacement  mode,  this
  question  is asked twice:  When the Unit is brought Online, the
  operator may display the existing RCT replacement  descriptors.
  After  all  desired  LBN's have been replaced, the operator may
  display the updated RCT replacement descriptors.  Default is "No".

       If the display  of  the  RCT  replacement  descriptors  is
  selected,  the  following  description  of  the RCT contents is
  provided on the console terminal.   Remember  that  the  period
  used  after  a number indicates that the number is displayed in
  decimal.

       RCT Descriptor information for _DUan

         RBN:  455. is SECONDARY  replacement for LBN:  23271.
         RBN:  456. is PRIMARY    replacement for LBN:  23272.
         RBN:  457. is UNUSABLE


       After having described the "status"  of  all  replacements
  logged  in the Replacement Control Table (RCT), The following 
  summary is provided:

             RCT Descriptor summary for _DUan
               Primary   replacement blocks: ddd.
               Secondary replacement blocks: ddd.
               Unusable  replacement blocks: ddd.
               dddd. RBNs used out of ddddd. RBNs on media.

       Displaying this information will provide an indication  of
  the  TOTAL number of replacements that are "logged" in the RCT.
  These replacements may have been made by the  Operating  System
  (If the O/S has the capability of doing Bad Block Replacement),
  the formatter or possibly this program.  Although their  is  no
  "specification"  for  how  many  replacements are too many, you
  should be aware of certain conditions  that  would  indicate  a
  problem.   An  RA81,  for example, can accommodate 17,472 total
  replacements.  A good working HDA can have a thousand  or  more
  primary   replacements,   however,   the  number  of  secondary
  replacements should be small.  If you displayed something  more
  than this, it could indicate that the drive has/had a data path
  problem.  ECC errors being generated by a Data Path problem can
  cause a significant number of good blocks to be replaced before
  the data path problem can be repaired.  If  you  see  what  may
  appear  to  be this condition, you should read RA81-TT-12.  The
  Tech Tip applies to all RA drives, even though it is written as
  an RA81 Tech Tip.  If this is the condition you sense, the tech
  tip will recommend that you reformat the HDA.   Now  that  this
  program  exists,  it  would  be  wise  to not only reformat, to
  recover from this condition, but to run as many passes of  this
  program  as possible, after having reformatted.  Any bad blocks
  that the formatter may not have found, may  be  found  by  this
  program and replaced.

 ENABLE_REPLACE
  Enable replacements? [(Yes), No] <cr>

       This is a  unique  feature  of  this  program.   Disabling
  replacements  (Answering with a NO) makes this program run in a
  similar fashion to the HSC50 "Verify" program.  In other words,
  this program will "Scan" the media and report anything it finds
  but not do any replacements.  Using it  in  this  mode  could  
  possibly  help  you  determine  the "condition" of the media,
  without having any replacements occur.  Default is "Yes".

  NOTE - This program is not intended for use as a diagnostic and 
  should not be run on drives/controllers that may have hardware 
  problems.

       Using this program in this mode (Answering  with  a  "No")
  will  disable  all  writes and therefore you can run it without
  the normal requirement for a back-up and restore.  However,  we
  can  not  be  held  responsible  for  any  errors  or subsystem
  problems that may create situations that cause this program (or
  subsystem) to do unexpected things.

  What you can do is run with replacements disabled just  to  see
  if  its  worth your time to do the extensive Backup and Restore
  operations that are required  when  replacements  are  enabled.
  Remember,  you  can  not  run  this  program  with replacements
  enabled and the drives write protected.  One pass may not  find
  all  the  "marginal"  bad spots.  It is highly  recommend  that
  when running in the "disable replacement" mode (Answer=NO) that
  you allow the program to run several passes.

       This mode can also be used to get a quick  "look"  at  the
  number  and  types  of replaced blocks that are "logged" in the
  Replacement Control table (RCT).  By going into this mode,  and
  answering the question "Display Replacement Descriptors" with a
  yes, you can get these typed without the  Backup  and  Restore.
  When analyzing the descriptors that are typed, be sure you read
  and  understand  the  discussion  for  the  question   "Display
  Replacement Descriptors" (above).

 ENABLE_FE
  Enable write with Forced Error flag? [(Yes), No] <cr>

       When an uncorrectable ECC error  is  encountered,  several
  attempts are made to read the data correctly (Using every means
  available).  If these attempts fail, the block  generating  the
  Uncorrectable  ECC  error  is  assumed to be bad and in need of
  replacement.  Thus, if you have a system that does "Dynamic Bad
  Block  Replacement" (Like VMS, RSTS, RSX, IAS), or your running
  this  program,  the   replacement   process   will   take   the
  uncorrectable  (Corrupt) data and move it to a good replacement
  block (RBN).  Now, we have a condition where  we  have  corrupt
  data  in  a  good  block  and if left this way the corrupt data
  would be read with no "indication" that the data is  "corrupt".
  Therefore,  in  order  to  tell a user that the data was at one
  time uncorrectable and exists now as "corrupted" data       (In
  a  good  RBN), the "Forced Error Flag" is attached to the block
  of "corrupted" data.  When this block is read by a  user,  this
  flag  (Inverted  EDC character) is also read and is intended to
  inform the user that the requested data is "not reliable".  The
  default is "Yes".

               The FORCED ERROR flag is not an error.  When read,
  it will not make an entry in  the  system  error  log.   it  is
  reported  as  a  "status" code 10 (octal) in a transfer request
  "end packet" (This program calls the end  packet  status  field
  the  "Endmsg  status",  in  several  message  types that can be
  displayed)

  Some Operating Systems have trouble dealing with, or reporting,
  the "Forced Error Flag".  Also, some  others  do  not  have  an
  intelligent  way  of  reporting  the flag to the user (Makes it
  appear as a "hardware error".  Currently (March 1985) UNIX type
  systems  (ULTRIX, Berkeley UNIX, ATT UNIX etc) can "give up" (I
  assume that means crash) in certain situations when  a  "Forced
  Error  Flag"  is  read.   This being what we are told, we added
  this question to allow the user to disable  the  function  that
  writes  the  forced error flag to an RBN with the uncorrectable
  data.

       If you answer the question "Enable write with Forced Error
  Flag"  with  a  NO, you will put uncorrectable data into a good
  RBN (Just normal replacement).  The "corrupted" data will  read
  good,  with  NO  indication  of an error or the "corrupted data
  flag" (Forced  error  Flag).   Therefore,  the  possibility  of
  reading corrupted data with NO indication that it is corrupted.

  If you  follow  the  recommended  procedure  for  running  this
  program  and do the backup, and then restore the backed-up data
  after running this  program,  you  will  not  have  a  problem.
  Restoring the backed-up data, to a drive that had this question
  answered with a NO, should eliminate any problem condition  for
  these  systems.   For  those  systems  that  do not know how to
  handle the "Forced Error  Flag",  following  the  procedure  of
  answering  the  question  with  a no and then doing a backed-up
  data restore is a requirement WITHOUT EXCEPTION.