The general structure of the Slow Control (SC) system is shown in figure 1. All subsystems should have their own SC-systems. The subsystem is a system which is responsible for some sub-detector or any other part of HERA-B (magnet, target, calorimeter, tracker, etc.; see "Proposal for HeraB Message Handler Interface", p.5). In the text below a subsystem and local SC are equivalent.
The concept of the system is a uniform handling of all commands, events, errors etc for all nodes and subsystems. In subsystem there is a one or more independent processes. They may exchange each other and main RUN-code with messages (requests, responses, traps [events], data etc.). The message exchange in any node is controlled by Local Message Dispatcher (LMD), which sorts, translates (if necessary) and resend messages to other processes or nodes. Some messages are used to inform client of some event (trap), others - to request or send data. If some routine needs some information, it has to send a request to the LMD. All decisions are defined by the data got via messages. The messages are sent using UDP-datagrams (the port numbers are assigned dynamically, see also STD-02, RFC-1700).
On the third level of the figure 1 and below there may be not a real sub-sub-systems but sub-processes. Every subsystem has its own Configuration D-base (CDB) and Status D-base (SDB). The CDB defines subsystem configuration for all types of runs (normal data takink HERA-B-run, calibration run and all test runs - V1...VN), it defines "unchangeable" parameters of the subsystems (thresholds, HV-values, gas mixtures, magnetic field, target positioning, addresses,...). The CDB is used at the subsystem setting up and contains also a section of Name/Address server. The structure of these d-bases is defined in description of the Table Based Data Base (herab/DAQ/soft.html docu.dvi, Thomas Kihm, 1995). Some details will be fixed later.
The SDB contains a current measured subsystem parameters (resources, like available disk space and number of free buffers are in this parameter list) and, if necessary, some prehistory of their changes, LogBook- and Error Logger sections. The D-base updating is realized using messages under the control of Local Message Dispatcher and SDB-server. The changeability of these sections is different - the LogBook and Error Logger (?) store data “for ever”, but the Status data are updated periodically.
To change CDB data one must have an administrative priority. All changes must be put down into the local and central Log Book data base section. In reality CDB and SDB may be sections of one common D-base (Management Information Base - MIB, like in SNMP). The structure for the MAIN SC and for any subsystem (local SC) are similar (Fig.1). MAIN SC is a code in the main computer of Slow Control system. Any subsystem may include several subsystems (or processes) in turn (1.1 - 1,J, e.g.)
If the RUN-code, e.g. needs some particular information, it sends a request to the local Message Dispatcher (LMD, resident routine). It provides RUN-process a requested datum with response-message immediately or requests in turn the data from some subsystem Message Dispatcher (if the source of data is unknown, dispatcher may use multicast request). If the request was resolved successfully, the address of the data source may be put down in the address section of D-base.
The local CDB should have all necessary information to control subsystem in all regimes. The LMD can request any data from the central and any other subsystem LMD, but it can not change the data there. All CDB and SDB have primary and secondary versions (the latter must be installed on the other computer than primary). The switching from primary to the secondary D-base in case of failure must be invisible for application but be registered in electronic LogBook. This will increase the reliability of the whole system. The secondary D-base is a mirror of primary D-base and must be upgraded after any change of the primary one. The content of these d-base must be compared periodically. The interaction between hardware parameter control, message dispatcher, SC and D-bases is shown in fig. 2.
A writing data from Message dispatcher
to SDB via SDB-server is carried out with help of DATA-messages,
in which body besides particular data there are type of data and
The RUN-code starts by sending a <START-message> to all subsystems (Fig. 3). The RUN-process changes its state and waits for READY-messages from all local SC. The subsystems check their status (interlocks, status switches, power supplies, temperatures, etc.), and if something wrong, an ALARM-message is sent to the RUN-process. Every subsystem must send a READY(1)- or NOT_READY-message in any case (this shows that it is alive). All messages from subsystems to the RUN-code and vise-versa are transmitted via Message Dispatcher. If there is no READY(1)-message during predetermined time (time-out), the RUN-process generates a TIME-OUT message to the Error Logger. <START-message> will be repeated once more for corresponding subsystem, at the same time the timer is started. After several (how many?) attempts the subsystem is nominated <dead>. The ALARM will be displayed at the main shift terminal (rpm_dump) and STOP-RUN message sent to the TRIGGER-system. A displaying of such a messages is realized via sending a corresponding message to the process controlling the SC-terminal. If necessary some of the message types may be masked with a help of MASK_ALARM-message.
The <START-message> initiates running preparations in subsystems (control the gas system, starting of the switching on HV procedures (if they are not ON) and shifting voltages on Si-detectors etc.
If all subsystem parameters are O.K., subsystem sends <READY(2)-message>. If there is no such a message during time-out time, the RUN-process sends corresponding message to this particular subsystem and it must response with a NOT_READY-message, explaining the reason (sends a failure code). The alternative solution is a periodical sending STILL_ALIVE-messages from the RUN-process to subsystem and receiving ALIVE-messages until subsystem sends READY-message. Subsystems must not directly block running under no circumstances (only via messages). The message structure is defined in the file messdoc.dvi.
Until the subsystem provides a <READY-message> any further actions are blocked. (In reality most of the subsystems are switched on in advance and as a rule are ready to run). The RUN-process must wait, but an operator on shift may mask any ALARM-messages and start RUN without some READY signals (this will be fixed in LogBook D-base section).
The same happens at HERA-B run-time, when alarm-message comes. The RUN may be resumed only after corresponding <READY-message> or at masking the ALARM by the operator on shift. From time-to-time process-code can send a STILL_ALIVE-message to any process or subsystem to test its availability. At the same moment a timer is activated. An absence of a response during predetermined time will cause a Time-out-message to the process which sent a request. It must retransmit it also to the higher level process (or to MAIN SC) and to the Error Logger.
It is not a coincidence of messages
but states, which are not changed since READY-message arrival
till the and of RUN, coming NON_READY-message or ALARM.
The time-out values may be calculated on the base of a round trip time (RTT) measurements or fixed at the system configuration. The round trip time is a time which passes from the sending request message till the response coming back. Every time process sends a request it activates a timer to measure RTT. RTT must be averaged for every type of requests (RTTm) . The Tout,i must be put into a corresponding SDB, it may be one more good diagnostics parameter. Any sharp variation of the Tout,i (more than double dispersion value) will be reported. The value Tout,i depends on the time. At the very beginning ("cold state") it is higher than at time when system is in "warmed" state (voltages are ON, all is already tested, etc.).
Tout,i =RTTm,i* a (a ~2, i - is a code of request message type).
In principal we may forget about individual RTT-values and take one time-out value for all subsystems (the biggest RTT). To control an availability of subsystems at this long waiting RUN-process will send STILL_ALIVE-messages periodically. The inpatient operator may activate special message (STATUS-request) to the "too slow" subsystem which requests the list of non-ready processes. Such an option should be provided by UI of MAIN SC.
The User Interface (UI) provides on the control terminal a scale of bars, corresponding to the all subsystems. The <READY-message> arrival changes the bar color from red to green (Fig. 4, the list of the subsystems see in the file messdoc.dvi). When all bars are green, the scale is removed from the screen, it may be restored in case of ALARM. If subsystem does not respond on STILL_ALIVE-request, the corresponding bar start blinking. A blinking stops at getting an answer. The messages STILL_ALIVE must be sent and in running time, this will help to check all component accessibility (dead subsystem will never send an alarm message).
We need a standard for color codes. The experience of the H1-experiment shows that subsystems, have a quite different color signaling. As on the shift not all the subsystem experts are available, it makes problems.
If all subsystems are ready,
the RUN-process sends <RUN-message> to TRIGGER-system
(RUN= READY1 * READY2 *....* READYn).
The RUN-name (from menu) determines the whole system configuration (see Table 5). The procedure starts with a sending a CONFIGURATION-message by SC-process to the local Message dispatcher. It takes a configuration code from the body of the configuration message and form a CONF-request to the SDB-server (see fig. 2). The latter reads corresponding data from CDB (Data(c)) and sends this information to the Message dispatcher in form of C-DATA-message (Data(b)). The dispatcher readdress the message to the parameter hardware control and the data are put into the Buffer (Data(a)). As the procedure is finished the LMD sends READY message to the SC-process.
The CONFIGURATION-message may be sent even when subsystem is in READY-state, if we need to change working regime. But before doing so process must inform the MAIN SC about it by sending an ALARM-message with a reason code in the message body. There is a possibility to have several configurations for different RUNs. Every configuration has its own code, which is to put into a CONFIGURATION-message body. The configuration process has several stages:
For example, the HERA-B ECAL has the following sequence of startup:
After some delay (PM warming)
the system can send a READY-message. For other subsystems the
procedures of setting up are different, but all of them may use
the same interaction with the CDB.
The correctness of the controlled parameters is checked locally in subsystems (SC). Any ALARM-messages are sent by subsystem Message Dispatcher only in case of real emergency. The subsystem Message dispatcher must respond to Request-messages of MAIN-SC Message dispatcher and send requested information (e.g. at the request of the person on shift). There is one central SC system, which will control some general parameters (atmospheric pressure, humidity, some low voltages for power supplies etc.). There are tree levels of ALARM:
1. No ALARM at all (displayed data is in green color).
2. The parameter value is in dangerous range (displayed message is red).
3. The parameter value is absolutely intolerable (blinking red message) or some sub-sub-system is dead.
The figure 5 shows the meaning of MAX/MIN "dangerous" and "intolerable". A dangerous parameter value is not good, operator is forewarned about this, but we may run still.
All alarms-messages are displayed locally and retransmitted to the MAIN-SC and to the Error-logger. All ALARM-messages are to be acknowledged by the Message Dispatcher, as loosing of such messages is dangerous. The sender start a timer just after an alarm sending and in case of time-out retransmits the ALARM. The ALARM generation is interrupted with a acknowledge-message.
The ALARM-message besides real current parameter value must contain a corresponding alarm-level thresholds (see Fig. 5). An alarm-message may be sent by subsystem, when it needs the calibration run. Such a message forces MAIN SC to stop main trigger system and to change run-type, sending a CONFIGURE-message. After the subsystem will sent the READY-message the calibration run may be started. When calibration is over, one more ALARM will be sent, the primary configuration restored by sending CONFIGURE-message and after the subsystem's READY, a date taking RUN will be resumed.
Let us, e.g. consider again the HERA-B ECAL, it has a current control for PM-base groups. In case of overcurrent the group is switched off. The group of PM is without HV, this can not be cured immediately. All this developments are written in into ERROR-logger. What shall we do? Look at detector scheme, where the dead detector section is shown and make a hard decision to resume running (any such a decision is on a responsibility of the operative personal, not a computer code). This decision will be put down automatically into LogBook data-base.
There are two types of parameters: single, table. A single parameter is characterized with:
Table 1. List of subsystem parameters for CDB
|Parameter name||String with length byte first|
|Parameter type||(see docu.dvi p.2) (case 2 of more dimension tables ???)|
|Measurement period in seconds||Usually 10 - 3600 sec.|
The rows 1-3 comprise a record header. In a message one should present a real value and values from rows 4 - 7. Values in the cells marked with a * are parameter dependent.
The header for table-type parameters is shown in table 1a.
Table 1a. Table-type parameter header in CDB-record
|Measurement period in seconds|
|Number of records|
Then follows list of records, e.g. as rows 4-7 in table 1. For SDB there are three types of parameters: single (magnetic field, atmospheric pressure etc.), table-parameters (HV, LV, and so on) and prehistory table-parameters (gas amplification, calibration drifts etc.).
For a single parameter the record has a format: parameter name [A(32)]; parameter type [B]; parameter value. For a table parameter type the record format looks as:
Table name A(32)
Table type UB
Measurement period in seconds UI
Parameter type UB
Number of records US
The parameter length in bytes is fixed by parameter type code
For prehistory table-parameters the record format:
Table name A(32)
Table type UB
Measurement period in seconds UI
Parameter type UB
Depth to the past (M) US
Number of records (N) US
Pointer to the next record US
V1,1, V1,2, ........, V1,M; V2,1, V2,2, ....., V2,M; ....., VN,1, VN,2, ....., VN,M.
Where Vi,j is a value of parameter i at measurement j. For index j (prehistory index) the buffer is cyclic. The pointer is an index for j, it shows where should me written the result of the next measurement. The type (length in bytes) of the Vi,j is defined by the parameter type code.
The list of ALARM, which interrupt
RUN automatically, will be fixed later. What should an operator
on shift do, if the run was interrupted? It depends on the failure
type. The computer can provide him with some hints, but the final
decision is on his responsibility.
Any change of any parameter (even temporal) will be fixed automatically by the routine LogBook (electronic LogBook D-base). The logbook structure requirements corresponds to the Table Based Data Base (herab/DAQ/soft.html docu.dvi).
Table 2. Structure of the LogBook record (the record length is constant)
|Date of changing|
|Time of changing|
|Date of run start|
|Time of run start|
|Type of run (see table 5)|
|BX-number (to fix correspondence for the future off-line data development)|
|Name of person on shift, who made the change of parameter|
|Value type (according to the “MIZZI Computer Software”, p.2)|
|Old parameter value|
|New parameter value|
|(Some run parameters and what ever else?)|
A double type for “old and new” parameter values will suit any parameter type and make all LogBook record equivalent by length.
|Error type code|
|Error source name|
Lengths of fields 8-10 are defined by parameter type field. The record format varies according to the error type code. A special precaution should be made against a mass error generation. This may overflow d-base.
|ALARM||Level, Source name/Address|
|Sub-system is DEAD (no ALIVE-response)||Subsystem name|
|Packet delivery error||Source address, message type|
Message dispatcher functions
The Message dispatcher accepts
messages from local codes (process) and from the Network. The
network messages are tested for correctness and in case of mistakes,
the correspondent record (sender address and message type) is
sent to the Error logger. A special routine analyses these data
periodically to find out “noisy” addresses. Then LMD analyses
the type of message, check access privilege, if necessary (for
this purposes one may use a special field, like "community"-field
in SNMP-packets), and defines the address of the receiver. In
this point the LMD can reference to the local address D-base.
The name/address D-base (Name-server) contains object name, physical
address of interfaces, its IP-address, processor number (for process)
and for hardware objects "geographical" address (rack,
crate, module etc.). In different contexts as an object may be
used process, electronic module or control processor. The Name-server
can convert name to address and vice versa. "Local"
in this context supposes, that the object is in the same subsystem
with control code.
Table 5. RUN type is determined by its name (from menu ?)
Table 6. The list of SC-messages (CLASS=HBCOM_CLASS)
|START_RUN||Initializes test and preparation procedures in subsystem (HV, thresholds, interlocks, etc.), from RUN-process to subsystem; rpm_send.|
|START_SC||Starts subsystem clients (from MAIN SC; rpm_send). This message may be used before START_RUN, must be answered with READY (or NOT_READY).|
|READY||Response to START_RUN, START_SC and CONFIGURE. Message from subsystem to RUN-process (rpm_reply)|
|RUN||Starts a run, sent from RUN-process to TRIGGER-system; rpm_send|
|STOP_RUN||Stops run. Message from RUN-process to TRIGGER-system.|
|NOT_READY||From subsystem to RUN-process (carries the reason code);
|CONFIGURE||Starts a configuration of a subsystem, sent by MAIN or local SC; (rpm_send). Subsystems must respond with READY or NOT_READY.|
|ALARM||Subsystem sends to MAIN or local SC (name of parameter, its value, MIN/MAX-limits etc.); rpm_send.|
|GET-parameter||Parameter value request (to CDB or SDB; rpm_stat)|
|PARAMETER||An answer for GET-parameter request, returns the requested parameter values.|
|SET-parameter||Request to change the parameter value (to CDB, needs access control and starts LogBook updating; rpm_ctl), must be answered with READY.|
|TIME-OUT||Message from a control process (analog of the NOT_READY message, sent by sub-process to the MAIN SC and Error logger)|
|STATUS-request||Request from MAIN SC to subsystem to send a standard, report taken from the data of SDB, rpm_request. A request code and parameters are written in the message body. Possible parameters: subsystem name, parameter name etc.|
|STATUS-report||An answer to a STATUS-request, is sent by D-base manager (SDB-server).|
|REPORT-request||Request for some particular information from CDB, SDB, LogBook or Error-Logger to the SDB-server. Rpm_flood.|
|REPORT-reply||An answer to a REPORT-request, sent by D-base manager (SDB-server).|
|MASK_ALARM||Masking some type of ALARM-messages for debugging purposes (needs access control and must be fixed in LogBook), sent by RUN-process to LMD; (rpm_send). May be used by the operator to continue RUN with dead sections of the detector.|
|STILL_ALIVE||Request to test availability of process, subsystem etc. , must be acknowledged with ALIVE-message.|
|ALIVE||Reply to the request STILL_ALIVE|
|DATA||Fetches date from Message dispatcher to SDB-server. The data type and address is in the body of the message. Used for writing the data into the D-bases.|
|CONF||Request sent by Message dispatcher to the SDB-server. It must be answered with C_DATA-message.|
|C_DATA||An answer of the SDB-server to the CONF-request of LMD.
Returns requested data from CDB.
The column code contains message-IDs. Many messages have a very simple structure, they are provided with only message-header and no body, e.g. READY, RUN, STOP_RUN, TIME-OUT, ALIVE. A READY must have a special message identifier which help to find correspondence between requests and replies (both have to have the same identifiers). The identifiers are generated arbitrarily, but they need not repeat (e.g. next may be equal previous+1). This is useful in case when a process sent e.g. START_RUN, START_SC and CONFIGURATION messages in sequence and needed to separate READY-responses (e.g. READY(1) and READY(2) in fig. 3).
There is a difference between
NOT_READY and ALARM-messages. The NOT_READY comes as an answer
for START_RUN, START_SC,...), but ALARM appears as a result of
local scanning tests. The UDP can not guarantee a data delivery,
and a SET-parameter procedure must be checked with GET-parameter
There are two types of possible failures:
In both cases if personal on duty is not sure of something, they may start corresponding testing routing. The main results of the failure, any configuration changes and time of test routine running will be put down into the standard LogBook record.
Table 7. The structure of the Name/Address D-base record
|Name of an object|
|Type of the object|
|Physical address of interface|
* marks the cells for which a corresponding record field is available
Some messages carry only information on some event, the others are commands to fulfill some task or operation.
|Information type name||Software
|Gas flow rate|
The format of STATUS-request message body looks like:
Subsystem_number, information_type_name, e.g., 7,3 in message body, means a request to send temperature report for electromagnetic calorimeter.
The most significant byte is
stored at the smallest address and the largest address contains
the least significant byte.
The messages to the Message Dispatcher
(MD) may come from any process in HERA_B installation.. The MD
accepts any message addressed it or any broadcast messages (multicast
also) The real dispatcher response depends on a message code.
The Message Dispatcher can send messages itself.
This can be done from an internal reason or because of time-out
or as a consequence of external message. The MD can address to
any process in HERA_B or any hardware objects
(e.g., START_RUNNING). It can send an multicasting-message (e.g.,
RESET, START etc.). Any interactions in a frame of HERA-B Slow
Control system is possible only via MD.
An access to all D-base sections should be done according to the
<message to the DB-Server interface> <--> <DB-Server
interface> <--> <data bank>.
An information in the opposite direction is sent to the same route.
The way of response to external events is defined by the data
structure stored in the dispatcher routine.
The purpose of this routine is
a translation of the object name (process, hardware system etc.)
into a message containing a requested address information. A volume
and structure of the data depends on a request parameter set.
The data bank is stored on a disk, but the data fetched during
the last N requests are put into a cash.
The routine is fixing any errors
in the system (its type and total numbers, for some predetermined
type of errors one must put the time of its occurrence). An error
record is put into the D-base after a getting of an error message.
The Data bank is stored on a disk with a backup copy on another
physical disk. The backing up should be done regularly (at timer)
if there are any changes. The routine should provide any data
from ErrorLogger D-base (for the last hour, day, month, from-to,
for an error code list, etc..
The routine fixes any changes
introduced by an authorized operator into the HERA-B Configuration
Data Base. The data bank is stored on a disk with a backup copy
on another physical disk. It fetches a data on a request (according
to a parameter list). It should provide a convenient access for
the off-line data development routines (this should be fixed later).
The code gets messages addressed
it and carries out requested procedures (DB-upgrade, reporting,
sending answer-messages) on the one of the HERA-B data bases (CDB,
SDB, NDB (Name-server), LogBook, ErrorLogger). The type of procedure
is defined by a request code and its parameters.
Routines using some dynamic resources
(e.g., memory buffers or disk memory) must be able to respond
to the corresponding request and send a message containing a status
information. In case of an overflow they have to send an ALARM-message
to the Message Dispatcher
List of Hardware SC-systems
The list of subsystems with their ID (component.h + messdoc.dvi
+ "Protocols & Data Formats in HERA-B")
|Subsystem number||Subsystem name||Sub-detector
|ID code range|
|DAQ||Data Acquisition||0x0000 - 0x00FF|
|SVD||Silicon strips||0x0100 - 0x01FF|
|ITR||Inner tracker||0x0200 - 0x02FF|
|HIPT||High-Pt chambers||0x0300 - 0x03FF|
|OTR||Outer tracker||0x0400 - 0x04FF|
|RICH||RICH||0x0500 - 0x05FF|
|TRD||TRD||0x0600 - 0x06FF|
|ECAL||EM-calorimeter||0x0700 - 0x07FF|
|MUON||Muon tubes||0x0800 - 0x08FF|
|MPIX||Muon pixels||0x0900 - 0x09FF|
|MPAD||Muon pads||0x0A00 - 0x0AFF|
|FLT||1st level trigger||0x0B00 - 0x0BFF|
|SLT||2nd level trigger||0x0C00 - 0x0CFF|
|FARM||3rd level trigger (Farm)||0x0D00 - 0x0DFF|
|TARG||Target||0x0E00 - 0x0EFF|
|FAST_CONTROL||0x0F00 - 0x0FFF|
The message header contains fields: message length (excluding header), message type, destination address, source address, identifier and block identifier. The message type field has sub-field of message class (high order two bytes) and ID (for SC CLASS =HBCOM_CLASS). The message ID one may find in the table 6. The combination class/ID must be unique throughout HERA-B. For data representation the protocol XDR is used.
|Class name||Class code|
The data are taken from /hb/daq/Online_Library/pro/include.h. USER_CLASS serves for developing interfaces that haven't been assigned a value yet.
Address of ITEP Slow Control server
ARP Address Resolution Protocol (STD-037, RFC-826)
BX Bunch crossing
CDB Configuration Data Base
CHAOT CHamber Analysis and On-line Tool
CICERO Control Information System Concept based on Encapsulated Real-time Object (RD-38, CERN)
CORBA Common Object Request Broker Architecture
CRC Cycle Redundancy Check
DAQ Data Acquisition
DSP Digital Signal Processor
DTM Data Transfer Message
ECM Event Control Message
EDM Event Data Message system
EVB EVent Builder
FCS Fast Control System
FED Front End Driver
FLT First Level Trigger
GDS Gas Distribution System
GRR General eRRor
HLT High Level Trigger
HVD High Voltage Distribution control
IDL Interface Definition Language
IP Internet Protocol (STD-05, RFC-791, -950, -919, -922, -1112)
LMD Local Message Dispatcher
MIB Management Information Base (STD-16,17, RFC-1155, -1212, -1213)
MDC Muon Detector Control system
MFM Means-End Model
MIB Management Information Base
MPI Message Passing Interface
OMA Object Management Architecture
PLC Programmable Logic Controller
PRC Process Control
PVM Parallel Virtual Machine
RPB Remote Procedure Buffer
RPI Remote Procedure Interface
RPM Remote Procedure Message
RTT Round Trip Time
SC Slow Control
SCM System Control Message
SDB Sorted Data Base; Status Data Base
SLB Second Level Buffer
SLT Second Level Trigger
SNMP Simple Network Management Protocol (STD-015, RFC-1157)
STCP Simple TCP
TCL Tool Command Language
TCP Transfer Control Protocol (STD-07, RFC-791, -950, -919, -922, -1112)
TLT Third Level Trigger
TRD Transition Radiation Detector
UDP User Defined Protocol (STD-06, RFC- 0768)
UI User Interface
XDR eXternal Data Representation (RFC-1832)
Zeus Message Passing