Introduction
Networked VR applications require different types of logging, such as:
- Debug Logs
- Experiment Logs
- Network Traces
- Refers to logging expected and exceptional events that occur during a regular session. The purpose is post-hoc debugging of high-level application code.
- Refers to logging application-specific data, such as measurements or questionnaire responses for an experiment.
- Refers to captures of network traffic to investigate reproducible low-level netcode bugs.
(1) & (2) are handled by Ubiq's Event Logging system. (3) has distinct performance implications, so is handled seperately.
Use Case
The Event Logging System is for collecting low or medium frequency events from multiple peers. The Event Logging system can log both Ubiq and third-party events, which can then be extracted and analysed.
Events are discrete, but otherwise have very few restrictions. It is up to the user to ensure that event logging in their application doesn't negatively affect performance.
Overview
Events are generated by LogEmitter
instances placed throughout the application. Events generated by these components are passed to a local LogCollector
instance. The LogCollector
then writes them to disk (or database, or other endpoint), or forwards them to another LogCollector
that will.
LogEmitter
LogEmitter
instances are lightweight objects that the application uses to log events.
Calls to a LogEmitter
are expected to be placed throughout the system persistently, rather than gated with pre-processor defines.
The most common types of event logger are the ContextEventLogger
, which is designed to work with Components that have a NetworkContext
, and the ExperimentEventLogger
, designed for logging measurements in experiments.
public class VoipPeerConnectionManager : MonoBehaviour
{
private ContextLogEmitter debug;
private void Start()
{
context = NetworkScene.Register(this);
debug = new ContextLogEmitter(context);
}
public void ProcessMessage(ReferenceCountedSceneGraphMessage message)
{
var msg = JsonUtility.FromJson<Message>(message.ToString());
switch (msg.type)
{
case "RequestPeerConnection":
debug.Log("CreatePeerConnectionForRequest", msg.objectid);
break;
}
}
}
The snippet above demonstrates the creation and use of a ContextEventLogger
. The VoipPeerConnectionManager
declares the ContextLogEmitter
debug
and initialises it with a ContextEventLogger
once a context has been created. The Log
method can then be called to log the receipt of a specific message.
Events are recorded as Json objects, for example:
{
"ticks":637794978937526996,
"peer":"aa9a2c8c-0c6f11c7",
"type":"Ubiq.Voip.VoipPeerConnectionManager",
"objectid":"aa9a2c8c-0c6f11c7",
"componentid":50,
"event":"CreatePeerConnectionForRequest",
"arg1":"9020d814-45c41c19"
}
LogEmitter
instances attach to a single LogCollector
. The emitter constructors find the closest LogCollector
automatically.
LogEmitter
methods can be safely called from outside the Unity main thread. They should not be called from outside CLR threads however.
LogEmitter
instances are designed to have zero overhead when logs are not actually written. The Log
method has many overloads to avoid boxing, and serialisation only runs when logging is on. Logs are only written when there is a listening LogCollector
in the scene.
If the LogCollector
is disabled, non-existent, or the Collector's EventFilter ignores the emitter's event type, the emitter will not do anything.
It is encouraged to make as many LogEmitter
instances as needed. Individual emitters are simple, with few options. Use multiple LogEmitter
instances within a class to get fine-grained control over logging, for example different log levels.
Events and Filters
A LogEmitter
can tag events with a flag. A number of flags are pre-defined in Ubiq.Logging.EventType
enum. The underlying type is an sbyte
, so additional values can be used in an unchecked context.
[Flags]
enum EventType
{
Application = 1,
Experiment = 2,
Debug = 4,
Info = 8
}
A LogCollector
can be set to ignore one or more of these types with its EventsFilter
member. If a flag is set the event is logged. If it is unset the event is not logged. Filtering applies to local events. LogCollectors
will forward and write all external events regardless of the filter. Filtered events are not cached, but lost permanently.
LogCollector
writes events of different types to different streams/files. Prefixes are defined for the entries in Ubiq.Logging.EventType
. Other tags will display as their numerical equivalent.
The LogCollector
EventsFilter
member and the Tag
member of any LogEmitter
can be changed at any time.
ComponentEventLogger
and ContextEventLogger
have their type set to Debug
by default. Convenience classes are defined for ExperimentLogEmitter
and InfoLogEmitter
.
LogCollector
LogCollector
instances receive events from local LogEmitter
instances and remote LogCollector
instances. They do one of three things with these events:
- Cache them (Buffering Mode)
- Write them to a local file (Writing Mode)
- Forward them to another LogCollector (Forwarding Mode)
The behaviour depends on the value of the destination
member, which determines where the events should go.
LogCollector
instances can be organised this way into complex tree arrangements.
However, the expected use case is that Peers in a Group forward events directly to one Collector, that also writes them to disk.
LogCollector
writes events to files in the Persistent Data folder of whatever platform it is running on. It does this through a set of Ubiq.Logging.LogCollector.IOutputStream
instances (one for each event type, created on demand).
The filenames include the event type and a timestamp.
Buffering
When a LogCollector
is not Forwarding or Writing, it will cache or Buffer all events received (whether local or remote). If a LogCollector
was previously Writing or Forwarding and stops, it will resume Buffering. When a LogCollector
begins Writing or Forwarding, it will start with all the events in its buffer.
Events only leave the buffer when being sent to a safe destination (another LogCollector
, or a file). In this way logging is lossless, regardless of the initialisation order and even when changing the Active LogCollector
.
However, events that leave the buffer are gone permanently. If a LogCollector
writes events, then another LogCollector
becomes active at a later time, the second will only output events received after it became active. The file from the first LogCollector
must be retrieved out-of-band to have the complete record of events.
This can be avoided by ensuring only one LogCollector
becomes active during the lifetime of the Peer Group.
Encodings
Events are written as Json (Utf8 strings).
LogCollector
attempts to make the output files Json compliant, by placing events in a top-level Json Array, with comma seperated entries. If the file is being read live however, or the process terminates unexpectedly, the file will not have the closing bracket ( ] ).
In this case the application reading the files should either add the bracket to the end of the stream or perform its own tokenisation.
Changing the LogCollector
Only one LogCollector
in a Peer Group may be in Writing Mode at any time.
The Active Collector can be changed by calling StartCollection
on the new Collector. This can be done at any time.
When the Active Collector changes, the Peers must agree on the new Collector. This is a Distributed Agreement problem. LogCollector
solves this using the Global Snapshot method [1].
The new Collector broadcasts a message with a logical clock value and atomically updates its cut state. When Peers receive the message, they compare the clock against their own. If it is greater, they update their cut state, otherwise the message is ignored. If a Collector is attempting to become Active, and receives a message with the same clock value as its own, a collision has occured. In this case, the Collector re-initialises its clock to a random value, and re-transmits its state. If a message is received with a greater clock value, it updates its own State as if it were any other Peer.
This algorithm assumes that Peers are fully connected, and message passing is reliable, which is true in Ubiq. The logical clock ensures that all Peers converge on the same State, regardless of the order the messages were recieved in. The Peers have converged when all messages have been delivered.
When the Active Collector leaves the Peer Group, the Clock is reset.
Limitations
As long as Peers remain connected, the system is lossless, even during convergence.
However, during convergence different events may flow to the old and new Active Collectors at the same time, resulting in events being spread between the sets of log files. This will occur until convergence. There is no upper bound on this time, and there is no mechanism to check if convergence has been achieved, so StartCollection
should be called with care.
To surrender position as the Active Collector, a Collector may call StopCollection
, however the time between this and the Peers converging to null is again indeterminate.
When the Active Collector changes, the delay between the previous Collector receiving the new state, and other Peers in the Group receiving the new state, may cause it to behave as a relay for a time.
Process Failure
If the Active Collector process fails, log events will be lost until the system converges on a new Collector or null.
If the Collector was part of a Room, the Rooms system can detect disconnection in some cases, and Collectors will update their cut state independently. The Collectors in this case will cache until a new Collector volunteers itself.
If a process that was not the Active Collector fails, then that process will simply not emit events. If that process was previously an Active Collector, it is still possible to lose events if the Collector was acting as a relay when the process failed.
For collecting data such as experimental logs, it is recommended to only ever have one collector on a process that can be monitored. For example, the logcollectorservice
.
Verification
The LogCollector
method GetBufferedEventCount
returns the number of Events currently buffered. As LogCollector
is multi-threaded, this count may change even while it is being returned. However, if it is known, for example, that an application will not generate any new Events of a particular type, it can be used to check whether all of those events have finished writing.
The LogCollector
method Ping
will ping the Active Collector.
All Log messages are delivered in order. Therefore these mechanisms can be used together to verify that all log messages for a particular application have been delivered successfully before it exits.
To do this, an application could:
- Write the last Event, e.g. a Questionnaire result.
- In the same thread, wait until the number of buffered events of that type is zero.
- Ping the Active Collector
When a response is receieved, the Event, and all preceeding it, will have been successfully delivered and the application can safely exit.
This protocol is implemented in the WaitForTransmitComplete
method.
This does not protect against process failure of an intermediary LogCollector
however, a condition which is irrecoverable.
Reliability
In order for WaitForTransmitComplete
to confirm that an event has been successfully delivered, the integrity of the LogCollector processes must be fully visible. One way to achieve this is to ensure only one LogCollector is ever active, and that that LogCollector never forwards. In this case, if the Ping is received, then the logs must also be successfully delivered as they cannot have taken any other path to the collector.
LogCollector
will keep track of this by default, and warn any caller of WaitForTransmitComplete
if this is the case.
Analysis
A LogCollector
outputs a stream of structured logs in compliant Json. These logs can be fed to a stack like the ELK, processed with third-party tools like Matlab or Excel, or processed programmatically on platforms such as Python.
See the Analysis section for examples of how to process the logs.
[1] Kshemkalyani, A. D., & Singhal, M. (2008). Distributed Computing: Principles, Algorithms and Practice. Cambridge University Press.