We’ve had reports that in Kolab 16, when the system restarts, the Manticore service may be started before the MongoDB service (has become functionally apt). This would render Manticore unavailable on every system restart.
This article is about how we are going to work resolve this properly;
Make Manticore Depend on MongoDB
This would instruct systemd to wait with starting Manticore until after the MongoDB service has been started. This is a valid resolution, but locks in Manticore and MongoDB to run on the same node. We would like to avoid this where we can. The corresponding snippet of the systemd unit file would look as follows:
[Unit] Description=Collaborative Editing for ODF Documents After=network.target Requires=mongod.service
Make Manticore Not Fail
Currently, Manticore fails (fatally) when MongoDB is not available during its startup. This seems improper, and leaves us to wonder what happens when the connection to MongoDB is broken while Manticore is running (and perhaps restored few moments thereafter).
Avoiding the fatal error is a development effort encapsulated in T981.
Delay the Startup of Manticore
In order to give MongoDB a chance to become available functionally (as opposed to “yes, blob been called”), we could choose to delay the start-up by some number of seconds:
[Service] ExecStartPre=sleep $number
This is definitely an ugly workaround and literally just guesses whether or not MongoDB has had a chance to start up yet.
Delay the Restart of Manticore Failing
The systemd unit file can utilize a setting that delays the restart of Manticore, such that it is likely to be restarted still (within the threshold of maximum number of restarts) after the MongoDB service is finally available:
This would imply, however, that should the Manticore service fail during any point, it’ll automatically be unavailable for up to a minute.
Add a Timeout to the Start-up
Should the aforementioned T981 be implemented and resolved successfully, we will likely add a time-out to the start-up routine in order to log the appropriate amount of error messages should Manticore fail to start-up, polling for MongoDB to become available.
This would ensure systems give administrators the necessary verbosity about what’s going on:
Should Manticore fail to start up completely within a time-window of 60 seconds, errors be logged some place and the restart timer would kick in to try again.