This is a harder-to-debug issue, as there is no known way to reproduce this reliability issue. I expect this to be a timing issue between the front end reading and backend writing to the job-file.

Another, maybe related issue reported by Andrew:

The server’s results seem seriously unstable. Over the past few
minutes, repeated clicking on “Your Test” and “Spec-test” produced the following
outcomes in seemingly nondeterministic fashion:

Status: Done
Hello cruel world
Test score: 1/1

Status: Done
Hello cruel world
Test score: 0/-1

Status: Running
Hello cruel world
[long wait]
Status: Killed

Status: Running
[long wait]
Status: Killed
Test score 0/-1

(Many repeats of each output.)
In this state, I’d say the system is unusable.

But now, things seem stable again giving the first, correct, output.

Smells like a synchronization bug maybe dependent on server load?

Submitted by Elmer van Chastelet on 29 September 2015 at 08:04

On 29 September 2015 at 08:05 Elmer van Chastelet tagged 0.46.1

On 29 September 2015 at 08:05 Elmer van Chastelet tagged reliability

On 20 November 2015 at 09:34 Elmer van Chastelet removed tag 0.46.1

On 20 November 2015 at 09:34 Elmer van Chastelet tagged 0.46.2

On 17 December 2015 at 11:43 Elmer van Chastelet removed tag 0.46.2

On 17 December 2015 at 11:43 Elmer van Chastelet tagged 1.85.0

On 2 August 2016 at 10:11 Elmer van Chastelet removed tag 1.85.0

On 2 August 2016 at 10:11 Elmer van Chastelet tagged 1.90.0

On 15 November 2016 at 08:53 Elmer van Chastelet tagged 0.47.1

On 15 November 2016 at 08:53 Elmer van Chastelet removed tag 1.90.0

On 15 November 2016 at 09:02 Elmer van Chastelet commented:

Backend issue: the status (e.g. RunningFailure or Done) is written before other details (error/output streams). WebLab may check the status and marks it as finished, while other details still need to be written. These will then never be read by WebLab as the job was already finished.


On 21 November 2016 at 09:38 Elmer van Chastelet closed this issue.

Log in to post comments