Automatic update of statistics needs refactoring
When
automatic statistics updates
(read: recalculation of grades) is turned on for an assignment, assignments get regraded automatically until that flag is turned off again. I.e., Turning automatic statistics updates off is likely to never happen.Currently, grades are recalculated every 5 minutes, taking at most 2 assignments each time for grading. I can imagine that this queue may become very large (or maybe it already is), causing grades to be outdated for a long time.
This should be changed to be more dynamic in a sense that it will just recalculate the assignments for which something has changed. Then, recalculation can be done instantly on all assignments, or in greater batches.
Submitted by Elmer van Chastelet on 22 September 2014 at 16:01
Issue Log
I have a new implementation which I’m currently testing.
In the old situation, any assignment in the tree of assignments within a course edition could be flagged for auto statistics update.
This leads to outdated statistics. Take for example this tree:level1 level2 level3 AssignmentA | |---AssignmentB | | | |---AssignmentC
- When assignment
B
is flagged for autoupdate, it uses outdated grades from assignmentC
.- When assignment
C
(leaf) is flagged for autoupdate, it calculates the grades and statistics of the assignments, but the statistics of higher level assignmentsB
andA
are not updated.Therefore, updating a single assignment leads to incorrect statistics/grades for other assignments in the tree.
My solution is to calculate assignment statistics and grade submissions in one order, and only for the whole tree under a course edition:
For a course edition
Edt
:
- Start at the leaf assignment
C
- For each submission in
C
, calculate the grade- Update the statistics for assignment
C
- Continue level up, assignment
B
- For each submission in
B
, calculate the grade (thus based on up to date grades from children)- Update the statistics for assignment
B
etc
This way, grades and statistics are calculated and updated properly.
Next issue is when to recalculate all grades/statistics:
- on change submission
- on grading by grader
- on change answer
- on submit
- on change number of enrolled students, for grade
- ….
I implemented statistics gathering on a per assignment basis because my implementation was very expensive. Can you implement it cheaply (by using database queries instead of weblab functions)? How much time and CPU cycles does it take to compute the statistics for a full course (say my concepts course)? That will inform the frequency. Of course, it would be useful to have a dirty bit on a course; when there have been no changes, it is not necessary to compute statistics. This would make it sensible to re-compute statistics frequently (every 5 minutes would be nice) for courses that are active. I really would like to have a (near) real-time view of the progress of a course.
During the grading of an assignment’s submissions, a
hasValidGrade()
check is added, which isfalse
whenever the submission or assignment was changed after previous call tocomputeGrade()
. This makes grading more efficient. However, most time is spent in calculating grades of a assignment collection’s submission. Here, the grade is computed by retrieving the children submissions one by one, each by a single HQL query. This may take 10 to 200ms for each submission, averaging about 60ms on my local machine. This may add up to 50 seconds for the whole set of submissions within that assignment collection. The data model needs an upgrade for this to handle it more efficient.Anyway, with the improved implementation (not online yet), a full course grade/statistics update takes about 3-5 minutes (all grades flagged dirty), but less then a sec when, say, 5 submissions are updated.
Furthermore, I have changed the recurring task to update statistics every 3 seconds using a queue.
- When the queue is empty, it builds the queue in the right order, i.e. to calculate grades/statistics for leaf assignments first, up to the root assignment collection.
- It picks at least one assignment each invocation, but continues to pick another one in the same invocation when invocation was done less then 2 seconds ago. This way, there is no 3 seconds penalty for each assignment that has small or no grade updates.
- Small jobs each invocation means short transaction times, making it less likely to have actions/updates in conflict (
StaleObjectstateException
)
Log in to post comments