Performance issues when using bigger datasets
The way current implementation deals with test data is not efficient and doesn’t cache test data. This causes each execution to read/copy the test data over and over again. Today we saw an assignment with big test data (20MB+) causing the backend’s ramdisk to run out of space. The implementation should change to keep the test data at a single shared place, so it wont need to read/copy the data mutiple times, and/or cache the data in memory (with some management) of the backend.
My current idea is to check whether test data is present in the shared space, and if not, copy it over before requesting executing the job. Test data present in the shared space will be deleted when not used in the last 10 minutes. Use a hash of the test data to uniquely identify the test data.
Submitted by Elmer van Chastelet on 16 November 2016 at 19:54
Log in to post comments