#171 GC overhead limit exceeded in Task Engine (project Spoofax on YellowGrass.org)

When parsing many programs per second, via the Java API, we run into the following error:

java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.HashMap.resize(HashMap.java:703)
at java.util.HashMap.putVal(HashMap.java:628)
at java.util.HashMap.put(HashMap.java:611)
at com.google.common.collect.AbstractMapBasedMultimap$WrappedCollection.addToMap(AbstractMapBasedMultimap.java:417)
at com.google.common.collect.AbstractMapBasedMultimap$WrappedCollection.addAll(AbstractMapBasedMultimap.java:538)
at com.google.common.collect.AbstractMultimap.putAll(AbstractMultimap.java:80)
at com.google.common.collect.ArrayListMultimap.putAll(ArrayListMultimap.java:66)
at org.metaborg.runtime.task.TaskInsertion.createResultMapping(TaskInsertion.java:205)
at org.metaborg.runtime.task.TaskInsertion.getResultsOf(TaskInsertion.java:235)
at org.metaborg.runtime.task.TaskInsertion.createResultMapping(TaskInsertion.java:198)
at org.metaborg.runtime.task.TaskInsertion.insertResultCombinations(TaskInsertion.java:175)
at org.metaborg.runtime.task.TaskInsertion.taskCombinations(TaskInsertion.java:127)
at org.metaborg.runtime.task.evaluation.BaseTaskEvaluator.evaluate(BaseTaskEvaluator.java:40)
at org.metaborg.runtime.task.evaluation.TaskEvaluationQueue.evaluateQueuedTasks(TaskEvaluationQueue.java:318)
at org.metaborg.runtime.task.evaluation.TaskEvaluationQueue.evaluate(TaskEvaluationQueue.java:182)
at org.metaborg.runtime.task.engine.TaskEngine.evaluateScheduled(TaskEngine.java:279)
at org.metaborg.runtime.task.primitives.task_api_evaluate_scheduled_3_0.call(task_api_evaluate_scheduled_3_0.java:21)
at org.metaborg.runtime.task.primitives.TaskEnginePrimitive.call(TaskEnginePrimitive.java:27)
at org.strategoxt.lang.Context.invokePrimitive(Context.java:227)
at org.strategoxt.lang.Context.invokePrimitive(Context.java:216)
at pgqllang.trans.task_api_evaluate_scheduled_3_0.invoke(task_api_evaluate_scheduled_3_0.java:28)
at pgqllang.trans.task_evaluate_scheduled_0_0.invoke(task_evaluate_scheduled_0_0.java:29)
at pgqllang.trans.lifted266.invoke(lifted266.java:32)
at pgqllang.trans.measure_time_2_0.invoke(measure_time_2_0.java:39)
at pgqllang.trans.analyze_all_no_builtins_4_1.invoke(analyze_all_no_builtins_4_1.java:143)
at pgqllang.trans.analyze_all_4_1.invoke(analyze_all_4_1.java:34)
at pgqllang.trans.analyze_all_3_1.invoke(analyze_all_3_1.java:29)
at pgqllang.trans.editor_analyze_0_0.invoke(editor_analyze_0_0.java:34)
at org.strategoxt.lang.Strategy.invokeDynamic(Strategy.java:30)
at org.strategoxt.lang.InteropSDefT.evaluate(InteropSDefT.java:192)
at org.strategoxt.lang.InteropSDefT.evaluate(InteropSDefT.java:183)
at org.strategoxt.lang.InteropSDefT$StrategyBody.evaluate(InteropSDefT.java:245)

Is there any way we can disable this cache or clean it up after parsing a program?

This is how I’m using the API:
https://github.com/oracle/pgql-lang/blob/master/pgql-lang/src/main/java/oracle/pgql/lang/Pgql.java

Submitted by Oskar van Rest on 3 October 2016 at 17:49

error

On 3 October 2016 at 18:01 Oskar van Rest commented:

Possibly this indicates a memory leak. I’m not sure if the cache ever gets cleaned.

On 3 October 2016 at 18:10 Gabriël Konat commented:

How much is many programs per second?

You’re not just parsing stuff right? You’re also analyzing, otherwise the task engine would not even be used?

What cache do you want to disable? Do you mean the data in the task engine?
If so, what you can try is creating a temporary context instead (see IContextService), using that for analyzing a parsed file, and closing that (calling the close method on the temporary context). This should not persist any data, and memory can be freed immediately.

On 3 October 2016 at 21:05 Oskar van Rest commented:

Indeed this is parsing + name analysis + type analysis but no code generation.

This was reported by one of our users and I think they were stress-testing the system to see if there’s any memory leaks by executing queries inside a while(true) loop.

Our queries are always Strings and I write these to an in-memory VFS file system.
The file names for the queries are randomly generated, which seems where the problem lies. After some initial warmup, the parsing+analyzing of the query would take 5ms, but after repeatedly parsing+analyzing the same query, this time would go up to 20ms after a minute or so. From there on, it just keeps increasing.

I found out that this slowdown does not happen when I reuse the same file names for queries, rather than random file names. Given that insight as well as the exception message, this indicates that Spoofax keeps around objects for each of these randomly generated files. The HashMap (this is what I meant with “cache” / task engine data) seems to grow so large that most of the time is spend in inserting new objects into it. At some point, there are so many objects that the JVM takes too much time to perform GC and throws the exception.

For PGQL, I’ve now fixed it by generating only a single random file name per Spoofax instance rather than per query: https://github.com/oracle/pgql-lang/commit/6b8f174d067d1d061a7c19790557eb4fd048febc

I’m not sure if for Spoofax you want to provide some mechanism for cleaning up resources on a per-file basis to avoid such leakage. Although the temporary context probably also does the trick already, so I’ll just close the issue.

On 3 October 2016 at 21:05 Oskar van Rest closed this issue.

On 6 October 2016 at 09:45 Gabriël Konat commented:

The task engine will store data about all files (both in memory, and persisted to disk), so if you keep feeding it random file names it will indeed leak memory. In practice, when files are deleted we do clean up the task engine by sending it an empty tuple as AST for that file (kind of a hack right now), but since you keep generating new files, this does not happen.

For now, you can either use the same filename (as you just implemented), or use a temporary context. Using a temporary context is preferred because it is more efficient and elegant. If you use the same filename every time with a regular context, the task engine will incrementally update result every time there is analysis, which has some overhead.

In the future, we should add API to the analyzer for removing files, instead of the hack of sending an empty tuple that we have now.

On 6 October 2016 at 10:52 Guido Wachsmuth commented:

Oskar, in case you implement a solution using temporary contexts, can you add an example how to process strings with temporary contexts to https://github.com/MetaBorgCube/metaborg-api-usage? This would help to document a working solution.

On 6 October 2016 at 15:43 Oskar van Rest commented:

The temporary context works: https://github.com/oracle/pgql-lang/commit/829f8cd4d5f5dbc88296317b05e35d97608b63f9

@Guido: I wanted to add the temporary context to org.metaborg.examples.api.analysis but this requires a dummy project. I’m not sure if this is the right way to do it but I can add it if you want:
  dummyProject = new Project(dummyProjectDir, new IProjectConfig() {

    @Override
    public Collection<LanguageIdentifier> sourceDeps() {
      Set<LanguageIdentifier> sourceDeps = new HashSet<>();
      sourceDeps.add(id);
      return sourceDeps;
    }

    @Override
    public Collection<LanguageIdentifier> javaDeps() {
      return new HashSet<>();
    }

    @Override
    public Collection<LanguageIdentifier> compileDeps() {
      return new HashSet<>();
    }

    @Override
    public String metaborgVersion() {
      return null;
    }

    @Override
    public boolean typesmart() {
      return false;
    }
  });
Also, I believe the following repository should be added to org.metaborg.examples.api.analysis/pom.xml otherwise users are required to add the repo to ~/.m2/settings.xml but that shouldn’t be the preferred way.
  <repositories>
    <repository>
      <id>metaborg-release-repo</id>
      <url>http://artifacts.metaborg.org/content/repositories/releases/</url>
      <releases>
	<enabled>true</enabled>
      </releases>
      <snapshots>
	<enabled>false</enabled>
      </snapshots>
    </repository>
  </repositories>

Log in to post comments

GC overhead limit exceeded in Task Engine

Issue Log