Optimize reading terms from files in Stratego/J (2)
There’s a number of issues that may degrade performance of reading terms from files:
The SSL_read_term_from_stream class doesn’t use the java nio channels yetThe PusbackInputStream.read() method used in BasicTermFactory.parseFromStream() may be inefficient
The Java API can currently only write textual ATerms to files. The binary ATerm format (BAF) is known to be very performance-inefficient though, and I haven’t seen much difference with the new, streamed ATerm format (SAF)
The first two issues can be addressed by writing our own PushbackInputStream (like) class that uses channels.Update: apparently, memory-mapped IO should be considered harmful: Spoofax/106.
Submitted by Lennart Kats on 20 March 2010 at 11:49
Issue Log
Added caching of ReadFromFile in r20723, which should help with this issue.
Writing textual aterm files was very memory inefficient because of in-memory string construction. This was fixed by streaming the terms to an OutputStream in revision 21582.
The implementation of Streaming ATerm reading and writing was completed (I hope) in r21640.
The writer however calculates sharing on its own… couldn’t it use existing term sharing for terms with storage type = MAXIMALLY_SHARED ? Just a thought.Writing binary aterms remains unimplemented, but shouldn’t be too hard to implement either, re-using parts of the binary aterm reader and the streaming aterm writer.
Nathan: great about the SAF support. Note that in the new-terms branch terms also gained an efficient
writeAsString(Appendable, int)
method. I also changed the description of this issue a bit to reflect that memory-mapped IO is3vil
.
Log in to post comments