login becomes slow on production server after some time (1)
Today, before restarting tomcat, login on yellowgrass/webdsl/researchr took about 20 seconds
New threaddump on YG (with jasypt 1.9.0)“TP-Processor9” - Thread t@54
- locked <41e776a> (a java.security.MessageDigest$Delegate)
Might be related to:
- switch to Oracle JDK 7u21 (using tomcat 6.0.32)
- 1 tomcat instance running all apps = 1 jvm with 20GB heap space
Updates (oldest first)
- The problem is JVM-wide, a redeploy of a single web application does not solve the large login times for that app
- The problem might kick in at some time (<40h after tomcat restart) or gets triggered by something, but doesn’t seem to worsen over time
- May 23: Switched java version from Oracle 7 to OpenJDK 7
- May 24:
- http://webdsl.org homepage is slow (which it wasn’t yesterday after switch to OpenJDK). CPU-sampling reveals most time is spend on String replaces as part of the (old) markdown processor… So now we have still mysterious slowdowns, but at different parts of our applications. Same cause? No idea :(
- webdsl.org app is now rebuild against newest webdsl and redeployed in the same tomat instance (not restarted yet) -> still slow
- Tomcat is being restarted, everything feels snappy again (as usual) and will probably get slow over time…
- May 26:
- Slow login is back -> switch to OpenJDK did not fix the issue.
- May 27:
- May 28:
- +/- 12.20h: Webdsl.org pages load really slow, as can be observed from the monitoring status pages (timeout triggers -> monitor.us reports it as down)
- First measured at 12:17:30 (pingdom time) by pingdom.
- evaluaties.war got deployed at 10:13 server time (=12.13 GMT+2h)
- permgen: pSize: 1,270,480,896 BUsed: 717,463,056 BMax: 2,147,483,648 B (measured at 15.15h, probably also at this level on 12.20h)
- However, login is still fast.
- (An increase in outgoing traphic started at (about) the same moment when slower page loads were detected. See zabbix graph at 28 may 12.25. Coincidence?) –> Yup as outgoing traphic decreased later
- May 30:
- We redeployed reposearch to increase the permgen space usage and see if page load times increase (or login becomes slow)
- Looking at the pingdom stats, we see an increase of the page loads time:
- Around 06.30h response time +/- 15s (pingdom EU servers)
- Probably around 9.45 (but guaranteed between Thu May 30 09:26 - 09:49 (00:23)): nixos-rebuild switch by rob (for backup)
- Around 10.00h response time +/-16 seconds (pingdom EU servers)
- Redeploy at 10:55h
- Around 10.00h response time 16.5-17.5 seconds (pingdom EU servers)
- The higher page loads ^ could be the result of reposearch starting its recurring tasks like search index optimization and autocompletion index renewal. But I noticed the search index optimization took about 30 minutes! That’s very slow.. as previous optimizations take at most 1 minute (checked in tomcat log).
- Instead of 1 tomcat instance with all apps, we now isolate the apps in separate tomcat instances
- done: researchr.org , webdsl.org, yellowgrass.org, codefinder.org
- rest of the apps still run in big tomcat
- June 5:
- Switch back to Oracle jdk 7
Conclusion so far
Slowdown seems to occur if permanent generation space usage increases above a certain value.Submitted by Danny Groenewegen on 17 May 2013 at 13:41
Current solution: isolate apps to have their own tomcat instance. Furthermore, from r5713 onwards we handle cleanup to make permgen space taken by apps GCable. See https://yellowgrass.org/issue/WebDSL/317
I’m not sure if this really relates to the JDK update…
Before this update, we already had slow page loads some times, especially pages with markdown on it. (which triggered me to update the markdown processor)
Once tomcat got restarted, those pages loaded quickly (nothing else changed).
Slow login/slower markdown processing/… is back again.
Login on researchr, waiting server response: 32.30s.
Loading webdsl.org homepage, waiting initial server response: 444ms
And slow login is back :( If time permits I’ll try to debug some more this afternoon.
No observable slowdowns anymore -> closing this issue.
Log in to post comments