#715 login becomes slow on production server after some time (project WebDSL on YellowGrass.org)

Today, before restarting tomcat, login on yellowgrass/webdsl/researchr took about 20 seconds

New threaddump on YG (with jasypt 1.9.0)
“TP-Processor9” - Thread t@54
java.lang.Thread.State: RUNNABLE
at sun.security.provider.SHA2.lf_sigma1(SHA2.java:180)
at sun.security.provider.SHA2.implCompress(SHA2.java:224)
at sun.security.provider.SHA2.implDigest(SHA2.java:118)
at sun.security.provider.DigestBase.engineDigest(DigestBase.java:186)
at sun.security.provider.DigestBase.engineDigest(DigestBase.java:165)
at java.security.MessageDigest$Delegate.engineDigest(MessageDigest.java:576)
at java.security.MessageDigest.digest(MessageDigest.java:353)
at java.security.MessageDigest.digest(MessageDigest.java:399)
at org.jasypt.digest.StandardByteDigester.digest(StandardByteDigester.java:979)
- locked <41e776a> (a java.security.MessageDigest$Delegate)
at org.jasypt.digest.StandardByteDigester.matches(StandardByteDigester.java:1099)
at org.jasypt.digest.StandardStringDigester.matches(StandardStringDigester.java:1052)
at org.jasypt.util.password.StrongPasswordEncryptor.checkPassword(StrongPasswordEncryptor.java:99)
at org.webdsl.tools.Utils.secretCheck(Utils.java:135)
Might be related to:

switch to Oracle JDK 7u21 (using tomcat 6.0.32)

1 tomcat instance running all apps = 1 jvm with 20GB heap space

Updates (oldest first)

The problem is JVM-wide, a redeploy of a single web application does not solve the large login times for that app

The problem might kick in at some time (<40h after tomcat restart) or gets triggered by something, but doesn’t seem to worsen over time

May 23: Switched java version from Oracle 7 to OpenJDK 7

May 24:

http://webdsl.org homepage is slow (which it wasn’t yesterday after switch to OpenJDK). CPU-sampling reveals most time is spend on String replaces as part of the (old) markdown processor… So now we have still mysterious slowdowns, but at different parts of our applications. Same cause? No idea :(

webdsl.org app is now rebuild against newest webdsl and redeployed in the same tomat instance (not restarted yet) -> still slow

Tomcat is being restarted, everything feels snappy again (as usual) and will probably get slow over time…

May 26:

Slow login is back -> switch to OpenJDK did not fix the issue.

May 27:

Tomcat restarted for now

Added external web page monitoring, checks every 30min, might reveal when slowness kicks in: pingdom status page or monitor.us status page

Reduced heap space to 10GB, tomcat restarted

May 28:

+/- 12.20h: Webdsl.org pages load really slow, as can be observed from the monitoring status pages (timeout triggers -> monitor.us reports it as down)

First measured at 12:17:30 (pingdom time) by pingdom.

evaluaties.war got deployed at 10:13 server time (=12.13 GMT+2h)

permgen: pSize: 1,270,480,896 BUsed: 717,463,056 BMax: 2,147,483,648 B (measured at 15.15h, probably also at this level on 12.20h)

However, login is still fast.

(An increase in outgoing traphic started at (about) the same moment when slower page loads were detected. See zabbix graph at 28 may 12.25. Coincidence?) –> Yup as outgoing traphic decreased later

May 30:

We redeployed reposearch to increase the permgen space usage and see if page load times increase (or login becomes slow)

Looking at the pingdom stats, we see an increase of the page loads time:

Around 06.30h response time +/- 15s (pingdom EU servers)

Probably around 9.45 (but guaranteed between Thu May 30 09:26 - 09:49 (00:23)): nixos-rebuild switch by rob (for backup)

Around 10.00h response time +/-16 seconds (pingdom EU servers)

Redeploy at 10:55h

Around 10.00h response time 16.5-17.5 seconds (pingdom EU servers)

The higher page loads ^ could be the result of reposearch starting its recurring tasks like search index optimization and autocompletion index renewal. But I noticed the search index optimization took about 30 minutes! That’s very slow.. as previous optimizations take at most 1 minute (checked in tomcat log).

Instead of 1 tomcat instance with all apps, we now isolate the apps in separate tomcat instances

done: researchr.org , webdsl.org, yellowgrass.org, codefinder.org

rest of the apps still run in big tomcat

June 5:

Switch back to Oracle jdk 7

Conclusion so far

Slowdown seems to occur if permanent generation space usage increases above a certain value.
Current solution: isolate apps to have their own tomcat instance. Furthermore, from r5713 onwards we handle cleanup to make permgen space taken by apps GCable. See https://yellowgrass.org/issue/WebDSL/317
Submitted by Danny Groenewegen on 17 May 2013 at 13:41

error!elmer

slowdownWebdslOrg.png	29 May 2013 at 08:18

On 17 May 2013 at 13:43 Elmer van Chastelet tagged !elmer

On 17 May 2013 at 15:49 Elmer van Chastelet commented:

I’m not sure if this really relates to the JDK update…

Before this update, we already had slow page loads some times, especially pages with markdown on it. (which triggered me to update the markdown processor)
Once tomcat got restarted, those pages loaded quickly (nothing else changed).

On 19 May 2013 at 12:34 Elmer van Chastelet commented:

Slow login/slower markdown processing/… is back again.

Login on researchr, waiting server response: 32.30s.
Loading webdsl.org homepage, waiting initial server response: 444ms

On 26 May 2013 at 11:06 Elmer van Chastelet commented:

And slow login is back :( If time permits I’ll try to debug some more this afternoon.

On 13 June 2013 at 14:05 Elmer van Chastelet commented:

No observable slowdowns anymore -> closing this issue.

On 13 June 2013 at 14:05 Elmer van Chastelet closed this issue.

On 24 June 2013 at 13:40 Elmer van Chastelet closed this issue.

Log in to post comments

login becomes slow on production server after some time (1)

Updates (oldest first)

Conclusion so far

Attachments

Issue Log