handling a few thousand simultaneous connections in TIBCO BusinessWorks

Scaling with TIBCO BusinessWorks can sometimes be a bit tricky. Recently I began testing some scenarios how to scale a Webservice a bit larger. The first source of information was of course the official documentation and to look at the proposed best practice values for such an engine.
To start small, I tried a HTTP Receiver with a 32bit JVM runtime. I set the heap to the maximum amount possible (something about 1.7gig) and tried how many connections I could handle with that. After a few hundred (300-400) the engine alsways ran into an Out-Of-Memory Exception. From that point the engine was often not recoverable and had to be killed.
After that I tried my luck with an 64bit JVM. Theoretically, with more RAM more connections should be possible, so lets go for it.
I increased the heap size to about 4gig. With that value, the engine actually consumed something about 6gig of memory (I only had 8gig on my test machine). Running the same test as before the connection count increased linearly. That was something I didn’t expect. First of I expeceted the Memory consumption should lower on the amount of connections (because of the more reusable objects) and an increase in CPU load because of the more and more complicated handling of the larger heap on the JVM end.
Despite that I came close to handle about one thousand connections. This seems pretty good but was not enough for what I had in mind. The only possibility I saw at that point was to increase the Memory further and further to get more connections running. A second concern which came to my mind was the thread handling. In a default Tomcat installation ever connection gets its own thread. This does consume a lot of memory but also increases the thread count of the server engine dramatically.
With that in mind I remembered something I read about Tomcat 6 a while ago. For the purpose of handling a lot of simultaneous and enduring Javascript requests Tomcat introduced a new kind of connector engine which uses the Java NIO Framework. To explain this a little, Sun introduced this framework in Java 1.4 to handle IO over a single thread mechanism and use mostly OS provided functions for memory allocation and interaction. So in theory this could be the ideal framework for handling a lot of network IO. Tomcat introduced this feature with the c10k problem specifically in mind. So I began searching around how I could get a similar behvior out of BusinessWorks.
What I didn’t know at that point was, that TIBCO already introduced this feature into BusinessWorks with version 5.7. You can switch the HTTP connector engine with some parameter in the HTTP Connection resource.

HTTP Connection - engine selector

HTTP Connection - engine selector

So I changed the connector engine and restartet the test. This time I started small. I set the heap to 512MB and limited the maxProcessor to 500. What I then saw was unexpected. The engine filled up right to the ten thousand requests I send. There occured no Out-Of-Memory at all. What was also interesting was, that the engine held 10k connections despite the maxprocessors was set to 500.
So to conclude the result, the new connector is quite impressive when you need to handle a lot of simultanous connection and have not a lot of memory to spare. On the other hand, when you use it, you loose some of the TIBCO integrated features to limit your load. Further to that, the TIBCO documentation states that due to the single thread arcitecture you increase latency. So as always there is a tradeoff.

One final sidenote. I had some issues with the BusinessWorks 5.7.1 engine so I upgraded it to 5.7.2. Than it ran without a glitch.

, , , ,

1 Comment

TIBCO Designer Panel too small

Recently I ran into some rather trivial problem which isn’t really addressed by the TIBCO Designer. I had a process which wouldn’t fit into the Design Panel. There was just not enough space on the canvas to fit in the actual flow.
After asking around I came to the conclusion that every designer (from different colleges I work with) had a different resolution for the Design canvas. Nobody knew any kind of property where you can set this resolution, so I began searching around.
The designer uses basically 2 Folder for its configs. One is the installation folder with the designer.tra (already explored this one). The other one is the .TIBCO folder in your user home directory. In that folder there exists a file with the name “Designer5.prefs”.
The content of this file consists mainly of position data of the various dialogs. Further to that it includes all the values which you can set through the designer preferences window. Back to the actual topic, I found the following two values:

graph.height.pref=712
graph.width.pref=1515

These values represent the default size which the Designer allocated for its canvas. As you can see the default is pretty small on this one. I also found installations where both values where ten times larger then this. So far I found no performance penalty to this.
The only pattern I found is that newer installation had smaller values as default. Where this value comes from and how it is determined stays unclear to me.

One other thing I found during my research. If the panel is to small for the current process you can drag one activity right next to the border and then start to move it via keyboard (‘Shift + Cursor’). By doing so you drag the icon out of the canvas, but the position will be updated internally. After that you just need to refresh the process view and voila you have expanded the canvas size. But this is just a quick and dirty work-around.

, ,

No Comments

searching for hash strings in postgres

For one of my projects a have a database which has a rather large table consisting of just an url and a corresponding id. For performance reasons I added a md5 column which hashes the url. With this column it should be a lot faster to look up an url.

CREATE TABLE pages
(
  id bigint NOT NULL,
  url character varying(255),
  md5 character(32),
  CONSTRAINT pages_pkey PRIMARY KEY (id)
)

The faster lookup should mainly be possible through the shorter column length (and therefore smaller index). Actually I don’t know if the fixed width is good or bad here, but hashes usually don’t vary in length. After creating this table I added a B-Tree unique Index to the md5 column to enable a fast lookup.
After a while a noticed a rather high CPU load on lookups for this table so I tried to analyze the problem. First I tried the obvious through psql.

cloud=# explain analyze select * from pages where md5 ='abc';
                                                     QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
 Index Scan using i_pages_md5 on pages  (cost=0.00..8.50 rows=1 width=166) (actual time=0.046..0.046 rows=0 loops=1)
   Index Cond: (md5 = 'abc'::bpchar)
 Total runtime: 0.157 ms
(3 rows)

As the explain shows all works perfectly fine and lookups shouldn’t be a problem. So there had to be something different what was going on.

After that I tried the same select with an actual md5.

cloud=# explain analyze select * from pages where md5 = md5('abc');
                                                  QUERY PLAN
--------------------------------------------------------------------------------------------------------------
 Seq Scan on pages  (cost=0.00..32017.63 rows=3994 width=166) (actual time=1203.699..1203.699 rows=0 loops=1)
   Filter: ((md5)::text = '900150983cd24fb0d6963f7d28e17f72'::text)
 Total runtime: 1203.769 ms
(3 rows)

Now you can see the plan does change quite a lot. I have a full table scan instead of an index scan. You can also see that the query time increases nearly by factor ten thousand.

The reason for this dramatic change is a simple type mismatch. For whatever reason the md5 function will be evaluated to a string of the type text. To create a match with the column md5 all values had to be casted to that type. The side effect of this is that the index can no longer be used, because it is of the wrong type.

To solve this I just had to cast the result of the md5 function back to something that is compatible with the index type. In my case I used a fixed width character field which is represented in the database as bpchar (blank padded character). So after modifying the query to the following I was back on index usage.

cloud=# explain analyze select * from pages where md5 = md5('abc')::bpchar;
                                                     QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
 Index Scan using i_pages_md5 on pages  (cost=0.00..8.50 rows=1 width=166) (actual time=0.141..0.141 rows=0 loops=1)
   Index Cond: (md5 = '900150983cd24fb0d6963f7d28e17f72'::bpchar)
 Total runtime: 0.199 ms
(3 rows)

, , ,

No Comments