Gangmax Blog

layout: post
title: Shenzhen POC Summary
date: 2012-08-14 13:22
comments: true
tags:

work
published: false

Background

The Shenzhen Mobile’s application is a pure C++ program, which gets queue messages from the message queue server and text files from the SSH server as the input, does some population/verification, then puts the handling result to the Oracle database and writes queue messages back to the message queue server.

As the customer’s request, in this POC the customer side provides an dummy program, which reads input queue messages and text files from given SSH server, then writes output queue messages as result to the message queue server. We don’t need to consider the database part in this POC.

The customer wants a customized scaling policy, which needs IWD monitors the input queue length and do the scale in/out actions according to the predefined scaling policy metrics.

The message queue server which the customer is using is “WebSphere MQ Version 7.0”.

The customer’s application is running on Linux. We don’t know what the exact version it is but the customer builds the dummy application for RHEL 6.0 x86-64, which is the version IWD will use in the POC.

Our Work

Environment setup
1. Prepare the message queue server: create a VM and install the message queue server, configure it, create needed queues;
2. Prepare an SSH server: create a VM with SSH access available, create text file for test.
Plug-in development
1. Create new pattern type and plug-in projects, meta data, transformer and other configuration files;
2. In the new plug-in, write the code to install the message queue client, which is needed by the customer’s application;
3. Write code to install the customer’s application and start it;
4. Scaling policy: a program which can query the input queue length from the message queue server, register this program to let the monitoring framework of IWD be aware and it will gather the queue length data periodically and do the scale in/out actions as defined.

Challenges

Runtime library issue

The customer’s application needs some libraries provided by the message queue client at runtime. The first time after installing the message queue client on the VM and starting the customer’s application, it reports “error while loading shared libraries: libimqc23gl.so: cannot open shared object file: No such file or directory” error. “libimqc23gl.so” is a library file provided by the queue client and it does exists on the VM.

It took us a lot of time to figure out that, the customer’s application attempts to load 2 library files from a different location other than the files’ actual location. So the fix is obvious and simple after we know that, create the symbolic links at the right location for the customer’s application to load them correctly.
Queue client access issue

When the customer’s application accessed the queue server, it got permission error. After consulting our MQ engineer, we know that some queue server side account configuration is needed.
Agent process hung after starting the customer’s application as a background process

In the role life cycle scripts, the plug-in calls a bash script to install the customer’s application and start it as a background process. But we found after the agent calls the line which calls the bash script, it’s hung(at “com.ibm.maestro.common.utils.CommandInvoker.java”, line 156. Found by “kill -3” dump command).

Finally I found the proper way to resolve this hung issue. All we need to do is to change the line which starts the background process in the shell script, like this:

1	./ValsrvDemo >/dev/null 2>&1 &

while ((len = iStream.read(buffer)) != -1) {
    if (os != null) {
        os.write(buffer, 0, len);
    }
}

Deploy this new plug-in to IWD 3.1.0.2, it can’t work properly. This is an defect and now it’s fixed.

Result

On Aug 8, we setup the demo environment at the customer building. On Aug 9, we presented a successful demo to the customer members.

In the demo, the customers mentioned that they want a feature that a third party system can call our API to create “Cloud Group” and “IP Group”, which is supported by IWD.

When setup the demo environment, 2 issues and the corresponding fixes:

The queue client can’t access the queue server.

It happened because the VM where queue server is on didn’t stop “iptables” service.
Agent on the VM doesn’t execute.

It happened because we didn’t disable the invalid IP address of IWD itself in the system configuration. Then when the agent downloading artifacts from SH(Store House), it used the invalid URL instead of the right one. Removing the invalid IP address of IWD itself resolved this problem.

Thoughts

About the Plug-in Development

Some conventions are not so obvious for a beginner to follow. Such as in the template-based VM template JSON configuration file, the “type” attribute should match to a role name. I should read more document before I started my work, however this kind of implicit conventions make it a little bit more difficult for a beginner to start the work.
As a beginner, the debugging process may take a lot of time. Some lessons learned here:
1. Use an IWD instance in the local network. It will save a lot of time when deploying a plug-in on it whose size is more than 100MB. I deployed an 126MB plug-in on an IWD instance in the “172.16.*.*“ remote network, the uploading process took more than 20 minutes.
2. In a normal debugging process, you build the plug-in, deploy it, create a corresponding pattern type, start an instance, wait for the VM starting, scripts are executed, then you may found something wrong. You change some code, undeploy the exiting plug-in from IWD, and do the whole process again. The whole cycle takes at least 15-20 minutes to get the result of your modification. However, we can save some time if we use another approach, which I didn’t know at the very beginning: Green iterative plugin development without redeploy through reuse. With this approach we can save the time that the hypervisor starts an new VM.

About the PDK Eclipse Tool

The PDK Eclipse tool is very helpful to create the new “pattern type/plug-in” project(s), and very efficient to build the plug-in package, deploy it to IWD, undeploy from IWD.
1. New project wizard;
2. Topology provider wizard;
3. Build/Deploy/Remove functions make the build and deployment process very easy and efficient and save the developer a lot of time.
JSON editor provided by the PDK Eclipse tool is helpful when editing the JSON files, it makes the JSON editing more clearer and less defects.
Something may be added to the PDK Eclipse tool to accelerate the development more:
1. Control the VM debugging process, such as re-running the role life cycle script;
2. Download and view the log files from Eclipse instead of logging in the VM to read them;
3. Integrated SSH client in Eclipse.

Background

Our Work

Challenges

Result

Thoughts

Comments