Grabbing String From Dynamic Content

Few days ago I was asked to debug someone’s code. The problem lies in a small function that contains around 5 lines of code. Its purpose is to grab an URL from a src attribute in an img tag, within an RSS feed. This is a PHP project. It was done using a mixture of substring and str_replace. First, the tag name and portion of the attribute are replaced by empty string e.g str_replace('<img src="', '', $input). Finally, a substr is used to extract the URL.

When the application was not working, I was asked to help debug it. In the end, I found out that there where some extra white spaces in front of the extracted URL string. In the end, we fixed it by using applying a trim function. I dislike this solution.

In this particular situation, the task is to extract an image URL from a RSS feed. The best solution is to use an XML parser, such as SimpleXMLElement, to locate and extract the attribute value. Using simple string searching functions is bad, because a slight change in the input can easily cause a bug. A XML parser can be used to extract content accurately, even if there are irregular spacing, and minor change in tags arrangement.

For unstructured text, regular expression is a good alternative solution. The sad thing is, many programmers do not know regular expression. Regular expression may be hard to learn, but it is an extremely powerful tool for string searching! I am not going to talk about why regular expressions are so powerful, there are many articles on this topic already.

NOTE: I am not implying that regular expressions and XML parser are the best solutions to all string searching problems. It depends on the requirements. Although The PHP’s native string searching functions are less flexible, but they are generally much faster then regular expressions. When making a decision, I will consider the performance, and how structured the input is.

Posted in Tao Of Programming at April 12th, 2010. No Comments.

[Linux] Aventail VPN Client 8.90.263

In order to connect to my company’s intranet, I need to install an Aventail client. I downloaded the package, unzip it, and execute install.sh. The installation process it as easy as ABC. The applications is installed in the /usr/local/Aventail/ folder. You can run the client either by executing startct or startctui. The startctui is a Java GUI client.

When I tried to establish a connection, it just couldn’t connect to my company’s VPN gateway. The /var/log/AvConnect.log log file tells me that handshake failed. After some googling, I found the solution at http://just-another.net/2008/11/20/ubuntu-intrepid-and-aventail-ssl-client/.

The problem lies in libssl and libcrypto from OpenSSL package. I am running CrunchBang (Ububtu based distro). The OpenSSL offered in the repository is version 0.9.8g. You can verify it by doing a “ldd /usr/local/Aventail/AvConnect“. AvConnect depdends on /usr/lib/libssl.so.0.9.7, which is a symbolic link pointing to libssl.so.0.9.8. Likewise for libcrypto too.

Solution:

  1. Download OpenSSL 0.9.8i.
  2. Compile it, installation is not necessary.
    After compilation, it generates libssl.so and libcrypto.so.
  3. Unlink the original /usr/lib/libssl.so.0.9.7 and /usr/lib/libcrypto.so.0.9.7.
  4. Create symbolic links to the newly created libssl.so and libcrypto.so.

This is just a summary. The detail steps are documented at http://just-another.net/2008/11/20/ubuntu-intrepid-and-aventail-ssl-client/. When unlinking and creating links in /usr/lib, make sure you are doing it as root.

At the time of writing, the latest stable version of OpenSSL is 0.9.8l. It does not work. So you should stick to 0.9.8i! Version i works but not g and l. I guess something must be broken in the later build. Hope the OpenSSL team will retify the problem soon.

Posted in Blog at December 4th, 2009. No Comments.