iOS annoyances

So I’ve just had an epic fail of my HTC desire. Honestly though, its been my fault, I broke the USB connection at the bottom of the phone. As a result of this, I’m left with having to make do with the collection of iOS devices I have kicking around. First need, a night stand clock with alarm, like the one that comes as standard on Android. Is it built into iOS? No.

Quick hunt round the market and download a few of the top rated candidates. They all have 1 major failing, they don’t seem to be able to dim the backlight on either the iPod or the iPad, therefore are totally useless when sitting next to your bed. I guess I’m not surprised, Kindle on iOS doesn’t seem to be able to dim the backlight either.

Solution? Go to iOS settings, brightness, disable auto brightness, manually set brightness to minimum, then launch the night stand app. Wow. Such a pleasant user experience. Got to remember to adjust the brightness back in the morning.

I’ll make these kind of annoyances and limitations imposed on users by iOS the subject of a later blog post. A friend commented that I should get an iPhone to replace the HTC, while its a nice phone, its just that, a phone. I need a computer with me, something that does what I want, not what some guys in Cupertino want.

Estimates != Reality

All too often in software development, we are asked to provide estimates for upcoming pieces of work. Invariably, these estimates are then incorporated into a plan, which is then taken as gospel to be the timeline for when the feature will be delivered. Should the feature not be delivered on said date, then the question is asked are :”Why is the feature late?”, but that’s asking the wrong question. The right question is “Why were the estimates taken a firm dates?”

There are many factors that go into providing an estimate for the duration of a task, but ultimately, unless the task is well defined and there is a record of the time taken to implement similar features, the estimate will have a high margin of error. Unfortunately, Gantt charts can’t cope with margin’s of error, so the times fed into it are seen as absolute. Everyone is happy because there’s now a nice little graphical representation of how the project will progress, it can be tracked, and people can tick boxes and sign off on it.

Everyone is happy… except the developer. His stated caveats, concerns around the lack of prior examples, concerns about the research and exploratory nature of the tasks, are all ignored. Result is that the developer will either have too much time (leading to questions), or too little time (leading to a lot late nights), and will end up dying a little more inside having to deliver the feature to a enforced timeline.

Inaccuracy in estimates can be caused by many things, though some common examples include:

  • Not estimated by the developer doing the work
  • Estimated before enough detail is known
  • Task is not of fine enough grain to be visualised and estimated
  • Lack of prior art and estimates v.s. reality for similar features
  • No allowance for unforeseen circumstances
  • Lack of understanding of just what the feature is supposed to do or how it would be implemented

etc…

Estimating the complexity level of a particular feature is the best that can be achieved at the point where software features are typically estimated. Estimating duration/effort is usually going to result in the output of a random number generator.

The only way that I have found to get some semblance of a reasonably accurate estimate is to get the developer who’ll be implementing the task to estimate how long it would take just before they commence on the task (i.e. when all the requirements are in place, etc). It doesn’t fit well with charts and long term plans, but it does have at least a level of accuracy.

It’ll be done when its done.

Spring, Portlets and Annotation Configuration

There’s one thing that is easy to miss when setting up a Portlet using Spring MVC and Annotation configuration, and that is the correct placement of the <context:annotation-config /> elements. In the examples that I found, the element is only located in the applicationContext.xml file, unfortunately this doesn’t always work as you might expect.

In Spring MVC Portlets, the applicationContext file is used to define beans that should be available to all the portlets in that portlet application (or WAR). Similarly it also makes these beans available to any servlets that are defined in the WAR. So far so good, this is working correctly, and is exactly how you would expect it to work.

The problem arises when using Annotation Configuration, and it can be difficult to trace back to exactly what might be causing it. If you define <context:annotation-config /> in just the applicationContext.xml file, then the annotation configuration will appear to work (portlet initialization will work, and they should start up appearing to be fully wired). However when the portlets are used, they will start to throw NullPointerExceptions due to beans not being wired. It appears that @Autowired and @Resource are not being interogated when the portlet is running, but worked fine when it was being initialized.

Attempting to diagnoise this issue will have you looking at classpaths, ensuring that there are no duplicated JARs loaded by parent and child classloaders, tracing the spring logs for wiring errors, validating the presence of spring-aspects.jar, etc, etc, etc. While these are all valid approachs, and can be a good way to clean up an application, in all likelyhood what is wrong is that the <context:annotation-config /> element is missing from the individual portlet’s configuration file. Leaving this element out means that the spring autowiring won’t be enabled when the portlet is using the portal server’s classloader, but was enabled when the portlet was initialially loaded by the application server’s classloader. At least, this appears to be what’s happening in JBoss Portal Server, your miliage may vary with other servers.

How do you ensure that your annotated portlet will be wired correctly? Simple, add these to both your applicationContext.xml and xxx-Portlet.xml files:

<context:annotation-config />
<context:spring-configured />
<aop:aspectj-autoproxy/>

Note: In the case of the xxx-Portlet.xml files, its a good idea to create a common.xml file, then import that in each of the individual portlet files. This allows for commonality between all the portlets, even though they will get their own instance of any beans defined in it.
<import resource=”classpath:/com/foo/bar/context/common.xml”/>

Log at the correct level

We all know the importance of having good, clear, and useful log messages in our applications (if not then I suggest you read "Release It" from the Pragmatic bookshelf). However it is not always apparent that log messages are written at the correct level.

For example, Log4J has a number of log levels; trace, debug, info, warn, error and fatal. While it looks like its clear what you should log at each level, some major players get it wrong.

I’ve spent the last day looking into issues in JBoss Portal where my custom login module wasn’t working correctly. There was nothing unusual in the log, nor were any errors written to the screen, no indication as to what the problem might have been. After much staring at the config files, it turned out that the package name was wrong, resulting in a ClassNotFoundException. 2 seconds to fix and a few minutes to start the portal and we’re elected.

However, all of this could have been avoided if JBoss was logging something as important as a ClassNotFoundException at a level other than TRACE (see JIRA, TWITTER). At least ERROR, and probably even FATAL (given the severity of not having a login module) would have been appropriate.

The same issue can be seen with exceptions from the portlets themselves. JBoss logs these at INFO level, which is inappropriate as an exception emitting from the portlet’s execution stack will have resulted in the user experiencing an error.

So what are appropriate uses of the log levels?

TRACE – method entry/exit points, logging of method parameters, any other information that is only useful when tracing why something has gone wrong. This level will generate orders of magnitude more data than the other levels, so should only be enabled selectively on classes when diagnosing issues. Of course, that is not to say that it should only be written selectively, it should be everywhere (introduced using Aspects for the majority of cases), and it must be wrapped by if (logger.isTraceEnabled() when using string concatenation for performance reasons).

DEBUG – information relevant for the majority of debug cases, where we aim to get to the bottom of the problem without resorting to TRACE logging. E.g. database access information, generated SQL statements, etc. As with TRACE, these messages must be wrapped in if (logger.isDebugEnabled()).

INFO – informative messages from the application. e.g. Database insert took 500 ms, or HTTP POST received containing 600 bytes. These messages should be wrapped in if (logger.isInfoEnabled()).

WARN – something MAY be going wrong, but we’ve managed to recover from it for now. e.g. HTTP connection dropped, retrying.

ERROR – something HAS gone wrong, but its limited to the scope of a single user, or single operation. e.g. Database constraint violation, key already exists.

FATAL – something HAS SERIOUSLY went wrong. These errors affect multiple users, or multiple operations on the system. They usually need immediate action as the system may no longer be in operation. e.g. Database error: Database has gone away.

Using these simple rules, its not difficult to choose the correct level to log at. By doing so you’ll make life easier for the next guy.

What the Spring template should really do

Spring provides a wonderful mechanism to standard the way that connections to an underlying database are managed. The JDBC Template abstracts the developer from having to understand any particular vendor, and provides a neat way to translate vendor specific SQL errors into a standardised format.

It doesn’t go far enough. SQL can no longer be treated as a standard, but instead should be viewed as a recommendation. The differences in SQL dialects between vendors is unbelievable, and inexcusable. However there isn’t a standardised translation service available. In an earlier example I outlined the different SQL statements required to allow MySQL and Oracle to update the same row in the database:

MySQL

UPDATE TABLE1(date, username)
VALUES ("2009-01-01", "fred");

Oracle

UPDATE TABLE1(date, username)
VALUES(TO_DATE("01-JAN-2009"), "fred")

Clearly this it is undesirable to have to write these different SQL statements in an application, there is too much scope for error and poor testing to let a faulty statement through. We could use Hibernate, represent the whole thing in objects, mapping files and HQL, but that’s the programming equivalent of taking a sledgehammer to a nut. It would work (usually) but it will certainly not be pretty, nor light weight.

What is really needed is a translator, similar to the concept of Hibernate’s dialects. The translator could be part of the template, either determined automatically from the database, or wired in by configuration. Developers would then simply execute statements on the template as normal, with it handling the conversion to the relevant dialect for the database:

template.update(
    "UPDATE TABLE1(date, username) "
    + "VALUES (\"2009-01-01\", \"fred\");");

This would make the JDBC template a universal fit for accessing databases.

SQL is English

As we all know, the ANSI SQL standard is now the computer equivalent of English. Its claimed to be universal, major databases claim to speak it; but alas much like English has now splintered into several derivative langauges, SQL has many dialects.

I recently hit this issue when I tried to migrate a simple application from using MySQL to Oracle. Yes I know its going backwards, the reasons for the move were beyond my control. This application made the simplest of simple calls to the database, e.g.

INSERT INTO TABLE1(date, username)
VALUES ("2009-01-01", "fred");

Yes its that basic. One would assume that this would work on all major databases that claim to speak SQL, but you would be wrong. Running this on Oracle (using the thin client) produced an error. Aparently the ; at the end of the statement is not recognised. Hmm… Ever seen the ANSI SQL-92 standard Oracle? So remove it and we move to iteration 1 of the conversion to Oracle

INSERT INTO TABLE1(date, username)
VALUES ("2009-01-01", "fred")

Does it run? No.

What’s wrong this time? The date. Lets see, the date is in ISO 8601 format, so it must need something else. Aha, the TO_DATE function, ok

INSERT INTO TABLE1(date, username)
VALUES (TO_DATE("2009-01-01"), "fred")

Invalid month.

What? 01 is an invalid month? January anyone? Revert to google. Ok, different format for dates. Seems like ISO 8601 dates are not welcome in the land of Oracle.

INSERT INTO TABLE1(date, username)
VALUES (TO_DATE("01-JAN-2009"), "fred")

Success! Unbelievably 3 iterations of this simple SQL statement were required to get it to work on Oracle, when it was previously working on MySQL.

I’ve left out the other example which made use of an autoincrement column in MySQL. Selecting from a sequence from some table named DUAL just makes no sense. Why DUAL? How did the Oracle guys come up with this? It breaks my heart.

Seriously database people, this is the equivalent of me going up to the Queen of England and saying “I CAN HAS CHEEZBURGER?”

Cybersource on Simplifying ePayments

Cybersource are one of the leading e-commerce middleware providers, and this week their chief technical officer gave an interesting presentation on epayments. While much of the later half of the presentation was about cybersource’s products and the benefits that they bring, the first half gave an insight into just what can be expected if you try to add your own payment handling capabilities to your product.

Cheap credit cards

Perhaps the most enlightening part of the entire talk was that its possible to buy valid credit card numbers on the web for as little as £1. Cards with the supposedly secure CCV number can be had for less that £5, while the card with the pin number can be had for £10. That’s scary! With prices so low, its possible to see the potential volume of fraudulent transactions your web site may be subject to. Its also easy to understand why around 50% of shopping cards are abandoned at the payment stage. People simply don’t feel secure entering their details online.

Collecting payments

Still thinking of the DIY solution to collecting your own payments online? At this stage I must admit that I was in two minds, sure there’s the potential for fraud, but it can’t be too hard to detect. And anyway the banks back credit cards against fraud, so you’re covered there (although this assumption was later proved to be incorrect as you are in a cardholder not present scenario). Surely collecting some card details and transmitting a payment can’t be too difficult, banks have been doing this for some time, they must have a standardized interface?

The talk moved on to cover the pitfalls of trying to implement a DIY solution to collecting payments, after which I was left with the feeling that I certainly didn’t want to try to do this myself.

Payment acceptance

Obvious really, you need to accept a payment for the product/service you are trying to sell. However that payment needs to go somewhere, and will probably go into a merchant bank account whereupon the bank will charge you around 2.5% of the value. Ouch. What’s more you will likely need to have accounts for each currency you want to collect payments in, or suffer more charges on currency conversion.

But what about a standardized interface for banks to collect payments? Apparently not, each payment vendor may well be using their own unique proprietary interface. This in itself is a bit of a headache but could be managed with the relevant abstractions in the code. Banks require that your interactions with these interfaces be certified (and re-certified on a regular basis), which makes sense. They want you to minimize the risk of the payments failing, or you sending them rubbish. However this certification means testing, which takes time to create, run and verify, and with each bank potentially having different processes for payments, suddenly that abstraction layer is looking a little more complex.

Assuming that the interfaces can be created and that they pass the certification, there’s still a cost in their day to day usage. They must be monitored to ensure that they continue to function as expected, and they may need to be re-certified several times a year against new versions of the banking interfaces.

Order verification

Before shipping the order, the system must be satisfied that the customer is who they claim to be, and that their payment details match. In essence details such as the shipping address matching the billing address, or one of the other known addresses for that customer. That multiple different account holders haven’t been seen at the same address, or if they have that there’s a reasonable explanation for them (e.g. multiple members of the same family, shared accommodation, etc).

Orders that were verified and shipped, but later challenged through the banks may need to be contested. Resources will need to be allocated to deal with researching the transaction and the customer, and to liaise with the bank. Apparently the banks can reverse the transaction, taking the payment back out of the merchant’s account, without prior notification.

Its not possible to determine the validity of all orders through an automated system, some of them will ultimately be passed into a manual process. Cybersource estimates that this is as much as 30% of all orders, which in a successful company could equate to a large amount of time needing to be spend manually reviewing these orders. Even in a small company, its unlikely that they would be able to support people dedicated to reviewing orders. Manual review takes time as details such as addresses, phone numbers, email, IP addresses an order history all need to be taken into account when determining a particular order’s validity.

Processing Management

Actually collecting the payment from the customer without accruing additional bank charges provides more challenges. Banks may charge extra for any deviation in the standard happy path process. So if we leave it too long between getting an AUTH code back on the account and actually collecting the payment we may have to pay more.

There may be issues that cause us to have to do additional processing on our side, work beyond our normal happy path collection process. The customer’s account many not actually have the money in the account when we go to collect it, their card details may have expired, etc. These issues can become a bigger problem if we have already sent the product out.

Reconciliation

To balance the books we have to reconcile the bank’s records of transactions with our own records. Determining what products were sent that weren’t paid for, or worse what products that were paid for that weren’t sent (seriously unhappy customers there). Reversed transactions need to be recorded and our own fraud database updated so that we can get smarter about detecting them in the future. It would be nice to assume that we could automate this reconciliation, but we face the same issue of having potentially different interfaces for each bank that we deal with.

Security

Perhaps the most problematic and headache inducing area of all is that of payment security. Having payment information flowing through our system leaves us vulnerable to hackers. Its no longer the script kiddies who we have to fear, but organized gangs with some really smart people paid to hack online retailers. Could we react quick enough if a vulnerability is found in a product that we use? How long would we have to be exposed for before compromised?

A breach would not only be bad for our brand integrity, it is a potentially terminal problem. Issuers may revoke our license to accept payment through their means, banks may refuse or revoke our merchant accounts. Even if they don’t we may be forced to notify all our customers of the security breach, and that their data may be compromised, costly and damaging to the brand.

To combat the hackers we would need people smarter than they are, working on ways to ensure that the system remains secure. To put in place and maintain appropriate encryption and indirection to make it impossible to have unauthorized access to the payment records. These people could well be better spent on ways to drive the business forward rather than spending their time build better walls and ditches in the existing system.

Summary

All of the above can lead to quite a significant cost and overhead on our business, and has certainly put me off the notion of creating my own payment collection system in any project. The talk went on to cover Cybersource’s solution to these problems, but this is well documented on their own website and literature, so would be better explained by them directly.

Overall it was quite an interesting talk, hopefully they will follow it up with some discussions around how they have put together a system to handle such large volumes of data.

JBoss Portal and overwrite-workflow

Recently I’ve been working on a custom workflow for user registration in JBoss Portal server. This is an area that is lacking in the otherwise adequate JBoss documentation, and has some quirks that confused me and lead to quite a bit of time researching how its implemented to better understand how to make it work. I hope to cover more of the particulars of the jbpm integration in Portal, culminating in a tutorial on how to create a custom user registration process.

But back to the importance of the overwrite-worfklow element in identity-ui-configuration.xml. By default when you first add a new process xml file to the /jboss-portal-ha.sar/portal-identity.sar/con/processes directory, it will be picked up by the portal server on its next restart and loaded into its jbpm database. This can lead you to believe that if the portal server is smart enough to recognize a new process has been deployed then it will be smart enough to recognize that the process has been updated too. Therefore simply overwriting the process definition file will upload a new definition to the database right?

Wrong. Unfortunately the server can only detect new process definition files in that directory, and won’t examine existing ones to look for changes. To have the portal server update existing processes that have already been deployed you must set overwrite-workflow to true. However as described in the documentation overwrite-workflow does not function just quite how you might like.

overwrite-workflow: overwrites existing process definitions

As it states, it overwrites existing process definitions, but that means that it will deploy new versions of all the process definitions that exist rather than examining them and determining which of them really need to be updated. So even if your process definition has not changed, it will deploy a new version.

Of course this may not matter as jbpm versions all the process definitions in the database, and any suspended or inflight process instances continue to run using the version of the process definition that they started with. Production instances of portal are (hopefully) rarely restarted and therefore will rarely deploy new process instances. But there are a couple of gotchas to watch out for.

Portal is frequently restarted during the development phase, often several times a day. This can lead to the jbpm database quickly having many versions of all the process definitions, which in turn can lead to some interesting problems trying to debug issues as it may not be possible to easily relate a process instance to a process definition xml file. This problem will be worsened if the portal database is shared between several developers.

In production, Portal is often clustered. If overwrite-workflow is set to true on all the portal instances then new definition versions will be deployed when each of the portal instances are restarted. This could quickly become a problem if there are many instances or if the portal server needs to be restarted outside of planned maintenance windows. In production there could well be a large number of running process instances that may need to be related back to a particular version for diagnostic purposes.

What can be done?

The simplest solution is to set overwrite-workflow to true only when you actually have new versions of the process definition files to deploy. The downsides of this approach are that it will deploy new versions of all the processes found in the directory, regardless of whether or not they have changed, and that you must remember to keep toggling the switch on and off.

For clustered production nodes, I recommend that during an upgrade process one instance is taken down and used to update the process definitions. After which the setting is immediately set to false again. The same configuration file can be used on all the instances, as its only ever modified briefly during the upgrade of one portal server instance. It should be noted though that if your new definition requires some updated code that you are also deploying, you may well need to take down all of the instances.

Summary

In brief, the original problem was that JBoss portal does not pick up updated instances of jbpm process definitions in the /jboss-portal-ha.sar/portal-identity.sar/conf/processes directory. It will only detect process definition files there the first time they are copied there. Subsequent updates to those files will not be deployed.

A workaround/correction to this issue is to set overwrite-workflow to true in identity-ui-configuration.xml. However this has the side-effects of redeploying all the process definitions in the directory, and continuing to redeploy them each time the portal is restarted, potentially leading to a large number of process definition versions in the database.