Cyberborean Chronicles

URL’s in Java: Another pitfall

If ClassLoader Delegation, though esoteric and counterintuitive, is an example of some design, there are some things in Java that are confusing and just poorly designed at once.

When testing my new Java application, I noticed strange freezing sometimes. It happened irregularly and the program did nothing serious at that time – no complex calculations, no heavy data transfers, etc. Nevertheless, I noticed that those freezings where related to my internet connection glitches – unfortunately, it occasionally happens here. But my program didn’t do anything with internet at those moments!

I did some profiling and found the bottleneck – to my surprise, it was get() method of a HashMap instance. Sometimes it took about 20 seconds only for just checking if an object exists in the map. I was shocked. HashMaps! I used it for ages and was sure they are reasonably fast kind of collections.

One way or another, I found the problem. The point was that I used java.net.URL instances as the HashMap keys. URL objects in Java, in comparison with simple strings, are convenient as they store URLs as the sets of separate fields (protocol, host, path etc), guarantee URL well-formedness and enable to create a connection with a single connect() method.

But its implementation is horrible. As it turned out, two generic object methods – equals() and hashCode() needed for internet connection and domain name resolving just to check equality of two URLs or to compute object’s hash code.

Two URL objects are equal if they have the same protocol, reference equivalent hosts, have the same port number on the host, and the same file and fragment of the file.

Two hosts are considered equivalent if both host names can be resolved into the same IP addresses; else if either host name can’t be resolved, the host names must be equal without regard to case; or both host names equal to null.

Since hosts comparison requires name resolution, this operation is a blocking operation.

(java.net.URL javadoc)

I have no idea what were the reasons to do that, but it is bad. Simply bad. Such common operations as equality test (and hash code computation) should rely on an object data only. Making it dependent on external unreliable resources is dangerous. Also, those operations must be fast as possible, because they are called frequently (often inside the loops). When the time of their execution is equal to the connection time-out, your program simply don’t work.

Even if there were some reasons for that, it works wrong.

Note: The defined behavior for equals is known to be inconsistent with virtual hosting in HTTP.

(Ibid.)

Sure. For instance, URLs of all WordPress blogs are equal from point of view of java.net.URL, as long as all of them are resolved to the same IP address. And a whole lot of other web-sites, unless they are hosted on the dedicated servers.

URLs are just the text strings with known syntax. They should work offline, no matter they are “http:” or “file:”. They don’t need to be resolved to the IP addresses, unless a referenced resource is requested.

Do not use java.net.URL. Store URLs as simple String objects. If you need to keep URLs structured and well-formed, you can use java.net.URI class instead, but the strings are faster anyway. You always can parse a string into java.net.URI object when it is needed. You can parse it into java.net.URL as well, but.. better don’t do it.

One Response to “URL’s in Java: Another pitfall”

Leave a Reply