2016-05-02

PHP DateTime: Mind the exclamation mark

A software I am currently working on manages employee schedules. Employees can request certain shifts and once a month a shift schedule is generated which tries to fulfill those requests. As soon as a plan is published, requests for that month should obviously no longer be accepted.

On the 31st of March (mind the date, it will become important later on), I received a strange error description from a customer using the software: Users complained that they could not enter their preferences for May, but they could enter preferences for the already published month of April. They said that this error had not been there the day before and they had not worked on the plans. Similarly, I knew that we did not deploy a new software release in the last 24 hours. So what was going on?

After a bit of debugging, I stumbled upon the piece of code that calculates which months are published and which are not. It takes an array of month/year strings in the format "2015-01" for months which have a published plan and transforms it into an array of DateTime objects, which represent the first day of each month with a published plan. This is then passed to the GUI so it can decide whether users can enter preferences for a certain month. Here is a simplified version of that function:

public function getPublishedStatePerMonth(array $publishedMonths) {
    return array_map(function ($element) {
        return \DateTime::createFromFormat("Y-m", $element);
    }, $publishedMonths);
}

Seems innocent enough: We just take a string like "2016-04" and parse it to a DateTime which should then contain "2016-04-01 00:00:00". However, there is a little quirk to how this function handles dates: When you do not specify an exclamation mark in your format string, PHP only reads the values which are present in the format string and takes the current date and time for all other values. So, on 31 March the string "2016-04" was passed into the function. PHP created a DateTime object for "2016-04-31" and because April has only 30 days, this was converted to "2016-05-01". The GUI therefore believed that the plan for May had already been published, whereas there was no entry for April, so users could enter preferences for April, but not for May.

Of course, there is a way to prevent this problem. PHP allows you to overwrite the behavior of always defaulting all values to the current value. If you prefix your format string with an exclamation mark, all values which are not read form the format string default to the Unix epoch, giving you zeroed values for all time fields and the first day of the month for your day field.

What we can learn from this is that you should unit-test even the simplest methods. This method had never worked as intended, as the date part was always the current day and never the first of the month. However, it turns out that this was not read anywhere in the GUI. A unit test would have caught that problem and probably prompted a developer to fix it by adding the exclamation mark. In the end, we refactored this method. As correct date handling is hard in PHP, we created a helper class to take care of recurring problems for us. One of the methods in this class is now fittingly named "getFirstDayOfMonth".

2016-03-07

Slow MySQL connections when using local IP but not loopback address

Currently working on a Symfony application, I have a MySQL database server and an Apache server set up in a VirtualBox virtual machine. My own machine and my VirtualBox machine are on a VirtualBox Host Only Network. This means that there is a virtual network on which only those two machines can communicate with each other. My machine is 192.168.56.1, whereas the VM is 192.168.56.2. As I want to access this database from both my Apache inside the VM as well as from the command line outside of the VM, I have set the IP address of my VM (192.168.56.2) as the database IP address in my Symfony config.

This did work, but my Symfony site was terribly slow. It took more than 30 seconds for each HTTP request to complete, meaning I had to wait 30 seconds for the actual page to be loaded and on top of that 30 seconds for each asset the page required. All in all, loading a single page took about 2 minutes. Obviously, this speed is inacceptable when working on an application.

After quite a bit of investigation with the Xdebug Profiler, I realized that creating the database connection was the main bottleneck. I verified this manually by connecting to the IP from the command line inside the VM using the MySQL CLI client. Even though I was on the same maching as the IP I was connecting to, establishing the connection took almost 30 seconds. Having narrowed it down to this issue, finding the solution was straightforward. There are several questions on StackOverflow regarding this problem, for example: Why is connecting to MySQL server so slow?. And indeed: Changing the server IP to 127.0.0.1 fixed the problem immediately.

It might seem counter-intuitive that connecting to an IP address which is associated with your local machine is this much slower than connecting via the loopback interface on 127.0.0.1. The reason can be found when looking at the linked MySQL documentation on how MySQL handles DNS lookup during connection: When someone connects to the MySQL server, the server tries to do a reverse DNS lookup of the connecting person's IP address to determine their host name. This is then used to match the host name against the access list to find out whether the connecting user is allowed to access the server and with what permissions. Obviously, this needs to be done before letting the user into the system, so the connection is halted. On a private network, however, IP addresses can not be looked up as there will with a very high probability no DNS server available which can resolve them. This means MySQL keeps on trying until it runs into a timeout, which seems to be set to 30 seconds. The reason that connecting via 127.0.0.1 is not affected by this problem is that MySQL recognizes this as the loopback interface and knows that looking up IPs on this interface will never work, so it skips the lookup directly.

So what can be done to fix this? The easiest way is to connect via the loopback interface on 127.0.0.1. However, if you need to connect to an intranet IP address because your database is on a different machine, you should either make sure that IPs can be looked up or disable IP lookup on the MySQL server using the skip-name-resolve option on your MySQL server. Note that disabling this option will mean that you can no longer use hosts in your user permissions and need to set permissions based on IP addresses instead.

2016-03-01

StrokesPlus and GTK applications: clicking problems resolved

StrokesPlus is a neat tool for mouse gestures under Windows. The basic gist of mouse gestures is that you hold a dedicated button down and then perform a gesture by moving your mouse in a certain way. A mouse gestures tool, such as StrokesPlus, can then recognize the gesture and perform various actions based on what you configured it to do.

I have been using StrokesPlus heavily on Windows 8 and am still very satisfied with it. However, it does not work well together with GTK-based applications. I had always assumed this was a Windows 8 issue, but today found out it is actually a StrokesPlus problem. Whenever I use a GTK application, such as, e.g., GIMP, Meld, or Wireshark, I can only make one click in the application's window. Afterwards, I need to move the mouse out of the application window and back in to make the next click. This is quite annoying, especially when you have to do a lot of clicks, such as when editing images with GIMP.

Today, I stumbled upon a solution to this problem. It seems StrokesPlus draws an invisible window over all other applications to capture the mouse gestures. This probably prevents GTK applications from correctly capturing clicks, resulting in the behavior described above. The fix is to change the StrokesPlus settings so that this window is not always kept above all other windows.

To do this, go to the StrokesPlus settings and uncheck the checkbox "Keep Gesture Draw Window On Top". Afterwards, close and re-open StrokesPlus and all your GTK applications should work fine!

2016-02-29

composer.lock: Why you need it and why it should be kept in version control

Composer is a dependency manager for PHP, used by many projects, most notably the Symfony PHP framework. It allows developers to specify which dependencies a project has by creating a composer.json file. When composer is run with "composer install", it uses the composer.json file to find out which versions of the dependencies to get. I do not want to get into too much detail regarding how version constraints can be specified in the composer.json file. If you are interested, there is a detailed manual on versions for Composer. Suffice to say that you usually do not want to specify a fixed version, but instead want to specify a minimum version so that your software can be updated to use newer versions of your dependencies.

The problem with this approach should already be apparent: When you create the project or checkout the project from source control and run "composer install", you will get the newest compatible versions of the dependencies which are available at the time you run the command. However, when another developer checks out the code, the same will happen to him, only that by then there could be a newer version than what you have. It is possible, that this newer version is incompatible with your project, which you did not anticipate when defining the version constraints in the composer.json file. So now the other developer can not get your project working because the dependency fails and, being new on the project, has no idea how to fix it. You, on the other hand, are happily coding on your checked out copy of the code which uses the old version of the dependencies and will never be aware of the problem.

The gist of this example is that it is bad practice to use different versions of dependencies among developers. You want all developers to use the same version of the dependencies to get reproducible results for all checked out copies of your code. The Composer folks were aware of this problem and created composer.lock to ensure this.

When you run "composer install" for the first time, Composer will find the newest compatible version of your dependencies and install them. It will then generate the composer.lock file which contains information about the exact version which was installed so that it is possible to restore this state on your machine and on other machines. So unlike the composer.json file, the composer.lock file contains specific versions numbers for all your dependencies. The next time you or someone else runs "composer install", Composer will try to find the composer.lock file and install the version of the dependencies specified in there.

Now it should also be obvious why you should add the composer.lock file to version control: By sharing this file, you let all other developers know which versions of the dependencies you have installed and tested your code with. They will now no longer need to fix possible bugs which might happen with newer versions of your dependencies.

So why should you not specify exact version constraints directly in your composer.json file? Well, this file is used to specify which versions your software should be compatible with, so which versions it should be safe to upgrade to. If you specify version constraints correctly and all your dependencies obey semantic versioning rules, you can run "composer update" to get the newest compatible version for each dependency as specified in the composer.json file. Note that this does not read the composer.lock file, but instead updates it with the new exact version numbers you are now using. After you have tested and verified everything still works, you can commit your updated composer.lock file to the repository. All other developers can then run "composer install" again to use the same versions as you.