common table expressions in postgres


Four weeks ago I was askes to show some features of PostgreSQL. In that presentation I came up with an interesting statement, with which I could show nice feature.

What I’m talking about is the usage of common table expressions (or short CTE) and explain.

Common table expressions create a temporary table just for this query. The result can be used anywhere in the rest of the query. It is pretty useful to group sub selects into smaller chunks, but also to create DML statements which return data.

A statement using CTEs can look like this:

with numbers as (
  select generate_series(1,10)
select * from numbers;

But it gets even nicer, when we can use this to move data between tables, for example to archive old data.

Lets create a table and an archive table and try it out.

$ create table foo(
  id serial primary key,
  t text
$ create table foo_archive(
  like foo
$ insert into foo(t)
  select generate_series(1,500);

The like option can be used to copy the table structure to a new table.

The table foo is now filled with data. Next we will delete all rows where the modulus 25 of the ID resolves to 0 and insert the row to the archive table.

$ with deleted_rows as (
  delete from foo where id % 25 = 0 returning *
insert into foo_archive select * from deleted_rows;

Another nice feature of postgres is the possibility to get an explain from a delete or insert. So when we prepend explain to the above query, we get this explain:

                            QUERY PLAN
 Insert on foo_archive  (cost=28.45..28.57 rows=6 width=36)
   CTE deleted_rows
     ->  Delete on foo  (cost=0.00..28.45 rows=6 width=6)
           ->  Seq Scan on foo  (cost=0.00..28.45 rows=6 width=6)
                 Filter: ((id % 25) = 0)
   ->  CTE Scan on deleted_rows  (cost=0.00..0.12 rows=6 width=36)
(6 rows)

This explain shows, that a sequence scan is done for the delete and grouped into the CTE deleted_rows, our temporary view. This is then scanned again and used to insert the data into foo_archive.

range types in postgres


Nearly two years ago, Postgres got a very nice feature - range types. These are available for timestamps, numerics and integers. The problem is, that till now, I didn’t have a good example what one could do with it. But today someone gave me a quest to use it!

His problem was, that they had id ranges used by customers and they weren’t sure if they overlapped. The table looked something like this:

create table ranges(
  range_id serial primary key,
  lower_bound bigint not null,
  upper_bound bigint not null

With data like this

insert into ranges(lower_bound, upper_bound) values
  (120000, 120500), (123000, 123750), (123750, 124000);

They had something like 40,000 rows of that kind. So this was perfect for using range type queries.

To find out, if there was an overlap, I used the following query

select *
  from ranges r1
  join ranges r2
    on int8range(r1.lower_bound, r1.upper_bound, '[]') &&
       int8range(r2.lower_bound, r2.upper_bound, '[]')
 where r1.range_id != r2.range_id;

In this case, int8range takes two bigint values and converts it to a range. The string [] defines if the two values are included or excluded in the range. In this example, they are included. The output for this query looked like the following

 range_id │ lower_bound │ upper_bound │ range_id │ lower_bound │ upper_bound
        2 │      123000 │      123750 │        3 │      123750 │      124000
        3 │      123750 │      124000 │        2 │      123000 │      123750
(2 rows)

Time: 0.317 ms

But as I said, the table had 40,000 values. That means the set to filter has a size of 1.6 billion entries. The computation of the query took a very long time, so I used another nice feature of postgres - transactions.

The idea was to add a temporary index to get the computation done in a much faster time (the index is also described in the documentation).

create index on ranges using gist(int8range(lower_bound, upper_bound, '[]'));
select *
  from ranges r1
  join ranges r2
    on int8range(r1.lower_bound, r1.upper_bound, '[]') &&
       int8range(r2.lower_bound, r2.upper_bound, '[]')
 where r1.range_id != r2.range_id;

The overall runtime in my case was 300ms, so the writelock wasn’t that much of a concern anymore.

learning the ansible way


Some weeks ago I read a blog post about rolling out your configs with ansible as a way to learn how to use it. The posts wasn’t full of information how to do it, but his repository was a great inspiration.

As I stopped using cfengine and instead wanted to use ansible, that was a great opportunity to further learn how to use it and I have to say, it is a really nice experience. Apart from a bunch configs I find every now and then, I have everything in my config repository.

The config is split at the moment between servers and workstations, but using an inventory file with localhost. As I mostly use freebsd and archlinux, I had to set the python interpreter path to different locations. There are two ways to do that in ansible. The first is to add it to the inventory



and the other is to set it in the playbook

- hosts: hosts
    ansible_python_interpreter: /usr/local/bin/python2
    - vim

The latter has the small disadvantage, that running plain ansible is not possible. Ansible in the command and check mode also needs an inventory and uses the variables too. But if they are not stated there, ansible has no idea what to do. But at the moment, it isn’t so much a problem. Maybe that problem can be solved by using a dynamic inventory.

What I can definitely recommend is using roles. These are descriptions on what to do and can be filled with variables from the outside. I have used them bundle all tasks for one topic. Then I can unclude these for the hosts I want them to have, which makes rather nice playbooks. One good example is my vim config, as it shows how to use lists.

All in all I’m pretty impressed how well it works. At the moment I’m working on a way to provision jails automatically, so that I can run the new server completely through ansible. Should make moving to a new server in the fututre much easier.

playing with go


For some weeks now I have been playing with Go, a programming language developed with support from google. I’m not really sure yet, if I like it or not.

The ugly things first - so that the nice things can be enjoyed longer.

Gos package management is probably one of the worst points of the language. It has an included system to load code from any repository system, but everything has to be versioned. The weird thing is that they forgot to make it possible to pin the dependencies to a specific version. Some projects are on the way to implement this feature, but it will probably take some time.

What I also miss a shell to test code and just try stuff. Go is a language which is compiled. I really like it for small code spikes, calculations and the like. I really hope they will include it sometime in the future, but I doubt it.

With that comes also a very strict project directory structure, which makes it nearly impossible to just open a project and code away. One has to move into the project structure.

The naming of functions and variables is strict too. Everything is bound to the package namespace by default. If the variable, type or function begins with a capital letter, it means that the object is exported and can be used from other packages.

// a public function
func FooBar() {

// not a public function
func fooBar() {

Coming from other programming languages, it might be a bit irritating and I still don’t really like the strictness, but my hands learned the lesson and mostly capitalize it for me.

Now the most interesting part for me is, that I can use Go very easily. I have to look for much of the functions, but the syntax is very easy to learn. Just for fun I built a small cassandra benchmark in a couple of hours and it works very nice.

After some adjustments it even ran in parallel and is now stressing a cassandra cluster for more than 3 weeks. That was a very nice experience.

Starting a thread in Go is surprisingly easy. There is nothing much needed to get it started.

go function(arg2, arg2)

It is really nice to just include a small two letter command to get the function to run in parallel.

Go also includes a feature I wished for some time in Ruby. Here is an example of what I mean

def foo(arg1)
  return unless arg1.respond_to?(:bar)

What this function does is test the argument for a specific method. Essentially it is an interface without a name. For some time I found that pretty nice to ask for methods instead of some weird name someone put behind the class name.

The Go designers found another way for the same problem. They called them also interfaces, but they work a bit differently. The same example, but this time in Go

type Barer interface {
  func Bar()

func foo(b Bar) {

In Go, we give our method constraint a name and use that in the function definition. But instead of adding the name to the struct or class like in Java, only the method has to be implemented and the compiler takes care of the rest.

But the biggest improvement for me is the tooling around Go. They deliver it with a formatting tool, a documentation and a test tool. And everything works blazingly fast. Even the compiler can run in mere seconds instead of minutes. It actually makes fun to have such a fast feedback cycle with a compiled language.

So for me, Go is definitely an interesting but not perfect project. The language definition is great and the tooling is good. But the strict and weird project directory structure and project management is currently a big problem for me.

I hope they get that figured out and then I will gladly use Go for some stuff.

no cfengine anymore


I thought I could write more good stuff about cfengine, but it had some pretty serious issues for me.

The first issue is the documentation. There are two documents available. One for an older version but very well written and a newer one which is a nightmare to navigate. I would use the older version, if it would work all the time.

The second issue is that cfengine can destroy itself. cfengine is one of the oldest configuration management systems and I didn’t expect that.

Given a configuration error, the server will give out the files to the agents. As the agent pulls are configured in the same promise files as the rest of the system an error in any file will result in the agent not being able to pull any new version.

Further is the syntax not easy at all and has some bogus limitations. For example it is not allowed to name a promise file with a dash. But instead of a warning or error, cfengine just can’t find the file.

This is not at all what I expect to get.

What I need is a system, which can’t deactivate itself or even better, just runs on a central server. I also didn’t want to run weird scripts just to get ruby compiled on the system to setup the configuration management. In my eyes, that is part of the job of the tool.

The only one I found which can handle that seems to be ansible. It is written in python and runs all commands remote with the help of python or in a raw mode. The first tests also looked very promising. I will keep posting, how it is going.

scan to samba share with HP Officejet pro 8600


Yesterday I bought a printer/scanner combination, a HP Officejet pro 8600. It has some nice functions included, but the most important for us was the ability to print to a network storage. As I did not find any documentation on how it is possible to get the printer to speak with a samba share, I will describe it here.

To get started I assume, that you already have a configured and running samba server.

The first step is to create a new system user and group. This user will used to create a login on the samba server for the scanner. The group will hold all users which should have access to the scanned documents. The following commands are for freebsd, but there should be an equivalent for any other system (like useradd).

pw groupadd -n scans
pw useradd -n scans -u 10000 -c "login for scanner" -d /nonexistent -g scans -s /usr/sbin/nologin

We can already add the user to the samba user managament. Don’t forget to set a strong password.

smbpasswd -a scans

As we have the group for all scan users, we can add every account which should have access

pw groupmod scans -m gibheer,stormwind

Now we need a directory to store the scans into. We make sure, that none other than group members can modify data in that directory.

zfs create rpool/export/scans
chown scans:scans /export/scans
chmod 770 /export/scans

Now that we have the system stuff done, we need to configure the share in the samba config. Add and modify the following part

comment = scan directory
path = /export/scans
writeable = yes
create mode = 0660
guest ok = no
valid users = @scans

Now restart/reload the samba server and the share should be good to go. The only thing left is to configure the scanner to use that share. I did it over the webinterface. For that, go to https://<yourscannerhere>/#hId-NetworkFolderAccounts. The we add a new network folder with the following data:

  • display name: scans
  • network path:
  • user name: scans
  • password:

In the next step, you can secure the network drive with a pin. In the third step you can set the default scan settings and now you are done. Safe and test the settings and everything should work fine. The first scan will be named scan.pdf and all following have an id appended. Too bad there isn’t a setting to append a timestamp instead. But it is still very nice t o be able to scan to a network device.

[cfengine] log to syslog


When you want to start with cfengine, it is not exactly obvious how some stuff works. To make it easier for others, I will write about some stuff I find out in the process.

For the start, here is the first thing I found out. By default cfengine logs to files in the work directory. This can get a bit ugly, when the agent is running every 5min. As I use cf-execd, I added the option executorfacility to the exed section.

body executor control {
  executorfacility => "LOG_LOCAL7";

After that a restart of execd will result in logs appearing through syslog.

overhaul of the blog


The new blog is finally online. It took us nearly more than a year to finally get the new design done.

First we replaced thin with puma. Thin was getting more and more a bother and didn’t really work reliable anymore. Because of the software needed, it was pinned to a specific version of rack, thin, rubinius and some other stuff. Changing one dependency meant a lot of working getting it going again. Puma together with rubinius make a pretty nice stack and in all the time it worked pretty well. We will see, how good it can handle running longer than some hours.

The next part we did was throw out sinatra and replace it with zero, our own toolkit for building small web applications. But instead of building yet another object spawning machine, we tried something different. The new blog uses a chain of functions to process a request into a response. This has the advantage that the number of objects kept around for the livetime of a request is minimized, the stack level is smaller and in all it should now need much less memory to process a request. From the numbers, things are looking good, but we will see how it will behave in the future.

On the frontend part we minimized the layout further, but found some nice functionality. It is now possible to view one post after another through the same pagination mechanism. This should make a nice experience when reading more a number of posts one after another.

We hope you like the new design and will enjoy reading our stuff in the future too.

block mails for unknown users


Postfix’ policy system is a bit confusing. There are so many knobs to avoid receiving mails which do not belong to any account on the system and most of them check multiple things at once, which makes building restrictions a bit of a gamble.

After I finally enabled the security reports in freebsd the amount of mails in the mailqueue hit me. After some further investigation I found even error messages of dspam, having trouble to rate spam for receivers which were not even in the system.

To fix it, I read into the postfix documentation again, build new and hopefully better restrictions. The result was even more spam getting through. After a day went by and my head was relaxed I read the documentation again and found the following in the postfix manual

The virtual_mailbox_maps parameter specifies the lookup table with all valid recipient addresses. The lookup result value is ignored by Postfix.

So instead of one of the many restrictions a completely unrelated parameter is responsible for blocking mails for unknown users. Another parameter related is smtpd_reject_unlisted_recipient. This is the only other place I could find, which listed virtual_mailbox_maps and I only found it when looking for links for this blog entry.

So if you ever have problems with receiving mails for unknown users, check smtpd_reject_unlistef_recipient and virtual_mailbox_maps.

choosing a firewall on freebsd


As I was setting up a firewall on my freebsd server I had to choose between one of the three firewalls available.

There is the freebsd developed firewall ipfw, the older filter ipf and the openbsd developed pf. As for features they have all their advantages and disadvantages. Best is to read firewall documentation of freebsd.

In the end my decision was to use pf for one reason - it can check the syntax before it is running any command. This was very important for me, as I’m not able to get direct access to the server easily.

ipf and ipfw both get initialized by a series of shell commands. That means the firewall controll program gets called by a series of commands. Is one command failing, the script may fail and the firewall ends up in a state undefined by the script. You may not even get into the server by ssh anymore and needs a reboot.

This is less of a problem with pf, as it does a syntax check on the configuration beforehand. It is not possible to throw pf into an undefined state because of a typo. So the only option left would be to forget ssh access or anything else.

I found the syntax of pf a bit weird, but I got a working firewall up and running which seems to work pretty well. ipfw looks similar, so maybe I try it the next time.

show older