Tämä asiakirja on saatavilla myös suomeksi. Dieses Dokument ist auch auf Deutsch erhältlich.

I like combinatorial problems, but I tend to solve them by brute force. Around the year 2000, I enumerated the solutions of the 6×10 Pentomino puzzle in a couple of CPU weeks on some 266 MHz Pentium II workstations.

My solution in C is a brute-force depth-first traversal method. In 2013, I made an obvious improvement to prune those search trees where there exists a gap whose size is not a multiple of 5. The pruning can pe achieved by flood-filling each gap in a copy of the current board and determining how big each gap was. On an Intel Core i5-2520M system at 2.5 GHz, an optimized build of the C program will cover the whole search space in about 4 minutes.

A C++ version
of my solution makes use of the STL.
On the platform that I tried, the optimized C++ build was slower than
the C program due to the dynamic memory allocation overhead in the
queue management of the flood-fill routine. The C version would use
an array allocated from the stack, while the C++ version would use
`std::list`

.

The performance difference went away in January 2014 when I noticed that the work queue can more appropriately be represented as a set. The original implementation of the flood-fill algorithm could dequeue the same work item multiple times. In this case, the set can be efficiently represented as a bitmap, more precisely in a 64-bit integer, just like the 6×10 Pentomino board. The performance improved by 60% when I exploited the fact that the already filled space can easily be subtracted from the bitmap. In this way, we will never dequeue an item that has already been processed. The performance improved even further (40 seconds to return all 2,339 solutions) when I eliminated symmetric configurations for the first piece. There were 56 configurations for the I-shaped piece before and 14 after the optimization.

In December 2016, I refined the C++ implementation to make use of
the C++14 `constexpr`

keyword, so that the bit patterns for
the pieces (up to 8 distinct orientations per piece) can be converted
at compilation time. I experimented with constructing all 2,014
distinct placements of the pieces at compilation time. This turned out
to be a bad idea not only in terms of Kolmogorov
complexity (the placements can be constructed in less than
2,014×64 bits of code) but also because clang hit a resource limit while
initializing the `constexpr`

data.

In August 2019, I replaced procedural recursion with a loop and an
explicit search stack. With a statically allocated array for the
stack, it is slightly faster than the original procedural recursion
(`-DRECURSION`

). As expected, using dynamic allocation for
the stack (`-DHEAP_STACK`

, only implemented in C++) is
slightly slower. Note: the option `-DRECURSION`

will
traverse the search space in a different order than the explicit
search stack.

Earlier, I experimented with expanding the 12-deep recursion at compilation time, using templates, but that resulted in slightly slower execution time.

When compiled with clang 8.0.1
using `-O3 -DNDEBUG -mtune=native -march=native`

,
the C++ program finished 8% faster than the C program. With GCC 9.2.1 and the same switches, the C
program was 3% faster than as C++ program. The biggest outlier is the
clang-compiled C program, which is clearly slower than the others.

variant | GCC 9.2.1 | clang 8.0.1 |
---|---|---|

C `-DRECURSION` |
27.7 s | 31.2 s |

C++ `-DRECURSION` |
28.7 s | 27.6 s |

C | 26.9 s | 29.4 s |

C++ | 27.6 s | 27.1 s |

C++ `-DHEAP_STACK` |
27.7 s | 28.1 s |

In early 2014, my wife picked up a 4×4×4 puzzle at a flea market. The cube is filled with one piece of volume 4 and twelve pieces of volume 5, which yields a total volume of 4×4×4=4+12×5=64. Unlike in the Pentomino puzzle, there are many more possible pieces of this size, and there seem to be 20,434 solutions. With minor modifications, the same solution algorithm is applicable:

- For simplicity, the pieces are represented as character strings.
- Each string is converted to the representation of the puzzle board (or cube).
- Each piece will be rotated and shifted to generate all possible
positions on the puzzle board.
- Each position will be represented as a puzzle board (64-bit integer) containing only one piece.
- In the 2-dimensional case, perform two nested loops, flipping the piece or rotating it 90 degrees at each step.
- In the 3-dimensional case, perform a loop for each dimension, rotating the piece 90 degrees at each step.
- For the first piece, only such shifted positions are allowed that are not equivalent to rotating or flipping other positions. In this way, we rule out symmetric solutions.
- To shift a piece in the board, simply shift the bitmap after ensuring that the piece is not touching the edge or face already.

- Start the depth-first search with the 4-unit piece.
- If some free space is not divisible by 5, reject the solution, since it would not be possible to place all 5-unit pieces.

In December 2016, when I refactored the 3-dimensional puzzle solver in C++14, I noticed that there had been a bug in my symmetry reducer. When a piece is rotated along each axis, there are 4×4×4 or 64 potential configurations. For the small piece in my puzzle, 12 of these configurations are distinct. But, many of these 12 configurations are actually symmetric with each other, if we place the piece to a specific corner of the box and then rotate the entire box in all 64 ways, without shifting the piece relative to the box. It turns out that there are only 15 distinct configurations of the small piece, instead of the 261 that my original buggy symmetry reduction produced.

GCC 9.2.1 generates a much faster solver for AMD64 than clang 8.0.1. In August 2019, the GCC-compiled solver completed the search in 14,063 seconds (3 hours, 54 minutes and 23 seconds), while the clang-solver needed 15,599 seconds (4 hours, 19 minutes and 59 seconds).

- pentomino.c
- Pentomino solver in C
- pentomino.C
- Pentomino solver in C++
- pentomino.png
- Image of all 2,339 solutions of the 6×10 Pentomino puzzle
- pentomino.pl
- A small utility for composing the image of all solutions.
- puzzle3d.C
- Solver of a 3-dimensional 4×4×4 puzzle in C++
- puzzle3d.txt.gz
- The 20,434 solutions of the 4×4×4 puzzle