We present graphics processing unit accelerated implementations of Markov chain algorithms to sample random tilings, dimers, and the six vertex model.

Dedicated to the memory of Ludwig Faddeev

The study of tilings of planar regions by dominos, lozenges, and other shapes has a long history in mathematics, physics, and computer science. In the recent decades, the discovery of arctic circles, limit shapes, and other phenomena in random tilings of large regions has sparked a renewed excitement into their investigation. For a survey of these developments, see Ref. 11.

A remarkable tool in the study of tilings has been the use of various computer programs to generate and visualize random tilings. A variety of powerful algorithms and techniques for sampling tilings have been developed and studied (see, for example, Refs. 15, 24, 1, 21, and 18), and their standard implementations are widely available and well-utilized by researchers in this field.

On the other hand, the recent years have seen the advent of many new tools and techniques in high performance computing. In particular is the resurgence of parallel computing, driven by the fact that as speeds of single processors reach their physical limits, computers increasingly rely on multicore processors for performance and efficiency. While previously restricted to supercomputing applications, parallel computing is now-a-days essential to fully exploit modern computational power.

A powerful element of multicore hardware is the graphics processing unit (GPU). Designed specifically for certain computations in 3D computer graphics, GPUs employ a massively parallel architecture with up to thousands of limited-functionality processing cores working synchronously. Thanks to the popularity of video games, powerful GPUs are now common-place on nearly all personal computers, as well as Playstations, XBoxes, and other devices. Utilizing these computational resources for tasks beyond computer graphics is a tantalizing prospect. Indeed, GPUs have proven to be well-suited for many other types of problems, and general purpose computing with the GPU (GPGPU) has become increasingly popular and successful in many fields. For an overview of applications of GPUs to other Monte Carlo simulations, see Ref. 23.

The main purpose of this note is to demonstrate the use of GPUs to generate random tilings. Our approach is based on the Glauber dynamics Markov chains where Markov moves are local “flips” of the tiling. For parallelization, we consider non-local updates consisting of clusters of flips, which when chosen according to a domain decomposition, can be generated and executed independently and in parallel on GPU cores. We implemented the program with C++ and OpenCL and have made our source code available.9 

The structure of the remainder of the paper is as follows: in Sec. II, we recall the basics of domino tilings and Markov chain algorithms for generating random tilings. We briefly discuss generalizations to some other models. In Sec. III, after reviewing some important aspects of graphics hardware architecture and programming, we explain our implementation of the Markov chain algorithm on the GPU. In Conclusion, we present the results of some experiments we conducted to test the program.

By the square lattice, we mean the planar graph embedded in the Euclidean plane with vertices at coordinates (i,j)Z2 and edges joining all pairs of vertices unit distance apart. A domainD is the union of a finite set of faces of the square lattice. We assume that D is simply connected.

A domino is the union of two square lattice faces that share an edge. A tilingT of D is a set of dominos whose interiors are disjoint and whose union is D. We denote by ΩD the set of tilings of D. A domino tiling T can equivalently be viewed as perfect matching or dimer coverT  * of the dual graph or as a lattice routing as shown in Fig. 1.

FIG. 1.

(a) A domino tiling of a rectangular domain. (b) The corresponding perfect matching of the dual graph. (c) The corresponding routing, obtained by drawing paths through dominos as shown.

FIG. 1.

(a) A domino tiling of a rectangular domain. (b) The corresponding perfect matching of the dual graph. (c) The corresponding routing, obtained by drawing paths through dominos as shown.

Close modal

By identifying tilings with perfect matchings of a bipartite graph, each tiling T can be associated with an integer valued height functionhT on vertices vD, according to the rules shown in Fig. 2. Height functions induce a partial ordering on tilings, where T<T   if hT(v)<hT(v) for all vD. Moreover, the point-wise maxima (or minima) of two height-functions also define a height function and a corresponding tiling, which endows ΩD with the structure of a distributive lattice. We denote by Tmax and Tmin the unique maximal and minimal elements of ΩD, respectively, see Fig. 2.

FIG. 2.

(a) The height function is defined by first fixing the height at a reference face (say the bottom-most, left-most face) to zero and then propagating the height function across unmatched edges according to the rules at left. (b) The maximal and minimal tilings of a square domain.

FIG. 2.

(a) The height function is defined by first fixing the height at a reference face (say the bottom-most, left-most face) to zero and then propagating the height function across unmatched edges according to the rules at left. (b) The maximal and minimal tilings of a square domain.

Close modal

We say a domain D is tileable if ΩD is non-empty. An efficient algorithm due to Ref. 22 determines the tileability of a domain and returns a maximum (or minimum) tiling in time proportional to the size of the domain.

1. Weights

It is often important to consider various probability measures on ΩD. Gibbs measures originate from models of statistical physics, in which dimer covers of the lattice correspond to bond configurations in crystals.

These measures are defined by edge weights that assign to each dual edge e a positive number we, and to each tiling T a weight W(T)=eT*we. A tiling T then has the probability

Equivalently, the Gibbs measure can be defined by volume weights that assign to each vertex v a positive real number qv, and to each tiling T the weight W(T)=vDqvhT(v). Choosing all weights to be 1 gives the uniform distribution on Ω. See Figs. 11–14 for examples of domino tilings.

2. Moves on tilings

We say a tiling T is rotatable at a vertex v if the faces adjacent to v are covered by two parallel dominos. An elementary rotation at v rotates the dominos in place, as shown in Fig. 3. More precisely, an up-rotation replaces two horizontal dominos by two vertical dominoes, and down rotation does the opposite. We denote by R±v:ΩDΩD the function that maps a tiling T to the tiling obtained from T by rotating up/down at v if possible. Here we adopt the convention of formally signing vertices to denote up and down rotations.

FIG. 3.

An elementary rotation at a vertex.

FIG. 3.

An elementary rotation at a vertex.

Close modal

A theorem of Ref. 22 states that any two tilings T and T   of a domain D are connected by a sequence of elementary rotations.

It is natural to consider simultaneous rotations on clusters of faces. Note that if two signed vertices v and v are not adjacent, then RvRv = RvRv. We call a subset of signed vertices S an admissible cluster if no two of its vertices are adjacent. Then the cluster rotation RS = vSRv is well defined. In practice, a convenient class of admissible clusters is found by fixing a vertex coloring, and choosing subsets of vertices of the same color.

3. Markov chain on tilings

A random walk on ΩD is defined by an initial tiling T  (0) and a sequence of randomly chosen admissible clusters {Si}i=1, where the nth step is

(2.1)

In other words, at each step, the Markov chain chooses a random cluster S and moves to RS(T). It is straightforward to check that this Markov chain is irreducible and aperiodic, from which it follows that in the limit n the random tiling T(n) is uniformly distributed.

In reality, the Markov chain is run for a finite but large time determined by the rate of convergence or mixing time. Although mixing times of Markov chains on tilings have been studied by many, see, for example, Refs. 15, 24, and 13, and upper bounds rigorously established in many particular settings, very little is known in general. In practice, the mixing times can often be estimated empirically by heuristic techniques such as a self-consistent analysis of autocorrelation times.

4. Perfect sampling with coupling from the past

When statistical soundness is paramount, exact sampling can be accomplished using the coupling-from-the-past algorithm,18 which effectively simulates running the Markov chain for an infinite time. It works as follows: given a sequence of random clusters {Si}i=1, define the backwards walk

Almost surely there exists an n for which |R(n)(ΩD)|=1 (and in fact for all earlier times mn). The Markov chain is then said to have collapsed and the unique element in the range of R(−n) is distributed according to the stationary distribution.

For Markov chains with large state spaces, checking for collapse can be impractical. However, in the case that the state space is partially ordered and the Markov moves monotone, as is the case for domino tilings, the state space collapses if and only if R(n)(Tmax)=R(n)(Tmin). Consequently, it is sufficient to check only the maximal and minimal states.

The machinery described above for domino tilings can be generalized to a variety of other models. Let us briefly describe a few examples:

1. Lozenge tilings

Lozenge tilings are the triangular-lattice analog of domino tilings, see Fig. 4. Lozenge tilings correspond to dimers on the hexagonal lattice, whose bipartite structure allows the introduction of a height function at vertices (which is easy to visualize by imagining a tiling as a stack of cubes), which in turn induces a partial order and lattice structure on the set of tilings. The set is connected by elementary rotations at vertices of the triangular lattice as shown below. Cluster rotations can be generated by three-coloring the triangular lattice. See Figs. 15 and 17(a) for examples of lozenge tilings.

FIG. 4.

(a) A lozenge is a pair of equilateral triangles glued along a side. (b) A lozenge tiling and the corresponding matching on the hexagonal lattice. (c) An elementary rotation.

FIG. 4.

(a) A lozenge is a pair of equilateral triangles glued along a side. (b) A lozenge tiling and the corresponding matching on the hexagonal lattice. (c) An elementary rotation.

Close modal

2. Bibone tilings

Bibone tilings are the hexagonal-lattice analog of domino tilings, see Fig. 5. While they correspond to dimers on the triangular lattice, techniques for bipartite dimers do not apply; for example, bibone tilings do not admit height functions. Nonetheless, the author of Ref. 10 showed the connectedness of the set of tilings under three types of elementary moves as shown in Fig. 5. For parallelization, we consider clusters of the same type of move, with a different domain decomposition for each type. See Fig. 16 for an example of a bibone tiling.

FIG. 5.

(a) A bibone is a pair of hexagons glued along an edge. (b) A bibone tiling and the corresponding matching on the triangular lattice. (c) Up to orientation and reflection, there are three types of elementary moves.

FIG. 5.

(a) A bibone is a pair of hexagons glued along an edge. (b) A bibone tiling and the corresponding matching on the triangular lattice. (c) Up to orientation and reflection, there are three types of elementary moves.

Close modal

3. Rectangle-triangle tilings

Rectangle-triangle tilings were studied by Nienhuis.16 The tiles are isosceles triangles and rectangles with side lengths 1 and 3. We focus on tilings of domains of the triangular lattice. Like lozenge tilings, rectangle-triangle tilings can be visualized in 3D as stacks of half-cubes, which gives a partial ordering to the set of tilings. It is easy to check that the set of tilings is connected by an elementary move at vertices as shown in Fig. 6. In practice, we allow many other local moves to improve the mixing rate.

FIG. 6.

(a) Rectangle and triangle tiles. (b) A rectangle-triangle tiling. (c) Up to orientation and reflection, there is one elementary move.

FIG. 6.

(a) Rectangle and triangle tiles. (b) A rectangle-triangle tiling. (c) Up to orientation and reflection, there is one elementary move.

Close modal

Local weights can be introduced by assigning to each face of the triangular lattice a weight depending on the tiles that cover the face.16 The three possibilities refer to types of tiles that can cover a face, labeled as (A), (B), and (C). These cases are assigned weights t, c, and r, respectively; the weight of the tiling is given by the product of all weights of faces. See Figs. 17(b) and 18 for examples of rectangle-triangle tilings.

4. The six-vertex model

The six-vertex model is defined on a domain D of the square lattice. A configuration S of the model is an assignment of “occupied” or “unoccupied” to each edge in D that satisfies the condition that at every vertex v in the interior of D, the edges adjacent to v must be one of the six local configurations shown in Fig. 7. A boundary condition for the six-vertex model fixes the state of edges intersecting the boundary of D.

FIG. 7.

(a) The six vertex types with six weights. (b) A six-vertex configuration. (c) A local move.

FIG. 7.

(a) The six vertex types with six weights. (b) A six-vertex configuration. (c) A local move.

Close modal

Six-vertex configurations correspond bijectively to height functions, as shown in Fig. 8. This endows the set of configurations with a partial ordering and the structure of a distributive lattice.

FIG. 8.

The height function for the six-vertex model.

FIG. 8.

The height function for the six-vertex model.

Close modal

Gibbs measures are defined by assigning positive real weights to each of the six vertex types. The weight of a configuration S is W(S)=vDw(v,S), where w(v, S) is the weight of the vertex v in the configuration S depending on the vertex type. The probability of S is then

The weights a = 1, b = 1, c = 1 give the uniform distribution on configurations.

With fixed boundary conditions, the set of configurations is connected by elementary c-flips at faces, as shown in Fig. 7. In general, variations of c-flips for boundary faces are also required. For parallelization, cluster moves can be generated by coloring faces according to both the parity of the x- and y-coordinate of the face.

It can be checked that the local moves preserve the partial order only if the weights satisfy ac and bc; this means that coupling-from-the-past can only be used for these weights, although the standard Markov chain sampling can be used for all weights. For the weights (a = 1, b = 1, c = 1), a much faster program is possible due to the combinatorial nature of the model, although we do not pursue it here. See Figs. 19–22 for examples of six vertex configurations.

Certain properties of graphics hardware have great significance in the design and implementation of efficient algorithms for the GPU. Let us briefly mention a few important aspects and refer the reader to Ref. 7 for details.

1. GPU design and architecture

Many tasks in 3D computer graphics involve carrying out an identical sequence of operations independently on a large set of input primitives. Rendering a surface for instance could require: for each pixel drawn to the screen, computing the local surface normal, computing directions to the viewer and light sources, and finally computing the color according to some lighting and shading model. These types of tasks are often called data parallel, and the routine executed on each input called a shader or kernel.

GPUs are designed specifically for data parallel tasks. At the hardware level, a typical GPU consists of a number of compute units, each containing an instruction block, shared memory caches, and a number of processing elements that carry out arithmetic computations. A parallel computation is organized by issuing a thread for each kernel execution instance; the threads are divided into blocks and distributed to compute units for execution. Each compute unit executes the kernel in single instruction multiple data (SIMD) fashion, by issuing at each time step the same kernel instruction to every processing element to carry out in a different thread.

For tasks well-suited to this architecture, GPUs can perform up to hundreds of times faster than conventional CPUs.

Performance can drastically suffer, however, when programs are not tuned for the graphics hardware. Due to the SIMD execution model, GPU performance can be severely affected by branch divergences, which occur when conditional expressions in the kernel cause different threads to follow different execution paths. Furthermore, compared to conventional processors, GPUs have little hardware dedicated to speed-up the execution of the sequential code. In particular, GPU cores have relatively smaller memory caches and limited bandwidth for memory access. While GPUs hide memory-access latency by multithreading, their performance is nonetheless sensitive to the pattern of memory-access and organization of data in memory.

2. The OpenCL framework

In early years, general purpose computing with GPUs required reformulating a task in terms of graphics primitives and operations. Since then, GPUs have evolved into highly flexible and easily programmable processors. Frameworks, such as NVidia’s Compute Unified Device Architecture (CUDA) and the Open Computing Language (OpenCL), define C-type kernel programming languages and provide C/C++ libraries to access GPU devices.

For this work, we chose to use OpenCL.

In the OpenCL framework, a computing platform consists of a host (typically the CPU) and any number of devices (typically GPUs). The kernel source code is loaded and compiled for devices by OpenCL at run-time. The host issues command to the device, such as kernel executions and data transfers, via a command queue. Before launch, the set of kernel execution threads is arranged by the user into a grid of work-items. The grid coordinates of a work-item can be accessed from within the kernel program and used to identify the thread. The grid is further divided into work-groups, each of which is executed on one compute unit of the GPU, and whose work-items can communicate via shared local memory and can synchronize with barriers.

A typical OpenCL host program first initializes the platform, then loads and compiles the kernel source, transfers relevant data to the device, sets up and launches the kernel on the device, and finally reads back the output data from the device.

Having laid the necessary foundations, let us now explain our implementation.

To each vertex v, we associate an integer state representing the tiling of the faces adjacent to v as follows: enumerating the edges adjacent to v in the order North, South, East, and West, the tilestate sv is defined as

where ei is 1 if a domino of T crosses edge i and 0 otherwise. A vertex v is rotatable if its state sv = 12 or sv = 3.

A tiling is represented by the state of every vertex of the domain D, as in Fig. 9. Assuming the domain D is contained in an N × N square domain in the first quadrant of the Euclidean plane, we store a tiling T as an N × N array T, where the (i, j)th entry of T is the state of the vertex with coordinate (i, j).

FIG. 9.

A tiling and its tilestate (without zero padding).

FIG. 9.

A tiling and its tilestate (without zero padding).

Close modal

To optimize kernel occupancy and memory transfer speed, we divide the tiling into the sub-arrays Tb and Tw of black and white vertices, respectively, and store each contiguously in the GPU device memory. For simplicity, we assume N is even so that, for example, the (i, j)th component of the array Tb corresponds to the state of the vertex (i, 2j + i mod 2) in D. Moreover, to avoid special cases at the boundary we assume all arrays are sufficiently zero-padded.

In our implementation, the primitives of GPU operations are the vertices of the domain D. We define a kernel function that rotates a vertex v with some fixed probability and kernel functions that check the neighbors of v and recompute its state. More precisely,

  • The kernel Rotate takes as a parameter a tiling sub-array T. The work-item (i, j) first generates a pseudo-random number and then with fixed probability attempts to either rotate up or down the state T[i, j]. The state can be rotated up if T[i, j] = 3 with rotating accomplished by setting T[i, j] = 12 and similarly for down-rotation.

  • The kernels UpdateBlack and UpdateWhite take as parameters both sub-arrays Tb and Tw. The work-item (i, j) of UpdateBlack recomputes the state of the vertex with index (i, j) in Tb in terms of the neighboring vertices in Tw. This can be done efficiently with bitwise operations, as follows:

The UpdateWhite kernel is defined similarly.

Other kernels used to generate pseudo-random numbers are discussed in Sec. III B 1.

A random cluster rotation is accomplished by launching Rotate on all black vertices and Update on all white vertices, or Rotate on all white vertices and Update on all black vertices. We define the following host functions:

  • The function RandomWalk takes as parameters a tiling T, a random number seed s, and a natural number nSteps. The function first divides T into the two sub-arrays Tb and Tw and loads them to the GPU memory. It then seeds all pseudo-random number generators with s. Then, looping nSteps times, with equal probability it either runs Rotate with all black vertices and Update on all white vertices or vice versa. Finally, after the loop is complete, it reads back Tb and Tw and recombines the two sub-arrays into the tiling T.

  • The function DominoTilerCFTP generates a random tiling using the coupling-from-the-past algorithm. The pseudo-code is

DominoTilerCFTP:

Compute the extremal tilings Tmax and Tmin.

Initialize a list seeds with a random real number.

Repeat:

Initialize Ttop=Tmax

Initialize Tbottom=Tmin

For i = 1 to length of seeds:

Set Ttop=RandomWalk(Ttop,seedsi,2i)

Set Tbottom=RandomWalk(Tbottom,seedsi,2i)

If Ttop=Tbottom, return Tbottom.

Else, push a new random number to the beginning of seeds.

1. Pseudo-random number generators

Random number generators for massively parallel computing are an active area of research. Most applications require a stream of pseudo-random numbers for each thread, with good statistical properties both within each stream and across different streams. The statistical independence is crucial in our application to ensure that the set of tilings is connected by the cluster moves generated by the Rotate kernel.

For our implementation, we adopted a variant of the well-known Mersenne Twister pseudo-random number generator known as TinyMT.20 Although compared to the Mersenne Twister, the period of TinyMT is relatively small (2127), the TinyMT admits a large number (up to 248) of internal parameter values that ensure statistical independence (see Ref. 20) of generators with different parameter values. For tilings with fewer than 248 vertices, it is convenient to simply instantiate a TinyMT generator with a unique parameter value for each vertex of the domain.

We tested our implementation using a few different graphics cards: an Intel Integrated HD 510, an NVidia Tesla P100, and an AMD Radeon Pro 555. Although a rigorous performance comparison of GPU algorithms and CPU algorithms can be a delicate matter (see Ref. 14), we nevertheless compared our implementation with standard CPU algorithms, see Fig. 10. We found that for smaller domains, the processing time is dominated by memory transfer time and other overheads, and the CPU is often faster. For larger domains, the parallelism pays off and we found significant speed-ups, of at least an order of magnitude depending on the quality of the graphics hardware.

FIG. 10.

(a) The time T in seconds to generate, with coupling-from-the-past, a random configuration of the six-vertex model on an N × N sized domain with domain wall boundary conditions and weights (a, b, c) = (1, 1, 1). We used an Nvidia Tesla P100 GPU and a 2.2 GHz Intel Xeon E5 CPU. (b) The time to generate a random domino tiling of an N × N square. Here we used a laptop with an Intel 510 Integrated GPU and a 2.10 GHz Pentium CPU.

FIG. 10.

(a) The time T in seconds to generate, with coupling-from-the-past, a random configuration of the six-vertex model on an N × N sized domain with domain wall boundary conditions and weights (a, b, c) = (1, 1, 1). We used an Nvidia Tesla P100 GPU and a 2.2 GHz Intel Xeon E5 CPU. (b) The time to generate a random domino tiling of an N × N square. Here we used a laptop with an Intel 510 Integrated GPU and a 2.10 GHz Pentium CPU.

Close modal

There are several avenues for improvement and generalization that we leave for future work.

First, our implementation leaves some opportunities for optimization, which we have forgone in favor of simplicity and flexibility. On the other hand, the fundamental limiting factor of GPU performance, particularly for large domains, is the number of available processing cores. Therefore, a natural next step is the use of multiple GPUs in parallel with message passing, which has proven successful in other applications.

More broadly, various other algorithms have been developed for generating tilings and could be adapted effectively for the GPU. Among the earliest were shuffling algorithms for exact sampling tilings of the Aztec diamond. Shuffling-type algorithms have since been generalized to many other settings.2 

Finally, there are many models closely related to tilings whose simulations are of great interest. In particular, considerable recent work has focused on certain stochastic processes, including the stochastic six-vertex model, exclusion processes, Schur processes, and others. While these models in a sense arise as special limits of dimer and vertex models, it is clear that their simulation could benefit from a different approach (Fig. 23).

FIG. 11.

Some domino tilings.

FIG. 11.

Some domino tilings.

Close modal
FIG. 12.

The author of Ref. 8 showed that the fluctuations of the top-most path, above which all tiles are horizontal, converge to the Airy process. In particular, the y-intercept of the path as shown in (a), after appropriate rescaling, converges to the Tracy-Widom distribution F2. (b) shows a normalized histogram of the y-intercept computed from 100 random tilings of an Aztec diamond of size 300, with the distribution F2 superimposed in bold.

FIG. 12.

The author of Ref. 8 showed that the fluctuations of the top-most path, above which all tiles are horizontal, converge to the Airy process. In particular, the y-intercept of the path as shown in (a), after appropriate rescaling, converges to the Tracy-Widom distribution F2. (b) shows a normalized histogram of the y-intercept computed from 100 random tilings of an Aztec diamond of size 300, with the distribution F2 superimposed in bold.

Close modal
FIG. 13.

A tiling of a rectangular Aztec diamond, with the Arctic curve computed by Ref. 4 superimposed in red.

FIG. 13.

A tiling of a rectangular Aztec diamond, with the Arctic curve computed by Ref. 4 superimposed in red.

Close modal
FIG. 14.

A random tiling of the Aztec diamond with volume weights (see Sec. II A 1) q = 20 for all black vertices and q = 1/20 for all white vertices. See Ref. 2 for details.

FIG. 14.

A random tiling of the Aztec diamond with volume weights (see Sec. II A 1) q = 20 for all black vertices and q = 1/20 for all white vertices. See Ref. 2 for details.

Close modal
FIG. 15.

A random tiling of a weird region by lozenges.

FIG. 15.

A random tiling of a weird region by lozenges.

Close modal
FIG. 16.

A tiling of a trapezoid by bibones. Bibone tilings, which correspond to dimers on the triangular lattice, seem not to develop Arctic curves or limit shapes.

FIG. 16.

A tiling of a trapezoid by bibones. Bibone tilings, which correspond to dimers on the triangular lattice, seem not to develop Arctic curves or limit shapes.

Close modal
FIG. 17.

A tiling of a partial hexagon (a) by rectangles and triangles, and (b) by lozenges.

FIG. 17.

A tiling of a partial hexagon (a) by rectangles and triangles, and (b) by lozenges.

Close modal
FIG. 18.

Choosing weights t = 0.5, r = 1, c = 1 (see Sec. II B 3) produces tilings that look like snowflakes. We observed large fluctuations in the boundaries of the arms as compared to arctic curves of lozenge tilings.

FIG. 18.

Choosing weights t = 0.5, r = 1, c = 1 (see Sec. II B 3) produces tilings that look like snowflakes. We observed large fluctuations in the boundaries of the arms as compared to arctic curves of lozenge tilings.

Close modal
FIG. 19.

The six-vertex model with weights a=1,b=1,c=8, (Δ = −3), in a square with fixed domain wall boundary conditions. (a) shows a random configuration and (b) shows the c-vertices of the random configuration in black. (c) shows the average density of horizontal edges computed, with 1000 random configurations. The arctic curve computed by Ref. 5 is superimposed in red. (d) shows the average density of c-vertices. (e) shows the cross-sectional profile of the horizontal edge density along a diagonal slice, which was studied in Ref. 12.

FIG. 19.

The six-vertex model with weights a=1,b=1,c=8, (Δ = −3), in a square with fixed domain wall boundary conditions. (a) shows a random configuration and (b) shows the c-vertices of the random configuration in black. (c) shows the average density of horizontal edges computed, with 1000 random configurations. The arctic curve computed by Ref. 5 is superimposed in red. (d) shows the average density of c-vertices. (e) shows the cross-sectional profile of the horizontal edge density along a diagonal slice, which was studied in Ref. 12.

Close modal
FIG. 20.

The average density of horizontal edges with weight Δ = 0 in an L-shaped region with domain wall type boundary conditions, computed with 1000 samples.

FIG. 20.

The average density of horizontal edges with weight Δ = 0 in an L-shaped region with domain wall type boundary conditions, computed with 1000 samples.

Close modal
FIG. 21.

The average density of horizontal edges with weights a = 2b, Δ = −3, computed with 1000 samples. The red curve is the arctic curve computed by Ref. 5.

FIG. 21.

The average density of horizontal edges with weights a = 2b, Δ = −3, computed with 1000 samples. The red curve is the arctic curve computed by Ref. 5.

Close modal
FIG. 22.

By the Wulff construction, the toroidal free energy f(H, V) is the limit shape of the volume-constrained model with special boundary conditions. (a) shows a random configuration with weights a = 2, b = 1, c = 0.8 and volume weights. (b) shows the free energy phase diagram for the same weights.17 

FIG. 22.

By the Wulff construction, the toroidal free energy f(H, V) is the limit shape of the volume-constrained model with special boundary conditions. (a) shows a random configuration with weights a = 2, b = 1, c = 0.8 and volume weights. (b) shows the free energy phase diagram for the same weights.17 

Close modal
FIG. 23.

The six-vertex model at the stochastic point with weights (a1, a2, b1, b2, c1, c2) = (1, 1, 0.3, 0.7, 0.3, 0.7) on a cylinder with fixed step boundary conditions at the bottom and free boundary conditions at the top. For details about the stochastic six vertex model, see Refs. 6, 3, and 19. (a) shows a random configuration on the cylinder. (b) shows the average density of paths, taken over 100 sample, with empty space shaded in white. In the thermodynamic limit, the density of paths is described by a Burgers-type equation that can be solved by characteristics. (c) shows the characteristic lines, with shocks drawn in bold and the rarefaction fans shaded in gray.

FIG. 23.

The six-vertex model at the stochastic point with weights (a1, a2, b1, b2, c1, c2) = (1, 1, 0.3, 0.7, 0.3, 0.7) on a cylinder with fixed step boundary conditions at the bottom and free boundary conditions at the top. For details about the stochastic six vertex model, see Refs. 6, 3, and 19. (a) shows a random configuration on the cylinder. (b) shows the average density of paths, taken over 100 sample, with empty space shaded in white. In the thermodynamic limit, the density of paths is described by a Burgers-type equation that can be solved by characteristics. (c) shows the characteristic lines, with shocks drawn in bold and the rarefaction fans shaded in gray.

Close modal

We are very grateful to Nicolai Reshetikhin for many helpful discussions during the course of this work. D.K. was supported by the NSF Grant No. DMS-1601947. A.S. was supported by the NSF Grant No. DMS-16011947 and by the DFG Collaborative Research Center TRR 109 “Discretization in Geometry and Dynamics.”

1.
Allison
,
D.
and
Reshetikhin
,
N.
, “
Numerical study of the 6-vertex model with domain wall boundary conditions
,”
Ann. Inst. Fourier
55
(
6
),
1847
1869
(
2005
).
2.
Betea
,
D.
,
Boutillier
,
C.
,
Bouttier
,
J.
,
Chapuy
,
G.
,
Corteel
,
S.
, and
Vuletic
,
M.
, “
Perfect sampling algorithm for Schur processes
,”
Markov Process. Related Fields
24
(
3
) (
2018
).
3.
Borodin
,
A.
,
Corwin
,
I.
, and
Gorin
,
V.
, “
Stochastic six-vertex model
,”
Duke Math. J.
165
(
3
),
563
624
(
2016
).
4.
Bufetov
,
A.
and
Knizel
,
A.
, “
Asymptotics of random domino tilings of rectangular Aztec diamonds
,”
Ann. Inst. Henri Poincare
54
,
1250
(
2016
).
5.
Colomo
,
F.
,
Pronko
,
A. G.
, and
Zinn-Justin
,
P.
, “
The arctic curve of the domain wall six-vertex model in its antiferroelectric regime
,”
J. Stat. Mech.: Theory Exp.
2010
,
L03002
.
6.
Gwa
,
L.-H.
and
Spohn
,
H.
, “
Six-vertex model, roughened surfaces, and an asymmetric spin Hamiltonian
,”
Phys. Rev. Lett.
68
(
6
),
725
728
(
1992
).
7.
Hennessy
,
J. L.
and
Patterson
,
D. A.
,
Computer Architecture: A Quantitative Approach
(
Elsevier
,
2011
), Vol. 16142, p.
2011
.
8.
Johansson
,
K.
, “
The arctic circle boundary and the Airy process
,”
Ann. Probab.
33
(
1
),
1
30
(
2005
).
9.
Keating
,
D.
and
Sridhar
,
A.
, https://github.com/GPUTilings.
10.
Kenyon
,
C.
, “
Perfect matchings in the triangular lattice
,”
Discrete Math.
152
,
191
210
(
1996
).
11.
Kenyon
,
R.
, “
The planar dimer model with boundary: A survey
,” in
Directions in Mathematical Quasicrystals
, Issue 13 of CRM Monograph Series (
American Mathematical Society
,
Providence, RI, 2000
), pp.
307
328
.
12.
Korepin
,
V.
,
Lyberg
,
I.
, and
Viti
,
J.
, “
The density profile of the six vertex model with domain wall boundary conditions
,”
J. Stat. Mech: Theory Exp.
2017
,
053103
.
13.
Laslier
,
B.
and
Toninelli
,
F. L.
, “
Lozenge tilings, Glauber dynamics and macroscopic shape
,”
Commun. Math. Phys.
338
,
1287
(
2015
).
14.
Lee
,
V.
 et al., “
Debunking the 100X GPU vs. CPU myth: An evaluation of throughput computing on CPU and GPU
,” in
Proceedings of the 37th Annual International Symposium on Computer Architecture ACM
(
ACM
,
2010
), Vol. 38(3), pp.
451
460
.
15.
Luby
,
M.
,
Randall
,
D.
, and
Sinclair
,
A.
, “
Markov chain algorithms for planar lattice structures
,”
SIAM J. Comput.
31
(
1
),
167
192
(
2001
).
16.
Nienhuis
,
B.
, “
Unfinished business
,” in
Ingenuity and Integrability in Statistical Mechanics Conference
(
Amsterdam
,
2017
).
17.
Palamarchuk
,
K.
and
Reshetikhin
,
N.
, “
The 6-vertex model with fixed boundary conditions
,” e-print arXiv:1010.5011 [math-ph].
18.
Propp
,
J.
and
Wilson
,
D.
, “
Coupling from the past: A user’s guide
,”
Microsurveys Discrete Probab.
41
,
181
192
(
1998
).
19.
Reshetikhin
,
N.
and
Sridhar
,
A.
, “
Limit shapes in the stochastic six-vertex model
,” e-print arXiv:1609.01756v1 [math-ph].
20.
Saito
,
M.
and
Matsumoto
,
M.
, https://github.com/MersenneTwister-Lab/TinyMT.
21.
Syljuasen
,
O.
and
Zvonarev
,
M.
, “
Directed-loop Monte Carlo simulations of vertex models
,”
Phys. Rev. E
70
,
016118
(
2004
).
22.
Thurston
,
W. P.
, “
Conway’s tiling group
,”
Am. Math. Mon.
97
,
757
773
(
1990
).
23.
Weigel
,
M.
, “
Monte Carlo methods for massively parallel computers
,” in
Order, Disorder and Criticality
, edited by
Yu.
Holovatch
(
World Scientific
,
Singapore
,
2018
), Vol. 5, pp.
271
340
.
24.
Wilson
,
D. B.
, “
Mixing times of Lozenge tiling and card shuffling Markov chains
,”
Ann. Appl. Probab.
14
,
274
325
(
2004
).