Electron-beam direct-write lithography systems must in the future transmit terabits of information per second to be viable for commercial semiconductor manufacturing. Lossless layout image compression algorithms with high decoding throughputs and modest decoding resources are tools to address the data transfer portion of the throughput problem. The earlier lossless layout image compression algorithm Corner2 is designed for binary layout images on raster-scanning systems. The authors propose variations of Corner2 collectively called Corner2-EPC and Paeth-EPC which apply to electron-beam proximity corrected layout images and offer interesting trade-offs between compression ratios and decoding speeds. Most of our algorithms achieve better overall compression performance than portable network graphics, Block C4, and LineDiffEntropy while having low decoding times and resources.

## I. INTRODUCTION

Electron-beam direct-write (EBDW) lithography is an attractive candidate for next-generation lithography because electron beam lithography systems can attain very high resolution,^{1} and EBDW lithography systems do not need costly physical masks since the electron beam writer writes the mask pattern directly to the photoresist layer. The widespread use of EBDW lithography in commercial semiconductor manufacturing has been hindered by its low throughput in the patterning of wafers. In order to address this issue, future EBDW lithography systems must transmit terabits of information per second; for example, Levinson^{2} estimated that the patterning of one 300-nm wafer per minute with 1-nm edge placement for 22-nm technology requires the transmission of about 48.7 Tbits/s of data. There have been endeavors to tackle the data transfer problem by using massively parallel electron beams to write multiple pixels at a time.^{3–6} However, multiple electron beam lithography systems are not currently ready to replace optical lithography for high-volume manufacturing, and among the problems to be addressed are datapath issues; i.e., there are questions about how to provide the massive layout image data to the array of electron beam writers.

Dai and Zakhor^{4,7} considered a datapath system which uses lossless data compression to approach this issue (see Fig. 1). Here, compressed layout images are cached in storage devices and sent to the processor memory board. To satisfy the throughput requirements, the decoder embedded within the array of electron beam writers must be able to rapidly recover the original layout images from the compressed files.

The Block C4 algorithm^{4} (BC4) is intended for a raster scanning system in which each pixel is quantized to one of 32 levels. This lossless layout image compression algorithm is based on a combination of context prediction and finding repeated regions within a layout image. The Block C4 algorithm inspired other lossless layout image compression algorithms including those by Cramer *et al.*^{9} and Carroll *et al.*^{10} for the reflective electron beam lithography (REBL)^{5} system.

There have been other lossless layout image compression algorithms for the datapath system of Dai and Zakhor, which utilize different techniques than the Block C4 algorithm and related algorithms. The most recent of these^{11} proposes a hybrid of standard data compression algorithms that are widely applied outside of lithography. In this paper, we will focus on a collection of gray-level layout image compression algorithms that are based on or have similarity to the lossless compression scheme Corner2 (Refs. 12 and 13) for binary layout images on raster-scanning systems. The Corner2 algorithm is based on a transform which exploits the fact that most polygons in layout images have right angle corners. The Corner2 decoder can reproduce layout images with only a small cache, and Corner2 and its precursor Corner^{14} were found to have better compression performance than the Block C4 algorithm. The earlier gray-level layout image compression algorithms that are the most similar to Corner2 are CornerGray^{15} and LineDiffEntropy (LDE),^{16} and we will discuss them in Sec. II B.

Krecinic *et al.*^{17} independently proposed a vertex-based circuit layout image representation format, which can be viewed as a variant of the Corner2 algorithm. However, Corner2 also incorporated ways to take advantage of regularity within circuits as well as advanced entropy encoding techniques to further compress the representation; the algorithm of Krecinic *et al.* did not include similar features.

In this paper, we propose a collection of schemes called Corner2-EPC and Paeth-EPC which extend the image transformation technique of Corner2 to handle pixels which can be quantized to an arbitrary number of gray levels; we used 32. The layout images used in our experiments were processed with the electron beam proximity effect correction algorithm of GenISys, Inc., beamer_v4.6.2_x64. We briefly comment on the range of values used by the proximity effect correction algorithm and the control of edge placement. High-speed electron-beam direct-write lithography is intended for the high-volume manufacturing of semiconductors and therefore to standard Si and SiO_{2} substrate materials. The dynamic dose range of these materials is generally 0.75–1.45 for a range of pattern densities. This range is within a dose factor of 2. If the 32nd dose is set to the dose factor of 1.45, then the 0.75 dose would fall in about the middle of the dose step range leaving the bottom half of the 32 dose steps available for edge movements of the lowest dosed features, which would appear to be the worst case. If a 22 nm writing pitch was being used for exposures that would potentially give the control of about 1.4 nm edge movements, but this is an incomplete discussion. The doses of less than 1.00 are for the interiors of large shapes, which have no bearing on the critical dimensions or edge placements. The interior of shapes do not need the granularity of adjustment; they just need enough doses to clear. Therefore, all of the doses below 1.00 can be used for edge movements, which gives some 24 of the 32 dose steps, for a writing pitch of 22 nm that results in sub-1 nm edge control.

Most of the compression schemes we discuss attain better overall compression performance than the standard image compression algorithm portable network graphics (PNG) (Ref. 18) as well as Block C4 and LineDiffEntropy while having modest decoding time and complexity. Although we are presenting results for a software implementation, Corner2-EPC has been designed so that it is amenable to hardware implementation. Paeth-EPC can also be implemented in hardware, but it has slower encoding and decoding times so that it can attain better compression ratios.

The remainder of the paper is organized as follows. In Sec. II, we present more information about some of the earlier work on layout image compression algorithms. In Sec. III, we introduce the Corner2-EPC schemes and their variants, the Paeth-EPC schemes. In Sec. IV, we provide experimental results for two circuits.

## II. ON LOSSLESS LAYOUT IMAGE COMPRESSION

### A. Some algorithms related to the Lempel–Ziv codes

The earliest lossless data compression schemes to be devised were for one-dimensional data.^{19} Entropy encoding and entropy decoding, respectively, refer to the encoding and decoding/decompression of one-dimensional data, and image compression/decompression schemes often incorporate entropy encoding and decoding algorithms.^{18}

The Lempel–Ziv codes^{18–21} are among the most widely used lossless data compression schemes. The original Lempel–Ziv codes^{20,21} date back to the 1970s, and there are numerous variants of these codes,^{19} which are the foundation for many practical compression schemes because they have simple implementations, their decoding is comparatively fast, and they often need comparatively little memory.^{18} The Lempel–Ziv schemes and their variations encode variable-length segments of data by pointing to the preceding appearance of each segment; the segments are kept in a dictionary which is being continually updated according to the specific algorithm.

One of the ways to assess the efficiency of a lossless layout image compression algorithm is to compare its performance to that of a general purpose image compression algorithm. The one we will use for this purpose is PNG.^{18} The compression algorithm utilized by this Internet standard is based on *deflate*,^{22} which is one combination of the Lempel–Ziv code LZ'77 (Ref. 20) with the Huffman code^{23} or an approximation to it; the Huffman code is another famous entropy code.

The Block C4 (Ref. 4) algorithm was one of the lossless layout image compression algorithms proposed by Dai and Zakhor. The algorithm is based on a two-dimensional version of a Lempel–Ziv code and also incorporates context prediction. The algorithms discussed in Refs. 9, 24, and 25 are refinements to this compression scheme. Carroll *et al.*^{10} developed a lossless layout image compression algorithm, which they implemented in the digital pattern generator of the REBL system. Although the exact details of the scheme were not provided, it appears to be computationally simpler than the Block C4 algorithm as the authors were concerned with issues like space constraints, operating speed, and power consumption. The authors of Ref. 10 segment layout images into “triLines,” which are groups of three rows; each “superpixel” of a triLine consists of a column of three pixels from the three rows, which have been aggregated. These triLines are encoded by a hybrid of Lempel–Ziv-like compression with run-length encoding^{26} and repeat and edit features. This approach accounts for some of the data dependencies between successive rows although it generally cannot handle many dependencies between pixels. This scheme more closely resembles a one-dimensional version of a Lempel–Ziv code than a two-dimensional variant of a Lempel–Ziv code. Carroll *et al.* reported that the compression ratio and throughput of the decoder needed were not consistently achieved although they had success with “specially contrived patterns... such as line-space patterns and via patterns whose pitch is compatible with the raster pixel-pitch.”

While the Lempel–Ziv codes are popular for rapid implementations in designs where decoding speed and memory are important, this popularity is due in part to the ability of the Lempel–Ziv codes to attain some compression on a large collection of data types. Hence, it is interesting to consider compression schemes which may better match certain features of the lossless layout image compression problem.

### B. Some algorithms related to the Corner2 algorithm

The lossless compression scheme Corner2 (Refs. 12 and 13) for binary layout images on raster-scanning systems was motivated by the GDSII (Ref. 27) format, which is one of the standard formats for storing circuit layouts. The GDSII format represents circuit features such as polygons and lines by their corner points. Circuit data, which are described in GDSII format, are far more compact than the uncompressed image of a circuit layer. However, the GDSII format is not well-suited for EBDW applications because it takes hours on a complex computer system with large memory to convert a GDSII representation into the layout images needed for the lithography process. The Corner2 algorithm is designed to take advantage of the idea of corner representation while avoiding the complex processes used to convert a GDSII file into layout images.

We can summarize the operation of the Corner2 algorithm as follows. The algorithm attempts to account for some of the regularity within the circuit by creating a dictionary of frequently occurring patterns and modifying the layout image to replace these patterns with pointers to them. The next step is a corner transformation process on the modified image to create a third image; we will consider extensions of this transformation in Sec. III. This transformation is based on the observation that a pixel in a layout image often has the same value as the previous pixel in the horizontal direction and/or the previous pixel in the vertical direction. This third image generally contains many pixels with value zero. This third image is processed with a combination of run length encoding^{26} and end-of-block coding to obtain a one-dimensional data stream; we will discuss run length encoding and end-of-block coding in Sec. III. The final step in the Corner2 encoding of a layout image is to apply arithmetic coding^{18,19,28–31} to the one-dimensional data stream; this step is one of the options we will consider in Sec. III. The Corner2 algorithm has a simple decoding process.

Most of the earlier extensions of the Corner2 algorithm were intended for binary layout images. Reference 32 improves upon the frequent pattern replacement portion of the Corner 2 algorithm. References 33 and 34 discuss Corner2-MEB, which modifies the Corner2 algorithm to account for the lattice formation of the positions and the zigzag writing strategies of the electron beam writers in the MAPPER (Ref. 6) system.

Corner2 greatly outperformed the Block C4 algorithm in compression ratio, encoding/decoding times and memory usage, but it did not account for the properties of the objects in a layout image that has been processed by software for electron-beam proximity effect corrections. While the input to the Block C4 algorithm is a gray layout image the discussion about Block C4 only considered edge placement and not proximity effect corrections. The CornerGray^{15} algorithm was the first attempt to extend the Corner2 algorithm to gray-level images. It omits the frequent pattern replacement step of the Corner2 algorithm and assumes the gray-level layout images have objects with pixel intensities that are fully filled inside of polygon outlines, empty outside of polygon outlines, and taking on intermediate values along polygon outlines, which have a uniform depth of one pixel. The transformation step of the CornerGray algorithm produces a “corner stream,” which identifies the horizontal/vertical transition points, and an “intensity stream,” which represents the corner/edge intensities. These streams are processed by separate entropy encoders. The encoding of the “corner stream” closely resembles most of the steps of the Corner2 transformation. The encoding procedure for the “intensity stream” uses information about the run lengths of a single intensity value along a horizontal edge or along a vertical edge in determining the description of the intensity stream. The output of that process is then compressed by a combination of the LZ'77 algorithm and the Huffman code. The experimental results reported on a 500 nm memory core with repeated memory cell structure were a slight improvement in overall compression ratio and significant improvements in encoding and in decoding times over the Block C4 algorithm.

The LineDiffEntropy^{16} algorithm is a gray-level layout image compression algorithm, which is also designed to take advantage of the fact that a pixel in a layout image often has the same value as the previous pixel in the horizontal direction and/or the previous pixel in the vertical direction. The authors of Ref. 16 provide an explicit description of the compression and decompression schemes they have proposed. As with the Corner2-MEB algorithm, the LineDiffEntropy algorithm segments the layout image into blocks which correspond to the writing region of a single electron beam writer. Like the Corner2 algorithm and its variants, the LineDiffEntropy encoder and decoder operate in a row-by-row fashion. However, the LineDiffEntropy algorithm does not use a transformation like the Corner2 algorithm or the variations we will propose in Sec. III. Instead, the LineDiffEntropy algorithm uses string matching to determine regions where a row of pixels is identical to the previous row and for other regions it describes pairs of intensity values and run lengths of those intensity values. Like the Corner2 algorithm and its variants, the LineDiffEntropy algorithm uses a symbol that is essentially identical to the end-of-block symbol of the Corner2 algorithm, but it differs from the Corner2 algorithm in that it does not use run length encoding on the end-of-block symbols. There is a compaction step to make the resulting description of sequences of rows more succinct, and there are 1056 different possible symbols that can be used by the encoder up to this point. Each of these 1056 symbols is represented by a binary codeword of a prefix condition code. The authors report improvements in compression ratios and significant improvements in encoding and decoding times over the Block C4 algorithm.

## III. MODELING AND ALGORITHMS

We can outline the operation of the Corner2-EPC compression algorithms as follows. First, each layout image goes through a transformation. Second, we may elect to use an additional entropy coding step to attain further compression. In Subsections III A and III B, we will offer more details about these steps. In Subsection III C, we will describe how to invert the image created during the transformation to reproduce the original layout image. In Subsection III D, we will describe the Paeth-EPC variation, which alters the first step of the Corner2-EPC transformation to attain improved compression ratios at the expense of increases in encoding and decoding times.

### A. Corner2-EPC transformation

A pixel in a proximity corrected layout image frequently has the same value as the previous pixel in the horizontal direction and/or the previous pixel in the vertical direction. Figure 2 illustrates a two-step transformation in which the intermediate image is obtained by replacing each pixel in a row after the first one with the difference between the pixel value and the preceding pixel value from the original image. The final image is obtained by replacing each pixel in a column after the first one with the difference between the pixel value and the preceding pixel value from the intermediate image. Although it is conceptually simple to describe this transformation as a two-step process it is more effective to implement it in one step, which we summarize in Algorithm 1. In the algorithm, *x* denotes the column index $[1,\u2026,C]$ of the image and *y* denotes the row index of the image $[1,\u2026,R]$. Observe that if the pixels in the original layout image take on values in the set ${0,\u20091,\u2009\u2026,\u200931}$, then the pixels in the Corner2-EPC transformed image take on values in the set ${\u221262,\u221261,\u2009\u2026,\u200961,\u200962}$.

Input: Layer image $IN\u2208{0,1,\u2026,31}C\xb7R$ | |

Output: Corner image $OUT\u2208{\u221262,\u221261,\u2026,61,62}C\xb7R$ | |

1: Initialize $OUT(1,1)=IN(1,1).$ | |

2: for x = 2 to C do | |

3: $OUT(x,1)=IN(x,1)\u2212IN(x\u22121,1).$ | |

4: end for | |

5: for y = 2 to R do | |

6: $OUT(1,y)=IN(1,y)\u2212IN(1,y\u22121).$ | |

7: for x = 2 to C do | |

8: $OUT(x,y)=IN(x,y)+IN(x\u22121,y\u22121)\u2212IN(x\u22121,y)\u2212IN(x,y\u22121).$ | |

9: end for | |

10: end for |

Input: Layer image $IN\u2208{0,1,\u2026,31}C\xb7R$ | |

Output: Corner image $OUT\u2208{\u221262,\u221261,\u2026,61,62}C\xb7R$ | |

1: Initialize $OUT(1,1)=IN(1,1).$ | |

2: for x = 2 to C do | |

3: $OUT(x,1)=IN(x,1)\u2212IN(x\u22121,1).$ | |

4: end for | |

5: for y = 2 to R do | |

6: $OUT(1,y)=IN(1,y)\u2212IN(1,y\u22121).$ | |

7: for x = 2 to C do | |

8: $OUT(x,y)=IN(x,y)+IN(x\u22121,y\u22121)\u2212IN(x\u22121,y)\u2212IN(x,y\u22121).$ | |

9: end for | |

10: end for |

As Fig. 2 suggests, the transformed image tends to have many zeroes, and it is therefore effective to represent it with some form of run length encoding.^{26} The idea in the simplest version of run-length encoding is to describe nonzero values as they are but to count the repeated zero values between successive nonzero values and to describe each “run” of zero values with an encoding of its run length. Run length encodings are widely used,^{18} and there are many variable-length representations of the integers.^{26,35} Run length codes such as the Golomb–Rice code and the Exponential-Golomb code are components of video standards.^{36} Moussalli *et al.*^{37} offer a field-programmable gate array (FPGA) implementation of a Golomb–Rice decoder with 7.8 Gbits/s throughput using 10% of the available resources of a midsize to low-size FPGA. In 2014, a typical Golomb–Rice decoder implemented as part of a video decoder had a throughput of 10 Gbits/s on a 500 MHz/16 nm process and 7.6 Gbits/s on a 380 MHz/28 nm process.^{38}

The form of run length encoding that we consider here is taken from the Corner2 algorithm and its variants. Our data initially takes on values in the range ${\u221262,\u221261,\u2009\u2026,\u200961,\u200962}$. The first step is to segment the data into blocks of a predetermined length *L*. In the example below, let $0i$ denote a sequence of *i* zeroes. Suppose *L* = 7 and our initial data sequence is segmented as

In the next step of converting this stream, we introduce a temporary “end-of-block” symbol *X* which indicates that the remainder of the block is a sequence of zeroes. Our intermediate representation of the preceding data sequence becomes

For our final representation of the data we introduce *M* + *N* symbols and remove the earlier symbols 0 and *X*. *M* of the symbols denote the base-*M* symbols “$0M$,” “$1M$,” …, “$(M\u22121)M$,” and for each $1\u2264i<L$ we replace every occurrence of $0i$ with the base *M* representation of *i* using these *M* symbols. A typical intermediate data sequence has (sometimes long) runs of “X”s. Our other *N* symbols denote the base-*N* symbols “$0N$,” “$1N$,” …, “$(N\u22121)N$.” and we replace each run of “X”s with the base *N* representation of its run length using these *N* characters. In a hardware implementation, it may be necessary to restrict the maximum length of runs of “X”s that can be decoded at once.

Observe that there are $124+M+N$ possible symbols used to represent the transformed layout image. For ease of implementation it is preferable to select *M* and *N* to be powers of two. As we will explain in Subsection III B, it is convenient to choose $124+M+N\u2264255$ to take advantage of compression algorithms that are available online. However, for one Corner2-EPC scheme we examine we assign an 8-bit string to each of the $124+M+N$ possible symbols, and we use no further encoding.

Although we have separately discussed the image transformation algorithm and the run length encoding algorithm, these are implemented together. If an additional entropy encoding step is used, then we are handling it separately to use open source implementations.

### B. Additional entropy encoding

As we will discuss in Sec. IV, we attain better overall compression ratios than the PNG image compression standard without additional entropy encoding. However, the data stream produced by the algorithm of Subsection III A can be compressed further, and it is interesting to consider ways to do so which are amenable to hardware implementations. As with the Corner2 algorithm and its variants, we consider arithmetic coding. Arithmetic coding is widely implemented in video coders and decoders.^{29–31,36} We use an open source implementation which permits a maximum input symbol alphabet size of 256.

There have been hardware implementations of the Lempel–Ziv codes targeting high throughput, which are motivated by the needs of data centers and communication networks.^{39} IBM researchers^{40} have sustained throughputs of 3 Gbytes/s (Ref. 41) for gzip (which combines LZ'77 and the Huffman code or an approximation to it) on the Canterbury corpus and the Large corpus.^{18} The AHA3642 integrated circuit offers a 20 Gbits/s compression/decompression throughput and is described as “low cost.” Because deflate is the basis for gzip, we use the standard *zlib* implementation^{22} of deflate, which permits a maximum input symbol alphabet size of 256.

### C. Inversion of the Corner2-EPC transformation

Just as the Corner2-EPC transformation involves a transformation of a layout image and some form of run length encoding, the inversion of this process requires the corresponding run length decoding and the inverse transformation of an image. Algorithm 2 summarizes the inverse image transformation algorithm.

Input: Corner image $IN\u2208{\u221262,\u221261,\u2026,61,62}C\xb7R$ | |

Output: Layer image $OUT\u2208{0,1,\u2026,31}C\xb7R$ | |

1: Initialize $OUT(1,1)=IN(1,1).$ | |

2: for x = 2 to C do | |

3: $OUT(x,1)=IN(x,1)+OUT(x\u22121,1).$ | |

4: end for | |

5: for y = 2 to R do | |

6: $OUT(1,y)=IN(1,y)+OUT(1,y\u22121).$ | |

7: for x = 2 to C do | |

8: $OUT(x,y)=IN(x,y)\u2212OUT(x\u22121,y\u22121)+OUT(x\u22121,y)+OUT(x,y\u22121).$ | |

9: end for | |

10: end for |

Input: Corner image $IN\u2208{\u221262,\u221261,\u2026,61,62}C\xb7R$ | |

Output: Layer image $OUT\u2208{0,1,\u2026,31}C\xb7R$ | |

1: Initialize $OUT(1,1)=IN(1,1).$ | |

2: for x = 2 to C do | |

3: $OUT(x,1)=IN(x,1)+OUT(x\u22121,1).$ | |

4: end for | |

5: for y = 2 to R do | |

6: $OUT(1,y)=IN(1,y)+OUT(1,y\u22121).$ | |

7: for x = 2 to C do | |

8: $OUT(x,y)=IN(x,y)\u2212OUT(x\u22121,y\u22121)+OUT(x\u22121,y)+OUT(x,y\u22121).$ | |

9: end for | |

10: end for |

Just as we implement a combination of the image transformation algorithm and the run length encoding algorithm, we implement the run length decoding algorithm together with the inverse image transformation because it is impractical to store entire images. As Algorithm 2 indicates, a layer image can be decoded on a row-by-row basis. If the decoder is currently working on row *y*, it uses the output of the run length encoding step to recover row *y* of the transformed image as well as the previous row and current row of the original layer image.

In terms of decoding an additional entropy encoder, we can decode that last step separately, and this is the approach we use for deflate. For arithmetic coding we adopt a “memory save” mode in which arithmetic decoding is combined with all of the other decoding operations.

### D. Paeth-EPC transformation

The first step of the Corner2-EPC transformation inputs an image with pixels taking on values ${0,\u20091,\u2009\u2026,\u200931}$ and outputs a sparse image with pixels taking on values ${\u221262,\u221261,\u2009\u2026,\u200962}$. We can alternatively use a computationally more demanding scheme motivated by the Paeth^{42} filter, which produces another sparse output image but has pixels taking on values ${0,\u20091,\u2009\u2026,\u200931}$. In addition to the input image $IN(x,y)$ and output image $OUT(x,y)$, the first step of the Paeth-EPC transformation must also maintain a “prediction” image $PRED(x,y)$, which has pixels taking on values ${0,\u20091,\u2009\u2026,\u200931}$ and is defined as follows:

Although the prediction image is explicitly provided by the Paeth filter, the way to use it to produce an output image is not. For computational simplicity, we choose

At the decoder, suppose the input pixel is $INd(x,y)$ and the decoder wishes to recover the original pixel $OUTd(x,y)$. Then, the decoder will use the earlier decoded pixels $OUTd(x\u22121,y),OUTd(x,y\u22121),\u2009\u2009and\u2009\u2009OUTd(x\u22121,y\u22121)$ to calculate the value of $PRED(x,y)$ and will subsequently compute

We use the same approach to run length encoding as we do for the Corner2-EPC transformation. As we will see in Sec. IV, this approach leads to better compression ratios at the expense of encoding and decoding times.

## IV. RESULTS AND DISCUSSION

There are two circuits for which we provide experimental results. The experiments were performed on Intel i7-2600 CPU processors at 3.40 GHz with 8 GB of RAM using a WD Elements 1 TB portable external drive, a Windows7 Enterprise operating system and the electron beam proximity effect correction algorithm of GenISys, Inc., beamer_v4.6.2_x64. The implementations of the algorithms we propose and LDE are written in C/C++; the BC4 algorithm is in $C\u266f$.

The output of the beamer software is in PNG format, which initially represents the value of each pixel with eight bits instead of five. Therefore, the input to all of the algorithms is in initially in PNG format, but this input is subsequently converted to pixels. We define the compression ratio of a layer as

The file sizes contain any overheads needed by the decoder and are all measured in bytes; for example, we used four bytes each to represent the width of the image, the height of the image and the length of the data stream produced at the end of the run length encoding step. The last row of Tables I and III is not the average of the preceding rows in the respective tables, but instead

With the exception of the LineDiffEntropy results, our encoding time measurements include the write time of the compressed file to the portable drive, and our decoding time measurements include the read time of the compressed file from the portable drive. We make an exception with the LineDiffEntropy results because we are not using the original code for this algorithm.

Layer . | PNG . | BC4 . | LDE . | Corner2-EPC (plain) . | Corner2-EPC (AC) . | Corner2-EPC (deflate) . | P-EPC (deflate) . |
---|---|---|---|---|---|---|---|

1 | 695.8 | 909.6 | 1310.4 | 1295.1 | 1690.3 | 2766.7 | 3221.9 |

2 | 998.2 | 1339.7 | 9832.7 | 44 661.2 | 64 367.8 | 395 852.5 | 500 239.1 |

3 | 364.2 | 436.4 | 320.8 | 237.1 | 327.9 | 793.5 | 835.7 |

4 | 395.9 | 503.0 | 357.2 | 250.3 | 347.5 | 852.7 | 874.4 |

5 | 272.6 | 308.1 | 283.0 | 249.9 | 335.1 | 488.2 | 549.7 |

6 | 952.7 | 1364.0 | 113 246.6 | 5 520 602.2 | 6 229 328.2 | 6 447 136.9 | 6 931 884.0 |

7 | 1026.6 | 1363.6 | 110 956.9 | 1 132 605.1 | 1 575 966.8 | 2 604 351.9 | 2 664 568.1 |

8 | 247.0 | 279.6 | 210.0 | 153.7 | 207.3 | 404.0 | 415.1 |

9 | 482.2 | 568.5 | 535.5 | 364.4 | 520.6 | 1020.3 | 1027.9 |

10 | 429.5 | 538.7 | 605.8 | 465.3 | 639.1 | 782.6 | 1001.0 |

11 | 1026.6 | 1363.9 | 80 659.7 | 2 044 214.1 | 2 908 329.9 | 7 146 826.1 | 7 495 451.8 |

12 | 437.3 | 498.5 | 418.6 | 294.0 | 407.4 | 838.0 | 869.5 |

13 | 836.0 | 1105.7 | 1864.4 | 2663.5 | 3739.3 | 6273.0 | 6341.9 |

14 | 766.2 | 1114.3 | 1469.3 | 3284.2 | 4551.7 | 6694.0 | 6918.9 |

15 | 1026.6 | 1364.1 | 117 504.5 | 6 065 398.5 | 6 778 974.8 | 6 680 728.8 | 7 435 004.6 |

16 | 770.8 | 1019.1 | 1420.4 | 1935.7 | 2742.8 | 5089.4 | 5114.9 |

17 | 1025.4 | 1362.9 | 83 980.7 | 717 463.5 | 911 008.5 | 1 403 258.1 | 1 501 531.9 |

18 | 1025.4 | 1363.4 | 85 961.8 | 1 074 522.8 | 1 268 143.8 | 1 477 468.9 | 1 790 175.9 |

19 | 1026.6 | 1364.3 | 116 040.3 | 14 870 009.2 | 15 895 527.1 | 18 438 811.4 | 20 042 186.4 |

20 | 1025.4 | 1362.9 | 83 866.1 | 704 309.1 | 896 829.4 | 1 390 558.9 | 1 487 000.9 |

21 | 1026.6 | 1364.3 | 118 258.2 | 11 381 982.4 | 12 458 656.4 | 11 670 133.8 | 13 170 579.6 |

22 | 1016.2 | 1364.3 | 119 592.8 | 17 072 973.6 | 17 729 626.4 | 16 174 396.0 | 17 729 626.4 |

23 | 1026.6 | 1364.3 | 118 258.2 | 11 381 982.4 | 12 458 656.4 | 11 670 133.8 | 13 170 579.6 |

24 | 926.3 | 1331.3 | 20 586.4 | 26 130.6 | 38 607.2 | 73 397.1 | 75 879.9 |

25 | 946.7 | 1362.7 | 88 682.2 | 863 240.2 | 1 322 726.8 | 1 726 480.5 | 1 866 276.5 |

All | 643.4 | 809.0 | 1083.5 | 860.6 | 1181.2 | 2213.5 | 2377.1 |

Layer . | PNG . | BC4 . | LDE . | Corner2-EPC (plain) . | Corner2-EPC (AC) . | Corner2-EPC (deflate) . | P-EPC (deflate) . |
---|---|---|---|---|---|---|---|

1 | 695.8 | 909.6 | 1310.4 | 1295.1 | 1690.3 | 2766.7 | 3221.9 |

2 | 998.2 | 1339.7 | 9832.7 | 44 661.2 | 64 367.8 | 395 852.5 | 500 239.1 |

3 | 364.2 | 436.4 | 320.8 | 237.1 | 327.9 | 793.5 | 835.7 |

4 | 395.9 | 503.0 | 357.2 | 250.3 | 347.5 | 852.7 | 874.4 |

5 | 272.6 | 308.1 | 283.0 | 249.9 | 335.1 | 488.2 | 549.7 |

6 | 952.7 | 1364.0 | 113 246.6 | 5 520 602.2 | 6 229 328.2 | 6 447 136.9 | 6 931 884.0 |

7 | 1026.6 | 1363.6 | 110 956.9 | 1 132 605.1 | 1 575 966.8 | 2 604 351.9 | 2 664 568.1 |

8 | 247.0 | 279.6 | 210.0 | 153.7 | 207.3 | 404.0 | 415.1 |

9 | 482.2 | 568.5 | 535.5 | 364.4 | 520.6 | 1020.3 | 1027.9 |

10 | 429.5 | 538.7 | 605.8 | 465.3 | 639.1 | 782.6 | 1001.0 |

11 | 1026.6 | 1363.9 | 80 659.7 | 2 044 214.1 | 2 908 329.9 | 7 146 826.1 | 7 495 451.8 |

12 | 437.3 | 498.5 | 418.6 | 294.0 | 407.4 | 838.0 | 869.5 |

13 | 836.0 | 1105.7 | 1864.4 | 2663.5 | 3739.3 | 6273.0 | 6341.9 |

14 | 766.2 | 1114.3 | 1469.3 | 3284.2 | 4551.7 | 6694.0 | 6918.9 |

15 | 1026.6 | 1364.1 | 117 504.5 | 6 065 398.5 | 6 778 974.8 | 6 680 728.8 | 7 435 004.6 |

16 | 770.8 | 1019.1 | 1420.4 | 1935.7 | 2742.8 | 5089.4 | 5114.9 |

17 | 1025.4 | 1362.9 | 83 980.7 | 717 463.5 | 911 008.5 | 1 403 258.1 | 1 501 531.9 |

18 | 1025.4 | 1363.4 | 85 961.8 | 1 074 522.8 | 1 268 143.8 | 1 477 468.9 | 1 790 175.9 |

19 | 1026.6 | 1364.3 | 116 040.3 | 14 870 009.2 | 15 895 527.1 | 18 438 811.4 | 20 042 186.4 |

20 | 1025.4 | 1362.9 | 83 866.1 | 704 309.1 | 896 829.4 | 1 390 558.9 | 1 487 000.9 |

21 | 1026.6 | 1364.3 | 118 258.2 | 11 381 982.4 | 12 458 656.4 | 11 670 133.8 | 13 170 579.6 |

22 | 1016.2 | 1364.3 | 119 592.8 | 17 072 973.6 | 17 729 626.4 | 16 174 396.0 | 17 729 626.4 |

23 | 1026.6 | 1364.3 | 118 258.2 | 11 381 982.4 | 12 458 656.4 | 11 670 133.8 | 13 170 579.6 |

24 | 926.3 | 1331.3 | 20 586.4 | 26 130.6 | 38 607.2 | 73 397.1 | 75 879.9 |

25 | 946.7 | 1362.7 | 88 682.2 | 863 240.2 | 1 322 726.8 | 1 726 480.5 | 1 866 276.5 |

All | 643.4 | 809.0 | 1083.5 | 860.6 | 1181.2 | 2213.5 | 2377.1 |

Layer . | PNG . | BC4 . | LDE . | Corner2-EPC (plain) . | Corner2-EPC (AC) . | Corner2-EPC (deflate) . | P-EPC (deflate) . |
---|---|---|---|---|---|---|---|

1 | 1014.4 | 1357.3 | 54 948.4 | 16 0735.2 | 243 394.2 | 34 7199.8 | 415 734.3 |

2 | 958.6 | 1320.3 | 5116.2 | 25 191.7 | 36 472.7 | 49 379.3 | 60 172.4 |

3 | 899.1 | 1230.4 | 2102.1 | 7327.9 | 10 033.9 | 16 600.7 | 20 200.2 |

4 | 948.0 | 1307.3 | 3482.1 | 18 693.6 | 25 698.2 | 41 625.2 | 50 916.3 |

5 | 887.0 | 1227.7 | 1745.4 | 5501.9 | 7950.7 | 17 830.3 | 21 953.6 |

6 | 870.2 | 1205.8 | 1635.4 | 4800.1 | 6448.9 | 15 171.1 | 18 603.2 |

7 | 412.6 | 613.5 | 77.6 | 54.5 | 117.7 | 1483.7 | 1889.5 |

8 | 761.6 | 1035.4 | 871.6 | 2392.9 | 3251.0 | 5253.5 | 6428.3 |

9 | 712.9 | 1018.5 | 462.2 | 454.0 | 810.1 | 7799.9 | 8177.1 |

10 | 788.4 | 1083.4 | 780.6 | 2655.2 | 3652.9 | 7294.8 | 8397.7 |

11 | 991.1 | 1349.3 | 28 198.8 | 76 023.5 | 131 744.4 | 390 702.9 | 452 622.2 |

12 | 751.3 | 1063.4 | 607.8 | 573.2 | 1020.4 | 10 684.0 | 11 297.3 |

13 | 885.3 | 1230.5 | 1628.8 | 5741.0 | 8055.5 | 20852.0 | 24 221.6 |

14 | 753.6 | 1064.2 | 635.4 | 595.8 | 1057.7 | 11 648.5 | 12 247.6 |

15 | 888.4 | 1234.5 | 1757.3 | 5987.9 | 8463.0 | 22 302.9 | 25 905.7 |

16 | 760.8 | 1082.9 | 673.8 | 653.0 | 1187.6 | 14 305.8 | 14 921.1 |

17 | 890.1 | 1239.8 | 1978.6 | 5890.1 | 8707.9 | 25 277.4 | 29 534.5 |

18 | 979.4 | 1319.8 | 5712.8 | 24 011.9 | 34 993.9 | 71 174.4 | 83 414.3 |

All | 808.6 | 1128.7 | 685.8 | 654.7 | 1293.1 | 10 242.8 | 12 172.3 |

Layer . | PNG . | BC4 . | LDE . | Corner2-EPC (plain) . | Corner2-EPC (AC) . | Corner2-EPC (deflate) . | P-EPC (deflate) . |
---|---|---|---|---|---|---|---|

1 | 1014.4 | 1357.3 | 54 948.4 | 16 0735.2 | 243 394.2 | 34 7199.8 | 415 734.3 |

2 | 958.6 | 1320.3 | 5116.2 | 25 191.7 | 36 472.7 | 49 379.3 | 60 172.4 |

3 | 899.1 | 1230.4 | 2102.1 | 7327.9 | 10 033.9 | 16 600.7 | 20 200.2 |

4 | 948.0 | 1307.3 | 3482.1 | 18 693.6 | 25 698.2 | 41 625.2 | 50 916.3 |

5 | 887.0 | 1227.7 | 1745.4 | 5501.9 | 7950.7 | 17 830.3 | 21 953.6 |

6 | 870.2 | 1205.8 | 1635.4 | 4800.1 | 6448.9 | 15 171.1 | 18 603.2 |

7 | 412.6 | 613.5 | 77.6 | 54.5 | 117.7 | 1483.7 | 1889.5 |

8 | 761.6 | 1035.4 | 871.6 | 2392.9 | 3251.0 | 5253.5 | 6428.3 |

9 | 712.9 | 1018.5 | 462.2 | 454.0 | 810.1 | 7799.9 | 8177.1 |

10 | 788.4 | 1083.4 | 780.6 | 2655.2 | 3652.9 | 7294.8 | 8397.7 |

11 | 991.1 | 1349.3 | 28 198.8 | 76 023.5 | 131 744.4 | 390 702.9 | 452 622.2 |

12 | 751.3 | 1063.4 | 607.8 | 573.2 | 1020.4 | 10 684.0 | 11 297.3 |

13 | 885.3 | 1230.5 | 1628.8 | 5741.0 | 8055.5 | 20852.0 | 24 221.6 |

14 | 753.6 | 1064.2 | 635.4 | 595.8 | 1057.7 | 11 648.5 | 12 247.6 |

15 | 888.4 | 1234.5 | 1757.3 | 5987.9 | 8463.0 | 22 302.9 | 25 905.7 |

16 | 760.8 | 1082.9 | 673.8 | 653.0 | 1187.6 | 14 305.8 | 14 921.1 |

17 | 890.1 | 1239.8 | 1978.6 | 5890.1 | 8707.9 | 25 277.4 | 29 534.5 |

18 | 979.4 | 1319.8 | 5712.8 | 24 011.9 | 34 993.9 | 71 174.4 | 83 414.3 |

All | 808.6 | 1128.7 | 685.8 | 654.7 | 1293.1 | 10 242.8 | 12 172.3 |

The first circuit we consider is a 25-layer image compression block based on the FREEPDK45 45 nm library with a minimum element of 60 nm. We use a pixel size of 30 nm × 30 nm, and each layout image consists of 30 403 × 30 324 pixels. For both circuits, we experience a memory shortage for the encoding process when we attempt to run BC4 on an entire layout image and hence have to segment the image into the largest components for which BC4 could be applied. In the case of the image compression block, we split each layout image into four segments, which are approximately quadrants of the image. The experiments for all other algorithms were on full layout images.

The second circuit we study is an 18-layer binary frequency shift keying (BFSK) transmitter targeting 250 nm lithography technology. We use a pixel size of 40 nm × 40 nm, and each layout image consists of 79 050 × 79 050 pixels. For the BC4 experiments, we split each layout image into 28 segments where each segment consists of about one quarter of the rows and one seventh of the columns.

We briefly comment on the proximity effect correction algorithm we used and the dose assignments for primitives. In the case of the layout images used for the experiments, most of the shapes were of such a small size that a single calculated dose was applied to the shape and did not require fracturing into smaller primitives for dose adjustment. If a shape is large enough and is influenced by other nearby shapes, the GenISys proximity effect correction algorithm beamer will physically fracture it into smaller primitive shapes to achieve the proper doses for maintaining the critical dimension and the edge placement accuracy of the whole original shape. The proximity effect correction algorithm is not applied on a pixel-by-pixel basis. When the moving of edges was required due to differences between the design grid and the writing grid, this was performed after the proximity effect correction algorithm was applied, and this could generate an additional one to eight primitives, which could be as small as one pixel.

The earlier image compression algorithms we consider in our experiments are the PNG standard, the BC4 algorithm, and the LDE algorithm. For the four algorithms of ours for which we report results, we use the parameters $M=N=64$ and set *L* to be the number of pixels in a row of the layout image. We report results for the Corner2-EPC algorithm without additional entropy encoding [Corner2-EPC (plain)], the Corner2-EPC algorithm followed by arithmetic coding [Corner2-EPC (AC)], the Corner2-EPC algorithm followed by deflate [Corner2-EPC (deflate)], and the Paeth-EPC algorithm followed by deflate [P-EPC (deflate)].

Tables I and II provide the compression ratios and a summary of encoding and decoding time statistics for the image compression block, and Tables III and IV provide the corresponding results for the BFSK transmitter circuit. We include the maximum of the decoding times among layers because this worst case may be important in a production environment. In all cases, we provide an approximation of the PNG encoding and decoding times, as well as an underestimate of the LineDiffEntropy encoding and decoding times. The PNG “encoding” time of an image is the time used by the libpng algorithm to write the compressed file to disk one row at a time, and the “decoding” time is the time used by the libpng algorithm to read the compressed file from the disk 5000 rows at a time. The LineDiffEntropy encoding and decoding times do not include the times to, respectively, write and read from the disk.

. | Encoding time (s) . | Decoding time (s) . | ||||
---|---|---|---|---|---|---|

Algorithm . | Best . | Worst . | Average . | Best . | Worst . | Average . |

PNG | 14.24 | 16.62 | 15.69 | 0.83 | 3.75 | 1.60 |

BC4 | 3464.49 | 3581.30 | 3500.66 | 72.39 | 113.67 | 89.32 |

LDE | 1.75 | 2.29 | 1.87 | 0.88 | 4.99 | 2.01 |

Corner2-EPC (plain) | 3.75 | 4.12 | 3.84 | 2.00 | 2.69 | 2.19 |

Corner2-EPC (AC) | 3.75 | 5.04 | 4.04 | 1.99 | 3.69 | 2.41 |

Corner2-EPC (deflate) | 3.26 | 3.74 | 3.37 | 1.97 | 2.63 | 2.19 |

P-EPC (deflate) | 7.25 | 7.91 | 7.42 | 5.13 | 6.00 | 5.50 |

. | Encoding time (s) . | Decoding time (s) . | ||||
---|---|---|---|---|---|---|

Algorithm . | Best . | Worst . | Average . | Best . | Worst . | Average . |

PNG | 14.24 | 16.62 | 15.69 | 0.83 | 3.75 | 1.60 |

BC4 | 3464.49 | 3581.30 | 3500.66 | 72.39 | 113.67 | 89.32 |

LDE | 1.75 | 2.29 | 1.87 | 0.88 | 4.99 | 2.01 |

Corner2-EPC (plain) | 3.75 | 4.12 | 3.84 | 2.00 | 2.69 | 2.19 |

Corner2-EPC (AC) | 3.75 | 5.04 | 4.04 | 1.99 | 3.69 | 2.41 |

Corner2-EPC (deflate) | 3.26 | 3.74 | 3.37 | 1.97 | 2.63 | 2.19 |

P-EPC (deflate) | 7.25 | 7.91 | 7.42 | 5.13 | 6.00 | 5.50 |

. | Encoding time (s) . | Decoding time (s) . | ||||
---|---|---|---|---|---|---|

Algorithm . | Best . | Worst . | Average . | Best . | Worst . | Average . |

PNG | 98.34 | 109.51 | 102.73 | 6.64 | 27.41 | 16.70 |

BC4 | 23 126.36 | 23 751.60 | 23 400.18 | 487.28 | 937.71 | 599.39 |

LDE | 11.79 | 19.62 | 12.66 | 6.77 | 43.14 | 19.05 |

Corner2-EPC (plain) | 27.44 | 34.05 | 28.03 | 14.10 | 20.10 | 15.66 |

Corner2-EPC (AC) | 27.26 | 46.41 | 29.15 | 13.86 | 37.73 | 17.04 |

Corner2-EPC (deflate) | 27.25 | 31.96 | 27.92 | 13.89 | 19.76 | 15.46 |

P-EPC (deflate) | 51.16 | 55.67 | 52.33 | 36.62 | 41.95 | 38.20 |

. | Encoding time (s) . | Decoding time (s) . | ||||
---|---|---|---|---|---|---|

Algorithm . | Best . | Worst . | Average . | Best . | Worst . | Average . |

PNG | 98.34 | 109.51 | 102.73 | 6.64 | 27.41 | 16.70 |

BC4 | 23 126.36 | 23 751.60 | 23 400.18 | 487.28 | 937.71 | 599.39 |

LDE | 11.79 | 19.62 | 12.66 | 6.77 | 43.14 | 19.05 |

Corner2-EPC (plain) | 27.44 | 34.05 | 28.03 | 14.10 | 20.10 | 15.66 |

Corner2-EPC (AC) | 27.26 | 46.41 | 29.15 | 13.86 | 37.73 | 17.04 |

Corner2-EPC (deflate) | 27.25 | 31.96 | 27.92 | 13.89 | 19.76 | 15.46 |

P-EPC (deflate) | 51.16 | 55.67 | 52.33 | 36.62 | 41.95 | 38.20 |

In terms of compression ratios, the algorithms we propose using deflate outperform all other algorithms on every layer for both circuits. The original Lempel–Ziv codes are known^{21,43} to asymptotically attain maximum data compression rates for certain types of one-dimensional data, but there are no comparable results for their performance on two-dimensional data. Although we do not use a frequent pattern replacement step in the proposed algorithms that resembles the one used in Corner2, the deflate algorithm effectively has a similar role. We also observe that most of the compression from our proposed algorithms comes from the first transformation step.

In terms of encoding times, we provide an approximation for the PNG algorithm and an underestimate for the LineDiffEntropy algorithm. The LineDiffEntropy algorithm has some additional advantage over the proposed algorithms because it does not use an image transformation technique and it has one run-length encoding part instead of two. However, the algorithms we provide have reasonable encoding times.

Decoding times are important for the electron-beam direct-write application. Here, the Corner2-EPC algorithms have the best performance in the worst-case decoding time of a layer, and for the BFSK transmitter, the deflate version is also the best for total decoding time among all layers.

As we suggested earlier, the Paeth-EPC algorithms improve the compression ratios of the Corner2-EPC algorithms at the expense of encoding and decoding times. The individual compression gains of both Paeth and deflate are greater in the BFSK transmitter circuit than in the image compression block; we speculate that this is partly because there is more alignment of patterns within the BFSK transmitter circuit than within the image compression block.

## V. SUMMARY AND CONCLUSIONS

We have presented a group of layout image compression algorithms that offer high compression ratios and show trade-offs between compression ratios and decoding times. The Block C4 algorithm offers relatively uniform compression ratios among layers. The LineDiffEntropy algorithm offers low decoding times for sparse layers. The Corner2-EPC algorithm with no additional entropy encoding appears to have the simplest decoding in terms of memory requirements. The Corner2-EPC algorithm with arithmetic coding offers a compromise between simplicity of decoding and compression ratios. The Corner2-EPC algorithm with deflate offers the best compression ratios without compromising decompression time, but it may have more memory requirements than the previous algorithms. The Paeth-EPC algorithms offer the best compression ratios at the expense of decoding time and memory.

For future research, Lin^{44} discussed the need to add redundancy to layout image data being written to the REBL system to reduce sensitivity to contamination on the digital pattern generator; it would be desirable to understand how this issue affects the data delivery problem for the REBL system.

## ACKNOWLEDGMENTS

This work was supported in part by NSF Grant No. ECCS-1201994 and was made possible through an Agreement of Cooperation between GenISys, Inc. and Texas A&M Engineering Experiment Station at College Station. The authors also thank S. Khatri, D. Lie, and S. Mukhopadhyay for their circuit data and V. Dai for providing an implementation of the Block C4 algorithm. The authors are grateful to J. Yang for providing them with an implementation of the Corner2 algorithm and for other helpful correspondences.