# Fabio Pellacini Homeworks Real Estate

## Transcription

1 2nd Summit on Advances in Programming Languages SNAPL 2017, May 7 10, 2017, Asilomar, CA, USA Edited by Benjamin S. Lerner Rastislav Bodík Shriram Krishnamurthi L I P I c s Vo l. 7 1 S N A P L w w w. d a g s t u h l. d e / l i p i c s

2 Editors Benjamin S. Lerner Rastislav Bodík Shriram Krishnamurthi Northeastern University University of California Berkeley Brown University USA USA USA ACM Classification 1998 D.3 Programming Languages ISBN Published online and open access by Schloss Dagstuhl Leibniz-Zentrum für Informatik GmbH, Dagstuhl Publishing, Saarbrücken/Wadern, Germany. Online available at Publication date May, 2017 Bibliographic information published by the Deutsche Nationalbibliothek The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available in the Internet at License This work is licensed under a Creative Commons Attribution 3.0 Unported license (CC-BY 3.0): In brief, this license authorizes each and everybody to share (to copy, distribute and transmit) the work under the following conditions, without impairing or restricting the authors moral rights: Attribution: The work must be attributed to its authors. The copyright is retained by the corresponding authors. Digital Object Identifier: /LIPIcs.SNAPL ISBN ISSN

3 0:iii LIPIcs Leibniz International Proceedings in Informatics LIPIcs is a series of high-quality conference proceedings across all fields in informatics. LIPIcs volumes are published according to the principle of Open Access, i.e., they are available online and free of charge. Editorial Board Susanne Albers (TU München) Chris Hankin (Imperial College London) Deepak Kapur (University of New Mexico) Michael Mitzenmacher (Harvard University) Madhavan Mukund (Chennai Mathematical Institute) Catuscia Palamidessi (INRIA) Wolfgang Thomas (Chair, RWTH Aachen) Pascal Weil (CNRS and University Bordeaux) Reinhard Wilhelm (Saarland University) ISSN S N A P L

4

5 Contents Preface Benjamin S. Lerner, Rastislav Bodík, and Shriram Krishnamurthi :vii Everest: Towards a Verified, Drop-in Replacement of HTTPS Karthikeyan Bhargavan, Barry Bond, Antoine Delignat-Lavaud, Cédric Fournet, Chris Hawblitzel, Cătălin Hriţcu, Samin Ishtiaq, Markulf Kohlweiss, Rustan Leino, Jay Lorch, Kenji Maillard, Jianyang Pang, Bryan Parno, Jonathan Protzenko, Tahina Ramananandro, Ashay Rane, Aseem Rastogi, Nikhil Swamy, Laure Thompson, Peng Wang, Santiago Zanella-Béguelin, and Jean-Karim Zinzindohoué :1 1:12 Domain-Specific Symbolic Compilation Rastislav Bodik, Kartik Chandra, Phitchaya Mangpo Phothilimthana, and Nathaniel Yazdani :1 2:17 The End of History? Using a Proof Assistant to Replace Language Design with Library Design Adam Chlipala, Benjamin Delaware, Samuel Duchovni, Jason Gross, Clément Pit-Claudel, Sorawit Suriyakarn, Peng Wang, and Katherine Ye :1 3:15 Natural Language is a Programming Language: Applying Natural Language Processing to Software Development Michael D. Ernst :1 4:14 Fission: Secure Dynamic Code-Splitting for JavaScript Arjun Guha, Jean-Baptiste Jeannin, Rachit Nigam, Rian Shambaugh, and Jane Tangen :1 5:13 I Can Parse You: Grammars for Dialogs Martin Hirzel, Louis Mandel, Avraham Shinnar, Jérôme Siméon, and Mandana Vaziri :1 6:15 Leveraging Sequential Computation for Programming Efficient and Reliable Distributed Systems Ivan Kuraj and Armando Solar-Lezama :1 7:15 Intermittent Computing: Challenges and Opportunities Brandon Lucia, Vignesh Balaji, Alexei Colin, Kiwan Maeng, and Emily Ruppel.. 8:1 8:14 Uncanny Valleys in Declarative Language Design Mark S. Miller, Daniel von Dincklage, Vuk Ercegovac, and Brian Chin :1 9:12 Programming Language Tools and Techniques for 3D Printing Chandrakana Nandi, Anat Caspi, Dan Grossman, and Zachary Tatlock :1 10:12 Toward Semantic Foundations for Program Editors Cyrus Omar, Ian Voysey, Michael Hilton, Joshua Sunshine, Claire Le Goues, Jonathan Aldrich, and Matthew A. Hammer :1 11:12 Linking Types for Multi-Language Software: Have Your Cake and Eat It Too Daniel Patterson and Amal Ahmed :1 12:15 2nd Summit on Advances in Programming Languages (SNAPL 2017). Editors: Benjamin S. Lerner, Rastislav Bodík, and Shriram Krishnamurthi Leibniz International Proceedings in Informatics Schloss Dagstuhl Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany

6 0:vi Contents Teaching Programming Languages by Experimental and Adversarial Thinking Justin Pombrio, Shriram Krishnamurthi, and Kathi Fisler :1 13:9 Let s Fix OpenGL Adrian Sampson :1 14:12 Search for Program Structure Gabriel Scherer :1 15:14 AP: Artificial Programming Rishabh Singh and Pushmeet Kohli :1 16:12 Migratory Typing: Ten Years Later Sam Tobin-Hochstadt, Matthias Felleisen, Robert Bruce Findler, Matthew Flatt, Ben Greenman, Andrew M. Kent, Vincent St-Amour, T. Stephen Strickland, and Asumu Takikawa :1 17:17 Taming the Static Analysis Beast John Toman and Dan Grossman :1 18:14 Programming Language Abstractions for Modularly Verified Distributed Systems James R. Wilcox, Ilya Sergey, and Zachary Tatlock :1 19:12

7 Preface This is the second running of the Summit on Advances in Programming Languages (SNAPL), a new venue for the programming languages community. The goal of SNAPL is to complement existing conferences by discussing big-picture questions. After the success of the first SNAPL in 2015, we hope to continue growing the venue into a place where our community comes to enjoy talks with inspiring ideas, fresh insights, and lots of discussion. Open to perspectives from both industry and academia, SNAPL values innovation, experience-based insight, and vision. Not affiliated with any other organization, SNAPL is organized by the PL community for the PL community. We planned to hold SNAPL every two years in early May in Asilomar, California, and this second running is consistent with that plan. SNAPL has drawn on the elements from many successful meeting formats such as the database community s CIDR conference, the security community s NSPW workshop, and others, and continues to evolve its own particular flavor. The focus on SNAPL is not primarily on papers but rather on talks and interaction. Nevertheless, a short paper is the primary medium by which authors request and obtain time to speak. A good SNAPL entry, however, does not have the character of a regular conference submission we already have plenty of venues for those. Rather, it is closer to the character of an invited talk, encompassing all the diversity that designation suggests: visionary idea, progress report, retrospective, analysis of mistakes, call to action, and more. Thus, a SNAPL submission should be viewed more as a request to give an invited talk. In the process of assembling SNAPL 2017, we also realized that SNAPL serves an additional useful role: serving as a discussion venue for programming languages akin to the role the Snowbird conference plays for computer science chairs and deans. In this spirit, we invited the community to suggest the names of junior researchers in programming languages who would be interesting to invite, and selected a few names from this long and impressive list. We also intend to invite a few senior researchers to address the gathering. Overall, the submissions suggest SNAPL remains an interesting and valuable venue. Its main weakness was fewer submissions than we expected (28, but of sufficient quality that we were able to accept 18). We have heard three problems with the organization: the submission date was poorly timed (January 6 was too close to or amidst vacation time), there was insufficiently broad publicity, and the chosen date clashes with a few other venues. The first two of these, in particular, seem easy to remedy in the future. Benjamin S. Lerner, Rastislav Bodík, and Shriram Krishnamurthi May, nd Summit on Advances in Programming Languages (SNAPL 2017). Editors: Benjamin S. Lerner, Rastislav Bodík, and Shriram Krishnamurthi Leibniz International Proceedings in Informatics Schloss Dagstuhl Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany

8

9 Everest: Towards a Verified, Drop-in Replacement of HTTPS Karthikeyan Bhargavan 1, Barry Bond 2, Antoine Delignat-Lavaud 3, Cédric Fournet 4, Chris Hawblitzel 5, Cătălin Hriţcu 6, Samin Ishtiaq 7, Markulf Kohlweiss 8, Rustan Leino 9, Jay Lorch 10, Kenji Maillard 11, Jianyang Pang 12, Bryan Parno 13, Jonathan Protzenko 14, Tahina Ramananandro 15, Ashay Rane 16, Aseem Rastogi 17, Nikhil Swamy 18, Laure Thompson 19, Peng Wang 20, Santiago Zanella-Béguelin 21, and Jean-Karim Zinzindohoué 22 1 Inria, Paris, France 2 Microsoft Research, Redmond, WA, USA 3 Microsoft Research, Cambridge, UK 4 Microsoft Research, Cambridge, UK 5 Microsoft Research, Redmond, WA, USA 6 Inria, Paris, France 7 Microsoft Research, Cambridge, UK 8 Microsoft Research, Cambridge, UK 9 Microsoft Research, Redmond, WA, USA 10 Microsoft Research, Redmond, WA, USA 11 Inria, Paris, France 12 Inria, Paris, France 13 Microsoft Research, Redmond, WA, USA 14 Microsoft Research, Redmond, WA, USA 15 Microsoft Research, Redmond, WA, USA 16 Microsoft Research, Redmond, WA, USA 17 Microsoft Research, Bangalore, India 18 Microsoft Research, Redmond, WA, USA 19 Microsoft Research, Redmond, WA, USA 20 Microsoft Research, Cambridge, UK 21 Microsoft Research, Cambridge, UK 22 Inria, Paris, France Abstract The HTTPS ecosystem is the foundation on which Internet security is built. At the heart of this ecosystem is the Transport Layer Security (TLS) protocol, which in turn uses the X.509 publickey infrastructure and numerous cryptographic constructions and algorithms. Unfortunately, this ecosystem is extremely brittle, with headline-grabbing attacks and emergency patches many times a year. We describe our ongoing efforts in Everest, 1 a project that aims to build and deploy a verified version of TLS and other components of HTTPS, replacing the current infrastructure with proven, secure software. Aiming both at full verification and usability, we conduct high-level code-based, game-playing proofs of security on cryptographic implementations that yield efficient, deployable code, at the level of C and assembly. Concretely, we use F, a dependently typed language for programming, meta-programming, and proving at a high level, while relying on low-level DSLs embedded within 1 The Everest VERified End-to-end Secure Transport: https://project-everest.github.io. Karthikeyan Bhargavan, Barry Bond, Antoine Delignat-Lavaud, Cédric Fournet, Chris Hawblitzel, Cătălin Hriţcu, Samin Ishtiaq, Markulf Kohlweiss, Rustan Leino, Jay Lorch, Kenji Maillard, Jianyang Pang, Bryan Parno, Jonathan Protzenko, Tahina Ramananandro, Ashay Rane, Aseem Rastogi, Nikhil Swamy, Laure Thompson, Peng Wang, Santiago Zanella-Béguelin, and Jean-Karim Zinzindohoué; licensed under Creative Commons License CC-BY 2nd Summit on Advances in Programming Languages (SNAPL 2017). Editors: Benjamin S. Lerner, Rastislav Bodík, and Shriram Krishnamurthi; Article No. 1; pp. 1:1 1:12 Leibniz International Proceedings in Informatics Schloss Dagstuhl Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany

10 1:2 Everest: Towards a Verified, Drop-in Replacement of HTTPS F for programming low-level components when necessary for performance and, sometimes, sidechannel resistance. To compose the pieces, we compile all our code to source-like C and assembly, suitable for deployment and integration with existing code bases, as well as audit by independent security experts. Our main results so far include (1) the design of Low, a subset of F designed for C-like imperative programming but with high-level verification support, and KreMLin, a compiler that extracts Low programs to C; (2) an implementation of the TLS-1.3 record layer in Low, together with a proof of its concrete cryptographic security; (3) Vale, a new DSL for verified assembly language, and several optimized cryptographic primitives proven functionally correct and side-channel resistant. In an early deployment, all our verified software is integrated and deployed within libcurl, a widely used library of networking protocols ACM Subject Classification F.3.1 Specifying and Verifying and Reasoning about Programs Keywords and phrases Security, Cryptography, Verification, TLS Digital Object Identifier /LIPIcs.SNAPL Introduction The Internet s core security infrastructure is frighteningly brittle. As more and more services rely on encryption, security best practices urge developers to use standard, widely-used components like HTTPS and SSL (the latter is now standardized as TLS). As a result, the same pervasive components are used for securing communications on the Web and for VoIP, , VPNs, and the IoT. Unfortunately, these standard components are themselves often broken in many ways. Even before recent headline-grabbing attacks like HeartBleed 2, FREAK 3, and Logjam 4 entire papers [21, 10] were published just to summarize all of the academically interesting ways TLS implementations have been broken, without even getting into boring vulnerabilities like buffer overflows and other basic coding mistakes. This tide of flaws shows no signs of abating; in the year after those papers were published, 54 new CVE security advisories were published just for TLS. These flaws are frequently found and fixed in all of the widely used TLS implementations, as well as in the larger HTTPS ecosystem. They span a wide gamut including memory management mistakes, errors in protocol state machines, lax X.509 certificate parsing and validation, weak or badly implemented cryptographic algorithms, side channels, and even genuine design flaws 5 in the standards. Furthermore, because many TLS implementations expose truly terrible APIs, HTTPS applications built on them regularly make devastating mistakes [12]. These persistent problems have generated sufficient industry concern that both Google and the OpenBSD project are building separate forks of OpenSSL (BoringSSL and LibreSSL, respectively) while Amazon is developing a brand new implementation (s2n). Many corporations have even joined the multi-million-dollar Core Infrastructure Initiative to support additional testing and security auditing of open-source projects, starting with OpenSSL. 2 https://heartbleed.com/ 3 https://freakattack.com/ 4 https://weakdh.org/ 5 https://mitls.org/pages/attacks/3shake

11 K. Bhargavan et al. 1:3 Figure 1 Everest architecture. 1.1 A Need for Verified Deployments Now While the industry is taking incremental steps to try to stem the persistent tide of vulnerabilities, the programming-language community is uniquely positioned to definitively solve this problem. The science of software verification has progressed to a point where a large team of experts can reasonably expect to build a fully verified software stack, e.g., sel4 [16], Ironclad [14], and CertiKOS [13], with still more ambitious, broadly ranging efforts already underway (e.g., Yet, even when augmented with secure communication components like TLS, a fully verified stack would not meet today s pressing needs, since a wholesale replacement of the software stack is not in the offing. Improving the current software landscape requires both a dramatic shift in software development methodology and an incremental approach to the deployment of verified software. Everest is a new joint project between Microsoft Research and INRIA that aims to build verified software components and deploy them within the existing software ecosystem. Specifically, Everest develops new implementations of existing protocols and standards used for web communications. At a minimum, we prove our implementations functionally correct. Beyond functional correctness, we integrate cryptographic modeling and protocol analysis to prove, by reduction to computational assumptions on their core algorithms, that our implementations provide secure-channel abstractions between the communicating endpoints. Our verified code is implemented in F [25, 1], a dependently typed programming language and proof assistant, and in several DSLs embedded within F. After verification, in support of incremental deployment, our code is extracted by verified tools to C and assembly, and compiled further by off-the-shelf C compilers (e.g., gcc and clang, but also, at a performance cost, verified compilers like CompCert [18]) and composed with adapters that interface our verified code with existing software components, like the web browsers, servers and other communication software shown in the Figure 1. Being only as strong as its weakest component, software systems that include verified Everest code may not be impervious to attack; yet, any attack such a system suffers will be attributable to a flaw in a component outside Everest, while simple, critical systems may be within reach of full verification with a reasonable marginal effort. S N A P L

12 1:4 Everest: Towards a Verified, Drop-in Replacement of HTTPS F Low erase 1 st -order F Kremlin Clight C partial λow print hoist compile.c Exe GCC/CompCert/MSVC Figure 2 Low embedded in F, compiled to C, with soundness and security guarantees. 1.2 Structure of this Paper Everest is a work in progress we present an overview of the methodology we have used so far. Our main guiding principle is to provide low-level, efficient implementations of various protocol standards by extracting them from fresh code, programmed and verified at a high-level of abstraction. This principle applies best for relatively small and complex code, such as a secure protocol stack. To this end, in 2, we present Low, an embedded sub-language of F geared towards programming against a C-like memory model, while specifying and proving these programs within F s dependent type theory. Low programs are extracted to C by a new tool called KreMLin: although we have yet to verify the implementation of KreMLin, we have formally modeled its translation and proved on paper that it preserves the functionality of programs to the level of CompCert s Clight. We have also showed that compilation from Low to Clight does not introduce any side-channels due to memory access patterns. In 3, we sketch several examples of verified code and their specifications in Low, showing how we state and prove memory safety, functional correctness, and cryptographic security. In 4, we discuss a few strands of ongoing work. This includes the design of Vale, a new DSL for verified assembly language programming and its use in producing even lower level, efficient, functionally correct implementations of the AES and SHA256 algorithms whose performance is on par with OpenSSL, the mostly widely used implementation. We also discuss an early deployment of Everest software as a drop-in replacement for TLS in libcurl, and its use from within a command-line git client. Finally, 5 presents some parting thoughts, covering some opportunities and challenges. For more technical details, we refer the reader to three recent papers describing our verification of the TLS-1.3 record layer [6]; the design and implementation of Low and KreMLin [5]; and the design and implementation of Vale [9]. 2 Low : Verified Low-level Programming Embedded in F We aim to bridge the gap between high-level, safe-by-construction code, optimized for clarity and ease of verification, and low-level code exerting fine control over data representations and memory layout in order to achieve better performance. Towards this end, we introduce Low, a DSL for verified, efficient low-level programming, embedded within F, a verificationoriented, ML-like language with dependent types and SMT-based automation [11]. Figure 2 illustrates the high-level design of Low and its compilation to native code.

13 K. Bhargavan et al. 1:5 Libraries for low-level programming within F. At its core, F is a purely functional language to which effects like state are added programmatically using monads. We instantiate the state monad of F to use a CompCert-like structured memory model [18, 19] that separates the stack and the heap, supporting transient allocation on the stack, and allocating and freeing individual reference cells on the heap this is not the big array of bytes model systems programmers sometimes use. The heap is organized into disjoint logical regions, which enables stating separation properties necessary for modular, stateful verification. On top of this, we program a library of buffers C-style arrays passed by reference with support for pointer arithmetic and referring to only part of an array. By virtue of F typing, our libraries and all their well-typed clients are guaranteed to be memory safe, e.g., they never access out-of-bounds or deallocated memory. Designing Low, a subset of F easily compiled to C. We intend to give Low programmers precise control over the performance profile of the generated C code. As much as possible, we aim for the programmer to have control even over the syntactic structure of the target C code, to facilitate its review by security experts unfamiliar with F. As such, to a first approximation, Low programs are F programs well-typed in the state monad described above, which, after all their computationally irrelevant (ghost) parts have been erased, must satisfy several requirements. Specifically, the code (1) must be first order, to prevent the need to allocate closures in C; (2) must not perform any implicit allocations; (3) must not use recursive datatypes, since these would have to be compiled using additional indirections to C structs; and (4) must be monomorphic, since C does not support polymorphism directly. We emphasize that these restrictions apply only to computationally relevant code: proofs and specifications are free to use arbitrary higher-order, dependently typed F. A dual interpretation of Low, via compilation to OCaml or Clight. Low programs interoperate naturally with other F programs, and precise specifications of Low and F code are intermingled when proving properties of their combination. As usual in F, programs are type-checked and compiled to OCaml for execution, after erasing computationally irrelevant parts of a program, e.g., proofs and specifications, using a process similar to Coq s extraction mechanism [20]. Importantly, Low programs have a second, equivalent but more efficient semantics via compilation to C, with a predictable performance model including manual memory management this is implemented by KreMLin, a new compiler from the Low subset of F to C. Justifying its dual interpretation as a subset of F and a subset of C, we give a translation from Low, via two intermediate languages, to CompCert s Clight [8] and show that it preserves trace equivalence with respect to the original F semantics of the program. In addition to ensuring that the functional behavior of a program is preserved, our trace equivalence also guarantees that the compiler does not introduce unexpected side-channels due to memory access patterns, at least until it reaches Clight a useful sanity check for cryptographic code. KreMLin, a compiler from Low to C. Our formal model guides the implementation of KreMLin, a new tool that emits C code from Low. Although the implementation of KreMLin is not verified yet, we plan to verify its main translation phased based on our formal model in the near future. KreMLin is designed to emit well-formatted, idiomatic C code suitable for manual review and audit by independent security experts unfamiliar with our verification methodology. The resulting C programs can be compiled with CompCert S N A P L

14 1:6 Everest: Towards a Verified, Drop-in Replacement of HTTPS for greatest assurance, and with mainstream C compilers, including GCC and Clang, for greatest performance. We have used KreMLin to extract to C the 20,000+ lines of Low code we have written so far. Our formal results cover the translation of standalone Low programs to C, proving that execution in C preserves the original F semantics of the Low program. More pragmatically, we have built several cryptographic libraries in Low, compiled them to ABI-compatible C, allowing them to be used as drop-in replacements for existing libraries in C or any other language. The performance of our verified code after KreMLin extraction is comparable to existing (unverified) cryptographic libraries in C. 3 Proving Cryptographic Implementations in Low In this section, we sketch a few simple fragments of code from HACL, a high-assurance cryptographic library programmed and verified in Low and used in our verified implementation of the TLS-1.3 record layer. First, we illustrate functional correctness properties proven of an efficient implementation of the Poly1305 Message Authentication Code (MAC) algorithm [2]. Then, we discuss our model of game-based cryptography in F and its use in proving security of the main authenticated encryption construction used in TLS Functional Correctness of Poly1305 Arithmetic for the Poly1305 MAC algorithm is performed modulo the prime , i.e., the algorithm works in the finite field GF ( ). To specify modular arithmetic in this field in F, we make use of refinement types, as shown below. val p = 2^130 5 ( the prime order of the field ) type elem = n:nat {n < p} ( abstract field element ) let ( + ) (x y : elem) : elem = (x + y) % p ( field addition ) let ( ) (x y : elem) : elem = (x y) % p ( field multiplication ) This code uses F infinite-precision natural numbers (nat) to define the prime order p of the field and the type of field elements, elem, inhabited by natural numbers n smaller than p. It also defines two infix operators for addition and multiplication in the field in terms of arithmetic on nat. Their result is annotated with elem, to indicate that these operations return field elements. The F typechecker automatically checks that the result is in the field; it would report an error if, for example, we omitted the reduction modulo p. These operations are convenient to specify polynomial computations but are much too inefficient for deployable code. Instead, typical 32-bit implementations of Poly1305 represent field elements as mutable arrays of five unsigned 32-bit integers, each holding 26 bits. This representation evenly spreads out the bits across the integers, so that carries during arithmetic operations can be delayed. It also enables an efficient modulo operation for p. We show below an excerpt of the interface of our lower-level verified implementation, relying on the definitions above to specify their correctness. The type repr defines the representation of field elements as buffers (mutable arrays) of five 32-bit integers. It is marked as abstract, to protect the representation invariant from the rest of the code. 1 abstract type repr = buffer UInt32.t 5 ( 5-limb representation ) 2 val _.[_] : memory repr Ghost elem ( m.[r] is the value of r in m ) 3 val multiply: e 0:repr e 1:repr ST unit ( e 0 := e 0 e 1 ) 4 (requires (λ m e 0 m e 1 m disjoint e 0 e 1)) 5 (ensures (λ m 0 _ m 1 modifies {e 0} m 0 m 1 m 1.[e 0] = m 0.[e 0] m 0.[e 1]))

15 K. Bhargavan et al. 1:7 Table 1 Performance in CPU cycles: 64-bit HACL, 64-bit Sodium (pure C, no assembly), 64-bit OpenSSL (pure C, no assembly). All code was compiled using gcc -O3 optimized and run on a 64-bit Intel Xeon CPU E Results are averaged over 1000 measurements, each processing a random block of 2 14 bytes; Curve25519 was averaged over 1000 random key-pairs. Algorithm HACL Sodium OpenSSL ChaCha cy/b 6.97 cy/b 8.04 cy/b Salsa cy/b 8.44 cy/b N/A Poly cy/b 2.48 cy/b 2.16 cy/b Curve k cy/mul 162k cy/mul 359k cy/mul Functions are declared with a series of argument types (separated by ) ending with an effect and a return type (e.g., Ghost elem, ST unit, etc.). Functions may have logical pre- and post-conditions that refer to their arguments, their result, and their effects on the memory. If they access buffers, they typically have a pre-condition requiring their caller to prove that the buffers are live in the current memory. The term m.[r] selects the contents of a buffer r from a memory m; it is used in specifications only, as indicated by the Ghost effect annotation on the final arrow of its type on line 2. The multiply function is marked as ST, to indicate a stateful computation that may read, write and allocate state. In a computation type ST a (requires pre) (ensures post), a is the result type of the computation, pre is a predicate on the input state, and post is a relation between the input state, the result value, and the final state. ST computations are also guaranteed to not leak any memory. The specification of multiply requires that its arguments are live in the initial memory (m) and refer to non-overlapping regions of memory; it computes the product of its two arguments and overwrites e 0 with the result. Its post-condition states that the value of e 0 in the final memory is the field multiplication of its value in the initial memory with that of e 1, and that multiply does not modify any other existing memory location. Implementing and proving that multiply meets its mathematical specification involves hundreds of lines of source code, including a custom, verified Bignum library in Low [26]. Using this library, we implement poly1305_mac and prove it functionally correct. 1 val poly1305_mac: 2 tag:nbytes 16ul 3 len:u32 msg:nbytes len{disjoint tag msg} 4 key:nbytes 32ul{disjoint msg key disjoint tag key} ST unit 5 (requires (λ h msg h key h tag h)) 6 (ensures (λ h0 _ h1 let r = Spec.clamp h0.[sub key 0ul 16ul] in 7 let s = h0.[sub key 16ul 16ul] in 8 modifies {tag} h0 h1 9 h1.[tag] == Spec.mac_1305 (encode_bytes h0.[msg]) r s)) Its specification above states that the final value of the 16 byte tag (h1.[tag]) is the value of Spec.mac_1305, a polynomial of the message and the key encoded as field elements. Performance of HACL. Besides the Poly1305 MAC, HACL provides functionally correct, side-channel resistant implementations of the ChaCha20 [22] and Salsa20 [4] ciphers, and multiplication on the Curve25519 elliptic curve [3]. After verification, F types and specifications are erased during compilation and the compiled code only performs efficient low-level operations. Indeed, after extraction to C by KreMLin, our verified implementations are very slightly but consistenly faster than unverified C implementations of the same primitives in libsodium [24] and OpenSSL (Table 1). S N A P L

16 1:8 Everest: Towards a Verified, Drop-in Replacement of HTTPS 3.2 Game-Based Cryptography Going beyond functional correctness, we sketch how we use Low to do security proofs in the standard model of cryptography, using authenticated encryption with associated data (AEAD) as a sample construction. AEAD is the main protection mechanism for the TLS record layer; it secures most Internet traffic. AEAD has a generic security proof by reduction to two core functionalities: a stream cipher (such as ChaCha20) and a one-time-mac (such as Poly1305). The cryptographic, game-based argument supposes that these two algorithms meet their intended ideal functionalities, e.g., that the cipher is indistinguishable from a random function. Idealization is not perfect, but is supposed to hold against computationally limited adversaries, except with small probabilities, say ε ChaCha20 and ε Poly1305. The argument then shows that the AEAD construction also meets its own ideal functionality, except with probability say ɛ Chacha20 + ɛ Poly1305. To apply this security argument to our implementation of AEAD, we need to encode such assumptions. To this end, we supplement our real Low code with ideal F code. For example, ideal AEAD is programmed as follows: encryption generates a fresh random ciphertext, and it records it together with the encryption arguments in a log. decryption simply looks up an entry in the log that matches its arguments and returns the corresponding plaintext, or reports an error. These functions capture both confidentiality (ciphertexts do not depend on plaintexts) and integrity (decryption only succeeds on ciphertexts output by encryption). Their behaviors are precisely captured by typing, using pre- and post-conditions about the ghost log shared between them, and abstract types to protect plaintexts and keys. The abstract type of keys and the encryption function for idealizing AEAD is below: type entry = cipher data nonce plain abstract type key = { key: keybytes; log: if Flag.aead then ref (seq entry) else unit } let encrypt (k:key) (n:nonce) (p:plain) (a:data) = if Flag.aead then let c = random_bytes cipher_len in k.log := (c, a, n, p) :: k.log; c else encrypt k.key n p a A module Flag declares a set of abstract booleans (idealization flags) that precisely capture each cryptographic assumption. For every real functionality that we wish to idealize, we branch on the corresponding flag. This style of programming heavily relies on the normalization capabilities of F. At verification time, flags are kept abstract, so that we verify both the real and ideal versions of the code. At extraction time, we reveal these booleans to be false. The F normalizer then drops the then branch, and replaces the log field with (), meaning that both the high-level, list-manipulating code and corresponding type definitions are erased, leaving only low-level code from the else branch to be extracted. Using this technique, we verify by typing e.g. that our AEAD code, when using any ideal cipher and one-time MAC, perfectly implements ideal AEAD. We also rely on typing to verify that our code complies with the pre-conditions of the intermediate proof steps. As a consequence of our proof, we are forced to establish various intensional properties of our code, e.g., that our code does not reuse nonces (a common cryptographic pitfall); that it has no buffer overruns (a common pitfall of low-level programming); etc.

17 K. Bhargavan et al. 1:9 4 Ongoing Work 4.1 Verified Assembly Language and Safe Interoperability with C While Low and KreMLin provide reasonably efficient C-like implementations of cryptography, for the highest performance, cryptographic code often relies on complex hand-tuned assembly routines that are customized for individual hardware platforms. For this we have designed Vale, a new DSL that supports foundational, automated verification of highperformance assembly code. The Vale tool transforms annotated assembly programs into deeply embedded terms in a verification language, together with proofs that the terms meet their specifications. So far, we have used Dafny [17] as the embedding language for Vale and used this tool chain to verify the correctness and security of implementations of SHA-256 on x86 and ARM, Poly1305 on x64, and hardware-accelerated AES-CBC on x86. Several implementations meet or beat the performance of unverified, state-of-the-art cryptographic libraries. In ongoing work, we have begun to use F as the embedding language for Vale, and are mechanizing a formal model of interoperability between Low and Vale. By defining the deeply embedded semantics of Vale in F as a Low interpreter for assembly language terms, we aim to show that invocations from C to assembly can be accounted for within a single semantics for both DSLs. A key challenge here is to reconcile Low s CompCert-like structured memory model with Vale s big array of bytes view of memory. 4.2 An Early Deployment of Everest within libcurl Emphasizing the incremental deployment of our verified code, we have integrated our verified TLS record layer extracted from Low to C, as well as Vale implementations in assembly, within a new version of mitls [7], implemented in F, covering TLS-1.2 and TLS-1.3. Our eventual goal is for mitls to be implemented entirely in the Low subset of F and extracted solely to C, with functional correctness and security proofs. However, as of now, mitls is only partially verified (the handshake being the main, remaining unverified component) and extracts to OCaml. However, already, by virtue of basic type safety, our partially verified version of mitls provides safety against low-level attacks (e.g. HeartBleed) similar to other OCaml-based implementations of TLS [15]. Relying on OCaml s foreign function interface, we integrate mitls extracted to OCaml with the verified TLS record layer extracted to C-and-assembly. Dually, we provide C bindings for calling into a mitls layer which handles the socket abstraction, fragmenting buffers, etc. The result is a libmitls.dll, which we integrate within libcurl, a popular open-source library, used pervasively in many tools. This early integration allows us to use our verified software from within popular command line tools, like git, providing immediate benefits. 5 Parting Thoughts A significant novelty of our proposed work is that we aim to replace security-critical components of existing software with provably correct versions. Our careful choice of the problem domain is crucial: verified OS kernels and compilers can only succeed by replacing the software stack or development toolchain; verified, standardized, security protocols, and HTTPS and TLS in particular, can be deployed within the existing ecosystem, providing a large boost to software security at a small overall cost. S N A P L

18 1:10 Everest: Towards a Verified, Drop-in Replacement of HTTPS With the emergence of TLS 1.3, most TLS implementers will rewrite their code from scratch, and the world will be faced with migrating towards brand new implementations. History tells us that widespread adoption of a new version of TLS may take almost an entire decade. Given a similar time line for the adoption of TLS 1.3, if we distribute Everest within 2 4 years in a form where the cost of switching to it is negligible, then we are optimistic that it stands a chance of widespread adoption. Despite this once-in-a-decade opportunity, several challenges remain. How will verified code authored in advanced programming languages be maintained and evolved going forward? Distributing our code as well-documented, source-like C may help somewhat, but to evolve the code while maintaining verification guarantees will require continued support from the Everest team, as well as outreach and education. Will the software industry at large be able to appreciate the technical benefits of verified code? How can we empirically prove that verified software is better? One direction may be to deploy, standalone, small-tcb versions of Everest and to demonstrate it is resistant to practical attacks this raises the possibility of deployments of Everest within fully verified stacks [16, 14, 13] or sandboxes [23]. References 1 Danel Ahman, Cătălin Hriţcu, Kenji Maillard, Guido Martínez, Gordon Plotkin, Jonathan Protzenko, Aseem Rastogi, and Nikhil Swamy. Dijkstra monads for free. In 44th ACM SIGPLAN Symposium on Principles of Programming Languages (POPL), pages ACM, January URL: https://www.fstar-lang.org/papers/dm4free/, doi: / Daniel J. Bernstein. The Poly1305-AES message-authentication code. In International Workshop on Fast Software Encryption, pages Springer, Daniel J. Bernstein. Curve25519: new Diffie-Hellman speed records. In International Workshop on Public Key Cryptography, pages Springer, Daniel J Bernstein. The Salsa20 family of stream ciphers. In New stream cipher designs, pages Springer, Karthikeyan Bhargavan, Antoine Delignat-Lavaud, Cédric Fournet, Cătălin Hriţcu, Jonathan Protzenko, Tahina Ramanandro, Aseem Rastogi, Nikhil Swamy, Peng Wang, Santiago Zanella-Béguelin, and Jean Karim Zinzindohoué. Verified low-level programming embedded in F. Preprint, https://arxiv.org/abs/ Karthikeyan Bhargavan, Antoine Delignat-Lavaud, Cédric Fournet, Markulf Kohlweiss, Jianyang Pan, Jonathan Protzenko, Aseem Rastogi, Nikhil Swamy, Santiago Zanella- Béguelin, and Jean Karim Zinzindohoué. Implementing and proving the tls 1.3 record layer. Cryptology eprint Archive, Report 2016/1178, / Karthikeyan Bhargavan, Cédric Fournet, Markulf Kohlweiss, Alfredo Pironti, and Pierre- Yves Strub. Implementing TLS with verified cryptographic security. In 2013 IEEE Symposium on Security and Privacy, pages , Sandrine Blazy and Xavier Leroy. Mechanized semantics for the Clight subset of the C language. Journal of Automated Reasoning, 43(3): , Barry Bond, Chris Hawblitzel, Jay Lorch, Rustan Leino, Bryan Parno, Ashay Rane, and Laure Thompson. Verifying high-performance cryptographic assembly code. https:// project-everest.github.io/papers/, J. Clark and P. C. van Oorschot. SoK: SSL and HTTPS: Revisiting Past Challenges and Evaluating Certificate Trust Model Enhancements. In 2013 IEEE Symposium on Security and Privacy, pages , May doi: /sp

19 K. Bhargavan et al. 1:11 11 Leonardo Mendonça de Moura and Nikolaj Bjørner. Z3: an efficient SMT solver. In 14th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, TACAS, volume 4963 of Lecture Notes in Computer Science, pages Springer, URL: doi: / _ Martin Georgiev, Subodh Iyengar, Suman Jana, Rishita Anubhai, Dan Boneh, and Vitaly Shmatikov. The most dangerous code in the world: Validating ssl certificates in non-browser software. In Proceedings of the 2012 ACM Conference on Computer and Communications Security, CCS 12, pages 38 49, New York, NY, USA, ACM. URL: org/ / , doi: / Ronghui Gu, Zhong Shao, Hao Chen, Xiongnan Wu, Jieung Kim, Vilhelm Sjöberg, and David Costanzo. Certikos: An extensible architecture for building certified concurrent os kernels. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation, OSDI 16, pages , Berkeley, CA, USA, USENIX Association. URL: 14 Chris Hawblitzel, Jon Howell, Jacob R. Lorch, Arjun Narayan, Bryan Parno, Danfeng Zhang, and Brian Zill. Ironclad apps: End-to-end security via automated full-system verification. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation, OSDI 14, pages , Berkeley, CA, USA, USENIX Association. URL: 15 David Kaloper-Meršinjak, Hannes Mehnert, Anil Madhavapeddy, and Peter Sewell. Notquite-so-broken TLS: Lessons in re-engineering a security protocol specification and implementation. In 24th USENIX Security Symposium (USENIX Security 15), pages , Gerwin Klein, Kevin Elphinstone, Gernot Heiser, June Andronick, David Cock, Philip Derrin, Dhammika Elkaduwe, Kai Engelhardt, Rafal Kolanski, Michael Norrish, Thomas Sewell, Harvey Tuch, and Simon Winwood. sel4: Formal Verification of an OS Kernel. In Proceedings of the ACM SIGOPS 22Nd Symposium on Operating Systems Principles, SOSP 09, pages , New York, NY, USA, ACM. URL: / , doi: / K. Rustan M. Leino. Dafny: An automatic program verifier for functional correctness. In Proceedings of the 16th International Conference on Logic for Programming, Artificial Intelligence, and Reasoning, LPAR 10, pages , Berlin, Heidelberg, Springer- Verlag. URL: 18 Xavier Leroy. Formal verification of a realistic compiler. Communications of the ACM, 52(7): , Xavier Leroy, Andrew W. Appel, Sandrine Blazy, and Gordon Stewart. The CompCert memory model, version 2. Research report RR-7987, INRIA, June URL: http: //hal.inria.fr/hal Pierre Letouzey. A new extraction for Coq. In Types for proofs and programs, pages Springer, Christopher Meyer and Jörg Schwenk. Sok: Lessons learned from ssl/tls attacks. In Revised Selected Papers of the 14th International Workshop on Information Security Applications Volume 8267, WISA 2013, pages , New York, NY, USA, Springer-Verlag New York, Inc. URL: doi: / _ Yoav Nir and Adam Langley. ChaCha20 and Poly1305 for IETF protocols. IETF RFC 7539, Rohit Sinha, Manuel Costa, Akash Lal, Nuno P. Lopes, Sriram Rajamani, Sanjit A. Seshia, and Kapil Vaswani. A design and verification methodology for secure isolated regions. S N A P L

20 1:12 Everest: Towards a Verified, Drop-in Replacement of HTTPS In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 16, pages , New York, NY, USA, ACM. URL: doi: / The Sodium crypto library (libsodium). URL: https://www.gitbook.com/book/ jedisct1/libsodium/details. 25 Nikhil Swamy, Cătălin Hriţcu, Chantal Keller, Aseem Rastogi, Antoine Delignat-Lavaud, Simon Forest, Karthikeyan Bhargavan, Cédric Fournet, Pierre-Yves Strub, Markulf Kohlweiss, Jean-Karim Zinzindohoué, and Santiago Zanella-Béguelin. Dependent types and multi-monadic effects in F*. In 43rd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL), pages ACM, URL: https: //www.fstar-lang.org/papers/mumon/. 26 Jean Karim Zinzindohoué, Evmorfia-Iro Bartzia, and Karthikeyan Bhargavan. A verified extensible library of elliptic curves. In IEEE Computer Security Foundations Symposium (CSF), 2016.

21 Domain-Specific Symbolic Compilation Rastislav Bodik 1, Kartik Chandra 2, Phitchaya Mangpo Phothilimthana 3, and Nathaniel Yazdani 4 1 University of Washington, Seattle, WA, USA 2 Gunn High School, Palo Alto, CA, USA 3 University of California, Berkeley, CA, USA 4 University of Washington, Seattle, WA, USA Abstract A symbolic compiler translates a program to symbolic constraints, automatically reducing model checking and synthesis to constraint solving. We show that new applications of constraint solving require domain-specific encodings that yield orders of magnitude improvements in solver efficiency. Unfortunately, these encodings cannot be obtained with today s symbolic compilation. We introduce symbolic languages that encapsulate domain-specific encodings under abstractions that behave as their non-symbolic counterparts: client code using the abstractions can be tested and debugged on concrete inputs. When client code is symbolically compiled, the resulting constraints use domain-specific encodings. We demonstrate the idea on the first fully symbolic checker of type systems; a program partitioner; and a parallelizer of tree computations. In each of these case studies, symbolic languages improved on classical symbolic compilers by orders of magnitude ACM Subject Classification D.2.2 [Software Engineering] Design Tools and Techniques, D.3.3 [Programming Languages] Language Constructs and Features Keywords and phrases Symbolic evaluation, constraint solvers, program synthesis Digital Object Identifier /LIPIcs.SNAPL Introduction A symbolic compiler translates a program p to constraints that model its behavior [3, 18, 20]. The unknowns in the constraints typically represent the symbolic inputs to p, and the solution to the constraints is an input that induces a particular program behavior. For example, with a symbolic compiler and a solver, by just writing program p, we obtain a program checker producing an input to p that leads to an assertion failure for free. The applications of symbolic compilation become even more interesting when the input to the program p is itself a program: If the program p is an interpreter, then constraint solving can find a program that forces the interpreter into a violation due to unsoundness in its type system. In Section 3, we explore finding such witnesses to unsoundness by symbolically compiling interpreters. If p is a type checker, then constraint solving performs type inference. In Section 4, we partition a program onto a many-core processor. We model this program transformation with a hardware-specific place type system, relying on symbolic compilation of type checkers to produce constraints for type inference. Rastislav Bodik, Kartik Chandra, Phitchaya Mangpo Phothilimthana, and Nathaniel Yazdani; licensed under Creative Commons License CC-BY 2nd Summit on Advances in Programming Languages (SNAPL 2017). Editors: Benjamin S. Lerner, Rastislav Bodík, and Shriram Krishnamurthi; Article No. 2; pp. 2:1 2:17 Leibniz International Proceedings in Informatics Schloss Dagstuhl Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany

22 2:2 Domain-Specific Symbolic Compilation If p is an execution runtime for parallel programs, then constraint solving finds a parallel execution strategy, effectively parallelizing p for us. In Section 5, we synthesize parallel evaluators for attribute grammars by modeling the strategies as schedules and symbolically compiling interpreters of such schedules. More formally, assume we have access to a solver sol that accepts a constraint φ and returns a solution, i.e., a value x such that φ(x) holds. If no such x exists, then sol returns. A symbolic compiler sym translates a program p into a logical formula φ that models the input-output semantics of p. It is convenient to think of a symbolic compiler as an execution inverter: sym accepts p and an output value y and produces a formula φ over the program input variable x such that the solution x to φ makes the program output y. Model checking and program synthesis are two common applications of symbolic compilation. In bounded model checking, we want to compute a program input that leads to a failure. First, we modify the program so that failed assertions exit the program, returning a special value fail. The call sol(sym(p, fail)) produces the failing input if one exists. In inductive synthesis, we have a sketch program def sk(x, h) = e where e uses an (unknown) function h. We want to find a function f such that substituting f for h gives sk the desired behavior. The behavior is often given with an input-output pair of values (x 0, y 0 ), i.e., we want sk(x 0, f) to evaluate to y 0. In many settings, y 0 is simply success or fail. Symbolic compilation produces such a function f with the call sol(sym(sk(x 0 ), y 0 )). The notation sk(x 0 ) is a partially applied function sk, i.e., a function of h. Note that to produce a function, the solver need not be second-order; the function f can be represented as a list of constants that define a derivation of the syntax of f from a suitable grammar. Revisiting the three case studies clarifies the task of symbolic compilation: Checking soundness of type systems. We want to check the type system of an interpreter int that is composed of a type checker and an evaluator. Assume that the interpreter outputs fail when a program passes the type checker but fails in the evaluator. The call sol(sym(int, fail)) then finds a program that is deemed type-safe but encounters a run-time violation. (We assume the interpreted programs have no parameters.) The benefit of using symbolic compilation is that one needs just an implementation of the interpreter. There is no need for fuzzers or tools that translate language semantics to constraints. Program partitioning. A spatial type system maps variables and operations to CPU cores, partitioning the program. If the typechecker has two parameters, a program r and the values of r s type variables, then the call sol(sym(typechecker(r), success) produces the type assignment to the program that partitions the program satisfying all program and hardware constraints checked by the type checker. Symbolic evaluation produces type constraints where unknowns are the type variables. Solving performs type inference. Formulating type inference as constraint solving is nothing new, of course. Our goal is to make this idea easier to apply by automatically obtaining high-performance type inference from a type checker. Parallelizing tree computations. We want to efficiently compute the attributes of a tree t defined by an attribute grammar G. The parallel tree evaluator may need to perform multiple tree traversals, some bottom-up, some top-down, some in-order, subject to value dependences in G. The evaluation strategy can be described with a schedule of traversals. The schedule language is defined with an interpreter that reads the grammar, the tree, and the schedule. Symbolic evaluation of the interpreter can give us a legal schedule for free with the call sol(sym(int(g, t), success). Constraint solving sidesteps the error-prone process of analyzing the grammar and operationally constructing a valid schedule.

23 R. Bodik, K. Chandra, P. M. Phothilimthana, and N. Yazdani 2:3 2 Architectures for Symbolic Compilation We discuss three approaches for generating constraints. We first compare two existing architectures constraint generators and general-purpose specializing symbolic compilers and then introduce domain-specific symbolic compilers. We describe the architectures by composing interpreters, compilers, and specializers. Borrowing the notation from [4], we summarize here their definitions: interpreter int : int (p, x) = p x compiler comp : comp p x = p x specializer s : s (p, x s ) x d = p (x s, x d ) symbolic compiler sym : p ( sol ( sym (p, y))) = y 2.1 Constraint generators A generator gen reads a problem instance and produces constraints whose solution solves the problem instance. As our running example, consider the synthesis of schedules for parallel tree evaluation that we introduced above. The problem instance is an attribute grammar G and the call sol(gen(g)) produces the schedule for G. 1 Notice that a constraint generator gen is not asked to invert a program, unlike the symbolic compiler sym. This frees the author of the generator to employ a clever problemspecific encoding. For example, Gulwani et al. phrased synthesis of loop-free programs as constraint-based synthesis of a network that connects available instructions [6]. Kuchcinski solved scheduling and resource assignment by modeling a system as a set of finite-domain variables [10]. Hamza et al. synthesized linear-time programs by converting an automaton recognizing the input/output relation [7]. On the other hand, since the generator receives only a problem instance but not the program to be inverted, the semantics of generated constraints must entirely come from the author of the generator. Consider again the synthesis of schedules: sym received the schedule interpreter, which it can use to automatically produce constraints that encode schedules. In contrast, the semantics of schedules must be hard-coded into gen by the programmer. Our running example shows why implementing generators is challenging. The programmer needs to wrangle the semantics of three languages the language of attribute grammars AG, the language of schedules Sch, and the constraints language Φ reasoning across a four-step indirection: 1. The programmer reads the specifications of the three languages and writes a constraint generator gen. 2. The generator gen reads the grammar G and outputs a constraint φ. 3. The solver sol reads φ and produces a solution σ. 4. The solution σ indirectly encodes a schedule s Sch. 5. The schedule s evaluates a tree according to the input grammar G AG. The programmer must ensure that the generator written in Step 1 produces a schedule that in Step 4 correctly encodes, say, in-order traversals and in Step 5 evaluates the tree in accordance with the attribute grammar semantics. This reasoning may explain why in 1 The solution to constraints must be typically converted back to the problem domain. For example, if using SAT constraints, a Boolean vector that solves the SAT problem is translated to a program in the scheduling language. This code-generation problem is important but we ignore its automation in this paper. S N A P L

24 2:4 Domain-Specific Symbolic Compilation our previous work on synthesizing schedules, we failed to fully debug our generator once the schedule language became moderately sophisticated. 2.2 General-Purpose Specializing Symbolic Compilers This is the architecture of Sketch [17] and to a large extent Rosette [20]. The architecture has three components and relies heavily on specialization: 1. int : (D D) L D D, an interpreter implemented in a metaprogramming language L m. The interpreter implements the language L of the input program p. It accepts the program p : (D D) and p s input, producing p s output. 2. s : (D D D) Lm D (D D) Ls, a specializer of programs in L m producing programs in L s. In particular, s will specialize int with respect to p, producing a residual program int p. 3. xlate : (D D) Ls D Φ translates a symbolic program to the language of constraints Φ. xlate also receives the output value y D and produces a formula φ(x) that is satisfied iff p(x) outputs y. The symbolic compiler is thus sym(p, y) xlate(s(int, p), y). In Rosette, L m is a subset of Racket maintaining many metaprogramming facilities of Racket; L s is the symbolic expression language; and C can be one of several subsets of SMT languages, such as the bitvector language. Note that s and xlate are part of the framework, while int is developed by the user. The core of symbolic evaluation happens in the specializer which explores all paths of the program, producing a functional residual program that reflects the shape of the final constraints. The translator xlate performs algebraic optimizations followed by local code generation from the residual program to the constraint language. The downside of this architecture is that symbolic compilation must typically follow the forward symbolic execution that merges constraints under their path conditions [9]. This algorithm may suffer from path explosion and does not lend itself to constraints other than SAT or SMT. Thus, integer linear programming (ILP) constraints often domain-specific and highly efficient are often the constraints of choice produced by constraint generators. 2.3 Domain-specific symbolic compilers We modify the specializing architecture by introducing a new abstraction for implementing the interpreter of L. This interpreter, int Ld, now has two parts: 1. int L, an interpreter of L implemented in the domain-specific symbolic language L d. 2. int d, an interpreter of L d implemented in a metaprogramming language L m. The symbolic compiler pipeline is now xlate(s(int Ld, p), y)). Ideally, the interpreter int d meets two informally stated properties: (1) symbolic evaluation of int d produces easier-to-solve constraints; and (2) symbolic evaluation of int d is faster than that of int, for example because L d reduces path explosion. This paper shows that domain-specific symbolic compilers can be implemented as a library on top of a classical symbolic compiler such as Rosette [19]. The library provides a symbolic language that implements the domain-specific encoding while hiding the encoding from both the programmer and the underlying symbolic compiler. In Section 3, we check type systems for soundness errors. We introduce a Bonsai tree that serves as the symbolic input into a type checker and an interpreter, which are implemented on top of the Bonsai library. Symbolic evaluation of the two components produces constraints

25 R. Bodik, K. Chandra, P. M. Phothilimthana, and N. Yazdani 2:5 that encode a space of abstract syntax trees (ASTs). The solution is a witness: a tree that succeeds in the type checker but fails in the interpreter. The Bonsai tree has the usual interface but internally produces a special encoding that allows symbolically evaluating the interpreter on trees that are not necessarily syntactically correct or type correct. This is key to finding witnesses, for the first time, without enumerating or sampling the program space, allowing us to compute the witness for a tricky soundness bug [1]. In Section 4, we partition a program onto a many-core processor. Mapping of program operations to cores is modeled with a place type system ensuring that each code fragment fits into its core. Partitioning is thus type inference. To infer types, we symbolically evaluate the type checker with respect to a program whose type variables are symbolic. We design a symbolic language for querying properties of the symbolic location of a computation. Under this abstraction, we switch from the standard SMT encoding to an ILP encoding. The resulting ILP encoding solves previously inaccessible partitioning problems. Finally, in Section 5, we synthesize parallel tree programs as used in page layout and data visualizations. The programs are formalized as schedules for evaluation of attribute grammars. We design a symbolic trace language, an abstraction for writing interpreters of such schedules. Under this abstraction, we can (1) sidestep the expensive symbolic state that is maintained by the standard symbolic evaluator and (2) switch from ensuring that all dependences are met to ensuring that all anti-dependences are avoided. 3 Checking Type Systems with Bonsai Trees Model checking of type systems. Bonsai uses model checking to search for soundness errors in type systems. The user provides a typechecker and an interpreter for their language, and Bonsai searches for a counterexample program that passes the typechecker while causing the interpreter to crash. If such a counterexample can be found, then it is evidence of a soundness bug in the type system. Furthermore, such a counterexample provides helpful feedback for the user to understand and fix the bug. On the other hand, if no counterexample can be found, the user has some assurance that the typechecker is sound. The most common existing typechecker-checking technique is fuzzing. A fuzzer generates random terms and uses them to test a typechecker and interpreter. Fuzzers may sample from the space of syntactically-correct terms or the space of well-typed terms; however, in both cases, the probability of generating a counterexample by chance is extremely low. Thus, fuzzers often need hours or days of guessing to find even simple type checker bugs (an example of a simple bug is assigning cons the return type a instead of Listof a). Bonsai can be regarded as a final successor to typechecker fuzzing: rather than randomly sampling from the space of syntactically-correct or well-typed terms, Bonsai symbolically compiles an executable language specification to constraints, and then utilizes the backward reasoning of a constraint solver to sample directly from the space of counterexamples. This makes Bonsai much more efficient than traditional fuzzers: Bonsai finds the above bug in just 1.3 seconds compared to hours or days needed by fuzzers. Bonsai consists of the algorithm shown in Figure 1. First, Bonsai initializes a symbolic representation of A, the set of all trees up to some maximum size m. Next, it computes symbolic representations of the subsets of A that (a) are syntactically valid; (b) pass the typechecker; (c) fail in the interpreter. Finally, it asks the solver to find a tree in the intersection of these three sets. If such a tree exists, it represents a counterexample. S N A P L

26 2:6 Domain-Specific Symbolic Compilation rejected good programs Parser syntactically correct programs (assert (parser p)) symbolic tree represents all programs up to depth d Type Checker well-typed programs (assert (typechecker p)) (define p (make-symbolic-program!)) Interpreter programs failing in interpreter (assert-not (interpreter p)) counterexamples (define cex (solve)) (display cex) Figure 1 Bonsai performs three independent symbolic evaluations, interestingly executing the interpreter also on trees that are both syntactically and type-incorrect. join x x abs y x λy. x app x y x y φ 1 φ 2 φ 3 abs app x y share subtrees Figure 2 The classical symbolic representation of a set of trees grows very quickly even when small trees are merged, even if their subtrees are shared. x λ y x x y internal node (unlabeled) leaf node (labeled with terminals) x λ y x x y internal node leaf node unused node x λy. x x y (a) Bonsai trees for x, λy.x, and x(y). G(x) G(λy. x) G( x y ) (b) Embedding (a) in perfect binary trees. n 1 n 2 n 5 n 3 n 4 n 6 n 7 G(λy. x) internal(n 1 ) leaf(n 2, λ) internal(n 5 ) leaf(n 7, x) leaf(n 6, y) (c) Symbolic encoding of the tree for λy.x. G(if(φ 1 ) x elseif(φ 2 ) λy. x elseif(φ 3 ) (y x)): φ 2 leaf n 2, λ φ 3 leaf(n 2, x) φ 1 leaf(n 1, x) φ 3 leaf(n 5, y) φ 3 leaf(n 7, x) φ 2 leaf(n 6, y) (d) Merging three trees under path conditions φ i. Figure 3 A stepwise overview of the Bonsai tree encoding.

27 R. Bodik, K. Chandra, P. M. Phothilimthana, and N. Yazdani 2:7 General-purpose symbolic evaluation. To perform symbolic evaluation of a type checker, we need to create a symbolic abstract syntax tree that represents A, the set of concrete abstract syntax trees. The standard approach would produce trees such as those shown in Figure 2, where sets of trees are merged by creating symbolic choices to select among potential children at each node. This merged symbolic tree could then be supplied to an existing type checker and interpreter by symbolically evaluating them on the tree. Though this classical approach would work in principle, it fails to scale to trees that are deep enough to explore large programs. Furthermore, each operation on such a symbolic tree causes its representation to grow even more complex, and the large data structures prevent scalable symbolic execution. Thus, the challenge here is to optimize the speed of symbolic compilation, rather than the speed of solving. Domain-specific symbolic evaluation. Bonsai solves this problem by creating a new encoding for sets of trees that limits growth by efficiently merging trees within the set. This Bonsai tree is compatible with a standard symbolic evaluator, and language engineers can use this symbolic tree nearly as if it were a concrete tree. Figure 3 gives a stepwise explanation of the Bonsai symbolic tree, starting from concrete Bonsai trees for three program terms (a). These concrete trees are embedded in a perfect binary tree (b). The embedding is represented with two predicates for each node: the first determines whether the node is internal or a leaf; the second determines the terminal for leaves (c). By allowing the predicates to be symbolic expressions, a single tree can represent multiple Bonsai trees. In (d), we show how symbolic Bonsai trees arise; here we merge three trees at an if-statement. Despite having a different underlying representation, Bonsai trees can be easily manipulated by programmers, just as if they were concrete trees. Bonsai provides utilities for creating, modifying, and pattern-matching with symbolic trees, allowing programmers to implement typecheckers and interpreters without having to focus on the details of symbolic execution. Evaluation. Figure 4a shows an empirical comparison between the classical and Bonsai encodings. Symbolic terms of various sizes were executed with identical typecheckers and interpreters, varying only the underlying encoding. Bonsai s encoding was consistently several orders of magnitude faster. In under an hour, Bonsai explores programs much larger than counterexamples created by human experts who report soundness bugs, thus providing users with a margin of assurance. Bonsai has reproduced many soundness bugs in a variety of languages, notably including (1) unsound argument covariance in a model of Java, and (2) a subtle issue with Scala s existential types, discovered in 2016 by Nada Amin and Ross Tate [1]. Slight modifications to the algorithm also allow users to ask intriguing new questions that fuzzers cannot easily answer, such as On what programs do typecheckers t 1 and t 2 disagree? or Does my typechecker reject programs that don t fail? Finally, by making the typechecker symbolic, Bonsai can synthesize suggestions for how to fix an unsound type system. 4 Program Partitioning The Code Partitioning Problem. Compilers for fine-grain many-core architectures must partition the program into tiny code fragments mapped onto physical cores. Chlorophyll [15, 16] is a language for GA144, an ultra-low-power processor with 144 tiny cores [5]. The Chlorophyll type system ensures that no fragment overflows the 64-word capacity of its core. S N A P L

28 2:8 Domain-Specific Symbolic Compilation Comparison of Classical and Bonsai encodings Bonsai symbolic execution Classical symbolic execution Bonsai solving Classical solving SMT (Z3) SMT-sim (Z3) 1000 ILP (Z3) ILP (CPLEX) Time taken (ms) Solving Time (s) # syntactically-valid trees 1 Conv Prefixsum FIR SSD (a) Bonsai tree (b) Program partitioning (c) Tree scheduling Figure 4 Experimental evaluations. Listing 1 Original type checker, ensuring that code fragments fit into cores. 1 ( define cores - space ( make - vector n- cores 0)) ; space used up on each core 2 ( define (inc - space p size ) 3 ( vector - set! cores - space p (+ ( vector - ref cores - space p) size ))) 4 5 ; Increase code size whenever core p sends a value to core r. 6 ( define ( comm p r) ( when ( not (= p r)) ( begin (inc - space p 2) (inc - space r 2)))) 7 8 ; Increase code size for broadcast communication from p to ps. ps may contain duplicates. 9 ( define ( broadcast p ps) 10 ( define remote - ps ( length ( remove p ( unique ps )))) ;# of unique cores in ps excluding p 11 (inc - space p (* 2 remote - ps )) ; space used in the sender core 12 ( for ([ r ps ]) (inc - space r 2))) ; space used in the receiver cores ( define ( count - space node ) ; Count space needed by an AST node. 15 ( cond 16 [( var? node ) (inc - space ( place - type node ) 1)] 17 [( binexpr? node ) ; The inputs to this operation come from binexpr - e1 and binexpr - e2. 18 ( define p ( place - type node )) 19 (inc - space p ( size node )) ; space taken by the operation 20 ( comm ( place - type ( binexpr - e1 node )) p) ; Add space for communication code when 21 ( comm ( place - type ( binexpr - e2 node )) p)] ; operands come from other cores. 22 [( if? node ) 23 ; If is replicated on all cores that run any part of if s body. 24 ; We omit inc - space here. 25 ; The condition result is broadcast to all cores used in if s body. 26 ( broadcast ( place - type (if - test node )) 27 ( append (all - cores (if - then node )) (all - cores (if - else node ))))] 28...)) ( tree - map count - space ast ) 31 ( for ([ space cores - space ]) ( assert (< space core - capacity ))) 32 ( minimize ( apply + cores - space )) ; used during inference only Each variable and operation have a place type whose value is a core ID. The type checker in Listing 1 computes the code size of each fragment. The tree-map function traverses a program AST in the post-order fashion and applies the function count-space on each node in the AST (line 28). The checker accumulates the space taken by each node (e.g. a binexpr node on line 18), and space occupied by communication code, for both one-to-one communication (e.g. sending operand values to an operator on lines 19 20) and broadcast communication (e.g. sending a condition result to all nodes in the body of if on lines 24 25). Automatic Program Partitioning as Type Inference. When a program omits some place types, the compiler infers them, effectively partitioning the program. Chlorophyll implements the type inference using Rosette [19, 20], which symbolically evaluates the type checker in Listing 1 with respect to the program. The type checker needs no changes; we only need to

29 R. Bodik, K. Chandra, P. M. Phothilimthana, and N. Yazdani 2:9 Listing 2 Type checker in resource language, producing ILP constraints. 1 ( define n ( make - parameter #f)) ; a parameter procedure for dynamic binding 2 ( define ( comm p r) (inc - space (n) (* 2 (+ ( different? p r (n)) ( different? r p (n )))))) 3 ( define ( broadcast p ps) 4 (inc - space (n) (* 2 (+ ( count - different p ps (n)) ;; space used in the sender core 5 ( different? ps p (n)) )))) ;; space used in the receiver cores 6 7 ;; the function count - space is changed in one place ( see text ); inc - space is unchanged 8 9 ( for ([i n- cores ]) ( parameterize ([n i]) (tree - map count - space ast ))) 10 ( for ([ space cores - space ]) ( assert (< space core - capacity ))) 11 ( minimize ( apply + cores - space )) a p$a = a p$a + p$+ b p$b (a) Example program AST. Each node is annotated with its place type below. The yellow nodes are the ones that have been interpreted. Figure 5 Running example of program partitioning > cores-space #(;; core 0 (+ (ite (= p$a 0) 1 0) ;; (inc-space p$a 1) > cores-space #(;; core 0 (+ (* 1 Mpn(p$a,0)) ;; (inc-space p$a 1) [def] (ite (= p$a 0) 1 0) (inc ;; (inc-space - p$a1) 1) ;; Line (* 1 Mpn(p$a,0)) 15, node ;; (inc-space = a [ def p$a] 1) [use] (ite (= p$b 0) 1 0) ;; (inc-space p$b 1) (* 1 Mpn(p$b,0)) ;; (inc-space p$b 1) (inc - space p$a 1) ;; Line 15, node = a [ use ] (ite (= p$+ 0) 1 0) ;; (inc-space p$+ 1) (* 1 Mpn(p$+,0)) ;; (inc-space p$+ 1) (ite (and (or (= p$+ (inc 0) (= - space p$a 0)) p$b 1) ;; Line (* 2 Remote_prn(p$+,p$a,0)) 15, node = b ;; (comm p$a p$+) (! (= p$+ (inc p$a))) - space p$+ 1) ;; (* Line 2 Remote_prn(p$a,p$+,0)) 18, node = + 2 0) ;; (comm p$a ( comm p$+) p$a p$ +) ;; (* Line 2 Remote_prn(p$+,p$b,0)) 19, comm a ->;; + (comm p$b p$+) (ite (and (or (= p$+ 0) (= p$b 0)) (* 2 Remote_prn(p$b,p$+,0))) (! (= p$+ ( comm p$b))) p$b p$ +) ;; Line 20, comm b -> + 2 0)) ;; (comm p$b p$+) ;; core 1 (+...)) ;; core 1 (b) Residual type checking program after travers- (+...)) > (asserts) ;; global assertions ing the yellow nodes in the AST on the left (postorder). The line numbers in the comments indicate where the expressions come from from Listing 1. ((and (<= 0 Mpn(p$a,0)) (>= 1 Mpn(p$a,0))) ;; from a (and (<= 0 Mpn(p$a,1)) (>= 1 Mpn(p$a,1))) (= 1 (+ Mpn(p$a,0) Mpn(p$a,1)))... (<= 0 Remote_prn(p$+,p$a,0)) ;; from (comm p$a p$+) (>= 1 Remote_prn(p$+,p$a,0)) (>= Remote_prn(p$+,p$a,0) (- Mpn(p$+,0) Mpn(p$a,0)))... ) initialize the (unknown) Residual Program place types in the program to symbolic values (i.e., p$0, p$1,...). (inc-space p$a 1) ;; Line 17, node = a [def] Figure 5a shows (inc-space an example p$a 1) ;; Line of a17, program node = a [use] AST with unknown places. Each node in the AST (inc-space p$b 1) ;; Line 17, node = b is annotated with (inc-space its symbolic p$+ 1) ;; place Line 17, type. node = + Figure 5b shows the conceptual partially-evaluated (comm p$a p$+) ;; Line 18 (comm p$b p$+) ;; Line 19 type checker after checking the yellow nodes in the example AST; concrete expressions are fully evaluated, and the expressions with symbolic variables remain. After we symbolically evaluate the residual type checker in Figure 5b, we obtain cores-space shown in Figure 6a. Rosette then uses Z3 to solve the generated SMT constraints on cores-space (line 29 of Listing 1) and minimize the total code size (line 30 of Listing 1). Hence, we obtain our type inference just by implementing a type checker. The development process requires little effort, but the type inference is slow at inferring place types. Symbolic Evaluation to ILP Constraints. It is known that partitioning and scheduling problems can be solved efficiently using ILP [21, 14, 13, 8]. However, if we follow the standard way of generating ILP constraints, we will not be able to simply turn type checking into type inference. Here, we turn to our key idea and introduce a symbolic language that will generate ILP constraints. The programmer implements the type checker as before but in our resource language. The programmer is prohibited from writing programs with symbolic path conditions because these path conditions create non-linear constraints. If a program contains a symbolic path condition, the compiler will raise an exception. Resource language is embedded in Rosette. It provides additional operations: mapped-to?, different?, and count-different, as described in Table 1. We make four minimal changes to our original type checker, shown in Listing 2. First, we traverse the AST once for every core (line 9). Each iteration i is responsible for accumulating space used in core i. Second, in the function count-space, we change the expression to increase the size of core p by the size of the operation of node from (inc-space p (size node)) to (inc-space (n) (* (size node) (mapped-to? p (n)))). The previous call produces a non- S N A P L

30 2:10 Domain-Specific Symbolic Compilation > cores - space #( ;; core 0 (+ ( ite (= p$a 0) 1 0) ;; a [ def ] ( ite (= p$a 0) 1 0) ;; a [ use ] ( ite (= p$b 0) 1 0) ;; b ( ite (= p$+ 0) 1 0) ;; + ( ite ( and (or (= p$+ 0) (= p$a 0)) (! (= p$+ p$a ))) 2 0) ;; a -> + ( ite ( and (or (= p$+ 0) (= p$b 0)) (! (= p$+ p$b ))) 2 0)) ;; b -> + ) ;; core 1 (+...) (a) Original symbolic expression generated from the original type checker (Listing 1) > cores - space #( ;; core 0 (+ (* 1 Mpn (p$a,0)) ;; a [ def ] (* 1 Mpn (p$a,0)) ;; a [ use ] (* 1 Mpn (p$b,0)) ;; b (* 1 Mpn (p$ +,0)) ;; + (* 2 Remote_prn ( p$+,p$a,0)) ;; a -> + (* 2 Remote_prn (p$a,p$ +,0)) (* 2 Remote_prn ( p$+,p$b,0)) ;; b -> + (* 2 Remote_prn (p$b,p$ +,0))) ;; core 1 (+...)) > ( asserts ) ;; global assertions (( and ( <= 0 Mpn (p$a,0)) ( >= 1 Mpn (p$a,0))) ( and ( <= 0 Mpn (p$a,1)) ( >= 1 Mpn (p$a,1))) (= 1 (+ Mpn (p$a,0) Mpn (p$a,1))) ;; a... ( <= 0 Remote_prn ( p$+,p$a,0)) ;; a -> + ( >= 1 Remote_prn ( p$+,p$a,0)) ( >= Remote_prn ( p$+,p$a,0) (- Mpn ( p$+,0) Mpn (p$a,0)))... ) (b) ILP symbolic expression generated from the modified type checker (Listing 2) Figure 6 Symbolic expression of space occupied in each core after running a type checker on the yellow nodes in the example AST (Figure 5a). Table 1 Description of resource language operations. Sym/conc stands for symbolic or concrete. Function Type Description (mapped-to? p n) p: sym/conc integer returns 1 if place p is core n n: concrete integer (i.e. p = n), return: sym/conc integer otherwise returns 0 (different? p r n) p: sym/conc integer returns 1 if places p and r r: sym/conc integer are different, and place p is n: concrete integer core n (i.e. (p r) (p = n)), return: sym/conc integer otherwise returns 0 (different? ps r n) ps: list of sym/conc integers returns 1 if there is at least r: sym/conc integer one place p in ps such that n: concrete integer (p r) (p = n), return: sym/conc integer otherwise returns 0 (count-different p rs n) p: sym/conc integer returns a number of unique rs: list of sym/conc integers places in rs that differ from p n: concrete integer if place p is core n, return: sym/conc integer otherwise returns 0 linear equation because the first argument p, which is symbolic, to inc-space is used as a path condition. Third, we avoid symbolic path conditions inside the function comm by using (different? p r (n)) to compute the size of code for sending data at core (n), and similarly for receiving data. Last, in the function broadcast, we utilize count-different to compute space taken by code for broadcasting a value to a set of cores. Implementation. Table 2 details the implementation of the additional operations provided by our symbolic language. Under the abstraction, (mapped-to? p n) creates symbolic variables M pn (p, n ) for all n N where N is a set of values that p can take and returns M pn (p, n);

31 R. Bodik, K. Chandra, P. M. Phothilimthana, and N. Yazdani 2:11 Table 2 Implementation of resource language operations. sum and offset are temporary variables. Function return Created Variables Additional Assertions (mapped-to? p n) n N, M pn(p, n ) n N, 0 M pn(p, n ) 1 M pn(p, n) n N n ) = 1 (different? p r n) Remote prn(p, r, n) 0 Remote prn(p,r,n) 1 Remote prn(p, r, n) Remote prn(p,r,n) M pn(p, n) M pn(r, n) (different? p rs n) Remote prsn(p, rs, n) 0 Remote prsn(p,rs,n) 1 Remote prsn(p, rs, n) r rs, Remote prsn(p, rs, n) Remote prn(p, r, n) (count-different p rs n) Count prsn(p, rs, n) n N, 0 M rsn(n) 1 Count prsn(p, rs, n) n N, M rsn(n ) n N, r rs, M rsn(n) M pn(r, n) sum = n {N {n}} M rsn(n ) offset = (M pn(p, n) 1) MAX INT Count prsn(p, rs, n) 0 Count prsn(p, rs, n) sum + offset M pn (p, n) = 1 if p = n, and M pn (p, n) = 0 otherwise. Since p can be mapped to only one value, the function adds the constraint n N M pn(p, n ) = 1 into the global list of assertions. (different? p r n) creates and returns a variable Remote prn (p, r, n), as well as adds Remote prn(p,r,n) M pn (p, n) M pn (r, n) and 0 Remote prn(p,r,n) 1 to the global list of assertions. Note that Remote prn(p,r,n) can be either 0 or 1 when p = r, which is not what we want. However, this equation is valid if Remote prn(p,r,n) is (indirectly) minimized, so it is 0 when p = r as we expect. The validity check happens when minimize is called. Our approach requires no change in Rosette s internals. The additional operations simply generate Rosette assertions. We implement our custom minimize function, which performs the validity check before calling Rosette s minimize. Figure 6b show the symbolic expression of cores-space along with additional assertions after symbolically executing the modified type checker on the yellow nodes of the AST in Figure 5a. Notice that the new expression is linear, while the original one is not. Evaluation. The ILP encoding produced by our abstraction solves problems inaccessible to the SMT-based partitioner, and it is faster than the SMT encoding optimized for the domain of partitioning problems (namely, flattening deeply nested ite expressions). Figure 4b shows the median time to partition four benchmarks across three runs. We set the timeout to 30 minutes. In summary, SMT always timed out; domain-optimized SMT constraints solved half of the benchmarks; the ILP encoding solved all benchmarks. 5 Synthesis of Parallel Tree Programs Attribute Grammars and Static Scheduling. Tree computations such as document layout for data visualization or CSS are naturally specified as attribute grammars [12]. For the sake of efficiency, high-performance layout engines in web browsers schedule these tree computations statically, by assigning the statement that computes an attribute to a predetermined position in a sequence of tree traversals. Static scheduling avoids the overhead of determining dynamically when an attribute is ready to be computed. We express a static schedule for an attribute grammar as a program in a domain-specific language of tree traversal schedules, L S. A schedule consists of tree traversal passes, each S N A P L

32 2:12 Domain-Specific Symbolic Compilation Listing 3 The original interpreter of tree traversal schedules, int S. 1 ( define ( int S G t s) 2 ( match s 3 [( seq s 1 s 2) 4 ( int S G t s 1) 5 ( int S G t s 2 )] 6 [( par s 1 s 2) 7 ; check data independence of forward 8 ; and backward orders 9 ( int S G ( copy t) ( seq s 2 s 1 )) 10 ( int S G t ( seq s 1 s 2 ))] 11 [( pre visits ) 12 ( preorder ( visitor G visits ) t)] 13 [( post visits ) 14 ( postorder ( visitor G visits ) t )])) ( define (( visitor G visits ) node ) 17 ( let ([ class (get - class G node )]) 18 ( for ([ slot (get - slots visits class )]) 19 ( eval class node slot )))) ( define ( eval class node slot ) 22 ( let * ([ rule (get - rule class slot )] 23 [ attr ( target node rule )]) 24 ( for ([ dep (get - deps node rule )]) 25 ( assert ( ready? dep ))) 26 ( assert ( not ( ready? attr ))) 27 (set - ready! attr ))) Listing 4 The interpreter int ST, written with the symbolic trace language. 1 ( define ( int ST G t s) 2 ( match s 3 [( seq s 1 s 2) 4 ( int ST G t s 1) 5 ( int ST G t s 2 )] 6 [( par s 1 s 2) 7 ( parallel 8 ( int ST G t s 1) 9 ( int ST G t s 2 ))] 10 [( pre visits ) 11 ( preorder ( visitor G visits ) t)] 12 [( post visits ) 13 ( postorder ( visitor G visits ) t )])) ( define (( visitor G visits ) node ) 16 ( let ([ class (get - class G node )]) 17 ( for ([ slot * (get - slots visits class )]) 18 ( fork ([ slot slot *]) 19 ( eval class node slot ))))) ( define ( eval class node slot ) 22 ( let * ([ rule (get - rule class slot )] 23 [ attr ( target node rule )]) 24 ( for ([ dep (get - deps node rule )]) 25 ( read dep )) 26 ( write attr ) 27 ( step ))) of which executes statements from the attribute grammar. For instance, the schedule post{ Inner{w, h}, Leaf{w, h} } ; pre{ Inner{x, y}, Leaf{x, y} } performs a post-order traversal computing w then h at both Inner and Leaf nodes and then performs a pre-order traversal computing x then y at both Inner and Leaf nodes. A statement to execute is indicated by the name of the target attribute, since attributes and statements to compute them correspond one-to-one. Listing 3 presents a definitional interpreter for the scheduling language L S. The interpreter checks the correctness of a schedule on a given input tree. Among the checks are the absence of reads from uninitialized attributes (line 25) and single assignment (line 26), which together ensure that data dependencies are satisfied. Schedule Synthesis. To synthesize a legal schedule, we first define the space of candidate schedules by creating a partially symbolic schedule, such as post{ Inner{?? 1,?? 2}, Leaf{?? 3,?? 4} } ; pre{ Inner{?? 5,?? 6}, Leaf{?? 7,?? 8} }, where?? i indicates a symbolic choice ranging over the statements from the relevant node class (e.g.,?? 1 ranges over the statements for Inner nodes). This schedule is desugared to a schedule-generating function sch : D 8 L Sch whose parameters control the symbolic choices?? i. Note that the schedule is partly concrete; it specifies two concrete traversals, leaving symbolic only the slots in the node visitors. This limited symbolic nature simplifies symbolic compilation. To consider other traversal patterns, we can apply a standard technique for prioritized enumeration of sketches [2]. General-Purpose Symbolic Evaluation. With the schedule-generating function sch in hand, the call sol(sym(int S (G, t) sch, success)) synthesizes the slots for the partially concrete schedule correct on a tree t. When evaluating a symbolic choice?? i, symbolic evaluation considers each alternative concrete statement (line 18 in Listing 3), generates the constraints stating that the dependencies are ready and the target has not been computed (lines 25 and

33

Сьюзан представила себе газетный заголовок: КРИПТОГРАФ ГРЕГ ХЕЙЛ РАСКРЫВАЕТ СЕКРЕТНЫЙ ПЛАН ПРАВИТЕЛЬСТВА ВЗЯТЬ ПОД КОНТРОЛЬ ГЛОБАЛЬНУЮ ИНФОРМАЦИЮ. Что же, это очередной Попрыгунчик. Вторично разоблачив попытку АНБ пристроить к алгоритму «черный ход», Грег Хейл превратится в мировую знаменитость.

И одновременно пустит АНБ ко дну. Сьюзан внезапно подумала, что Хейл, возможно, говорит правду, но потом прогнала эту мысль.

## 0 Thoughts to “Fabio Pellacini Homeworks Real Estate”