diff --git a/src/Doc/Implementation/ML.thy b/src/Doc/Implementation/ML.thy --- a/src/Doc/Implementation/ML.thy +++ b/src/Doc/Implementation/ML.thy @@ -1,2211 +1,2211 @@ (*:maxLineLen=78:*) theory "ML" imports Base begin chapter \Isabelle/ML\ text \ Isabelle/ML is best understood as a certain culture based on Standard ML. Thus it is not a new programming language, but a certain way to use SML at an advanced level within the Isabelle environment. This covers a variety of aspects that are geared towards an efficient and robust platform for applications of formal logic with fully foundational proof construction --- according to the well-known \<^emph>\LCF principle\. There is specific infrastructure with library modules to address the needs of this difficult task. For example, the raw parallel programming model of Poly/ML is presented as considerably more abstract concept of \<^emph>\futures\, which is then used to augment the inference kernel, Isar theory and proof interpreter, and PIDE document management. The main aspects of Isabelle/ML are introduced below. These first-hand explanations should help to understand how proper Isabelle/ML is to be read and written, and to get access to the wealth of experience that is expressed in the source text and its history of changes.\<^footnote>\See \<^url>\https://isabelle.in.tum.de/repos/isabelle\ for the full Mercurial history. There are symbolic tags to refer to official Isabelle releases, as opposed to arbitrary \<^emph>\tip\ versions that merely reflect snapshots that are never really up-to-date.\ \ section \Style and orthography\ text \ The sources of Isabelle/Isar are optimized for \<^emph>\readability\ and \<^emph>\maintainability\. The main purpose is to tell an informed reader what is really going on and how things really work. This is a non-trivial aim, but it is supported by a certain style of writing Isabelle/ML that has emerged from long years of system development.\<^footnote>\See also the interesting style guide for OCaml \<^url>\https://caml.inria.fr/resources/doc/guides/guidelines.en.html\ which shares many of our means and ends.\ The main principle behind any coding style is \<^emph>\consistency\. For a single author of a small program this merely means ``choose your style and stick to it''. A complex project like Isabelle, with long years of development and different contributors, requires more standardization. A coding style that is changed every few years or with every new contributor is no style at all, because consistency is quickly lost. Global consistency is hard to achieve, though. Nonetheless, one should always strive at least for local consistency of modules and sub-systems, without deviating from some general principles how to write Isabelle/ML. In a sense, good coding style is like an \<^emph>\orthography\ for the sources: it helps to read quickly over the text and see through the main points, without getting distracted by accidental presentation of free-style code. \ subsection \Header and sectioning\ text \ Isabelle source files have a certain standardized header format (with precise spacing) that follows ancient traditions reaching back to the earliest versions of the system by Larry Paulson. See \<^file>\~~/src/Pure/thm.ML\, for example. The header includes at least \<^verbatim>\Title\ and \<^verbatim>\Author\ entries, followed by a prose description of the purpose of the module. The latter can range from a single line to several paragraphs of explanations. The rest of the file is divided into chapters, sections, subsections, subsubsections, paragraphs etc.\ using a simple layout via ML comments as follows. @{verbatim [display] \ (**** chapter ****) (*** section ***) (** subsection **) (* subsubsection *) (*short paragraph*) (* long paragraph, with more text *)\} As in regular typography, there is some extra space \<^emph>\before\ section headings that are adjacent to plain text, but not other headings as in the example above. \<^medskip> The precise wording of the prose text given in these headings is chosen carefully to introduce the main theme of the subsequent formal ML text. \ subsection \Naming conventions\ text \ Since ML is the primary medium to express the meaning of the source text, naming of ML entities requires special care. \ paragraph \Notation.\ text \ A name consists of 1--3 \<^emph>\words\ (rarely 4, but not more) that are separated by underscore. There are three variants concerning upper or lower case letters, which are used for certain ML categories as follows: \<^medskip> \begin{tabular}{lll} variant & example & ML categories \\\hline lower-case & \<^ML_text>\foo_bar\ & values, types, record fields \\ capitalized & \<^ML_text>\Foo_Bar\ & datatype constructors, structures, functors \\ upper-case & \<^ML_text>\FOO_BAR\ & special values, exception constructors, signatures \\ \end{tabular} \<^medskip> For historical reasons, many capitalized names omit underscores, e.g.\ old-style \<^ML_text>\FooBar\ instead of \<^ML_text>\Foo_Bar\. Genuine mixed-case names are \<^emph>\not\ used, because clear division of words is essential for readability.\<^footnote>\Camel-case was invented to workaround the lack of underscore in some early non-ASCII character sets. Later it became habitual in some language communities that are now strong in numbers.\ A single (capital) character does not count as ``word'' in this respect: some Isabelle/ML names are suffixed by extra markers like this: \<^ML_text>\foo_barT\. Name variants are produced by adding 1--3 primes, e.g.\ \<^ML_text>\foo'\, \<^ML_text>\foo''\, or \<^ML_text>\foo'''\, but not \<^ML_text>\foo''''\ or more. Decimal digits scale better to larger numbers, e.g.\ \<^ML_text>\foo0\, \<^ML_text>\foo1\, \<^ML_text>\foo42\. \ paragraph \Scopes.\ text \ Apart from very basic library modules, ML structures are not ``opened'', but names are referenced with explicit qualification, as in \<^ML>\Syntax.string_of_term\ for example. When devising names for structures and their components it is important to aim at eye-catching compositions of both parts, because this is how they are seen in the sources and documentation. For the same reasons, aliases of well-known library functions should be avoided. Local names of function abstraction or case/let bindings are typically shorter, sometimes using only rudiments of ``words'', while still avoiding cryptic shorthands. An auxiliary function called \<^ML_text>\helper\, \<^ML_text>\aux\, or \<^ML_text>\f\ is considered bad style. Example: @{verbatim [display] \ (* RIGHT *) fun print_foo ctxt foo = let fun print t = ... Syntax.string_of_term ctxt t ... in ... end; (* RIGHT *) fun print_foo ctxt foo = let val string_of_term = Syntax.string_of_term ctxt; fun print t = ... string_of_term t ... in ... end; (* WRONG *) val string_of_term = Syntax.string_of_term; fun print_foo ctxt foo = let fun aux t = ... string_of_term ctxt t ... in ... end;\} \ paragraph \Specific conventions.\ text \ Here are some specific name forms that occur frequently in the sources. \<^item> A function that maps \<^ML_text>\foo\ to \<^ML_text>\bar\ is called \<^ML_text>\foo_to_bar\ or \<^ML_text>\bar_of_foo\ (never \<^ML_text>\foo2bar\, nor \<^ML_text>\bar_from_foo\, nor \<^ML_text>\bar_for_foo\, nor \<^ML_text>\bar4foo\). \<^item> The name component \<^ML_text>\legacy\ means that the operation is about to be discontinued soon. \<^item> The name component \<^ML_text>\global\ means that this works with the background theory instead of the regular local context (\secref{sec:context}), sometimes for historical reasons, sometimes due a genuine lack of locality of the concept involved, sometimes as a fall-back for the lack of a proper context in the application code. Whenever there is a non-global variant available, the application should be migrated to use it with a proper local context. \<^item> Variables of the main context types of the Isabelle/Isar framework (\secref{sec:context} and \chref{ch:local-theory}) have firm naming conventions as follows: \<^item> theories are called \<^ML_text>\thy\, rarely \<^ML_text>\theory\ (never \<^ML_text>\thry\) \<^item> proof contexts are called \<^ML_text>\ctxt\, rarely \<^ML_text>\context\ (never \<^ML_text>\ctx\) \<^item> generic contexts are called \<^ML_text>\context\ \<^item> local theories are called \<^ML_text>\lthy\, except for local theories that are treated as proof context (which is a semantic super-type) Variations with primed or decimal numbers are always possible, as well as semantic prefixes like \<^ML_text>\foo_thy\ or \<^ML_text>\bar_ctxt\, but the base conventions above need to be preserved. This allows to emphasize their data flow via plain regular expressions in the text editor. \<^item> The main logical entities (\secref{ch:logic}) have established naming convention as follows: \<^item> sorts are called \<^ML_text>\S\ \<^item> types are called \<^ML_text>\T\, \<^ML_text>\U\, or \<^ML_text>\ty\ (never \<^ML_text>\t\) \<^item> terms are called \<^ML_text>\t\, \<^ML_text>\u\, or \<^ML_text>\tm\ (never \<^ML_text>\trm\) \<^item> certified types are called \<^ML_text>\cT\, rarely \<^ML_text>\T\, with variants as for types \<^item> certified terms are called \<^ML_text>\ct\, rarely \<^ML_text>\t\, with variants as for terms (never \<^ML_text>\ctrm\) \<^item> theorems are called \<^ML_text>\th\, or \<^ML_text>\thm\ Proper semantic names override these conventions completely. For example, the left-hand side of an equation (as a term) can be called \<^ML_text>\lhs\ (not \<^ML_text>\lhs_tm\). Or a term that is known to be a variable can be called \<^ML_text>\v\ or \<^ML_text>\x\. \<^item> Tactics (\secref{sec:tactics}) are sufficiently important to have specific naming conventions. The name of a basic tactic definition always has a \<^ML_text>\_tac\ suffix, the subgoal index (if applicable) is always called \<^ML_text>\i\, and the goal state (if made explicit) is usually called \<^ML_text>\st\ instead of the somewhat misleading \<^ML_text>\thm\. Any other arguments are given before the latter two, and the general context is given first. Example: @{verbatim [display] \ fun my_tac ctxt arg1 arg2 i st = ...\} Note that the goal state \<^ML_text>\st\ above is rarely made explicit, if tactic combinators (tacticals) are used as usual. A tactic that requires a proof context needs to make that explicit as seen in the \<^verbatim>\ctxt\ argument above. Do not refer to the background theory of \<^verbatim>\st\ -- it is not a proper context, but merely a formal certificate. \ subsection \General source layout\ text \ The general Isabelle/ML source layout imitates regular type-setting conventions, augmented by the requirements for deeply nested expressions that are commonplace in functional programming. \ paragraph \Line length\ text \ is limited to 80 characters according to ancient standards, but we allow as much as 100 characters (not more).\<^footnote>\Readability requires to keep the beginning of a line in view while watching its end. Modern wide-screen displays do not change the way how the human brain works. Sources also need to be printable on plain paper with reasonable font-size.\ The extra 20 characters acknowledge the space requirements due to qualified library references in Isabelle/ML. \ paragraph \White-space\ text \ is used to emphasize the structure of expressions, following mostly standard conventions for mathematical typesetting, as can be seen in plain {\TeX} or {\LaTeX}. This defines positioning of spaces for parentheses, punctuation, and infixes as illustrated here: @{verbatim [display] \ val x = y + z * (a + b); val pair = (a, b); val record = {foo = 1, bar = 2};\} Lines are normally broken \<^emph>\after\ an infix operator or punctuation character. For example: @{verbatim [display] \ val x = a + b + c; val tuple = (a, b, c); \} Some special infixes (e.g.\ \<^ML_text>\|>\) work better at the start of the line, but punctuation is always at the end. Function application follows the tradition of \\\-calculus, not informal mathematics. For example: \<^ML_text>\f a b\ for a curried function, or \<^ML_text>\g (a, b)\ for a tupled function. Note that the space between \<^ML_text>\g\ and the pair \<^ML_text>\(a, b)\ follows the important principle of \<^emph>\compositionality\: the layout of \<^ML_text>\g p\ does not change when \<^ML_text>\p\ is refined to the concrete pair \<^ML_text>\(a, b)\. \ paragraph \Indentation\ text \ uses plain spaces, never hard tabulators.\<^footnote>\Tabulators were invented to move the carriage of a type-writer to certain predefined positions. In software they could be used as a primitive run-length compression of consecutive spaces, but the precise result would depend on non-standardized text editor configuration.\ Each level of nesting is indented by 2 spaces, sometimes 1, very rarely 4, never 8 or any other odd number. Indentation follows a simple logical format that only depends on the nesting depth, not the accidental length of the text that initiates a level of nesting. Example: @{verbatim [display] \ (* RIGHT *) if b then expr1_part1 expr1_part2 else expr2_part1 expr2_part2 (* WRONG *) if b then expr1_part1 expr1_part2 else expr2_part1 expr2_part2\} The second form has many problems: it assumes a fixed-width font when viewing the sources, it uses more space on the line and thus makes it hard to observe its strict length limit (working against \<^emph>\readability\), it requires extra editing to adapt the layout to changes of the initial text (working against \<^emph>\maintainability\) etc. \<^medskip> For similar reasons, any kind of two-dimensional or tabular layouts, ASCII-art with lines or boxes of asterisks etc.\ should be avoided. \ paragraph \Complex expressions\ text \ that consist of multi-clausal function definitions, \<^ML_text>\handle\, \<^ML_text>\case\, \<^ML_text>\let\ (and combinations) require special attention. The syntax of Standard ML is quite ambitious and admits a lot of variance that can distort the meaning of the text. Multiple clauses of \<^ML_text>\fun\, \<^ML_text>\fn\, \<^ML_text>\handle\, \<^ML_text>\case\ get extra indentation to indicate the nesting clearly. Example: @{verbatim [display] \ (* RIGHT *) fun foo p1 = expr1 | foo p2 = expr2 (* WRONG *) fun foo p1 = expr1 | foo p2 = expr2\} Body expressions consisting of \<^ML_text>\case\ or \<^ML_text>\let\ require care to maintain compositionality, to prevent loss of logical indentation where it is especially important to see the structure of the text. Example: @{verbatim [display] \ (* RIGHT *) fun foo p1 = (case e of q1 => ... | q2 => ...) | foo p2 = let ... in ... end (* WRONG *) fun foo p1 = case e of q1 => ... | q2 => ... | foo p2 = let ... in ... end\} Extra parentheses around \<^ML_text>\case\ expressions are optional, but help to analyse the nesting based on character matching in the text editor. \<^medskip> There are two main exceptions to the overall principle of compositionality in the layout of complex expressions. \<^enum> \<^ML_text>\if\ expressions are iterated as if ML had multi-branch conditionals, e.g. @{verbatim [display] \ (* RIGHT *) if b1 then e1 else if b2 then e2 else e3\} \<^enum> \<^ML_text>\fn\ abstractions are often layed-out as if they would lack any structure by themselves. This traditional form is motivated by the possibility to shift function arguments back and forth wrt.\ additional combinators. Example: @{verbatim [display] \ (* RIGHT *) fun foo x y = fold (fn z => expr)\} Here the visual appearance is that of three arguments \<^ML_text>\x\, \<^ML_text>\y\, \<^ML_text>\z\ in a row. Such weakly structured layout should be use with great care. Here are some counter-examples involving \<^ML_text>\let\ expressions: @{verbatim [display] \ (* WRONG *) fun foo x = let val y = ... in ... end (* WRONG *) fun foo x = let val y = ... in ... end (* WRONG *) fun foo x = let val y = ... in ... end (* WRONG *) fun foo x = let val y = ... in ... end\} \<^medskip> In general the source layout is meant to emphasize the structure of complex language expressions, not to pretend that SML had a completely different syntax (say that of Haskell, Scala, Java). \ section \ML embedded into Isabelle/Isar\ text \ ML and Isar are intertwined via an open-ended bootstrap process that provides more and more programming facilities and logical content in an alternating manner. Bootstrapping starts from the raw environment of existing implementations of Standard ML (mainly Poly/ML). Isabelle/Pure marks the point where the raw ML toplevel is superseded by Isabelle/ML within the Isar theory and proof language, with a uniform context for arbitrary ML values (see also \secref{sec:context}). This formal environment holds ML compiler bindings, logical entities, and many other things. Object-logics like Isabelle/HOL are built within the Isabelle/ML/Isar environment by introducing suitable theories with associated ML modules, either inlined within \<^verbatim>\.thy\ files, or as separate \<^verbatim>\.ML\ files that are loading from some theory. Thus Isabelle/HOL is defined as a regular user-space application within the Isabelle framework. Further add-on tools can be implemented in ML within the Isar context in the same manner: ML is part of the standard repertoire of Isabelle, and there is no distinction between ``users'' and ``developers'' in this respect. \ subsection \Isar ML commands\ text \ The primary Isar source language provides facilities to ``open a window'' to the underlying ML compiler. Especially see the Isar commands @{command_ref "ML_file"} and @{command_ref "ML"}: both work the same way, but the source text is provided differently, via a file vs.\ inlined, respectively. Apart from embedding ML into the main theory definition like that, there are many more commands that refer to ML source, such as @{command_ref setup} or @{command_ref declaration}. Even more fine-grained embedding of ML into Isar is encountered in the proof method @{method_ref tactic}, which refines the pending goal state via a given expression of type \<^ML_type>\tactic\. \ text %mlex \ The following artificial example demonstrates some ML toplevel declarations within the implicit Isar theory context. This is regular functional programming without referring to logical entities yet. \ ML \ fun factorial 0 = 1 | factorial n = n * factorial (n - 1) \ text \ Here the ML environment is already managed by Isabelle, i.e.\ the \<^ML>\factorial\ function is not yet accessible in the preceding paragraph, nor in a different theory that is independent from the current one in the import hierarchy. Removing the above ML declaration from the source text will remove any trace of this definition, as expected. The Isabelle/ML toplevel environment is managed in a \<^emph>\stateless\ way: in contrast to the raw ML toplevel, there are no global side-effects involved here.\<^footnote>\Such a stateless compilation environment is also a prerequisite for robust parallel compilation within independent nodes of the implicit theory development graph.\ \<^medskip> The next example shows how to embed ML into Isar proofs, using @{command_ref "ML_prf"} instead of @{command_ref "ML"}. As illustrated below, the effect on the ML environment is local to the whole proof body, but ignoring the block structure. \ notepad begin ML_prf %"ML" \val a = 1\ { ML_prf %"ML" \val b = a + 1\ } \ \Isar block structure ignored by ML environment\ ML_prf %"ML" \val c = b + 1\ end text \ By side-stepping the normal scoping rules for Isar proof blocks, embedded ML code can refer to the different contexts and manipulate corresponding entities, e.g.\ export a fact from a block context. \<^medskip> Two further ML commands are useful in certain situations: @{command_ref ML_val} and @{command_ref ML_command} are \<^emph>\diagnostic\ in the sense that there is no effect on the underlying environment, and can thus be used anywhere. The examples below produce long strings of digits by invoking \<^ML>\factorial\: @{command ML_val} takes care of printing the ML toplevel result, but @{command ML_command} is silent so we produce an explicit output message. \ ML_val \factorial 100\ ML_command \writeln (string_of_int (factorial 100))\ notepad begin ML_val \factorial 100\ ML_command \writeln (string_of_int (factorial 100))\ end subsection \Compile-time context\ text \ Whenever the ML compiler is invoked within Isabelle/Isar, the formal context is passed as a thread-local reference variable. Thus ML code may access the theory context during compilation, by reading or writing the (local) theory under construction. Note that such direct access to the compile-time context is rare. In practice it is typically done via some derived ML functions instead. \ text %mlref \ \begin{mldecls} @{define_ML Context.the_generic_context: "unit -> Context.generic"} \\ @{define_ML "Context.>>": "(Context.generic -> Context.generic) -> unit"} \\ @{define_ML ML_Thms.bind_thms: "string * thm list -> unit"} \\ @{define_ML ML_Thms.bind_thm: "string * thm -> unit"} \\ \end{mldecls} \<^descr> \<^ML>\Context.the_generic_context ()\ refers to the theory context of the ML toplevel --- at compile time. ML code needs to take care to refer to \<^ML>\Context.the_generic_context ()\ correctly. Recall that evaluation of a function body is delayed until actual run-time. \<^descr> \<^ML>\Context.>>\~\f\ applies context transformation \f\ to the implicit context of the ML toplevel. \<^descr> \<^ML>\ML_Thms.bind_thms\~\(name, thms)\ stores a list of theorems produced in ML both in the (global) theory context and the ML toplevel, associating it with the provided name. \<^descr> \<^ML>\ML_Thms.bind_thm\ is similar to \<^ML>\ML_Thms.bind_thms\ but refers to a singleton fact. It is important to note that the above functions are really restricted to the compile time, even though the ML compiler is invoked at run-time. The majority of ML code either uses static antiquotations (\secref{sec:ML-antiq}) or refers to the theory or proof context at run-time, by explicit functional abstraction. \ subsection \Antiquotations \label{sec:ML-antiq}\ text \ A very important consequence of embedding ML into Isar is the concept of \<^emph>\ML antiquotation\. The standard token language of ML is augmented by special syntactic entities of the following form: \<^rail>\ @{syntax_def antiquote}: '@{' name args '}' \ Here @{syntax name} and @{syntax args} are outer syntax categories, as defined in @{cite "isabelle-isar-ref"}. \<^medskip> A regular antiquotation \@{name args}\ processes its arguments by the usual means of the Isar source language, and produces corresponding ML source text, either as literal \<^emph>\inline\ text (e.g.\ \@{term t}\) or abstract \<^emph>\value\ (e.g. \@{thm th}\). This pre-compilation scheme allows to refer to formal entities in a robust manner, with proper static scoping and with some degree of logical checking of small portions of the code. \ subsection \Printing ML values\ text \ The ML compiler knows about the structure of values according to their static type, and can print them in the manner of its toplevel, although the details are non-portable. The antiquotations @{ML_antiquotation_def "make_string"} and @{ML_antiquotation_def "print"} provide a quasi-portable way to refer to this potential capability of the underlying ML system in generic Isabelle/ML sources. This is occasionally useful for diagnostic or demonstration purposes. Note that production-quality tools require proper user-level error messages, avoiding raw ML values in the output. \ text %mlantiq \ \begin{matharray}{rcl} @{ML_antiquotation_def "make_string"} & : & \ML_antiquotation\ \\ @{ML_antiquotation_def "print"} & : & \ML_antiquotation\ \\ \end{matharray} \<^rail>\ @@{ML_antiquotation make_string} ; @@{ML_antiquotation print} embedded? \ \<^descr> \@{make_string}\ inlines a function to print arbitrary values similar to the ML toplevel. The result is compiler dependent and may fall back on "?" in certain situations. The value of configuration option @{attribute_ref ML_print_depth} determines further details of output. \<^descr> \@{print f}\ uses the ML function \f: string -> unit\ to output the result of \@{make_string}\ above, together with the source position of the antiquotation. The default output function is \<^ML>\writeln\. \ text %mlex \ The following artificial examples show how to produce adhoc output of ML values for debugging purposes. \ ML_val \ val x = 42; val y = true; writeln (\<^make_string> {x = x, y = y}); \<^print> {x = x, y = y}; \<^print>\tracing\ {x = x, y = y}; \ section \Canonical argument order \label{sec:canonical-argument-order}\ text \ Standard ML is a language in the tradition of \\\-calculus and \<^emph>\higher-order functional programming\, similar to OCaml, Haskell, or Isabelle/Pure and HOL as logical languages. Getting acquainted with the native style of representing functions in that setting can save a lot of extra boiler-plate of redundant shuffling of arguments, auxiliary abstractions etc. Functions are usually \<^emph>\curried\: the idea of turning arguments of type \\\<^sub>i\ (for \i \ {1, \ n}\) into a result of type \\\ is represented by the iterated function space \\\<^sub>1 \ \ \ \\<^sub>n \ \\. This is isomorphic to the well-known encoding via tuples \\\<^sub>1 \ \ \ \\<^sub>n \ \\, but the curried version fits more smoothly into the basic calculus.\<^footnote>\The difference is even more significant in HOL, because the redundant tuple structure needs to be accommodated extraneous proof steps.\ Currying gives some flexibility due to \<^emph>\partial application\. A function \f: \\<^sub>1 \ \\<^sub>2 \ \\ can be applied to \x: \\<^sub>1\ and the remaining \(f x): \\<^sub>2 \ \\ passed to another function etc. How well this works in practice depends on the order of arguments. In the worst case, arguments are arranged erratically, and using a function in a certain situation always requires some glue code. Thus we would get exponentially many opportunities to decorate the code with meaningless permutations of arguments. This can be avoided by \<^emph>\canonical argument order\, which observes certain standard patterns and minimizes adhoc permutations in their application. In Isabelle/ML, large portions of text can be written without auxiliary operations like \swap: \ \ \ \ \ \ \\ or \C: (\ \ \ \ \) \ (\ \ \ \ \)\ (the latter is not present in the Isabelle/ML library). \<^medskip> The main idea is that arguments that vary less are moved further to the left than those that vary more. Two particularly important categories of functions are \<^emph>\selectors\ and \<^emph>\updates\. The subsequent scheme is based on a hypothetical set-like container of type \\\ that manages elements of type \\\. Both the names and types of the associated operations are canonical for Isabelle/ML. \begin{center} \begin{tabular}{ll} kind & canonical name and type \\\hline selector & \member: \ \ \ \ bool\ \\ update & \insert: \ \ \ \ \\ \\ \end{tabular} \end{center} Given a container \B: \\, the partially applied \member B\ is a predicate over elements \\ \ bool\, and thus represents the intended denotation directly. It is customary to pass the abstract predicate to further operations, not the concrete container. The argument order makes it easy to use other combinators: \forall (member B) list\ will check a list of elements for membership in \B\ etc. Often the explicit \list\ is pointless and can be contracted to \forall (member B)\ to get directly a predicate again. In contrast, an update operation varies the container, so it moves to the right: \insert a\ is a function \\ \ \\ to insert a value \a\. These can be composed naturally as \insert c \ insert b \ insert a\. The slightly awkward inversion of the composition order is due to conventional mathematical notation, which can be easily amended as explained below. \ subsection \Forward application and composition\ text \ Regular function application and infix notation works best for relatively deeply structured expressions, e.g.\ \h (f x y + g z)\. The important special case of \<^emph>\linear transformation\ applies a cascade of functions \f\<^sub>n (\ (f\<^sub>1 x))\. This becomes hard to read and maintain if the functions are themselves given as complex expressions. The notation can be significantly improved by introducing \<^emph>\forward\ versions of application and composition as follows: \<^medskip> \begin{tabular}{lll} \x |> f\ & \\\ & \f x\ \\ \(f #> g) x\ & \\\ & \x |> f |> g\ \\ \end{tabular} \<^medskip> This enables to write conveniently \x |> f\<^sub>1 |> \ |> f\<^sub>n\ or \f\<^sub>1 #> \ #> f\<^sub>n\ for its functional abstraction over \x\. \<^medskip> There is an additional set of combinators to accommodate multiple results (via pairs) that are passed on as multiple arguments (via currying). \<^medskip> \begin{tabular}{lll} \(x, y) |-> f\ & \\\ & \f x y\ \\ \(f #-> g) x\ & \\\ & \x |> f |-> g\ \\ \end{tabular} \<^medskip> \ text %mlref \ \begin{mldecls} @{define_ML_infix "|>" : "'a * ('a -> 'b) -> 'b"} \\ @{define_ML_infix "|->" : "('c * 'a) * ('c -> 'a -> 'b) -> 'b"} \\ @{define_ML_infix "#>" : "('a -> 'b) * ('b -> 'c) -> 'a -> 'c"} \\ @{define_ML_infix "#->" : "('a -> 'c * 'b) * ('c -> 'b -> 'd) -> 'a -> 'd"} \\ \end{mldecls} \ subsection \Canonical iteration\ text \ As explained above, a function \f: \ \ \ \ \\ can be understood as update on a configuration of type \\\, parameterized by an argument of type \\\. Given \a: \\ the partial application \(f a): \ \ \\ operates homogeneously on \\\. This can be iterated naturally over a list of parameters \[a\<^sub>1, \, a\<^sub>n]\ as \f a\<^sub>1 #> \ #> f a\<^sub>n\. The latter expression is again a function \\ \ \\. It can be applied to an initial configuration \b: \\ to start the iteration over the given list of arguments: each \a\ in \a\<^sub>1, \, a\<^sub>n\ is applied consecutively by updating a cumulative configuration. The \fold\ combinator in Isabelle/ML lifts a function \f\ as above to its iterated version over a list of arguments. Lifting can be repeated, e.g.\ \(fold \ fold) f\ iterates over a list of lists as expected. The variant \fold_rev\ works inside-out over the list of arguments, such that \fold_rev f \ fold f \ rev\ holds. The \fold_map\ combinator essentially performs \fold\ and \map\ simultaneously: each application of \f\ produces an updated configuration together with a side-result; the iteration collects all such side-results as a separate list. \ text %mlref \ \begin{mldecls} @{define_ML fold: "('a -> 'b -> 'b) -> 'a list -> 'b -> 'b"} \\ @{define_ML fold_rev: "('a -> 'b -> 'b) -> 'a list -> 'b -> 'b"} \\ @{define_ML fold_map: "('a -> 'b -> 'c * 'b) -> 'a list -> 'b -> 'c list * 'b"} \\ \end{mldecls} \<^descr> \<^ML>\fold\~\f\ lifts the parametrized update function \f\ to a list of parameters. \<^descr> \<^ML>\fold_rev\~\f\ is similar to \<^ML>\fold\~\f\, but works inside-out, as if the list would be reversed. \<^descr> \<^ML>\fold_map\~\f\ lifts the parametrized update function \f\ (with side-result) to a list of parameters and cumulative side-results. \begin{warn} The literature on functional programming provides a confusing multitude of combinators called \foldl\, \foldr\ etc. SML97 provides its own variations as \<^ML>\List.foldl\ and \<^ML>\List.foldr\, while the classic Isabelle library also has the historic \<^ML>\Library.foldl\ and \<^ML>\Library.foldr\. To avoid unnecessary complication, all these historical versions should be ignored, and the canonical \<^ML>\fold\ (or \<^ML>\fold_rev\) used exclusively. \end{warn} \ text %mlex \ The following example shows how to fill a text buffer incrementally by adding strings, either individually or from a given list. \ ML_val \ val s = Buffer.empty |> Buffer.add "digits: " |> fold (Buffer.add o string_of_int) (0 upto 9) |> Buffer.content; \<^assert> (s = "digits: 0123456789"); \ text \ Note how \<^ML>\fold (Buffer.add o string_of_int)\ above saves an extra \<^ML>\map\ over the given list. This kind of peephole optimization reduces both the code size and the tree structures in memory (``deforestation''), but it requires some practice to read and write fluently. \<^medskip> The next example elaborates the idea of canonical iteration, demonstrating fast accumulation of tree content using a text buffer. \ ML \ datatype tree = Text of string | Elem of string * tree list; fun slow_content (Text txt) = txt | slow_content (Elem (name, ts)) = "<" ^ name ^ ">" ^ implode (map slow_content ts) ^ "" fun add_content (Text txt) = Buffer.add txt | add_content (Elem (name, ts)) = Buffer.add ("<" ^ name ^ ">") #> fold add_content ts #> Buffer.add (""); fun fast_content tree = Buffer.empty |> add_content tree |> Buffer.content; \ text \ The slowness of \<^ML>\slow_content\ is due to the \<^ML>\implode\ of the recursive results, because it copies previously produced strings again and again. The incremental \<^ML>\add_content\ avoids this by operating on a buffer that is passed through in a linear fashion. Using \<^ML_text>\#>\ and contraction over the actual buffer argument saves some additional boiler-plate. Of course, the two \<^ML>\Buffer.add\ invocations with concatenated strings could have been split into smaller parts, but this would have obfuscated the source without making a big difference in performance. Here we have done some peephole-optimization for the sake of readability. Another benefit of \<^ML>\add_content\ is its ``open'' form as a function on buffers that can be continued in further linear transformations, folding etc. Thus it is more compositional than the naive \<^ML>\slow_content\. As realistic example, compare the old-style \<^ML>\Term.maxidx_of_term: term -> int\ with the newer \<^ML>\Term.maxidx_term: term -> int -> int\ in Isabelle/Pure. Note that \<^ML>\fast_content\ above is only defined as example. In many practical situations, it is customary to provide the incremental \<^ML>\add_content\ only and leave the initialization and termination to the concrete application to the user. \ section \Message output channels \label{sec:message-channels}\ text \ Isabelle provides output channels for different kinds of messages: regular output, high-volume tracing information, warnings, and errors. Depending on the user interface involved, these messages may appear in different text styles or colours. The standard output for batch sessions prefixes each line of warnings by \<^verbatim>\###\ and errors by \<^verbatim>\***\, but leaves anything else unchanged. The message body may contain further markup and formatting, which is routinely used in the Prover IDE @{cite "isabelle-jedit"}. Messages are associated with the transaction context of the running Isar command. This enables the front-end to manage commands and resulting messages together. For example, after deleting a command from a given theory document version, the corresponding message output can be retracted from the display. \ text %mlref \ \begin{mldecls} @{define_ML writeln: "string -> unit"} \\ @{define_ML tracing: "string -> unit"} \\ @{define_ML warning: "string -> unit"} \\ @{define_ML error: "string -> 'a"} % FIXME Output.error_message (!?) \\ \end{mldecls} \<^descr> \<^ML>\writeln\~\text\ outputs \text\ as regular message. This is the primary message output operation of Isabelle and should be used by default. \<^descr> \<^ML>\tracing\~\text\ outputs \text\ as special tracing message, indicating potential high-volume output to the front-end (hundreds or thousands of messages issued by a single command). The idea is to allow the user-interface to downgrade the quality of message display to achieve higher throughput. Note that the user might have to take special actions to see tracing output, e.g.\ switch to a different output window. So this channel should not be used for regular output. \<^descr> \<^ML>\warning\~\text\ outputs \text\ as warning, which typically means some extra emphasis on the front-end side (color highlighting, icons, etc.). \<^descr> \<^ML>\error\~\text\ raises exception \<^ML>\ERROR\~\text\ and thus lets the Isar toplevel print \text\ on the error channel, which typically means some extra emphasis on the front-end side (color highlighting, icons, etc.). This assumes that the exception is not handled before the command terminates. Handling exception \<^ML>\ERROR\~\text\ is a perfectly legal alternative: it means that the error is absorbed without any message output. \begin{warn} The actual error channel is accessed via \<^ML>\Output.error_message\, but this is normally not used directly in user code. \end{warn} \begin{warn} Regular Isabelle/ML code should output messages exclusively by the official channels. Using raw I/O on \<^emph>\stdout\ or \<^emph>\stderr\ instead (e.g.\ via \<^ML>\TextIO.output\) is apt to cause problems in the presence of parallel and asynchronous processing of Isabelle theories. Such raw output might be displayed by the front-end in some system console log, with a low chance that the user will ever see it. Moreover, as a genuine side-effect on global process channels, there is no proper way to retract output when Isar command transactions are reset by the system. \end{warn} \begin{warn} The message channels should be used in a message-oriented manner. This means that multi-line output that logically belongs together is issued by a single invocation of \<^ML>\writeln\ etc.\ with the functional concatenation of all message constituents. \end{warn} \ text %mlex \ The following example demonstrates a multi-line warning. Note that in some situations the user sees only the first line, so the most important point should be made first. \ ML_command \ warning (cat_lines ["Beware the Jabberwock, my son!", "The jaws that bite, the claws that catch!", "Beware the Jubjub Bird, and shun", "The frumious Bandersnatch!"]); \ text \ \<^medskip> An alternative is to make a paragraph of freely-floating words as follows. \ ML_command \ warning (Pretty.string_of (Pretty.para "Beware the Jabberwock, my son! \ \The jaws that bite, the claws that catch! \ \Beware the Jubjub Bird, and shun \ \The frumious Bandersnatch!")) \ text \ This has advantages with variable window / popup sizes, but might make it harder to search for message content systematically, e.g.\ by other tools or by humans expecting the ``verse'' of a formal message in a fixed layout. \ section \Exceptions \label{sec:exceptions}\ text \ The Standard ML semantics of strict functional evaluation together with exceptions is rather well defined, but some delicate points need to be observed to avoid that ML programs go wrong despite static type-checking. Exceptions in Isabelle/ML are subsequently categorized as follows. \ paragraph \Regular user errors.\ text \ These are meant to provide informative feedback about malformed input etc. The \<^emph>\error\ function raises the corresponding \<^ML>\ERROR\ exception, with a plain text message as argument. \<^ML>\ERROR\ exceptions can be handled internally, in order to be ignored, turned into other exceptions, or cascaded by appending messages. If the corresponding Isabelle/Isar command terminates with an \<^ML>\ERROR\ exception state, the system will print the result on the error channel (see \secref{sec:message-channels}). It is considered bad style to refer to internal function names or values in ML source notation in user error messages. Do not use \@{make_string}\ nor \@{here}\! Grammatical correctness of error messages can be improved by \<^emph>\omitting\ final punctuation: messages are often concatenated or put into a larger context (e.g.\ augmented with source position). Note that punctuation after formal entities (types, terms, theorems) is particularly prone to user confusion. \ paragraph \Program failures.\ text \ There is a handful of standard exceptions that indicate general failure situations, or failures of core operations on logical entities (types, terms, theorems, theories, see \chref{ch:logic}). These exceptions indicate a genuine breakdown of the program, so the main purpose is to determine quickly what has happened where. Traditionally, the (short) exception message would include the name of an ML function, although this is no longer necessary, because the ML runtime system attaches detailed source position stemming from the corresponding \<^ML_text>\raise\ keyword. \<^medskip> User modules can always introduce their own custom exceptions locally, e.g.\ to organize internal failures robustly without overlapping with existing exceptions. Exceptions that are exposed in module signatures require extra care, though, and should \<^emph>\not\ be introduced by default. Surprise by users of a module can be often minimized by using plain user errors instead. \ paragraph \Interrupts.\ text \ These indicate arbitrary system events: both the ML runtime system and the Isabelle/ML infrastructure signal various exceptional situations by raising the special \<^ML>\Exn.Interrupt\ exception in user code. This is the one and only way that physical events can intrude an Isabelle/ML program. Such an interrupt can mean out-of-memory, stack overflow, timeout, internal signaling of threads, or a POSIX process signal. An Isabelle/ML program that intercepts interrupts becomes dependent on physical effects of the environment. Even worse, exception handling patterns that are too general by accident, e.g.\ by misspelled exception constructors, will cover interrupts unintentionally and thus render the program semantics ill-defined. Note that the Interrupt exception dates back to the original SML90 language definition. It was excluded from the SML97 version to avoid its malign impact on ML program semantics, but without providing a viable alternative. Isabelle/ML recovers physical interruptibility (which is an indispensable tool to implement managed evaluation of command transactions), but requires user code to be strictly transparent wrt.\ interrupts. \begin{warn} Isabelle/ML user code needs to terminate promptly on interruption, without guessing at its meaning to the system infrastructure. Temporary handling of interrupts for cleanup of global resources etc.\ needs to be followed immediately by re-raising of the original exception. \end{warn} \ text %mlref \ \begin{mldecls} @{define_ML try: "('a -> 'b) -> 'a -> 'b option"} \\ @{define_ML can: "('a -> 'b) -> 'a -> bool"} \\ @{define_ML_exception ERROR of string} \\ @{define_ML_exception Fail of string} \\ @{define_ML Exn.is_interrupt: "exn -> bool"} \\ @{define_ML Exn.reraise: "exn -> 'a"} \\ @{define_ML Runtime.exn_trace: "(unit -> 'a) -> 'a"} \\ \end{mldecls} \<^rail>\ (@@{ML_antiquotation try} | @@{ML_antiquotation can}) embedded \ \<^descr> \<^ML>\try\~\f x\ makes the partiality of evaluating \f x\ explicit via the option datatype. Interrupts are \<^emph>\not\ handled here, i.e.\ this form serves as safe replacement for the \<^emph>\unsafe\ version \<^ML_text>\(SOME\~\f x\~\<^ML_text>\handle _ => NONE)\ that is occasionally seen in books about SML97, but not in Isabelle/ML. \<^descr> \<^ML>\can\ is similar to \<^ML>\try\ with more abstract result. \<^descr> \<^ML>\ERROR\~\msg\ represents user errors; this exception is normally raised indirectly via the \<^ML>\error\ function (see \secref{sec:message-channels}). \<^descr> \<^ML>\Fail\~\msg\ represents general program failures. \<^descr> \<^ML>\Exn.is_interrupt\ identifies interrupts robustly, without mentioning concrete exception constructors in user code. Handled interrupts need to be re-raised promptly! \<^descr> \<^ML>\Exn.reraise\~\exn\ raises exception \exn\ while preserving its implicit position information (if possible, depending on the ML platform). \<^descr> \<^ML>\Runtime.exn_trace\~\<^ML_text>\(fn () =>\~\e\\<^ML_text>\)\ evaluates expression \e\ while printing a full trace of its stack of nested exceptions (if possible, depending on the ML platform). Inserting \<^ML>\Runtime.exn_trace\ into ML code temporarily is useful for debugging, but not suitable for production code. \ text %mlantiq \ \begin{matharray}{rcl} @{ML_antiquotation_def "try"} & : & \ML_antiquotation\ \\ @{ML_antiquotation_def "can"} & : & \ML_antiquotation\ \\ @{ML_antiquotation_def "assert"} & : & \ML_antiquotation\ \\ @{ML_antiquotation_def "undefined"} & : & \ML_antiquotation\ \\ \end{matharray} \<^descr> \@{try}\ and \{can}\ are similar to the corresponding functions, but the argument is taken directly as ML expression: functional abstraction and application is done implicitly. \<^descr> \@{assert}\ inlines a function \<^ML_type>\bool -> unit\ that raises \<^ML>\Fail\ if the argument is \<^ML>\false\. Due to inlining the source position of failed assertions is included in the error output. \<^descr> \@{undefined}\ inlines \<^verbatim>\raise\~\<^ML>\Match\, i.e.\ the ML program behaves as in some function application of an undefined case. \ text %mlex \ We define a total version of division: any failures are swept under the carpet and mapped to a default value. Thus division-by-zero becomes 0, but there could be other exceptions like overflow that produce the same result (for unbounded integers this does not happen). \ ML \ fun div_total x y = \<^try>\x div y\ |> the_default 0; \<^assert> (div_total 1 0 = 0); \ text \ The ML function \<^ML>\undefined\ is defined in \<^file>\~~/src/Pure/library.ML\ as follows: \ ML \fun undefined _ = raise Match\ text \ \<^medskip> The following variant uses the antiquotation @{ML_antiquotation undefined} instead: \ ML \fun undefined _ = \<^undefined>\ text \ \<^medskip> Here is the same, using control-symbol notation for the antiquotation, with special rendering of \<^verbatim>\\<^undefined>\: \ ML \fun undefined _ = \<^undefined>\ text \ \<^medskip> Semantically, all forms are equivalent to a function definition without any clauses, but that is syntactically not allowed in ML. \ section \Strings of symbols \label{sec:symbols}\ text \ A \<^emph>\symbol\ constitutes the smallest textual unit in Isabelle/ML --- raw ML characters are normally not encountered at all. Isabelle strings consist of a sequence of symbols, represented as a packed string or an exploded list of strings. Each symbol is in itself a small string, which has either one of the following forms: \<^enum> a single ASCII character ``\c\'', for example ``\<^verbatim>\a\'', \<^enum> a codepoint according to UTF-8 (non-ASCII byte sequence), \<^enum> a regular symbol ``\<^verbatim>\\\'', for example ``\<^verbatim>\\\'', \<^enum> a control symbol ``\<^verbatim>\\<^ident>\'', for example ``\<^verbatim>\\<^bold>\'', The \ident\ syntax for symbol names is \letter (letter | digit)\<^sup>*\, where \letter = A..Za..z\ and \digit = 0..9\. There are infinitely many regular symbols and control symbols, but a fixed collection of standard symbols is treated specifically. For example, ``\<^verbatim>\\\'' is classified as a letter, which means it may occur within regular Isabelle identifiers. The character set underlying Isabelle symbols is 7-bit ASCII, but 8-bit character sequences are passed-through unchanged. Unicode/UCS data in UTF-8 encoding is processed in a non-strict fashion, such that well-formed code sequences are recognized accordingly. Unicode provides its own collection of mathematical symbols, but within the core Isabelle/ML world there is no link to the standard collection of Isabelle regular symbols. \<^medskip> Output of Isabelle symbols depends on the print mode. For example, the standard {\LaTeX} setup of the Isabelle document preparation system would present ``\<^verbatim>\\\'' as \\\, and ``\<^verbatim>\\<^bold>\\'' as \\<^bold>\\. On-screen rendering usually works by mapping a finite subset of Isabelle symbols to suitable Unicode characters. \ text %mlref \ \begin{mldecls} @{define_ML_type Symbol.symbol = string} \\ @{define_ML Symbol.explode: "string -> Symbol.symbol list"} \\ @{define_ML Symbol.is_letter: "Symbol.symbol -> bool"} \\ @{define_ML Symbol.is_digit: "Symbol.symbol -> bool"} \\ @{define_ML Symbol.is_quasi: "Symbol.symbol -> bool"} \\ @{define_ML Symbol.is_blank: "Symbol.symbol -> bool"} \\ \end{mldecls} \begin{mldecls} @{define_ML_type "Symbol.sym"} \\ @{define_ML Symbol.decode: "Symbol.symbol -> Symbol.sym"} \\ \end{mldecls} \<^descr> Type \<^ML_type>\Symbol.symbol\ represents individual Isabelle symbols. \<^descr> \<^ML>\Symbol.explode\~\str\ produces a symbol list from the packed form. This function supersedes \<^ML>\String.explode\ for virtually all purposes of manipulating text in Isabelle!\<^footnote>\The runtime overhead for exploded strings is mainly that of the list structure: individual symbols that happen to be a singleton string do not require extra memory in Poly/ML.\ \<^descr> \<^ML>\Symbol.is_letter\, \<^ML>\Symbol.is_digit\, \<^ML>\Symbol.is_quasi\, \<^ML>\Symbol.is_blank\ classify standard symbols according to fixed syntactic conventions of Isabelle, cf.\ @{cite "isabelle-isar-ref"}. \<^descr> Type \<^ML_type>\Symbol.sym\ is a concrete datatype that represents the different kinds of symbols explicitly, with constructors \<^ML>\Symbol.Char\, \<^ML>\Symbol.UTF8\, \<^ML>\Symbol.Sym\, \<^ML>\Symbol.Control\, \<^ML>\Symbol.Malformed\. \<^descr> \<^ML>\Symbol.decode\ converts the string representation of a symbol into the datatype version. \ paragraph \Historical note.\ text \ In the original SML90 standard the primitive ML type \<^ML_type>\char\ did not exists, and \<^ML_text>\explode: string -> string list\ produced a list of singleton strings like \<^ML>\raw_explode: string -> string list\ in Isabelle/ML today. When SML97 came out, Isabelle did not adopt its somewhat anachronistic 8-bit or 16-bit characters, but the idea of exploding a string into a list of small strings was extended to ``symbols'' as explained above. Thus Isabelle sources can refer to an infinite store of user-defined symbols, without having to worry about the multitude of Unicode encodings that have emerged over the years. \ section \Basic data types\ text \ The basis library proposal of SML97 needs to be treated with caution. Many of its operations simply do not fit with important Isabelle/ML conventions (like ``canonical argument order'', see \secref{sec:canonical-argument-order}), others cause problems with the parallel evaluation model of Isabelle/ML (such as \<^ML>\TextIO.print\ or \<^ML>\OS.Process.system\). Subsequently we give a brief overview of important operations on basic ML data types. \ subsection \Characters\ text %mlref \ \begin{mldecls} @{define_ML_type char} \\ \end{mldecls} \<^descr> Type \<^ML_type>\char\ is \<^emph>\not\ used. The smallest textual unit in Isabelle is represented as a ``symbol'' (see \secref{sec:symbols}). \ subsection \Strings\ text %mlref \ \begin{mldecls} @{define_ML_type string} \\ \end{mldecls} \<^descr> Type \<^ML_type>\string\ represents immutable vectors of 8-bit characters. There are operations in SML to convert back and forth to actual byte vectors, which are seldom used. This historically important raw text representation is used for Isabelle-specific purposes with the following implicit substructures packed into the string content: \<^enum> sequence of Isabelle symbols (see also \secref{sec:symbols}), with \<^ML>\Symbol.explode\ as key operation; \<^enum> XML tree structure via YXML (see also @{cite "isabelle-system"}), with \<^ML>\YXML.parse_body\ as key operation. Note that Isabelle/ML string literals may refer Isabelle symbols like ``\<^verbatim>\\\'' natively, \<^emph>\without\ escaping the backslash. This is a consequence of Isabelle treating all source text as strings of symbols, instead of raw characters. \ text %mlex \ The subsequent example illustrates the difference of physical addressing of bytes versus logical addressing of symbols in Isabelle strings. \ ML_val \ val s = "\"; \<^assert> (length (Symbol.explode s) = 1); \<^assert> (size s = 4); \ text \ Note that in Unicode renderings of the symbol \\\, variations of encodings like UTF-8 or UTF-16 pose delicate questions about the multi-byte representations of its codepoint, which is outside of the 16-bit address space of the original Unicode standard from the 1990-ies. In Isabelle/ML it is just ``\<^verbatim>\\\'' literally, using plain ASCII characters beyond any doubts. \ subsection \Integers\ text %mlref \ \begin{mldecls} @{define_ML_type int} \\ \end{mldecls} \<^descr> Type \<^ML_type>\int\ represents regular mathematical integers, which are \<^emph>\unbounded\. Overflow is treated properly, but should never happen in practice.\<^footnote>\The size limit for integer bit patterns in memory is 64\,MB for 32-bit Poly/ML, and much higher for 64-bit systems.\ Structure \<^ML_structure>\IntInf\ of SML97 is obsolete and superseded by \<^ML_structure>\Int\. Structure \<^ML_structure>\Integer\ in \<^file>\~~/src/Pure/General/integer.ML\ provides some additional operations. \ subsection \Rational numbers\ text %mlref \ \begin{mldecls} @{define_ML_type Rat.rat} \\ \end{mldecls} \<^descr> Type \<^ML_type>\Rat.rat\ represents rational numbers, based on the unbounded integers of Poly/ML. Literal rationals may be written with special antiquotation syntax \<^verbatim>\@\\int\\<^verbatim>\/\\nat\ or \<^verbatim>\@\\int\ (without any white space). For example \<^verbatim>\@~1/4\ or \<^verbatim>\@10\. The ML toplevel pretty printer uses the same format. Standard operations are provided via ad-hoc overloading of \<^verbatim>\+\, \<^verbatim>\-\, \<^verbatim>\*\, \<^verbatim>\/\, etc. \ subsection \Time\ text %mlref \ \begin{mldecls} @{define_ML_type Time.time} \\ @{define_ML seconds: "real -> Time.time"} \\ \end{mldecls} \<^descr> Type \<^ML_type>\Time.time\ represents time abstractly according to the SML97 basis library definition. This is adequate for internal ML operations, but awkward in concrete time specifications. \<^descr> \<^ML>\seconds\~\s\ turns the concrete scalar \s\ (measured in seconds) into an abstract time value. Floating point numbers are easy to use as configuration options in the context (see \secref{sec:config-options}) or system options that are maintained externally. \ subsection \Options\ text %mlref \ \begin{mldecls} @{define_ML Option.map: "('a -> 'b) -> 'a option -> 'b option"} \\ @{define_ML is_some: "'a option -> bool"} \\ @{define_ML is_none: "'a option -> bool"} \\ @{define_ML the: "'a option -> 'a"} \\ @{define_ML these: "'a list option -> 'a list"} \\ @{define_ML the_list: "'a option -> 'a list"} \\ @{define_ML the_default: "'a -> 'a option -> 'a"} \\ \end{mldecls} \ text \ Apart from \<^ML>\Option.map\ most other operations defined in structure \<^ML_structure>\Option\ are alien to Isabelle/ML and never used. The operations shown above are defined in \<^file>\~~/src/Pure/General/basics.ML\. \ subsection \Lists\ text \ Lists are ubiquitous in ML as simple and light-weight ``collections'' for many everyday programming tasks. Isabelle/ML provides important additions and improvements over operations that are predefined in the SML97 library. \ text %mlref \ \begin{mldecls} @{define_ML cons: "'a -> 'a list -> 'a list"} \\ @{define_ML member: "('b * 'a -> bool) -> 'a list -> 'b -> bool"} \\ @{define_ML insert: "('a * 'a -> bool) -> 'a -> 'a list -> 'a list"} \\ @{define_ML remove: "('b * 'a -> bool) -> 'b -> 'a list -> 'a list"} \\ @{define_ML update: "('a * 'a -> bool) -> 'a -> 'a list -> 'a list"} \\ \end{mldecls} \<^descr> \<^ML>\cons\~\x xs\ evaluates to \x :: xs\. Tupled infix operators are a historical accident in Standard ML. The curried \<^ML>\cons\ amends this, but it should be only used when partial application is required. \<^descr> \<^ML>\member\, \<^ML>\insert\, \<^ML>\remove\, \<^ML>\update\ treat lists as a set-like container that maintains the order of elements. See \<^file>\~~/src/Pure/library.ML\ for the full specifications (written in ML). There are some further derived operations like \<^ML>\union\ or \<^ML>\inter\. Note that \<^ML>\insert\ is conservative about elements that are already a \<^ML>\member\ of the list, while \<^ML>\update\ ensures that the latest entry is always put in front. The latter discipline is often more appropriate in declarations of context data (\secref{sec:context-data}) that are issued by the user in Isar source: later declarations take precedence over earlier ones. \ text %mlex \ Using canonical \<^ML>\fold\ together with \<^ML>\cons\ (or similar standard operations) alternates the orientation of data. The is quite natural and should not be altered forcible by inserting extra applications of \<^ML>\rev\. The alternative \<^ML>\fold_rev\ can be used in the few situations, where alternation should be prevented. \ ML_val \ val items = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]; val list1 = fold cons items []; \<^assert> (list1 = rev items); val list2 = fold_rev cons items []; \<^assert> (list2 = items); \ text \ The subsequent example demonstrates how to \<^emph>\merge\ two lists in a natural way. \ ML_val \ fun merge_lists eq (xs, ys) = fold_rev (insert eq) ys xs; \ text \ Here the first list is treated conservatively: only the new elements from the second list are inserted. The inside-out order of insertion via \<^ML>\fold_rev\ attempts to preserve the order of elements in the result. This way of merging lists is typical for context data (\secref{sec:context-data}). See also \<^ML>\merge\ as defined in \<^file>\~~/src/Pure/library.ML\. \ subsection \Association lists\ text \ The operations for association lists interpret a concrete list of pairs as a finite function from keys to values. Redundant representations with multiple occurrences of the same key are implicitly normalized: lookup and update only take the first occurrence into account. \ text \ \begin{mldecls} @{define_ML AList.lookup: "('a * 'b -> bool) -> ('b * 'c) list -> 'a -> 'c option"} \\ @{define_ML AList.defined: "('a * 'b -> bool) -> ('b * 'c) list -> 'a -> bool"} \\ @{define_ML AList.update: "('a * 'a -> bool) -> 'a * 'b -> ('a * 'b) list -> ('a * 'b) list"} \\ \end{mldecls} \<^descr> \<^ML>\AList.lookup\, \<^ML>\AList.defined\, \<^ML>\AList.update\ implement the main ``framework operations'' for mappings in Isabelle/ML, following standard conventions for their names and types. Note that a function called \<^verbatim>\lookup\ is obliged to express its partiality via an explicit option element. There is no choice to raise an exception, without changing the name to something like \the_element\ or \get\. The \defined\ operation is essentially a contraction of \<^ML>\is_some\ and \<^verbatim>\lookup\, but this is sufficiently frequent to justify its independent existence. This also gives the implementation some opportunity for peep-hole optimization. Association lists are adequate as simple implementation of finite mappings in many practical situations. A more advanced table structure is defined in \<^file>\~~/src/Pure/General/table.ML\; that version scales easily to thousands or millions of elements. \ subsection \Unsynchronized references\ text %mlref \ \begin{mldecls} - @{define_ML_type "'a Unsynchronized.ref"} \\ + @{define_ML_type 'a "Unsynchronized.ref"} \\ @{define_ML Unsynchronized.ref: "'a -> 'a Unsynchronized.ref"} \\ @{define_ML "!": "'a Unsynchronized.ref -> 'a"} \\ @{define_ML_infix ":=" : "'a Unsynchronized.ref * 'a -> unit"} \\ \end{mldecls} \ text \ Due to ubiquitous parallelism in Isabelle/ML (see also \secref{sec:multi-threading}), the mutable reference cells of Standard ML are notorious for causing problems. In a highly parallel system, both correctness \<^emph>\and\ performance are easily degraded when using mutable data. The unwieldy name of \<^ML>\Unsynchronized.ref\ for the constructor for references in Isabelle/ML emphasizes the inconveniences caused by mutability. Existing operations \<^ML>\!\ and \<^ML_infix>\:=\ are unchanged, but should be used with special precautions, say in a strictly local situation that is guaranteed to be restricted to sequential evaluation --- now and in the future. \begin{warn} Never \<^ML_text>\open Unsynchronized\, not even in a local scope! Pretending that mutable state is no problem is a very bad idea. \end{warn} \ section \Thread-safe programming \label{sec:multi-threading}\ text \ Multi-threaded execution has become an everyday reality in Isabelle since Poly/ML 5.2.1 and Isabelle2008. Isabelle/ML provides implicit and explicit parallelism by default, and there is no way for user-space tools to ``opt out''. ML programs that are purely functional, output messages only via the official channels (\secref{sec:message-channels}), and do not intercept interrupts (\secref{sec:exceptions}) can participate in the multi-threaded environment immediately without further ado. More ambitious tools with more fine-grained interaction with the environment need to observe the principles explained below. \ subsection \Multi-threading with shared memory\ text \ Multiple threads help to organize advanced operations of the system, such as real-time conditions on command transactions, sub-components with explicit communication, general asynchronous interaction etc. Moreover, parallel evaluation is a prerequisite to make adequate use of the CPU resources that are available on multi-core systems.\<^footnote>\Multi-core computing does not mean that there are ``spare cycles'' to be wasted. It means that the continued exponential speedup of CPU performance due to ``Moore's Law'' follows different rules: clock frequency has reached its peak around 2005, and applications need to be parallelized in order to avoid a perceived loss of performance. See also @{cite "Sutter:2005"}.\ Isabelle/Isar exploits the inherent structure of theories and proofs to support \<^emph>\implicit parallelism\ to a large extent. LCF-style theorem proving provides almost ideal conditions for that, see also @{cite "Wenzel:2009"}. This means, significant parts of theory and proof checking is parallelized by default. In Isabelle2013, a maximum speedup-factor of 3.5 on 4 cores and 6.5 on 8 cores can be expected @{cite "Wenzel:2013:ITP"}. \<^medskip> ML threads lack the memory protection of separate processes, and operate concurrently on shared heap memory. This has the advantage that results of independent computations are directly available to other threads: abstract values can be passed without copying or awkward serialization that is typically required for separate processes. To make shared-memory multi-threading work robustly and efficiently, some programming guidelines need to be observed. While the ML system is responsible to maintain basic integrity of the representation of ML values in memory, the application programmer needs to ensure that multi-threaded execution does not break the intended semantics. \begin{warn} To participate in implicit parallelism, tools need to be thread-safe. A single ill-behaved tool can affect the stability and performance of the whole system. \end{warn} Apart from observing the principles of thread-safeness passively, advanced tools may also exploit parallelism actively, e.g.\ by using library functions for parallel list operations (\secref{sec:parlist}). \begin{warn} Parallel computing resources are managed centrally by the Isabelle/ML infrastructure. User programs should not fork their own ML threads to perform heavy computations. \end{warn} \ subsection \Critical shared resources\ text \ Thread-safeness is mainly concerned about concurrent read/write access to shared resources, which are outside the purely functional world of ML. This covers the following in particular. \<^item> Global references (or arrays), i.e.\ mutable memory cells that persist over several invocations of associated operations.\<^footnote>\This is independent of the visibility of such mutable values in the toplevel scope.\ \<^item> Global state of the running Isabelle/ML process, i.e.\ raw I/O channels, environment variables, current working directory. \<^item> Writable resources in the file-system that are shared among different threads or external processes. Isabelle/ML provides various mechanisms to avoid critical shared resources in most situations. As last resort there are some mechanisms for explicit synchronization. The following guidelines help to make Isabelle/ML programs work smoothly in a concurrent environment. \<^item> Avoid global references altogether. Isabelle/Isar maintains a uniform context that incorporates arbitrary data declared by user programs (\secref{sec:context-data}). This context is passed as plain value and user tools can get/map their own data in a purely functional manner. Configuration options within the context (\secref{sec:config-options}) provide simple drop-in replacements for historic reference variables. \<^item> Keep components with local state information re-entrant. Instead of poking initial values into (private) global references, a new state record can be created on each invocation, and passed through any auxiliary functions of the component. The state record contain mutable references in special situations, without requiring any synchronization, as long as each invocation gets its own copy and the tool itself is single-threaded. \<^item> Avoid raw output on \stdout\ or \stderr\. The Poly/ML library is thread-safe for each individual output operation, but the ordering of parallel invocations is arbitrary. This means raw output will appear on some system console with unpredictable interleaving of atomic chunks. Note that this does not affect regular message output channels (\secref{sec:message-channels}). An official message id is associated with the command transaction from where it originates, independently of other transactions. This means each running Isar command has effectively its own set of message channels, and interleaving can only happen when commands use parallelism internally (and only at message boundaries). \<^item> Treat environment variables and the current working directory of the running process as read-only. \<^item> Restrict writing to the file-system to unique temporary files. Isabelle already provides a temporary directory that is unique for the running process, and there is a centralized source of unique serial numbers in Isabelle/ML. Thus temporary files that are passed to to some external process will be always disjoint, and thus thread-safe. \ text %mlref \ \begin{mldecls} @{define_ML File.tmp_path: "Path.T -> Path.T"} \\ @{define_ML serial_string: "unit -> string"} \\ \end{mldecls} \<^descr> \<^ML>\File.tmp_path\~\path\ relocates the base component of \path\ into the unique temporary directory of the running Isabelle/ML process. \<^descr> \<^ML>\serial_string\~\()\ creates a new serial number that is unique over the runtime of the Isabelle/ML process. \ text %mlex \ The following example shows how to create unique temporary file names. \ ML_val \ val tmp1 = File.tmp_path (Path.basic ("foo" ^ serial_string ())); val tmp2 = File.tmp_path (Path.basic ("foo" ^ serial_string ())); \<^assert> (tmp1 <> tmp2); \ subsection \Explicit synchronization\ text \ Isabelle/ML provides explicit synchronization for mutable variables over immutable data, which may be updated atomically and exclusively. This addresses the rare situations where mutable shared resources are really required. Synchronization in Isabelle/ML is based on primitives of Poly/ML, which have been adapted to the specific assumptions of the concurrent Isabelle environment. User code should not break this abstraction, but stay within the confines of concurrent Isabelle/ML. A \<^emph>\synchronized variable\ is an explicit state component associated with mechanisms for locking and signaling. There are operations to await a condition, change the state, and signal the change to all other waiting threads. Synchronized access to the state variable is \<^emph>\not\ re-entrant: direct or indirect nesting within the same thread will cause a deadlock! \ text %mlref \ \begin{mldecls} - @{define_ML_type "'a Synchronized.var"} \\ + @{define_ML_type 'a "Synchronized.var"} \\ @{define_ML Synchronized.var: "string -> 'a -> 'a Synchronized.var"} \\ @{define_ML Synchronized.guarded_access: "'a Synchronized.var -> ('a -> ('b * 'a) option) -> 'b"} \\ \end{mldecls} \<^descr> Type \<^ML_type>\'a Synchronized.var\ represents synchronized variables with state of type \<^ML_type>\'a\. \<^descr> \<^ML>\Synchronized.var\~\name x\ creates a synchronized variable that is initialized with value \x\. The \name\ is used for tracing. \<^descr> \<^ML>\Synchronized.guarded_access\~\var f\ lets the function \f\ operate within a critical section on the state \x\ as follows: if \f x\ produces \<^ML>\NONE\, it continues to wait on the internal condition variable, expecting that some other thread will eventually change the content in a suitable manner; if \f x\ produces \<^ML>\SOME\~\(y, x')\ it is satisfied and assigns the new state value \x'\, broadcasts a signal to all waiting threads on the associated condition variable, and returns the result \y\. There are some further variants of the \<^ML>\Synchronized.guarded_access\ combinator, see \<^file>\~~/src/Pure/Concurrent/synchronized.ML\ for details. \ text %mlex \ The following example implements a counter that produces positive integers that are unique over the runtime of the Isabelle process: \ ML_val \ local val counter = Synchronized.var "counter" 0; in fun next () = Synchronized.guarded_access counter (fn i => let val j = i + 1 in SOME (j, j) end); end; val a = next (); val b = next (); \<^assert> (a <> b); \ text \ \<^medskip> See \<^file>\~~/src/Pure/Concurrent/mailbox.ML\ how to implement a mailbox as synchronized variable over a purely functional list. \ section \Managed evaluation\ text \ Execution of Standard ML follows the model of strict functional evaluation with optional exceptions. Evaluation happens whenever some function is applied to (sufficiently many) arguments. The result is either an explicit value or an implicit exception. \<^emph>\Managed evaluation\ in Isabelle/ML organizes expressions and results to control certain physical side-conditions, to say more specifically when and how evaluation happens. For example, the Isabelle/ML library supports lazy evaluation with memoing, parallel evaluation via futures, asynchronous evaluation via promises, evaluation with time limit etc. \<^medskip> An \<^emph>\unevaluated expression\ is represented either as unit abstraction \<^verbatim>\fn () => a\ of type \<^verbatim>\unit -> 'a\ or as regular function \<^verbatim>\fn a => b\ of type \<^verbatim>\'a -> 'b\. Both forms occur routinely, and special care is required to tell them apart --- the static type-system of SML is only of limited help here. The first form is more intuitive: some combinator \<^verbatim>\(unit -> 'a) -> 'a\ applies the given function to \<^verbatim>\()\ to initiate the postponed evaluation process. The second form is more flexible: some combinator \<^verbatim>\('a -> 'b) -> 'a -> 'b\ acts like a modified form of function application; several such combinators may be cascaded to modify a given function, before it is ultimately applied to some argument. \<^medskip> \<^emph>\Reified results\ make the disjoint sum of regular values versions exceptional situations explicit as ML datatype: \'a result = Res of 'a | Exn of exn\. This is typically used for administrative purposes, to store the overall outcome of an evaluation process. \<^emph>\Parallel exceptions\ aggregate reified results, such that multiple exceptions are digested as a collection in canonical form that identifies exceptions according to their original occurrence. This is particular important for parallel evaluation via futures \secref{sec:futures}, which are organized as acyclic graph of evaluations that depend on other evaluations: exceptions stemming from shared sub-graphs are exposed exactly once and in the order of their original occurrence (e.g.\ when printed at the toplevel). Interrupt counts as neutral element here: it is treated as minimal information about some canceled evaluation process, and is absorbed by the presence of regular program exceptions. \ text %mlref \ \begin{mldecls} - @{define_ML_type "'a Exn.result"} \\ + @{define_ML_type 'a "Exn.result"} \\ @{define_ML Exn.capture: "('a -> 'b) -> 'a -> 'b Exn.result"} \\ @{define_ML Exn.interruptible_capture: "('a -> 'b) -> 'a -> 'b Exn.result"} \\ @{define_ML Exn.release: "'a Exn.result -> 'a"} \\ @{define_ML Par_Exn.release_all: "'a Exn.result list -> 'a list"} \\ @{define_ML Par_Exn.release_first: "'a Exn.result list -> 'a list"} \\ \end{mldecls} \<^descr> Type \<^ML_type>\'a Exn.result\ represents the disjoint sum of ML results explicitly, with constructor \<^ML>\Exn.Res\ for regular values and \<^ML>\Exn.Exn\ for exceptions. \<^descr> \<^ML>\Exn.capture\~\f x\ manages the evaluation of \f x\ such that exceptions are made explicit as \<^ML>\Exn.Exn\. Note that this includes physical interrupts (see also \secref{sec:exceptions}), so the same precautions apply to user code: interrupts must not be absorbed accidentally! \<^descr> \<^ML>\Exn.interruptible_capture\ is similar to \<^ML>\Exn.capture\, but interrupts are immediately re-raised as required for user code. \<^descr> \<^ML>\Exn.release\~\result\ releases the original runtime result, exposing its regular value or raising the reified exception. \<^descr> \<^ML>\Par_Exn.release_all\~\results\ combines results that were produced independently (e.g.\ by parallel evaluation). If all results are regular values, that list is returned. Otherwise, the collection of all exceptions is raised, wrapped-up as collective parallel exception. Note that the latter prevents access to individual exceptions by conventional \<^verbatim>\handle\ of ML. \<^descr> \<^ML>\Par_Exn.release_first\ is similar to \<^ML>\Par_Exn.release_all\, but only the first (meaningful) exception that has occurred in the original evaluation process is raised again, the others are ignored. That single exception may get handled by conventional means in ML. \ subsection \Parallel skeletons \label{sec:parlist}\ text \ Algorithmic skeletons are combinators that operate on lists in parallel, in the manner of well-known \map\, \exists\, \forall\ etc. Management of futures (\secref{sec:futures}) and their results as reified exceptions is wrapped up into simple programming interfaces that resemble the sequential versions. What remains is the application-specific problem to present expressions with suitable \<^emph>\granularity\: each list element corresponds to one evaluation task. If the granularity is too coarse, the available CPUs are not saturated. If it is too fine-grained, CPU cycles are wasted due to the overhead of organizing parallel processing. In the worst case, parallel performance will be less than the sequential counterpart! \ text %mlref \ \begin{mldecls} @{define_ML Par_List.map: "('a -> 'b) -> 'a list -> 'b list"} \\ @{define_ML Par_List.get_some: "('a -> 'b option) -> 'a list -> 'b option"} \\ \end{mldecls} \<^descr> \<^ML>\Par_List.map\~\f [x\<^sub>1, \, x\<^sub>n]\ is like \<^ML>\map\~\f [x\<^sub>1, \, x\<^sub>n]\, but the evaluation of \f x\<^sub>i\ for \i = 1, \, n\ is performed in parallel. An exception in any \f x\<^sub>i\ cancels the overall evaluation process. The final result is produced via \<^ML>\Par_Exn.release_first\ as explained above, which means the first program exception that happened to occur in the parallel evaluation is propagated, and all other failures are ignored. \<^descr> \<^ML>\Par_List.get_some\~\f [x\<^sub>1, \, x\<^sub>n]\ produces some \f x\<^sub>i\ that is of the form \SOME y\<^sub>i\, if that exists, otherwise \NONE\. Thus it is similar to \<^ML>\Library.get_first\, but subject to a non-deterministic parallel choice process. The first successful result cancels the overall evaluation process; other exceptions are propagated as for \<^ML>\Par_List.map\. This generic parallel choice combinator is the basis for derived forms, such as \<^ML>\Par_List.find_some\, \<^ML>\Par_List.exists\, \<^ML>\Par_List.forall\. \ text %mlex \ Subsequently, the Ackermann function is evaluated in parallel for some ranges of arguments. \ ML_val \ fun ackermann 0 n = n + 1 | ackermann m 0 = ackermann (m - 1) 1 | ackermann m n = ackermann (m - 1) (ackermann m (n - 1)); Par_List.map (ackermann 2) (500 upto 1000); Par_List.map (ackermann 3) (5 upto 10); \ subsection \Lazy evaluation\ text \ Classic lazy evaluation works via the \lazy\~/ \force\ pair of operations: \lazy\ to wrap an unevaluated expression, and \force\ to evaluate it once and store its result persistently. Later invocations of \force\ retrieve the stored result without another evaluation. Isabelle/ML refines this idea to accommodate the aspects of multi-threading, synchronous program exceptions and asynchronous interrupts. The first thread that invokes \force\ on an unfinished lazy value changes its state into a \<^emph>\promise\ of the eventual result and starts evaluating it. Any other threads that \force\ the same lazy value in the meantime need to wait for it to finish, by producing a regular result or program exception. If the evaluation attempt is interrupted, this event is propagated to all waiting threads and the lazy value is reset to its original state. This means a lazy value is completely evaluated at most once, in a thread-safe manner. There might be multiple interrupted evaluation attempts, and multiple receivers of intermediate interrupt events. Interrupts are \<^emph>\not\ made persistent: later evaluation attempts start again from the original expression. \ text %mlref \ \begin{mldecls} - @{define_ML_type "'a lazy"} \\ + @{define_ML_type 'a "lazy"} \\ @{define_ML Lazy.lazy: "(unit -> 'a) -> 'a lazy"} \\ @{define_ML Lazy.value: "'a -> 'a lazy"} \\ @{define_ML Lazy.force: "'a lazy -> 'a"} \\ \end{mldecls} \<^descr> Type \<^ML_type>\'a lazy\ represents lazy values over type \<^verbatim>\'a\. \<^descr> \<^ML>\Lazy.lazy\~\(fn () => e)\ wraps the unevaluated expression \e\ as unfinished lazy value. \<^descr> \<^ML>\Lazy.value\~\a\ wraps the value \a\ as finished lazy value. When forced, it returns \a\ without any further evaluation. There is very low overhead for this proforma wrapping of strict values as lazy values. \<^descr> \<^ML>\Lazy.force\~\x\ produces the result of the lazy value in a thread-safe manner as explained above. Thus it may cause the current thread to wait on a pending evaluation attempt by another thread. \ subsection \Futures \label{sec:futures}\ text \ Futures help to organize parallel execution in a value-oriented manner, with \fork\~/ \join\ as the main pair of operations, and some further variants; see also @{cite "Wenzel:2009" and "Wenzel:2013:ITP"}. Unlike lazy values, futures are evaluated strictly and spontaneously on separate worker threads. Futures may be canceled, which leads to interrupts on running evaluation attempts, and forces structurally related futures to fail for all time; already finished futures remain unchanged. Exceptions between related futures are propagated as well, and turned into parallel exceptions (see above). Technically, a future is a single-assignment variable together with a \<^emph>\task\ that serves administrative purposes, notably within the \<^emph>\task queue\ where new futures are registered for eventual evaluation and the worker threads retrieve their work. The pool of worker threads is limited, in correlation with the number of physical cores on the machine. Note that allocation of runtime resources may be distorted either if workers yield CPU time (e.g.\ via system sleep or wait operations), or if non-worker threads contend for significant runtime resources independently. There is a limited number of replacement worker threads that get activated in certain explicit wait conditions, after a timeout. \<^medskip> Each future task belongs to some \<^emph>\task group\, which represents the hierarchic structure of related tasks, together with the exception status a that point. By default, the task group of a newly created future is a new sub-group of the presently running one, but it is also possible to indicate different group layouts under program control. Cancellation of futures actually refers to the corresponding task group and all its sub-groups. Thus interrupts are propagated down the group hierarchy. Regular program exceptions are treated likewise: failure of the evaluation of some future task affects its own group and all sub-groups. Given a particular task group, its \<^emph>\group status\ cumulates all relevant exceptions according to its position within the group hierarchy. Interrupted tasks that lack regular result information, will pick up parallel exceptions from the cumulative group status. \<^medskip> A \<^emph>\passive future\ or \<^emph>\promise\ is a future with slightly different evaluation policies: there is only a single-assignment variable and some expression to evaluate for the \<^emph>\failed\ case (e.g.\ to clean up resources when canceled). A regular result is produced by external means, using a separate \<^emph>\fulfill\ operation. Promises are managed in the same task queue, so regular futures may depend on them. This allows a form of reactive programming, where some promises are used as minimal elements (or guards) within the future dependency graph: when these promises are fulfilled the evaluation of subsequent futures starts spontaneously, according to their own inter-dependencies. \ text %mlref \ \begin{mldecls} - @{define_ML_type "'a future"} \\ + @{define_ML_type 'a "future"} \\ @{define_ML Future.fork: "(unit -> 'a) -> 'a future"} \\ @{define_ML Future.forks: "Future.params -> (unit -> 'a) list -> 'a future list"} \\ @{define_ML Future.join: "'a future -> 'a"} \\ @{define_ML Future.joins: "'a future list -> 'a list"} \\ @{define_ML Future.value: "'a -> 'a future"} \\ @{define_ML Future.map: "('a -> 'b) -> 'a future -> 'b future"} \\ @{define_ML Future.cancel: "'a future -> unit"} \\ @{define_ML Future.cancel_group: "Future.group -> unit"} \\[0.5ex] @{define_ML Future.promise: "(unit -> unit) -> 'a future"} \\ @{define_ML Future.fulfill: "'a future -> 'a -> unit"} \\ \end{mldecls} \<^descr> Type \<^ML_type>\'a future\ represents future values over type \<^verbatim>\'a\. \<^descr> \<^ML>\Future.fork\~\(fn () => e)\ registers the unevaluated expression \e\ as unfinished future value, to be evaluated eventually on the parallel worker-thread farm. This is a shorthand for \<^ML>\Future.forks\ below, with default parameters and a single expression. \<^descr> \<^ML>\Future.forks\~\params exprs\ is the general interface to fork several futures simultaneously. The \params\ consist of the following fields: \<^item> \name : string\ (default \<^ML>\""\) specifies a common name for the tasks of the forked futures, which serves diagnostic purposes. \<^item> \group : Future.group option\ (default \<^ML>\NONE\) specifies an optional task group for the forked futures. \<^ML>\NONE\ means that a new sub-group of the current worker-thread task context is created. If this is not a worker thread, the group will be a new root in the group hierarchy. \<^item> \deps : Future.task list\ (default \<^ML>\[]\) specifies dependencies on other future tasks, i.e.\ the adjacency relation in the global task queue. Dependencies on already finished tasks are ignored. \<^item> \pri : int\ (default \<^ML>\0\) specifies a priority within the task queue. Typically there is only little deviation from the default priority \<^ML>\0\. As a rule of thumb, \<^ML>\~1\ means ``low priority" and \<^ML>\1\ means ``high priority''. Note that the task priority only affects the position in the queue, not the thread priority. When a worker thread picks up a task for processing, it runs with the normal thread priority to the end (or until canceled). Higher priority tasks that are queued later need to wait until this (or another) worker thread becomes free again. \<^item> \interrupts : bool\ (default \<^ML>\true\) tells whether the worker thread that processes the corresponding task is initially put into interruptible state. This state may change again while running, by modifying the thread attributes. With interrupts disabled, a running future task cannot be canceled. It is the responsibility of the programmer that this special state is retained only briefly. \<^descr> \<^ML>\Future.join\~\x\ retrieves the value of an already finished future, which may lead to an exception, according to the result of its previous evaluation. For an unfinished future there are several cases depending on the role of the current thread and the status of the future. A non-worker thread waits passively until the future is eventually evaluated. A worker thread temporarily changes its task context and takes over the responsibility to evaluate the future expression on the spot. The latter is done in a thread-safe manner: other threads that intend to join the same future need to wait until the ongoing evaluation is finished. Note that excessive use of dynamic dependencies of futures by adhoc joining may lead to bad utilization of CPU cores, due to threads waiting on other threads to finish required futures. The future task farm has a limited amount of replacement threads that continue working on unrelated tasks after some timeout. Whenever possible, static dependencies of futures should be specified explicitly when forked (see \deps\ above). Thus the evaluation can work from the bottom up, without join conflicts and wait states. \<^descr> \<^ML>\Future.joins\~\xs\ joins the given list of futures simultaneously, which is more efficient than \<^ML>\map Future.join\~\xs\. Based on the dependency graph of tasks, the current thread takes over the responsibility to evaluate future expressions that are required for the main result, working from the bottom up. Waiting on future results that are presently evaluated on other threads only happens as last resort, when no other unfinished futures are left over. \<^descr> \<^ML>\Future.value\~\a\ wraps the value \a\ as finished future value, bypassing the worker-thread farm. When joined, it returns \a\ without any further evaluation. There is very low overhead for this proforma wrapping of strict values as futures. \<^descr> \<^ML>\Future.map\~\f x\ is a fast-path implementation of \<^ML>\Future.fork\~\(fn () => f (\\<^ML>\Future.join\~\x))\, which avoids the full overhead of the task queue and worker-thread farm as far as possible. The function \f\ is supposed to be some trivial post-processing or projection of the future result. \<^descr> \<^ML>\Future.cancel\~\x\ cancels the task group of the given future, using \<^ML>\Future.cancel_group\ below. \<^descr> \<^ML>\Future.cancel_group\~\group\ cancels all tasks of the given task group for all time. Threads that are presently processing a task of the given group are interrupted: it may take some time until they are actually terminated. Tasks that are queued but not yet processed are dequeued and forced into interrupted state. Since the task group is itself invalidated, any further attempt to fork a future that belongs to it will yield a canceled result as well. \<^descr> \<^ML>\Future.promise\~\abort\ registers a passive future with the given \abort\ operation: it is invoked when the future task group is canceled. \<^descr> \<^ML>\Future.fulfill\~\x a\ finishes the passive future \x\ by the given value \a\. If the promise has already been canceled, the attempt to fulfill it causes an exception. \ end diff --git a/src/Doc/Isar_Ref/Document_Preparation.thy b/src/Doc/Isar_Ref/Document_Preparation.thy --- a/src/Doc/Isar_Ref/Document_Preparation.thy +++ b/src/Doc/Isar_Ref/Document_Preparation.thy @@ -1,718 +1,724 @@ (*:maxLineLen=78:*) theory Document_Preparation imports Main Base begin chapter \Document preparation \label{ch:document-prep}\ text \ Isabelle/Isar provides a simple document preparation system based on {PDF-\LaTeX}, with support for hyperlinks and bookmarks within that format. This allows to produce papers, books, theses etc.\ from Isabelle theory sources. {\LaTeX} output is generated while processing a \<^emph>\session\ in batch mode, as explained in the \<^emph>\The Isabelle System Manual\ @{cite "isabelle-system"}. The main Isabelle tools to get started with document preparation are @{tool_ref mkroot} and @{tool_ref build}. The classic Isabelle/HOL tutorial @{cite "isabelle-hol-book"} also explains some aspects of theory presentation. \ section \Markup commands \label{sec:markup}\ text \ \begin{matharray}{rcl} @{command_def "chapter"} & : & \any \ any\ \\ @{command_def "section"} & : & \any \ any\ \\ @{command_def "subsection"} & : & \any \ any\ \\ @{command_def "subsubsection"} & : & \any \ any\ \\ @{command_def "paragraph"} & : & \any \ any\ \\ @{command_def "subparagraph"} & : & \any \ any\ \\ @{command_def "text"} & : & \any \ any\ \\ @{command_def "txt"} & : & \any \ any\ \\ @{command_def "text_raw"} & : & \any \ any\ \\ \end{matharray} Markup commands provide a structured way to insert text into the document generated from a theory. Each markup command takes a single @{syntax text} argument, which is passed as argument to a corresponding {\LaTeX} macro. The default macros provided by \<^file>\~~/lib/texinputs/isabelle.sty\ can be redefined according to the needs of the underlying document and {\LaTeX} styles. Note that formal comments (\secref{sec:comments}) are similar to markup commands, but have a different status within Isabelle/Isar syntax. \<^rail>\ (@@{command chapter} | @@{command section} | @@{command subsection} | @@{command subsubsection} | @@{command paragraph} | @@{command subparagraph}) @{syntax text} ';'? | (@@{command text} | @@{command txt} | @@{command text_raw}) @{syntax text} \ \<^descr> @{command chapter}, @{command section}, @{command subsection} etc.\ mark section headings within the theory source. This works in any context, even before the initial @{command theory} command. The corresponding {\LaTeX} macros are \<^verbatim>\\isamarkupchapter\, \<^verbatim>\\isamarkupsection\, \<^verbatim>\\isamarkupsubsection\ etc.\ \<^descr> @{command text} and @{command txt} specify paragraphs of plain text. This corresponds to a {\LaTeX} environment \<^verbatim>\\begin{isamarkuptext}\ \\\ \<^verbatim>\\end{isamarkuptext}\ etc. \<^descr> @{command text_raw} is similar to @{command text}, but without any surrounding markup environment. This allows to inject arbitrary {\LaTeX} source into the generated document. All text passed to any of the above markup commands may refer to formal entities via \<^emph>\document antiquotations\, see also \secref{sec:antiq}. These are interpreted in the present theory or proof context. \<^medskip> The proof markup commands closely resemble those for theory specifications, but have a different formal status and produce different {\LaTeX} macros. \ section \Document antiquotations \label{sec:antiq}\ text \ \begin{matharray}{rcl} @{antiquotation_def "theory"} & : & \antiquotation\ \\ @{antiquotation_def "thm"} & : & \antiquotation\ \\ @{antiquotation_def "lemma"} & : & \antiquotation\ \\ @{antiquotation_def "prop"} & : & \antiquotation\ \\ @{antiquotation_def "term"} & : & \antiquotation\ \\ @{antiquotation_def term_type} & : & \antiquotation\ \\ @{antiquotation_def typeof} & : & \antiquotation\ \\ @{antiquotation_def const} & : & \antiquotation\ \\ @{antiquotation_def abbrev} & : & \antiquotation\ \\ @{antiquotation_def typ} & : & \antiquotation\ \\ @{antiquotation_def type} & : & \antiquotation\ \\ @{antiquotation_def class} & : & \antiquotation\ \\ @{antiquotation_def locale} & : & \antiquotation\ \\ @{antiquotation_def "text"} & : & \antiquotation\ \\ @{antiquotation_def goals} & : & \antiquotation\ \\ @{antiquotation_def subgoals} & : & \antiquotation\ \\ @{antiquotation_def prf} & : & \antiquotation\ \\ @{antiquotation_def full_prf} & : & \antiquotation\ \\ @{antiquotation_def ML_text} & : & \antiquotation\ \\ @{antiquotation_def ML} & : & \antiquotation\ \\ @{antiquotation_def ML_def} & : & \antiquotation\ \\ @{antiquotation_def ML_ref} & : & \antiquotation\ \\ @{antiquotation_def ML_infix} & : & \antiquotation\ \\ @{antiquotation_def ML_infix_def} & : & \antiquotation\ \\ @{antiquotation_def ML_infix_ref} & : & \antiquotation\ \\ @{antiquotation_def ML_type} & : & \antiquotation\ \\ @{antiquotation_def ML_type_def} & : & \antiquotation\ \\ @{antiquotation_def ML_type_ref} & : & \antiquotation\ \\ @{antiquotation_def ML_structure} & : & \antiquotation\ \\ @{antiquotation_def ML_structure_def} & : & \antiquotation\ \\ @{antiquotation_def ML_structure_ref} & : & \antiquotation\ \\ @{antiquotation_def ML_functor} & : & \antiquotation\ \\ @{antiquotation_def ML_functor_def} & : & \antiquotation\ \\ @{antiquotation_def ML_functor_ref} & : & \antiquotation\ \\ @{antiquotation_def emph} & : & \antiquotation\ \\ @{antiquotation_def bold} & : & \antiquotation\ \\ @{antiquotation_def verbatim} & : & \antiquotation\ \\ @{antiquotation_def bash_function} & : & \antiquotation\ \\ @{antiquotation_def system_option} & : & \antiquotation\ \\ @{antiquotation_def session} & : & \antiquotation\ \\ @{antiquotation_def "file"} & : & \antiquotation\ \\ @{antiquotation_def "url"} & : & \antiquotation\ \\ @{antiquotation_def "cite"} & : & \antiquotation\ \\ @{command_def "print_antiquotations"}\\<^sup>*\ & : & \context \\ \\ \end{matharray} The overall content of an Isabelle/Isar theory may alternate between formal and informal text. The main body consists of formal specification and proof commands, interspersed with markup commands (\secref{sec:markup}) or document comments (\secref{sec:comments}). The argument of markup commands quotes informal text to be printed in the resulting document, but may again refer to formal entities via \<^emph>\document antiquotations\. For example, embedding \<^verbatim>\@{term [show_types] "f x = a + x"}\ within a text block makes \isa{{\isacharparenleft}f{\isasymColon}{\isacharprime}a\ {\isasymRightarrow}\ {\isacharprime}a{\isacharparenright}\ {\isacharparenleft}x{\isasymColon}{\isacharprime}a{\isacharparenright}\ {\isacharequal}\ {\isacharparenleft}a{\isasymColon}{\isacharprime}a{\isacharparenright}\ {\isacharplus}\ x} appear in the final {\LaTeX} document. Antiquotations usually spare the author tedious typing of logical entities in full detail. Even more importantly, some degree of consistency-checking between the main body of formal text and its informal explanation is achieved, since terms and types appearing in antiquotations are checked within the current theory or proof context. \<^medskip> Antiquotations are in general written as \<^verbatim>\@{\\name\~\<^verbatim>\[\\options\\<^verbatim>\]\~\arguments\\<^verbatim>\}\. The short form \<^verbatim>\\\\<^verbatim>\<^\\name\\<^verbatim>\>\\\argument_content\\ (without surrounding \<^verbatim>\@{\\\\\<^verbatim>\}\) works for a single argument that is a cartouche. A cartouche without special decoration is equivalent to \<^verbatim>\\<^cartouche>\\\argument_content\\, which is equivalent to \<^verbatim>\@{cartouche\~\\argument_content\\\<^verbatim>\}\. The special name @{antiquotation_def cartouche} is defined in the context: Isabelle/Pure introduces that as an alias to @{antiquotation_ref text} (see below). Consequently, \\foo_bar + baz \ bazar\\ prints literal quasi-formal text (unchecked). A control symbol \<^verbatim>\\\\<^verbatim>\<^\\name\\<^verbatim>\>\ within the body text, but without a subsequent cartouche, is equivalent to \<^verbatim>\@{\\name\\<^verbatim>\}\. \begingroup \def\isasymcontrolstart{\isatt{\isacharbackslash\isacharless\isacharcircum}} \<^rail>\ @{syntax_def antiquotation}: '@{' antiquotation_body '}' | '\' @{syntax_ref name} '>' @{syntax_ref cartouche} | @{syntax_ref cartouche} ; options: '[' (option * ',') ']' ; option: @{syntax name} | @{syntax name} '=' @{syntax name} ; \ \endgroup Note that the syntax of antiquotations may \<^emph>\not\ include source comments \<^verbatim>\(*\~\\\~\<^verbatim>\*)\ nor verbatim text \<^verbatim>\{*\~\\\~\<^verbatim>\*}\. %% FIXME less monolithic presentation, move to individual sections!? \<^rail>\ @{syntax_def antiquotation_body}: (@@{antiquotation text} | @@{antiquotation cartouche} | @@{antiquotation theory_text}) options @{syntax text} | @@{antiquotation theory} options @{syntax embedded} | @@{antiquotation thm} options styles @{syntax thms} | @@{antiquotation lemma} options @{syntax prop} @'by' @{syntax method} @{syntax method}? | @@{antiquotation prop} options styles @{syntax prop} | @@{antiquotation term} options styles @{syntax term} | @@{antiquotation (HOL) value} options styles @{syntax term} | @@{antiquotation term_type} options styles @{syntax term} | @@{antiquotation typeof} options styles @{syntax term} | @@{antiquotation const} options @{syntax term} | @@{antiquotation abbrev} options @{syntax term} | @@{antiquotation typ} options @{syntax type} | @@{antiquotation type} options @{syntax embedded} | @@{antiquotation class} options @{syntax embedded} | @@{antiquotation locale} options @{syntax embedded} | (@@{antiquotation command} | @@{antiquotation method} | @@{antiquotation attribute}) options @{syntax name} ; @{syntax antiquotation}: @@{antiquotation goals} options | @@{antiquotation subgoals} options | @@{antiquotation prf} options @{syntax thms} | @@{antiquotation full_prf} options @{syntax thms} | @@{antiquotation ML_text} options @{syntax text} | @@{antiquotation ML} options @{syntax text} | @@{antiquotation ML_infix} options @{syntax text} | - @@{antiquotation ML_type} options @{syntax text} | + @@{antiquotation ML_type} options @{syntax typeargs} @{syntax text} | @@{antiquotation ML_structure} options @{syntax text} | @@{antiquotation ML_functor} options @{syntax text} | @@{antiquotation emph} options @{syntax text} | @@{antiquotation bold} options @{syntax text} | @@{antiquotation verbatim} options @{syntax text} | @@{antiquotation bash_function} options @{syntax embedded} | @@{antiquotation system_option} options @{syntax embedded} | @@{antiquotation session} options @{syntax embedded} | @@{antiquotation path} options @{syntax embedded} | @@{antiquotation "file"} options @{syntax name} | @@{antiquotation dir} options @{syntax name} | @@{antiquotation url} options @{syntax embedded} | @@{antiquotation cite} options @{syntax cartouche}? (@{syntax name} + @'and') ; styles: '(' (style + ',') ')' ; style: (@{syntax name} +) ; @@{command print_antiquotations} ('!'?) \ \<^descr> \@{text s}\ prints uninterpreted source text \s\, i.e.\ inner syntax. This is particularly useful to print portions of text according to the Isabelle document style, without demanding well-formedness, e.g.\ small pieces of terms that should not be parsed or type-checked yet. It is also possible to write this in the short form \\s\\ without any further decoration. \<^descr> \@{theory_text s}\ prints uninterpreted theory source text \s\, i.e.\ outer syntax with command keywords and other tokens. \<^descr> \@{theory A}\ prints the session-qualified theory name \A\, which is guaranteed to refer to a valid ancestor theory in the current context. \<^descr> \@{thm a\<^sub>1 \ a\<^sub>n}\ prints theorems \a\<^sub>1 \ a\<^sub>n\. Full fact expressions are allowed here, including attributes (\secref{sec:syn-att}). \<^descr> \@{prop \}\ prints a well-typed proposition \\\. \<^descr> \@{lemma \ by m}\ proves a well-typed proposition \\\ by method \m\ and prints the original \\\. \<^descr> \@{term t}\ prints a well-typed term \t\. \<^descr> \@{value t}\ evaluates a term \t\ and prints its result, see also @{command_ref (HOL) value}. \<^descr> \@{term_type t}\ prints a well-typed term \t\ annotated with its type. \<^descr> \@{typeof t}\ prints the type of a well-typed term \t\. \<^descr> \@{const c}\ prints a logical or syntactic constant \c\. \<^descr> \@{abbrev c x\<^sub>1 \ x\<^sub>n}\ prints a constant abbreviation \c x\<^sub>1 \ x\<^sub>n \ rhs\ as defined in the current context. \<^descr> \@{typ \}\ prints a well-formed type \\\. \<^descr> \@{type \}\ prints a (logical or syntactic) type constructor \\\. \<^descr> \@{class c}\ prints a class \c\. \<^descr> \@{locale c}\ prints a locale \c\. \<^descr> \@{command name}\, \@{method name}\, \@{attribute name}\ print checked entities of the Isar language. \<^descr> \@{goals}\ prints the current \<^emph>\dynamic\ goal state. This is mainly for support of tactic-emulation scripts within Isar. Presentation of goal states does not conform to the idea of human-readable proof documents! When explaining proofs in detail it is usually better to spell out the reasoning via proper Isar proof commands, instead of peeking at the internal machine configuration. \<^descr> \@{subgoals}\ is similar to \@{goals}\, but does not print the main goal. \<^descr> \@{prf a\<^sub>1 \ a\<^sub>n}\ prints the (compact) proof terms corresponding to the theorems \a\<^sub>1 \ a\<^sub>n\. Note that this requires proof terms to be switched on for the current logic session. \<^descr> \@{full_prf a\<^sub>1 \ a\<^sub>n}\ is like \@{prf a\<^sub>1 \ a\<^sub>n}\, but prints the full proof terms, i.e.\ also displays information omitted in the compact proof term, which is denoted by ``\_\'' placeholders there. \<^descr> \@{ML_text s}\ prints ML text verbatim: only the token language is checked. \<^descr> \@{ML s}\, \@{ML_infix s}\, \@{ML_type s}\, \@{ML_structure s}\, and \@{ML_functor s}\ check text \s\ as ML value, infix operator, type, exception, structure, and functor respectively. The source is printed verbatim. The variants \@{ML_def s}\ and \@{ML_ref s}\ etc. maintain the document index: ``def'' means to make a bold entry, ``ref'' means to make a regular entry. + There are two forms for type constructors, with or without separate type + arguments: this impacts only the index entry. For example, \@{ML_type_ref + \'a list\}\ makes an entry literally for ``\'a list\'' (sorted under the + letter ``a''), but \@{ML_type_ref 'a \list\}\ makes an entry for the + constructor name ``\list\''. + \<^descr> \@{emph s}\ prints document source recursively, with {\LaTeX} markup \<^verbatim>\\emph{\\\\\<^verbatim>\}\. \<^descr> \@{bold s}\ prints document source recursively, with {\LaTeX} markup \<^verbatim>\\textbf{\\\\\<^verbatim>\}\. \<^descr> \@{verbatim s}\ prints uninterpreted source text literally as ASCII characters, using some type-writer font style. \<^descr> \@{bash_function name}\ prints the given GNU bash function verbatim. The name is checked wrt.\ the Isabelle system environment @{cite "isabelle-system"}. \<^descr> \@{system_option name}\ prints the given system option verbatim. The name is checked wrt.\ cumulative \<^verbatim>\etc/options\ of all Isabelle components, notably \<^file>\~~/etc/options\. \<^descr> \@{session name}\ prints given session name verbatim. The name is checked wrt.\ the dependencies of the current session. \<^descr> \@{path name}\ prints the file-system path name verbatim. \<^descr> \@{file name}\ is like \@{path name}\, but ensures that \name\ refers to a plain file. \<^descr> \@{dir name}\ is like \@{path name}\, but ensures that \name\ refers to a directory. \<^descr> \@{url name}\ produces markup for the given URL, which results in an active hyperlink within the text. \<^descr> \@{cite name}\ produces a citation \<^verbatim>\\cite{name}\ in {\LaTeX}, where the name refers to some Bib{\TeX} database entry. This is only checked in batch-mode session builds. The variant \@{cite \opt\ name}\ produces \<^verbatim>\\cite[opt]{name}\ with some free-form optional argument. Multiple names are output with commas, e.g. \@{cite foo \ bar}\ becomes \<^verbatim>\\cite{foo,bar}\. The {\LaTeX} macro name is determined by the antiquotation option @{antiquotation_option_def cite_macro}, or the configuration option @{attribute cite_macro} in the context. For example, \@{cite [cite_macro = nocite] foobar}\ produces \<^verbatim>\\nocite{foobar}\. \<^descr> @{command "print_antiquotations"} prints all document antiquotations that are defined in the current context; the ``\!\'' option indicates extra verbosity. \ subsection \Styled antiquotations\ text \ The antiquotations \thm\, \prop\ and \term\ admit an extra \<^emph>\style\ specification to modify the printed result. A style is specified by a name with a possibly empty number of arguments; multiple styles can be sequenced with commas. The following standard styles are available: \<^descr> \lhs\ extracts the first argument of any application form with at least two arguments --- typically meta-level or object-level equality, or any other binary relation. \<^descr> \rhs\ is like \lhs\, but extracts the second argument. \<^descr> \concl\ extracts the conclusion \C\ from a rule in Horn-clause normal form \A\<^sub>1 \ \ A\<^sub>n \ C\. \<^descr> \prem\ \n\ extract premise number \n\ from from a rule in Horn-clause normal form \A\<^sub>1 \ \ A\<^sub>n \ C\. \ subsection \General options\ text \ The following options are available to tune the printed output of antiquotations. Note that many of these coincide with system and configuration options of the same names. \<^descr> @{antiquotation_option_def show_types}~\= bool\ and @{antiquotation_option_def show_sorts}~\= bool\ control printing of explicit type and sort constraints. \<^descr> @{antiquotation_option_def show_structs}~\= bool\ controls printing of implicit structures. \<^descr> @{antiquotation_option_def show_abbrevs}~\= bool\ controls folding of abbreviations. \<^descr> @{antiquotation_option_def names_long}~\= bool\ forces names of types and constants etc.\ to be printed in their fully qualified internal form. \<^descr> @{antiquotation_option_def names_short}~\= bool\ forces names of types and constants etc.\ to be printed unqualified. Note that internalizing the output again in the current context may well yield a different result. \<^descr> @{antiquotation_option_def names_unique}~\= bool\ determines whether the printed version of qualified names should be made sufficiently long to avoid overlap with names declared further back. Set to \false\ for more concise output. \<^descr> @{antiquotation_option_def eta_contract}~\= bool\ prints terms in \\\-contracted form. \<^descr> @{antiquotation_option_def display}~\= bool\ indicates if the text is to be output as multi-line ``display material'', rather than a small piece of text without line breaks (which is the default). In this mode the embedded entities are printed in the same style as the main theory text. \<^descr> @{antiquotation_option_def break}~\= bool\ controls line breaks in non-display material. \<^descr> @{antiquotation_option_def cartouche}~\= bool\ indicates if the output should be delimited as cartouche. \<^descr> @{antiquotation_option_def quotes}~\= bool\ indicates if the output should be delimited via double quotes (option @{antiquotation_option cartouche} takes precedence). Note that the Isabelle {\LaTeX} styles may suppress quotes on their own account. \<^descr> @{antiquotation_option_def mode}~\= name\ adds \name\ to the print mode to be used for presentation. Note that the standard setup for {\LaTeX} output is already present by default, with mode ``\latex\''. \<^descr> @{antiquotation_option_def margin}~\= nat\ and @{antiquotation_option_def indent}~\= nat\ change the margin or indentation for pretty printing of display material. \<^descr> @{antiquotation_option_def goals_limit}~\= nat\ determines the maximum number of subgoals to be printed (for goal-based antiquotation). \<^descr> @{antiquotation_option_def source}~\= bool\ prints the original source text of the antiquotation arguments, rather than its internal representation. Note that formal checking of @{antiquotation "thm"}, @{antiquotation "term"}, etc. is still enabled; use the @{antiquotation "text"} antiquotation for unchecked output. Regular \term\ and \typ\ antiquotations with \source = false\ involve a full round-trip from the original source to an internalized logical entity back to a source form, according to the syntax of the current context. Thus the printed output is not under direct control of the author, it may even fluctuate a bit as the underlying theory is changed later on. In contrast, @{antiquotation_option source}~\= true\ admits direct printing of the given source text, with the desirable well-formedness check in the background, but without modification of the printed text. Cartouche delimiters of the argument are stripped for antiquotations that are internally categorized as ``embedded''. \<^descr> @{antiquotation_option_def source_cartouche} is like @{antiquotation_option source}, but cartouche delimiters are always preserved in the output. For Boolean flags, ``\name = true\'' may be abbreviated as ``\name\''. All of the above flags are disabled by default, unless changed specifically for a logic session in the corresponding \<^verbatim>\ROOT\ file. \ section \Markdown-like text structure\ text \ The markup commands @{command_ref text}, @{command_ref txt}, @{command_ref text_raw} (\secref{sec:markup}) consist of plain text. Its internal structure consists of paragraphs and (nested) lists, using special Isabelle symbols and some rules for indentation and blank lines. This quasi-visual format resembles \<^emph>\Markdown\\<^footnote>\\<^url>\http://commonmark.org\\, but the full complexity of that notation is avoided. This is a summary of the main principles of minimal Markdown in Isabelle: \<^item> List items start with the following markers \<^descr>[itemize:] \<^verbatim>\\<^item>\ \<^descr>[enumerate:] \<^verbatim>\\<^enum>\ \<^descr>[description:] \<^verbatim>\\<^descr>\ \<^item> Adjacent list items with same indentation and same marker are grouped into a single list. \<^item> Singleton blank lines separate paragraphs. \<^item> Multiple blank lines escape from the current list hierarchy. Notable differences to official Markdown: \<^item> Indentation of list items needs to match exactly. \<^item> Indentation is unlimited (official Markdown interprets four spaces as block quote). \<^item> List items always consist of paragraphs --- there is no notion of ``tight'' list. \<^item> Section headings are expressed via Isar document markup commands (\secref{sec:markup}). \<^item> URLs, font styles, other special content is expressed via antiquotations (\secref{sec:antiq}), usually with proper nesting of sub-languages via text cartouches. \ section \Document markers and command tags \label{sec:document-markers}\ text \ \emph{Document markers} are formal comments of the form \\<^marker>\marker_body\\ (using the control symbol \<^verbatim>\\<^marker>\) and may occur anywhere within the outer syntax of a command: the inner syntax of a marker body resembles that for attributes (\secref{sec:syn-att}). In contrast, \emph{Command tags} may only occur after a command keyword and are treated as special markers as explained below. \<^rail>\ @{syntax_def marker}: '\<^marker>' @{syntax cartouche} ; @{syntax_def marker_body}: (@{syntax name} @{syntax args} * ',') ; @{syntax_def tags}: tag* ; tag: '%' (@{syntax short_ident} | @{syntax string}) \ Document markers are stripped from the document output, but surrounding white-space is preserved: e.g.\ a marker at the end of a line does not affect the subsequent line break. Markers operate within the semantic presentation context of a command, and may modify it to change the overall appearance of a command span (e.g.\ by adding tags). Each document marker has its own syntax defined in the theory context; the following markers are predefined in Isabelle/Pure: \<^rail>\ (@@{document_marker_def title} | @@{document_marker_def creator} | @@{document_marker_def contributor} | @@{document_marker_def date} | @@{document_marker_def license} | @@{document_marker_def description}) @{syntax embedded} ; @@{document_marker_def tag} (scope?) @{syntax name} ; scope: '(' ('proof' | 'command') ')' \ \<^descr> \\<^marker>\title arg\\, \\<^marker>\creator arg\\, \\<^marker>\contributor arg\\, \\<^marker>\date arg\\, \\<^marker>\license arg\\, and \\<^marker>\description arg\\ produce markup in the PIDE document, without any immediate effect on typesetting. This vocabulary is taken from the Dublin Core Metadata Initiative\<^footnote>\\<^url>\https://www.dublincore.org/specifications/dublin-core/dcmi-terms\\. The argument is an uninterpreted string, except for @{document_marker description}, which consists of words that are subject to spell-checking. \<^descr> \\<^marker>\tag name\\ updates the list of command tags in the presentation context: later declarations take precedence, so \\<^marker>\tag a, tag b, tag c\\ produces a reversed list. The default tags are given by the original \<^theory_text>\keywords\ declaration of a command, and the system option @{system_option_ref document_tags}. The optional \scope\ tells how far the tagging is applied to subsequent proof structure: ``\<^theory_text>\("proof")\'' means it applies to the following proof text, and ``\<^theory_text>\(command)\'' means it only applies to the current command. The default within a proof body is ``\<^theory_text>\("proof")\'', but for toplevel goal statements it is ``\<^theory_text>\(command)\''. Thus a \tag\ marker for \<^theory_text>\theorem\, \<^theory_text>\lemma\ etc. does \emph{not} affect its proof by default. An old-style command tag \<^verbatim>\%\\name\ is treated like a document marker \\<^marker>\tag (proof) name\\: the list of command tags precedes the list of document markers. The head of the resulting tags in the presentation context is turned into {\LaTeX} environments to modify the type-setting. The following tags are pre-declared for certain classes of commands, and serve as default markup for certain kinds of commands: \<^medskip> \begin{tabular}{ll} \document\ & document markup commands \\ \theory\ & theory begin/end \\ \proof\ & all proof commands \\ \ML\ & all commands involving ML code \\ \end{tabular} \<^medskip> The Isabelle document preparation system @{cite "isabelle-system"} allows tagged command regions to be presented specifically, e.g.\ to fold proof texts, or drop parts of the text completely. For example ``\<^theory_text>\by auto\~\\<^marker>\tag invisible\\'' causes that piece of proof to be treated as \invisible\ instead of \proof\ (the default), which may be shown or hidden depending on the document setup. In contrast, ``\<^theory_text>\by auto\~\\<^marker>\tag visible\\'' forces this text to be shown invariably. Explicit tag specifications within a proof apply to all subsequent commands of the same level of nesting. For example, ``\<^theory_text>\proof\~\\<^marker>\tag invisible\ \\~\<^theory_text>\qed\'' forces the whole sub-proof to be typeset as \visible\ (unless some of its parts are tagged differently). \<^medskip> Command tags merely produce certain markup environments for type-setting. The meaning of these is determined by {\LaTeX} macros, as defined in \<^file>\~~/lib/texinputs/isabelle.sty\ or by the document author. The Isabelle document preparation tools also provide some high-level options to specify the meaning of arbitrary tags to ``keep'', ``drop'', or ``fold'' the corresponding parts of the text. Logic sessions may also specify ``document versions'', where given tags are interpreted in some particular way. Again see @{cite "isabelle-system"} for further details. \ section \Railroad diagrams\ text \ \begin{matharray}{rcl} @{antiquotation_def "rail"} & : & \antiquotation\ \\ \end{matharray} \<^rail>\ 'rail' @{syntax text} \ The @{antiquotation rail} antiquotation allows to include syntax diagrams into Isabelle documents. {\LaTeX} requires the style file \<^file>\~~/lib/texinputs/railsetup.sty\, which can be used via \<^verbatim>\\usepackage{railsetup}\ in \<^verbatim>\root.tex\, for example. The rail specification language is quoted here as Isabelle @{syntax string} or text @{syntax "cartouche"}; it has its own grammar given below. \begingroup \def\isasymnewline{\isatt{\isacharbackslash\isacharless newline\isachargreater}} \<^rail>\ rule? + ';' ; rule: ((identifier | @{syntax antiquotation}) ':')? body ; body: concatenation + '|' ; concatenation: ((atom '?'?) +) (('*' | '+') atom?)? ; atom: '(' body? ')' | identifier | '@'? (string | @{syntax antiquotation}) | '\' \ \endgroup The lexical syntax of \identifier\ coincides with that of @{syntax short_ident} in regular Isabelle syntax, but \string\ uses single quotes instead of double quotes of the standard @{syntax string} category. Each \rule\ defines a formal language (with optional name), using a notation that is similar to EBNF or regular expressions with recursion. The meaning and visual appearance of these rail language elements is illustrated by the following representative examples. \<^item> Empty \<^verbatim>\()\ \<^rail>\()\ \<^item> Nonterminal \<^verbatim>\A\ \<^rail>\A\ \<^item> Nonterminal via Isabelle antiquotation \<^verbatim>\@{syntax method}\ \<^rail>\@{syntax method}\ \<^item> Terminal \<^verbatim>\'xyz'\ \<^rail>\'xyz'\ \<^item> Terminal in keyword style \<^verbatim>\@'xyz'\ \<^rail>\@'xyz'\ \<^item> Terminal via Isabelle antiquotation \<^verbatim>\@@{method rule}\ \<^rail>\@@{method rule}\ \<^item> Concatenation \<^verbatim>\A B C\ \<^rail>\A B C\ \<^item> Newline inside concatenation \<^verbatim>\A B C \ D E F\ \<^rail>\A B C \ D E F\ \<^item> Variants \<^verbatim>\A | B | C\ \<^rail>\A | B | C\ \<^item> Option \<^verbatim>\A ?\ \<^rail>\A ?\ \<^item> Repetition \<^verbatim>\A *\ \<^rail>\A *\ \<^item> Repetition with separator \<^verbatim>\A * sep\ \<^rail>\A * sep\ \<^item> Strict repetition \<^verbatim>\A +\ \<^rail>\A +\ \<^item> Strict repetition with separator \<^verbatim>\A + sep\ \<^rail>\A + sep\ \ end diff --git a/src/Doc/Isar_Ref/Outer_Syntax.thy b/src/Doc/Isar_Ref/Outer_Syntax.thy --- a/src/Doc/Isar_Ref/Outer_Syntax.thy +++ b/src/Doc/Isar_Ref/Outer_Syntax.thy @@ -1,603 +1,607 @@ (*:maxLineLen=78:*) theory Outer_Syntax imports Main Base begin chapter \Outer syntax --- the theory language \label{ch:outer-syntax}\ text \ The rather generic framework of Isabelle/Isar syntax emerges from three main syntactic categories: \<^emph>\commands\ of the top-level Isar engine (covering theory and proof elements), \<^emph>\methods\ for general goal refinements (analogous to traditional ``tactics''), and \<^emph>\attributes\ for operations on facts (within a certain context). Subsequently we give a reference of basic syntactic entities underlying Isabelle/Isar syntax in a bottom-up manner. Concrete theory and proof language elements will be introduced later on. \<^medskip> In order to get started with writing well-formed Isabelle/Isar documents, the most important aspect to be noted is the difference of \<^emph>\inner\ versus \<^emph>\outer\ syntax. Inner syntax is that of Isabelle types and terms of the logic, while outer syntax is that of Isabelle/Isar theory sources (specifications and proofs). As a general rule, inner syntax entities may occur only as \<^emph>\atomic entities\ within outer syntax. For example, the string \<^verbatim>\"x + y"\ and identifier \<^verbatim>\z\ are legal term specifications within a theory, while \<^verbatim>\x + y\ without quotes is not. Printed theory documents usually omit quotes to gain readability (this is a matter of {\LaTeX} macro setup, say via \<^verbatim>\\isabellestyle\, see also @{cite "isabelle-system"}). Experienced users of Isabelle/Isar may easily reconstruct the lost technical information, while mere readers need not care about quotes at all. \ section \Commands\ text \ \begin{matharray}{rcl} @{command_def "print_commands"}\\<^sup>*\ & : & \any \\ \\ @{command_def "help"}\\<^sup>*\ & : & \any \\ \\ \end{matharray} \<^rail>\ @@{command help} (@{syntax name} * ) \ \<^descr> @{command "print_commands"} prints all outer syntax keywords and commands. \<^descr> @{command "help"}~\pats\ retrieves outer syntax commands according to the specified name patterns. \ subsubsection \Examples\ text \ Some common diagnostic commands are retrieved like this (according to usual naming conventions): \ help "print" help "find" section \Lexical matters \label{sec:outer-lex}\ text \ The outer lexical syntax consists of three main categories of syntax tokens: \<^enum> \<^emph>\major keywords\ --- the command names that are available in the present logic session; \<^enum> \<^emph>\minor keywords\ --- additional literal tokens required by the syntax of commands; \<^enum> \<^emph>\named tokens\ --- various categories of identifiers etc. Major keywords and minor keywords are guaranteed to be disjoint. This helps user-interfaces to determine the overall structure of a theory text, without knowing the full details of command syntax. Internally, there is some additional information about the kind of major keywords, which approximates the command type (theory command, proof command etc.). Keywords override named tokens. For example, the presence of a command called \<^verbatim>\term\ inhibits the identifier \<^verbatim>\term\, but the string \<^verbatim>\"term"\ can be used instead. By convention, the outer syntax always allows quoted strings in addition to identifiers, wherever a named entity is expected. When tokenizing a given input sequence, the lexer repeatedly takes the longest prefix of the input that forms a valid token. Spaces, tabs, newlines and formfeeds between tokens serve as explicit separators. \<^medskip> The categories for named tokens are defined once and for all as follows. \begin{center} \begin{supertabular}{rcl} @{syntax_def short_ident} & = & \letter (subscript\<^sup>? quasiletter)\<^sup>*\ \\ @{syntax_def long_ident} & = & \short_ident(\\<^verbatim>\.\\short_ident)\<^sup>+\ \\ @{syntax_def sym_ident} & = & \sym\<^sup>+ |\~~\<^verbatim>\\\\<^verbatim>\<\\short_ident\\<^verbatim>\>\ \\ @{syntax_def nat} & = & \digit\<^sup>+\ \\ @{syntax_def float} & = & @{syntax_ref nat}\<^verbatim>\.\@{syntax_ref nat}~~\|\~~\<^verbatim>\-\@{syntax_ref nat}\<^verbatim>\.\@{syntax_ref nat} \\ @{syntax_def term_var} & = & \<^verbatim>\?\\short_ident |\~~\<^verbatim>\?\\short_ident\\<^verbatim>\.\\nat\ \\ @{syntax_def type_ident} & = & \<^verbatim>\'\\short_ident\ \\ @{syntax_def type_var} & = & \<^verbatim>\?\\type_ident |\~~\<^verbatim>\?\\type_ident\\<^verbatim>\.\\nat\ \\ @{syntax_def string} & = & \<^verbatim>\"\ \\\ \<^verbatim>\"\ \\ @{syntax_def altstring} & = & \<^verbatim>\`\ \\\ \<^verbatim>\`\ \\ @{syntax_def cartouche} & = & \<^verbatim>\\\ \\\ \<^verbatim>\\\ \\ @{syntax_def verbatim} & = & \<^verbatim>\{*\ \\\ \<^verbatim>\*}\ \\[1ex] \letter\ & = & \latin |\~~\<^verbatim>\\\\<^verbatim>\<\\latin\\<^verbatim>\>\~~\|\~~\<^verbatim>\\\\<^verbatim>\<\\latin latin\\<^verbatim>\>\~~\| greek |\ \\ \subscript\ & = & \<^verbatim>\\<^sub>\ \\ \quasiletter\ & = & \letter | digit |\~~\<^verbatim>\_\~~\|\~~\<^verbatim>\'\ \\ \latin\ & = & \<^verbatim>\a\~~\| \ |\~~\<^verbatim>\z\~~\|\~~\<^verbatim>\A\~~\| \ |\~~\<^verbatim>\Z\ \\ \digit\ & = & \<^verbatim>\0\~~\| \ |\~~\<^verbatim>\9\ \\ \sym\ & = & \<^verbatim>\!\~~\|\~~\<^verbatim>\#\~~\|\~~\<^verbatim>\$\~~\|\~~\<^verbatim>\%\~~\|\~~\<^verbatim>\&\~~\|\~~\<^verbatim>\*\~~\|\~~\<^verbatim>\+\~~\|\~~\<^verbatim>\-\~~\|\~~\<^verbatim>\/\~~\|\ \\ & & \<^verbatim>\<\~~\|\~~\<^verbatim>\=\~~\|\~~\<^verbatim>\>\~~\|\~~\<^verbatim>\?\~~\|\~~\<^verbatim>\@\~~\|\~~\<^verbatim>\^\~~\|\~~\<^verbatim>\_\~~\|\~~\<^verbatim>\|\~~\|\~~\<^verbatim>\~\ \\ \greek\ & = & \<^verbatim>\\\~~\|\~~\<^verbatim>\\\~~\|\~~\<^verbatim>\\\~~\|\~~\<^verbatim>\\\~~\|\ \\ & & \<^verbatim>\\\~~\|\~~\<^verbatim>\\\~~\|\~~\<^verbatim>\\\~~\|\~~\<^verbatim>\\\~~\|\ \\ & & \<^verbatim>\\\~~\|\~~\<^verbatim>\\\~~\|\~~\<^verbatim>\\\~~\|\~~\<^verbatim>\\\~~\|\ \\ & & \<^verbatim>\\\~~\|\~~\<^verbatim>\\\~~\|\~~\<^verbatim>\\\~~\|\~~\<^verbatim>\\\~~\|\~~\<^verbatim>\\\~~\|\ \\ & & \<^verbatim>\\\~~\|\~~\<^verbatim>\\\~~\|\~~\<^verbatim>\\\~~\|\~~\<^verbatim>\\\~~\|\ \\ & & \<^verbatim>\\\~~\|\~~\<^verbatim>\\\~~\|\~~\<^verbatim>\\\~~\|\~~\<^verbatim>\\\~~\|\ \\ & & \<^verbatim>\\\~~\|\~~\<^verbatim>\\\~~\|\~~\<^verbatim>\\\~~\|\~~\<^verbatim>\\\~~\|\ \\ & & \<^verbatim>\\\~~\|\~~\<^verbatim>\\\~~\|\~~\<^verbatim>\\\~~\|\~~\<^verbatim>\\\ \\ \end{supertabular} \end{center} A @{syntax_ref term_var} or @{syntax_ref type_var} describes an unknown, which is internally a pair of base name and index (ML type \<^ML_type>\indexname\). These components are either separated by a dot as in \?x.1\ or \?x7.3\ or run together as in \?x1\. The latter form is possible if the base name does not end with digits. If the index is 0, it may be dropped altogether: \?x\ and \?x0\ and \?x.0\ all refer to the same unknown, with basename \x\ and index 0. The syntax of @{syntax_ref string} admits any characters, including newlines; ``\<^verbatim>\"\'' (double-quote) and ``\<^verbatim>\\\'' (backslash) need to be escaped by a backslash; arbitrary character codes may be specified as ``\<^verbatim>\\\\ddd\'', with three decimal digits. Alternative strings according to @{syntax_ref altstring} are analogous, using single back-quotes instead. The body of @{syntax_ref verbatim} may consist of any text not containing ``\<^verbatim>\*}\''; this allows to include quotes without further escapes, but there is no way to escape ``\<^verbatim>\*}\''. Cartouches do not have this limitation. A @{syntax_ref cartouche} consists of arbitrary text, with properly balanced blocks of ``@{verbatim "\"}~\\\~@{verbatim "\"}''. Note that the rendering of cartouche delimiters is usually like this: ``\\ \ \\''. Source comments take the form \<^verbatim>\(*\~\\\~\<^verbatim>\*)\ and may be nested: the text is removed after lexical analysis of the input and thus not suitable for documentation. The Isar syntax also provides proper \<^emph>\document comments\ that are considered as part of the text (see \secref{sec:comments}). Common mathematical symbols such as \\\ are represented in Isabelle as \<^verbatim>\\\. There are infinitely many Isabelle symbols like this, although proper presentation is left to front-end tools such as {\LaTeX} or Isabelle/jEdit. A list of predefined Isabelle symbols that work well with these tools is given in \appref{app:symbols}. Note that \<^verbatim>\\\ does not belong to the \letter\ category, since it is already used differently in the Pure term language. \ section \Common syntax entities\ text \ We now introduce several basic syntactic entities, such as names, terms, and theorem specifications, which are factored out of the actual Isar language elements to be described later. \ subsection \Names\ text \ Entity @{syntax name} usually refers to any name of types, constants, theorems etc.\ Quoted strings provide an escape for non-identifier names or those ruled out by outer syntax keywords (e.g.\ quoted \<^verbatim>\"let"\). \<^rail>\ @{syntax_def name}: @{syntax short_ident} | @{syntax long_ident} | @{syntax sym_ident} | @{syntax nat} | @{syntax string} ; @{syntax_def par_name}: '(' @{syntax name} ')' \ A @{syntax_def system_name} is like @{syntax name}, but it excludes white-space characters and needs to conform to file-name notation. Name components that are special on Windows (e.g.\ \<^verbatim>\CON\, \<^verbatim>\PRN\, \<^verbatim>\AUX\) are excluded on all platforms. \ subsection \Numbers\ text \ The outer lexical syntax (\secref{sec:outer-lex}) admits natural numbers and floating point numbers. These are combined as @{syntax int} and @{syntax real} as follows. \<^rail>\ @{syntax_def int}: @{syntax nat} | '-' @{syntax nat} ; @{syntax_def real}: @{syntax float} | @{syntax int} \ Note that there is an overlap with the category @{syntax name}, which also includes @{syntax nat}. \ subsection \Embedded content\ text \ Entity @{syntax embedded} refers to content of other languages: cartouches allow arbitrary nesting of sub-languages that respect the recursive balancing of cartouche delimiters. Quoted strings are possible as well, but require escaped quotes when nested. As a shortcut, tokens that appear as plain identifiers in the outer language may be used as inner language content without delimiters. \<^rail>\ @{syntax_def embedded}: @{syntax cartouche} | @{syntax string} | @{syntax short_ident} | @{syntax long_ident} | @{syntax sym_ident} | @{syntax term_var} | @{syntax type_ident} | @{syntax type_var} | @{syntax nat} \ \ subsection \Document text\ text \ A chunk of document @{syntax text} is usually given as @{syntax cartouche} \\\\\ or @{syntax verbatim}, i.e.\ enclosed in \<^verbatim>\{*\~\\\~\<^verbatim>\*}\. For convenience, any of the smaller text unit that conforms to @{syntax name} is admitted as well. \<^rail>\ @{syntax_def text}: @{syntax embedded} | @{syntax verbatim} \ Typical uses are document markup commands, like \<^theory_text>\chapter\, \<^theory_text>\section\ etc. (\secref{sec:markup}). \ subsection \Document comments \label{sec:comments}\ text \ Formal comments are an integral part of the document, but are logically void and removed from the resulting theory or term content. The output of document preparation (\chref{ch:document-prep}) supports various styles, according to the following kinds of comments. \<^item> Marginal comment of the form \<^verbatim>\\\~\\text\\ or \\\~\\text\\, usually with a single space between the comment symbol and the argument cartouche. The given argument is typeset as regular text, with formal antiquotations (\secref{sec:antiq}). \<^item> Canceled text of the form \<^verbatim>\\<^cancel>\\\text\\ (no white space between the control symbol and the argument cartouche). The argument is typeset as formal Isabelle source and overlaid with a ``strike-through'' pattern, e.g. \<^theory_text>\\<^cancel>\bad\\. \<^item> Raw {\LaTeX} source of the form \<^verbatim>\\<^latex>\\\argument\\ (no white space between the control symbol and the argument cartouche). This allows to augment the generated {\TeX} source arbitrarily, without any sanity checks! These formal comments work uniformly in outer syntax, inner syntax (term language), Isabelle/ML, and some other embedded languages of Isabelle. \ subsection \Type classes, sorts and arities\ text \ Classes are specified by plain names. Sorts have a very simple inner syntax, which is either a single class name \c\ or a list \{c\<^sub>1, \, c\<^sub>n}\ referring to the intersection of these classes. The syntax of type arities is given directly at the outer level. \<^rail>\ @{syntax_def classdecl}: @{syntax name} (('<' | '\') (@{syntax name} + ','))? ; @{syntax_def sort}: @{syntax embedded} ; @{syntax_def arity}: ('(' (@{syntax sort} + ',') ')')? @{syntax sort} \ \ subsection \Types and terms \label{sec:types-terms}\ text \ The actual inner Isabelle syntax, that of types and terms of the logic, is far too sophisticated in order to be modelled explicitly at the outer theory level. Basically, any such entity has to be quoted to turn it into a single token (the parsing and type-checking is performed internally later). For convenience, a slightly more liberal convention is adopted: quotes may be omitted for any type or term that is already atomic at the outer level. For example, one may just write \<^verbatim>\x\ instead of quoted \<^verbatim>\"x"\. Note that symbolic identifiers (e.g.\ \<^verbatim>\++\ or \\\ are available as well, provided these have not been superseded by commands or other keywords already (such as \<^verbatim>\=\ or \<^verbatim>\+\). \<^rail>\ @{syntax_def type}: @{syntax embedded} ; @{syntax_def term}: @{syntax embedded} ; @{syntax_def prop}: @{syntax embedded} \ Positional instantiations are specified as a sequence of terms, or the placeholder ``\_\'' (underscore), which means to skip a position. \<^rail>\ @{syntax_def inst}: '_' | @{syntax term} ; @{syntax_def insts}: (@{syntax inst} *) \ Named instantiations are specified as pairs of assignments \v = t\, which refer to schematic variables in some theorem that is instantiated. Both type and terms instantiations are admitted, and distinguished by the usual syntax of variable names. \<^rail>\ @{syntax_def named_inst}: variable '=' (type | term) ; @{syntax_def named_insts}: (named_inst @'and' +) ; variable: @{syntax name} | @{syntax term_var} | @{syntax type_ident} | @{syntax type_var} \ Type declarations and definitions usually refer to @{syntax typespec} on the left-hand side. This models basic type constructor application at the outer syntax level. Note that only plain postfix notation is available here, but no infixes. \<^rail>\ - @{syntax_def typespec}: - (() | @{syntax type_ident} | '(' ( @{syntax type_ident} + ',' ) ')') @{syntax name} + @{syntax_def typeargs}: + (() | @{syntax type_ident} | '(' ( @{syntax type_ident} + ',' ) ')') ; - @{syntax_def typespec_sorts}: + @{syntax_def typeargs_sorts}: (() | (@{syntax type_ident} ('::' @{syntax sort})?) | - '(' ( (@{syntax type_ident} ('::' @{syntax sort})?) + ',' ) ')') @{syntax name} + '(' ( (@{syntax type_ident} ('::' @{syntax sort})?) + ',' ) ')') + ; + @{syntax_def typespec}: @{syntax typeargs} @{syntax name} + ; + @{syntax_def typespec_sorts}: @{syntax typeargs_sorts} @{syntax name} \ \ subsection \Term patterns and declarations \label{sec:term-decls}\ text \ Wherever explicit propositions (or term fragments) occur in a proof text, casual binding of schematic term variables may be given specified via patterns of the form ``\<^theory_text>\(is p\<^sub>1 \ p\<^sub>n)\''. This works both for @{syntax term} and @{syntax prop}. \<^rail>\ @{syntax_def term_pat}: '(' (@'is' @{syntax term} +) ')' ; @{syntax_def prop_pat}: '(' (@'is' @{syntax prop} +) ')' \ \<^medskip> Declarations of local variables \x :: \\ and logical propositions \a : \\ represent different views on the same principle of introducing a local scope. In practice, one may usually omit the typing of @{syntax vars} (due to type-inference), and the naming of propositions (due to implicit references of current facts). In any case, Isar proof elements usually admit to introduce multiple such items simultaneously. \<^rail>\ @{syntax_def vars}: (((@{syntax name} +) ('::' @{syntax type})? | @{syntax name} ('::' @{syntax type})? @{syntax mixfix}) + @'and') ; @{syntax_def props}: @{syntax thmdecl}? (@{syntax prop} @{syntax prop_pat}? +) ; @{syntax_def props'}: (@{syntax prop} @{syntax prop_pat}? +) \ The treatment of multiple declarations corresponds to the complementary focus of @{syntax vars} versus @{syntax props}. In ``\x\<^sub>1 \ x\<^sub>n :: \\'' the typing refers to all variables, while in \a: \\<^sub>1 \ \\<^sub>n\ the naming refers to all propositions collectively. Isar language elements that refer to @{syntax vars} or @{syntax props} typically admit separate typings or namings via another level of iteration, with explicit @{keyword_ref "and"} separators; e.g.\ see @{command "fix"} and @{command "assume"} in \secref{sec:proof-context}. \ subsection \Attributes and theorems \label{sec:syn-att}\ text \ Attributes have their own ``semi-inner'' syntax, in the sense that input conforming to @{syntax args} below is parsed by the attribute a second time. The attribute argument specifications may be any sequence of atomic entities (identifiers, strings etc.), or properly bracketed argument lists. Below @{syntax atom} refers to any atomic entity, including any @{syntax keyword} conforming to @{syntax sym_ident}. \<^rail>\ @{syntax_def atom}: @{syntax name} | @{syntax type_ident} | @{syntax type_var} | @{syntax term_var} | @{syntax nat} | @{syntax float} | @{syntax keyword} | @{syntax cartouche} ; arg: @{syntax atom} | '(' @{syntax args} ')' | '[' @{syntax args} ']' ; @{syntax_def args}: arg * ; @{syntax_def attributes}: '[' (@{syntax name} @{syntax args} * ',') ']' \ Theorem specifications come in several flavors: @{syntax axmdecl} and @{syntax thmdecl} usually refer to axioms, assumptions or results of goal statements, while @{syntax thmdef} collects lists of existing theorems. Existing theorems are given by @{syntax thm} and @{syntax thms}, the former requires an actual singleton result. There are three forms of theorem references: \<^enum> named facts \a\, \<^enum> selections from named facts \a(i)\ or \a(j - k)\, \<^enum> literal fact propositions using token syntax @{syntax_ref altstring} \<^verbatim>\`\\\\\<^verbatim>\`\ or @{syntax_ref cartouche} \\\\\ (see also method @{method_ref fact}). Any kind of theorem specification may include lists of attributes both on the left and right hand sides; attributes are applied to any immediately preceding fact. If names are omitted, the theorems are not stored within the theorem database of the theory or proof context, but any given attributes are applied nonetheless. An extra pair of brackets around attributes (like ``\[[simproc a]]\'') abbreviates a theorem reference involving an internal dummy fact, which will be ignored later on. So only the effect of the attribute on the background context will persist. This form of in-place declarations is particularly useful with commands like @{command "declare"} and @{command "using"}. \<^rail>\ @{syntax_def axmdecl}: @{syntax name} @{syntax attributes}? ':' ; @{syntax_def thmbind}: @{syntax name} @{syntax attributes} | @{syntax name} | @{syntax attributes} ; @{syntax_def thmdecl}: thmbind ':' ; @{syntax_def thmdef}: thmbind '=' ; @{syntax_def thm}: (@{syntax name} selection? | @{syntax altstring} | @{syntax cartouche}) @{syntax attributes}? | '[' @{syntax attributes} ']' ; @{syntax_def thms}: @{syntax thm} + ; selection: '(' ((@{syntax nat} | @{syntax nat} '-' @{syntax nat}?) + ',') ')' \ \ subsection \Structured specifications\ text \ Structured specifications use propositions with explicit notation for the ``eigen-context'' to describe rule structure: \\x. A x \ \ \ B x\ is expressed as \<^theory_text>\B x if A x and \ for x\. It is also possible to use dummy terms ``\_\'' (underscore) to refer to locally fixed variables anonymously. Multiple specifications are delimited by ``\|\'' to emphasize separate cases: each with its own scope of inferred types for free variables. \<^rail>\ @{syntax_def for_fixes}: (@'for' @{syntax vars})? ; @{syntax_def multi_specs}: (@{syntax structured_spec} + '|') ; @{syntax_def structured_spec}: @{syntax thmdecl}? @{syntax prop} @{syntax spec_prems} @{syntax for_fixes} ; @{syntax_def spec_prems}: (@'if' ((@{syntax prop}+) + @'and'))? ; @{syntax_def specification}: @{syntax vars} @'where' @{syntax multi_specs} \ \ section \Diagnostic commands\ text \ \begin{matharray}{rcl} @{command_def "print_theory"}\\<^sup>*\ & : & \context \\ \\ @{command_def "print_definitions"}\\<^sup>*\ & : & \context \\ \\ @{command_def "print_methods"}\\<^sup>*\ & : & \context \\ \\ @{command_def "print_attributes"}\\<^sup>*\ & : & \context \\ \\ @{command_def "print_theorems"}\\<^sup>*\ & : & \context \\ \\ @{command_def "find_theorems"}\\<^sup>*\ & : & \context \\ \\ @{command_def "find_consts"}\\<^sup>*\ & : & \context \\ \\ @{command_def "thm_deps"}\\<^sup>*\ & : & \context \\ \\ @{command_def "unused_thms"}\\<^sup>*\ & : & \context \\ \\ @{command_def "print_facts"}\\<^sup>*\ & : & \context \\ \\ @{command_def "print_term_bindings"}\\<^sup>*\ & : & \context \\ \\ \end{matharray} \<^rail>\ (@@{command print_theory} | @@{command print_definitions} | @@{command print_methods} | @@{command print_attributes} | @@{command print_theorems} | @@{command print_facts}) ('!'?) ; @@{command find_theorems} ('(' @{syntax nat}? 'with_dups'? ')')? \ (thm_criterion*) ; thm_criterion: ('-'?) ('name' ':' @{syntax name} | 'intro' | 'elim' | 'dest' | 'solves' | 'simp' ':' @{syntax term} | @{syntax term}) ; @@{command find_consts} (const_criterion*) ; const_criterion: ('-'?) ('name' ':' @{syntax name} | 'strict' ':' @{syntax type} | @{syntax type}) ; @@{command thm_deps} @{syntax thmrefs} ; @@{command unused_thms} ((@{syntax name} +) '-' (@{syntax name} * ))? \ These commands print certain parts of the theory and proof context. Note that there are some further ones available, such as for the set of rules declared for simplifications. \<^descr> @{command "print_theory"} prints the main logical content of the background theory; the ``\!\'' option indicates extra verbosity. \<^descr> @{command "print_definitions"} prints dependencies of definitional specifications within the background theory, which may be constants (\secref{sec:term-definitions}, \secref{sec:overloading}) or types (\secref{sec:types-pure}, \secref{sec:hol-typedef}); the ``\!\'' option indicates extra verbosity. \<^descr> @{command "print_methods"} prints all proof methods available in the current theory context; the ``\!\'' option indicates extra verbosity. \<^descr> @{command "print_attributes"} prints all attributes available in the current theory context; the ``\!\'' option indicates extra verbosity. \<^descr> @{command "print_theorems"} prints theorems of the background theory resulting from the last command; the ``\!\'' option indicates extra verbosity. \<^descr> @{command "print_facts"} prints all local facts of the current context, both named and unnamed ones; the ``\!\'' option indicates extra verbosity. \<^descr> @{command "print_term_bindings"} prints all term bindings that are present in the context. \<^descr> @{command "find_theorems"}~\criteria\ retrieves facts from the theory or proof context matching all of given search criteria. The criterion \name: p\ selects all theorems whose fully qualified name matches pattern \p\, which may contain ``\*\'' wildcards. The criteria \intro\, \elim\, and \dest\ select theorems that match the current goal as introduction, elimination or destruction rules, respectively. The criterion \solves\ returns all rules that would directly solve the current goal. The criterion \simp: t\ selects all rewrite rules whose left-hand side matches the given term. The criterion term \t\ selects all theorems that contain the pattern \t\ -- as usual, patterns may contain occurrences of the dummy ``\_\'', schematic variables, and type constraints. Criteria can be preceded by ``\-\'' to select theorems that do \<^emph>\not\ match. Note that giving the empty list of criteria yields \<^emph>\all\ currently known facts. An optional limit for the number of printed facts may be given; the default is 40. By default, duplicates are removed from the search result. Use \with_dups\ to display duplicates. \<^descr> @{command "find_consts"}~\criteria\ prints all constants whose type meets all of the given criteria. The criterion \strict: ty\ is met by any type that matches the type pattern \ty\. Patterns may contain both the dummy type ``\_\'' and sort constraints. The criterion \ty\ is similar, but it also matches against subtypes. The criterion \name: p\ and the prefix ``\-\'' function as described for @{command "find_theorems"}. \<^descr> @{command "thm_deps"}~\thms\ prints immediate theorem dependencies, i.e.\ the union of all theorems that are used directly to prove the argument facts, without going deeper into the dependency graph. \<^descr> @{command "unused_thms"}~\A\<^sub>1 \ A\<^sub>m - B\<^sub>1 \ B\<^sub>n\ displays all theorems that are proved in theories \B\<^sub>1 \ B\<^sub>n\ or their parents but not in \A\<^sub>1 \ A\<^sub>m\ or their parents and that are never used. If \n\ is \0\, the end of the range of theories defaults to the current theory. If no range is specified, only the unused theorems in the current theory are displayed. \ end diff --git a/src/Pure/Thy/document_antiquotations.ML b/src/Pure/Thy/document_antiquotations.ML --- a/src/Pure/Thy/document_antiquotations.ML +++ b/src/Pure/Thy/document_antiquotations.ML @@ -1,437 +1,446 @@ (* Title: Pure/Thy/document_antiquotations.ML Author: Makarius Miscellaneous document antiquotations. *) structure Document_Antiquotations: sig end = struct (* basic entities *) local type style = term -> term; fun pretty_term_style ctxt (style: style, t) = Document_Output.pretty_term ctxt (style t); fun pretty_thms_style ctxt (style: style, ths) = map (fn th => Document_Output.pretty_term ctxt (style (Thm.full_prop_of th))) ths; fun pretty_term_typ ctxt (style: style, t) = let val t' = style t in Document_Output.pretty_term ctxt (Type.constraint (Term.fastype_of t') t') end; fun pretty_term_typeof ctxt (style: style, t) = Syntax.pretty_typ ctxt (Term.fastype_of (style t)); fun pretty_const ctxt c = let val t = Const (c, Consts.type_scheme (Proof_Context.consts_of ctxt) c) handle TYPE (msg, _, _) => error msg; val (t', _) = yield_singleton (Variable.import_terms true) t ctxt; in Document_Output.pretty_term ctxt t' end; fun pretty_abbrev ctxt s = let val t = Syntax.read_term (Proof_Context.set_mode Proof_Context.mode_abbrev ctxt) s; fun err () = error ("Abbreviated constant expected: " ^ Syntax.string_of_term ctxt t); val (head, args) = Term.strip_comb t; val (c, T) = Term.dest_Const head handle TERM _ => err (); val (U, u) = Consts.the_abbreviation (Proof_Context.consts_of ctxt) c handle TYPE _ => err (); val t' = Term.betapplys (Envir.expand_atom T (U, u), args); val eq = Logic.mk_equals (t, t'); val ctxt' = Proof_Context.augment eq ctxt; in Proof_Context.pretty_term_abbrev ctxt' eq end; fun pretty_locale ctxt (name, pos) = let val thy = Proof_Context.theory_of ctxt in Pretty.str (Locale.extern thy (Locale.check thy (name, pos))) end; fun pretty_class ctxt s = Pretty.str (Proof_Context.extern_class ctxt (Proof_Context.read_class ctxt s)); fun pretty_type ctxt s = let val Type (name, _) = Proof_Context.read_type_name {proper = true, strict = false} ctxt s in Pretty.str (Proof_Context.extern_type ctxt name) end; fun pretty_prf full ctxt = Proof_Syntax.pretty_standard_proof_of ctxt full; fun pretty_theory ctxt (name, pos) = (Theory.check {long = true} ctxt (name, pos); Pretty.str name); val basic_entity = Document_Output.antiquotation_pretty_source_embedded; fun basic_entities name scan pretty = Document_Antiquotation.setup name scan (fn {context = ctxt, source = src, argument = xs} => Document_Output.pretty_items_source ctxt {embedded = false} src (map (pretty ctxt) xs)); val _ = Theory.setup (basic_entity \<^binding>\prop\ (Term_Style.parse -- Args.prop) pretty_term_style #> basic_entity \<^binding>\term\ (Term_Style.parse -- Args.term) pretty_term_style #> basic_entity \<^binding>\term_type\ (Term_Style.parse -- Args.term) pretty_term_typ #> basic_entity \<^binding>\typeof\ (Term_Style.parse -- Args.term) pretty_term_typeof #> basic_entity \<^binding>\const\ (Args.const {proper = true, strict = false}) pretty_const #> basic_entity \<^binding>\abbrev\ (Scan.lift Args.embedded_inner_syntax) pretty_abbrev #> basic_entity \<^binding>\typ\ Args.typ_abbrev Syntax.pretty_typ #> basic_entity \<^binding>\locale\ (Scan.lift Args.embedded_position) pretty_locale #> basic_entity \<^binding>\class\ (Scan.lift Args.embedded_inner_syntax) pretty_class #> basic_entity \<^binding>\type\ (Scan.lift Args.embedded_inner_syntax) pretty_type #> basic_entity \<^binding>\theory\ (Scan.lift Args.embedded_position) pretty_theory #> basic_entities \<^binding>\prf\ Attrib.thms (pretty_prf false) #> basic_entities \<^binding>\full_prf\ Attrib.thms (pretty_prf true) #> Document_Antiquotation.setup \<^binding>\thm\ (Term_Style.parse -- Attrib.thms) (fn {context = ctxt, source = src, argument = arg} => Document_Output.pretty_items_source ctxt {embedded = false} src (pretty_thms_style ctxt arg))); in end; (* Markdown errors *) local fun markdown_error binding = Document_Antiquotation.setup binding (Scan.succeed ()) (fn {source = src, ...} => error ("Bad Markdown structure: illegal " ^ quote (Binding.name_of binding) ^ Position.here (Position.no_range_position (#1 (Token.range_of src))))) val _ = Theory.setup (markdown_error \<^binding>\item\ #> markdown_error \<^binding>\enum\ #> markdown_error \<^binding>\descr\); in end; (* control spacing *) val _ = Theory.setup (Document_Output.antiquotation_raw \<^binding>\noindent\ (Scan.succeed ()) (fn _ => fn () => Latex.string "\\noindent") #> Document_Output.antiquotation_raw \<^binding>\smallskip\ (Scan.succeed ()) (fn _ => fn () => Latex.string "\\smallskip") #> Document_Output.antiquotation_raw \<^binding>\medskip\ (Scan.succeed ()) (fn _ => fn () => Latex.string "\\medskip") #> Document_Output.antiquotation_raw \<^binding>\bigskip\ (Scan.succeed ()) (fn _ => fn () => Latex.string "\\bigskip")); (* nested document text *) local fun nested_antiquotation name s1 s2 = Document_Output.antiquotation_raw_embedded name (Scan.lift Args.cartouche_input) (fn ctxt => fn txt => (Context_Position.reports ctxt (Document_Output.document_reports txt); Latex.enclose_block s1 s2 (Document_Output.output_document ctxt {markdown = false} txt))); val _ = Theory.setup (nested_antiquotation \<^binding>\footnote\ "\\footnote{" "}" #> nested_antiquotation \<^binding>\emph\ "\\emph{" "}" #> nested_antiquotation \<^binding>\bold\ "\\textbf{" "}"); in end; (* index entries *) local val index_like = Parse.$$$ "(" |-- Parse.!!! (Parse.$$$ "is" |-- Args.name --| Parse.$$$ ")"); val index_args = Parse.enum1 "!" (Args.embedded_input -- Scan.option index_like); fun output_text ctxt = Latex.block o Document_Output.output_document ctxt {markdown = false}; fun index binding def = Document_Output.antiquotation_raw binding (Scan.lift index_args) (fn ctxt => fn args => let val _ = Context_Position.reports ctxt (maps (Document_Output.document_reports o #1) args); fun make_item (txt, opt_like) = let val text = output_text ctxt txt; val like = (case opt_like of SOME s => s | NONE => Document_Antiquotation.approx_content ctxt (Input.string_of txt)); val _ = if is_none opt_like andalso Context_Position.is_visible ctxt then writeln ("(" ^ Markup.markup Markup.keyword2 "is" ^ " " ^ quote like ^ ")" ^ Position.here (Input.pos_of txt)) else (); in {text = text, like = like} end; in Latex.index_entry {items = map make_item args, def = def} end); val _ = Theory.setup (index \<^binding>\index_ref\ false #> index \<^binding>\index_def\ true); in end; (* quasi-formal text (unchecked) *) local fun report_text ctxt text = let val pos = Input.pos_of text in Context_Position.reports ctxt [(pos, Markup.language_text (Input.is_delimited text)), (pos, Markup.raw_text)] end; fun prepare_text ctxt = Input.source_content #> #1 #> Document_Antiquotation.prepare_lines ctxt; fun text_antiquotation name = Document_Output.antiquotation_raw_embedded name (Scan.lift Args.text_input) (fn ctxt => fn text => let val _ = report_text ctxt text; in prepare_text ctxt text |> Document_Output.output_source ctxt |> Document_Output.isabelle ctxt end); val theory_text_antiquotation = Document_Output.antiquotation_raw_embedded \<^binding>\theory_text\ (Scan.lift Args.text_input) (fn ctxt => fn text => let val keywords = Thy_Header.get_keywords' ctxt; val _ = report_text ctxt text; val _ = Input.source_explode text |> Token.tokenize keywords {strict = true} |> maps (Token.reports keywords) |> Context_Position.reports_text ctxt; in prepare_text ctxt text |> Token.explode0 keywords |> maps (Document_Output.output_token ctxt) |> Document_Output.isabelle ctxt end); val _ = Theory.setup (text_antiquotation \<^binding>\text\ #> text_antiquotation \<^binding>\cartouche\ #> theory_text_antiquotation); in end; (* goal state *) local fun goal_state name main = Document_Output.antiquotation_pretty name (Scan.succeed ()) (fn ctxt => fn () => Goal_Display.pretty_goal (Config.put Goal_Display.show_main_goal main ctxt) (#goal (Proof.goal (Toplevel.proof_of (Toplevel.presentation_state ctxt))))); val _ = Theory.setup (goal_state \<^binding>\goals\ true #> goal_state \<^binding>\subgoals\ false); in end; (* embedded lemma *) val _ = Theory.setup (Document_Antiquotation.setup \<^binding>\lemma\ (Scan.lift (Scan.ahead Parse.not_eof) -- Args.prop -- Scan.lift (Parse.position (Parse.reserved "by") -- Method.parse -- Scan.option Method.parse)) (fn {context = ctxt, source = src, argument = ((prop_tok, prop), (((_, by_pos), m1), m2))} => let val reports = (by_pos, Markup.keyword1 |> Markup.keyword_properties) :: maps Method.reports_of (m1 :: the_list m2); val _ = Context_Position.reports ctxt reports; (* FIXME check proof!? *) val _ = ctxt |> Proof.theorem NONE (K I) [[(prop, [])]] |> Proof.global_terminal_proof (m1, m2); in Document_Output.pretty_source ctxt {embedded = false} [hd src, prop_tok] (Document_Output.pretty_term ctxt prop) end)); (* verbatim text *) val _ = Theory.setup (Document_Output.antiquotation_verbatim_embedded \<^binding>\verbatim\ (Scan.lift Args.text_input) (fn ctxt => fn text => let val pos = Input.pos_of text; val _ = Context_Position.reports ctxt [(pos, Markup.language_verbatim (Input.is_delimited text)), (pos, Markup.raw_text)]; in #1 (Input.source_content text) end)); (* bash functions *) val _ = Theory.setup (Document_Output.antiquotation_verbatim_embedded \<^binding>\bash_function\ (Scan.lift Args.embedded_position) Isabelle_System.check_bash_function); (* system options *) val _ = Theory.setup (Document_Output.antiquotation_verbatim_embedded \<^binding>\system_option\ (Scan.lift Args.embedded_position) (fn ctxt => fn (name, pos) => let val _ = Completion.check_option (Options.default ()) ctxt (name, pos); in name end)); (* ML text *) local fun test_val (ml1, []) = ML_Lex.read "fn _ => (" @ ml1 @ ML_Lex.read ");" | test_val (ml1, ml2) = ML_Lex.read "fn _ => (" @ ml1 @ ML_Lex.read " : " @ ml2 @ ML_Lex.read ");"; fun test_op (ml1, ml2) = test_val (ML_Lex.read "op " @ ml1, ml2); fun test_type (ml1, []) = ML_Lex.read "val _ = NONE : (" @ ml1 @ ML_Lex.read ") option;" | test_type (ml1, ml2) = ML_Lex.read "val _ = [NONE : (" @ ml1 @ ML_Lex.read ") option, NONE : (" @ ml2 @ ML_Lex.read ") option];"; fun test_exn (ml1, []) = ML_Lex.read "fn _ => (" @ ml1 @ ML_Lex.read " : exn);" | test_exn (ml1, ml2) = ML_Lex.read "fn _ => (" @ ml1 @ ML_Lex.read " : " @ ml2 @ ML_Lex.read " -> exn);"; fun test_struct (ml, _) = ML_Lex.read "functor XXX() = struct structure XX = " @ ml @ ML_Lex.read " end;"; fun test_functor (Antiquote.Text tok :: _, _) = ML_Lex.read "ML_Env.check_functor " @ ML_Lex.read (ML_Syntax.print_string (ML_Lex.content_of tok)) | test_functor _ = raise Fail "Bad ML functor specification"; -val parse_ml0 = Args.text_input >> rpair Input.empty; -val parse_ml = Args.text_input -- Scan.optional (Args.colon |-- Args.text_input) Input.empty; -val parse_type = Args.text_input -- Scan.optional (Args.$$$ "=" |-- Args.text_input) Input.empty; -val parse_exn = Args.text_input -- Scan.optional (Args.$$$ "of" |-- Args.text_input) Input.empty; +val parse_ml0 = Args.text_input >> (fn source => ("", (source, Input.empty))); + +val parse_ml = + Args.text_input -- Scan.optional (Args.colon |-- Args.text_input) Input.empty >> pair ""; + +val parse_exn = + Args.text_input -- Scan.optional (Args.$$$ "of" |-- Args.text_input) Input.empty >> pair ""; + +val parse_type = + (Parse.type_args >> (fn [] => "" | [a] => a ^ " " | bs => enclose "(" ") " (commas bs))) -- + (Args.text_input -- Scan.optional (Args.$$$ "=" |-- Args.text_input) Input.empty); fun eval ctxt pos ml = ML_Context.eval_in (SOME ctxt) ML_Compiler.flags pos ml handle ERROR msg => error (msg ^ Position.here pos); fun make_text sep sources = let val (txt1, txt2) = apply2 (#1 o Input.source_content) sources; val is_ident = (case try List.last (Symbol.explode txt1) of NONE => false | SOME s => Symbol.is_ascii_letdig s); val txt = if txt2 = "" then txt1 else if sep = ":" andalso is_ident then txt1 ^ ": " ^ txt2 else txt1 ^ " " ^ sep ^ " " ^ txt2 in (txt, txt1) end; fun antiquotation_ml parse test kind show_kind binding index = Document_Output.antiquotation_raw binding (Scan.lift parse) - (fn ctxt => fn sources => + (fn ctxt => fn (txt0, sources) => let - val _ = apply2 ML_Lex.read_source sources |> test |> eval ctxt (Input.pos_of (#1 sources)); + val (ml1, ml2) = apply2 ML_Lex.read_source sources; + val ml0 = ML_Lex.read_source (Input.string txt0); + val _ = test (ml0 @ ml1, ml2) |> eval ctxt (Input.pos_of (#1 sources)); val sep = if kind = "type" then "=" else if kind = "exception" then "of" else ":"; val (txt, idx) = make_text sep sources; val main_text = Document_Output.verbatim ctxt - (if kind = "" orelse not show_kind then txt else kind ^ " " ^ txt); + ((if kind = "" orelse not show_kind then "" else kind ^ " ") ^ txt0 ^ txt); val index_text = index |> Option.map (fn def => let val ctxt' = Config.put Document_Antiquotation.thy_output_display false ctxt; val kind' = if kind = "" then " (ML)" else " (ML " ^ kind ^ ")"; val txt' = Latex.block [Document_Output.verbatim ctxt' idx, Latex.string kind']; val like = Document_Antiquotation.approx_content ctxt' idx; in Latex.index_entry {items = [{text = txt', like = like}], def = def} end); in Latex.block (the_list index_text @ [main_text]) end); fun antiquotation_ml0 test kind = antiquotation_ml parse_ml0 test kind false; fun antiquotation_ml1 parse test kind binding = antiquotation_ml parse test kind true binding (SOME true); in val _ = Theory.setup (Latex.index_variants (antiquotation_ml0 test_val "") \<^binding>\ML\ #> Latex.index_variants (antiquotation_ml0 test_op "infix") \<^binding>\ML_infix\ #> Latex.index_variants (antiquotation_ml0 test_type "type") \<^binding>\ML_type\ #> Latex.index_variants (antiquotation_ml0 test_struct "structure") \<^binding>\ML_structure\ #> Latex.index_variants (antiquotation_ml0 test_functor "functor") \<^binding>\ML_functor\ #> antiquotation_ml0 (K []) "text" \<^binding>\ML_text\ NONE #> antiquotation_ml1 parse_ml test_val "" \<^binding>\define_ML\ #> antiquotation_ml1 parse_ml test_op "infix" \<^binding>\define_ML_infix\ #> antiquotation_ml1 parse_type test_type "type" \<^binding>\define_ML_type\ #> antiquotation_ml1 parse_exn test_exn "exception" \<^binding>\define_ML_exception\ #> antiquotation_ml1 parse_ml0 test_struct "structure" \<^binding>\define_ML_structure\ #> antiquotation_ml1 parse_ml0 test_functor "functor" \<^binding>\define_ML_functor\); end; (* URLs *) val escape_url = translate_string (fn c => if c = "%" orelse c = "#" orelse c = "^" then "\\" ^ c else c); val _ = Theory.setup (Document_Output.antiquotation_raw_embedded \<^binding>\url\ (Scan.lift Args.embedded_input) (fn ctxt => fn source => let val url = Input.string_of source; val pos = Input.pos_of source; val delimited = Input.is_delimited source; val _ = Context_Position.reports ctxt [(pos, Markup.language_url delimited), (pos, Markup.url url)]; in Latex.enclose_block "\\url{" "}" [Latex.string (escape_url url)] end)); (* formal entities *) local fun entity_antiquotation name check bg en = Document_Output.antiquotation_raw name (Scan.lift Args.name_position) (fn ctxt => fn (name, pos) => let val _ = check ctxt (name, pos) in Latex.enclose_block bg en [Latex.string (Output.output name)] end); val _ = Theory.setup (entity_antiquotation \<^binding>\command\ Outer_Syntax.check_command "\\isacommand{" "}" #> entity_antiquotation \<^binding>\method\ Method.check_name "\\isa{" "}" #> entity_antiquotation \<^binding>\attribute\ Attrib.check_name "\\isa{" "}"); in end; end;