diff --git a/thys/Generalized_Counting_Sort/Algorithm.thy b/thys/Generalized_Counting_Sort/Algorithm.thy new file mode 100644 --- /dev/null +++ b/thys/Generalized_Counting_Sort/Algorithm.thy @@ -0,0 +1,2159 @@ +(* Title: An Efficient Generalization of Counting Sort for Large, possibly Infinite Key Ranges + Author: Pasquale Noce + Software Engineer at HID Global, Italy + pasquale dot noce dot lavoro at gmail dot com + pasquale dot noce at hidglobal dot com +*) + +section "Algorithm's description, analysis, and formalization" + +theory Algorithm + imports Main +begin + +text \ +\null + +\emph{This paper is dedicated to Gaia, my sweet niece, whose arrival has blessed me and my family +with joy and tenderness.} + +\null + +\emph{Moreover, I would like to thank my colleague Iacopo Rippa, who patiently listened to the ideas +underlying sections \ref{SEC1} and \ref{SEC3}, and helped me expand those ideas into complete proofs +by providing me with valuable hints and test data.} +\ + + +subsection "Introduction" + +subsubsection "Counting sort" + +text \ +\emph{Counting sort} is a well-known algorithm that sorts a collection of objects of any kind, as +long as each such object is associated with a signed integer key, according to their respective keys +(cf. \cite{R1}, \cite{R2}). If $xs$ is the input array containing $n$ objects to be sorted, $out$ is +the output, sorted array, and $key$ is the function mapping objects to keys, counting sort works as +follows (assuming arrays to be zero-based): + +\begin{enumerate} + +\item +Search the minimum key $mi$ and the maximum key $ma$ occurring within $xs$ (which can be done via a +single loop over $xs$). + +\item +Allocate an array $ns$ of $ma - mi + 2$ unsigned integers and initialize all its elements to 0. + +\item +For each $i$ from 0 to $n-1$, increase $ns[key(xs[i]) - mi + 1]$ by 1. + +\item +For each $i$ from 2 to $ma - mi$, increase $ns[i]$ by $ns[i - 1]$. + +\item +For each $i$ from 0 to $n-1$, set $out[ns[key(xs[i]) - mi]]$ to $xs[i]$ and increase +$ns[key(xs[i]) - mi]$ by 1. + +\end{enumerate} + +Steps 1 and 2 take $O(n)$ and $O(ma - mi)$ time, respectively. Step 3 counts how many times each +possible key occurs within $xs$, and takes $O(n)$ time. Step 4 computes the offset within $out$ of +the first object in $xs$, if any, having each possible key, and takes $O(ma - mi)$ time. Finally, +step 5 fills $out$, taking $O(n)$ time. Thus, the overall running time is $O(n) + O(ma - mi)$, and +the same is obviously true of memory space. + +If the range of all the keys possibly occurring within $xs$, henceforth briefly referred to as the +\emph{key range}, is known in advance, the first two steps can be skipped by using the minimum and +maximum keys in the key range as $mi$, $ma$ and pre-allocating (possibly statically, rather than +dynamically, in real-world implementations) an array $ns$ of size $ma - mi + 2$. However, this does +not affect the asymptotic running time and memory space required by the algorithm, since both keep +being $O(n) + O(ma - mi)$ independently of the distribution of the keys actually occurring in $xs$ +within the key range. + +As a result, counting sort is suitable for direct use, viz. not just as a subroutine of another +sorting algorithm such as radix sort, only if the key range is not significantly larger than $n$. +Indeed, if 100 objects with 100,000 possible keys have to be sorted, accomplishing this task by +allocating, and iterating over, an array of 100,000 unsigned integers to count keys' occurrences +would be quite impractical! Whence the question that this paper will try to answer: how can counting +sort be generalized for direct use in case of a large key range? + +Solving this problem clearly requires to renounce having one counter per key, rather using a bounded +number of counters, independent of the key range's cardinality, and partitioning the key range into +some number of intervals compatible with the upper bound on the counters' number. The resulting key +intervals will then form as many \emph{buckets}, and what will have to be counted is the number of +the objects contained in each bucket. + +Counting objects per bucket, rather than per single key, has the following major consequences, the +former good, the latter bad: + +\begin{itemize} + +\item +\emph{Keys are no longer constrained to be integers, but may rather be elements of any linear order, +even of infinite cardinality.} +\\In fact, in counting sort keys must be integers -- or anything else in one-to-one correspondence +with some subset of the integers, such as alphabet letters -- since this ensures that the key range +contains finitely many keys, so that finitely many counters are needed. Thus, the introduction of an +upper bound for the number of counters makes this constraint vanish. As a result, keys of any kind +are now allowed and the key range can even be infinite (mathematically, since any representation of +the key range on a computer will always be finite). Notably, rational or real numbers may be used as +keys, too. +\\This observation considerably extends the scope of application of the special case where function +$key$ matches the identity function. In counting sort, this option is viable only if the objects to +be sorted are themselves integers, whereas in the generalized algorithm it is viable whenever they +are elements of any linear order, which also happens if they are rational or real numbers. + +\item +\emph{Recursion needs to be introduced, since any bucket containing more than one object is in turn +required to be sorted.} +\\In fact, nothing prevents multiple objects from falling in the same bucket, and while this happens +sorting is not accomplished. Therefore, the generalized algorithm must provide for recursive rounds, +where each round splits any bucket containing multiple objects into finer-grained buckets containing +fewer objects. Recursion will then go on until every bucket contains at most one object, viz. until +there remains no counter larger than one. + +\end{itemize} + +Of course, the fewer recursive rounds are required to complete sorting, the more the algorithm will +be efficient, whence the following, fundamental question: how to minimize the number of the rounds? +That is to say, how to maximize the probability that, as a result of the execution of a round, there +be at most one object in each bucket, so that no more rounds be required? + +The intuitive answer is: first, by making the buckets equiprobable -- or at least, by making their +probabilities as much uniform as possible --, and second, by increasing the number of the buckets as +much as possible. Providing pen-and-paper proofs of both of these statements, and showing how they +can be enforced, is the purpose of the following sections. +\ + +subsubsection "Buckets' probability -- Proof" + +text \ +\label{SEC1} + +Suppose that $k$ objects be split randomly among $n$ equiprobable buckets, where $k \leq n$. This +operation is equivalent to selecting at random a sequence of $k$ buckets, possibly with repetitions, +so that the first object be placed into the first bucket of the sequence, the second object into the +second bucket, and so on. Thus, the probability $P$ that each bucket will contain at most one object +-- which will be called \emph{event E} in what follows -- is equal to the probability of selecting a +sequence without repetitions among all the possible sequences of $k$ buckets formed with the $n$ +given ones. + +Since buckets are assumed to be equiprobable, so are all such sequences. Hence, $P$ is equal to the +ratio of the number of the sequences without repetitions to the number of all sequences, namely: + +\begin{equation} +\label{EQ1} +P=\frac{n!}{(n-k)!n^k} +\end{equation} + +In the special case where $k = n$, this equation takes the following, simpler form: + +\begin{equation} +\label{EQ2} +P=\frac{n!}{n^n} +\end{equation} + +Now, suppose that the $n$ buckets be no longer equiprobable, viz. that they no longer have the same, +uniform probability $1/n$, rather having arbitrary, nonuniform probabilities $p_1$, ..., $p_n$. The +equation for probability $P$ applying to this case can be obtained through an iterative procedure, +as follows. + +Let $i$ be an index in the range 1 to $n$ such that $p_i$ is larger than $1/n$. After swapping index +$i$ for 1, let $x_1$ be the increment in probability $p_1$ with respect to $1/n$, so that $p_1 = +a_0/n + x_1$ with $a_0 = 1$ and $0 < x_1 \leq a_0(n-1)/n$ (as $p_1 = 1$ for $x_1 = a_0(n-1)/n$). +Then, let $P_1$ be the probability of event $E$ in case the first bucket has probability $p_1$ and +all the other $n-1$ buckets have the same, uniform probability $q_1 = a_0/n - x_1/(n-1)$. + +If $k < n$, event $E$ occurs just in case either all $k$ objects fall in as many distinct buckets +with probability $q_1$, or $k - 1$ objects do so whereas the remaining object falls in the bucket +with probability $p_1$. As these events, say $E_A$ and $E_B$, are incompatible, $P_1$ matches the +sum of their respective probabilities. + +Since all the possible choices of $k$ distinct buckets are mutually incompatible, while those of the +buckets containing any two distinct objects are mutually independent, the probability of event $E_A$ +is equal to the product of the following factors: + +\begin{itemize} + +\item +The number of the sequences without repetitions of $k$ buckets formed with the $n-1$ ones with +probability $q_1$, i.e. $(n-1)!/(n-1-k)! = (n-k)(n-1)!/(n-k)!$. + +\item +The probability of any such sequence, i.e. $q_1^k$. + +\end{itemize} + +By virtue of similar considerations, the probability of event $E_B$ turns out to match the product +of the following factors: + +\begin{itemize} + +\item +The number of the sequences without repetitions of $k - 1$ buckets formed with the $n-1$ ones with +probability $q_1$, i.e. $(n-1)!/(n-\bcancel{1}-k+\bcancel{1})! = (n-1)!/(n-k)!$. + +\item +The probability of any such sequence, i.e. $q_1^{k-1}$. + +\item +The number of the possible choices of the object falling in the first bucket, i.e. $k$. + +\item +The probability of the first bucket, i.e. $p_1$. + +\end{itemize} + +Therefore, $P_1$ is provided by the following equation: + +\begin{equation} +\label{EQ3} +\begin{split} +P_1&=\frac{(n-k)(n-1)!}{(n-k)!}\left(\frac{a_0}{n} - \frac{x_1}{n-1}\right)^k\\ +&\quad+k\frac{(n-1)!}{(n-k)!}\left(\frac{a_0}{n} - \frac{x_1}{n-1}\right)^{k-1} +\left(\frac{a_0}{n} + x_1\right) +\end{split} +\end{equation} + +The correctness of this equation is confirmed by the fact that its right-hand side matches that of +equation \eqref{EQ1} for $x_1 = 0$, since $P_1$ must degenerate to $P$ in this case. In fact, being +$a_0 = 1$, it results: + +\begin{align*} +&\frac{(n-k)(n-1)!}{(n-k)!}\left(\frac{a_0}{n} - 0\right)^k ++k\frac{(n-1)!}{(n-k)!}\left(\frac{a_0}{n} - 0\right)^{k-1}\left(\frac{a_0}{n} + 0\right)\\ +&\quad=(n-\bcancel{k}+\bcancel{k})\frac{(n-1)!}{(n-k)!}\left(\frac{a_0}{n}\right)^k\\ +&\quad=\frac{n!}{(n-k)!n^k} +\end{align*} + +If $k = n$, event $E_A$ is impossible, as there is no way to accommodate $n$ objects within $n-1$ +buckets without repetitions. Thus, $P_1$ is given by the following equation, derived by deleting the +first addend and replacing $k$ with $n$ in the right-hand side of equation \eqref{EQ3}: + +\begin{equation} +\label{EQ4} +P_1=n!\left(\frac{a_0}{n} - \frac{x_1}{n-1}\right)^{n-1}\left(\frac{a_0}{n} + x_1\right) +\end{equation} + +Likewise, the right-hand side of this equation matches that of equation \eqref{EQ2} for $x_1 = 0$, +which confirms its correctness. + +The conclusions reached so far can be given a concise form, suitable for generalization, through the +following definitions, where $i$ and $j$ are any two natural numbers such that $0 < k-j \leq n-i$ +and $a_i$ is some assigned real number: + +\begin{align*} +A_{i,j}&\equiv\frac{(n-i)!}{(n-i-k+j)!}\left(\frac{a_i}{n-i}\right)^{k-j}\\ +F_{i,j}&\equiv(k-j+1)A_{i,j}p_i\\ +G_{i,j}&\equiv A_{i,j-1}+F_{i,j} +\end{align*} + +Then, denoting the value of $P$ in the uniform probability case with $P_0$, and $a_0(n-1)/n - x_1$ +with $a_1$, so that $q_1 = a_1/(n-1)$, equations \eqref{EQ1}, \eqref{EQ3}, and \eqref{EQ4} can be +rewritten as follows: + +\begin{align} +\label{EQ5} +P_0&=A_{0,0}\\ +\label{EQ6} +P_1&= +\begin{cases} +G_{1,1}=A_{1,0}+kA_{1,1}p_1&\quad\text{if $k < n$,}\\ +F_{1,1}=kA_{1,1}p_1&\quad\text{if $k = n$.} +\end{cases} +\end{align} + +Even more than for their conciseness, these equations are significant insofar as they show that the +right-hand side of equation \eqref{EQ6} can be obtained from the one of equation \eqref{EQ5} by +replacing $A_{0,0}$ with either $G_{1,1}$ or $F_{1,1}$, depending on whether $k < n$ or $k = n$. + +If $p_i$ matches $q_1$ for any $i$ in the range 2 to $n$, $P = P_1$, thus $P$ is given by equation +\eqref{EQ6}. Otherwise, the procedure that has led to equation \eqref{EQ6} can be applied again. For +some index $i$ in the range 2 to $n$ such that $p_i$ is larger than $q_1$, swap $i$ for 2, and let +$x_2 = p_2 - a_1/(n-1)$, $a_2 = a_1(n-2)/(n-1) - x_2$, with $0 < x_2 \leq a_1(n-2)/(n-1)$. Moreover, +let $P_2$ be the probability of event $E$ if the first two buckets have probabilities $p_1$, $p_2$ +and the other $n-2$ buckets have the same probability $q_2 = a_2/(n-2)$. + +Then, reasoning as before, it turns out that the equation for $P_2$ can be obtained from equation +\eqref{EQ6} by replacing: + +\begin{itemize} + +\item +$A_{1,0}$ with $G_{2,1}$ or $F_{2,1}$, depending on whether $k < n-1$ or $k = n-1$, and + +\item +$A_{1,1}$ with $G_{2,2}$ or $F_{2,2}$, depending on whether $k-1 < n-1$, i.e. $k < n$, or $k-1 = +n-1$, i.e. $k = n$. + +\end{itemize} + +As a result, $P_2$ is provided by the following equation: + +\begin{equation} +\label{EQ7} +P_2= +\begin{cases} +G_{2,1}+kG_{2,2}p_1=A_{2,0}+kA_{2,1}p_2+k[A_{2,1}+(k-1)A_{2,2}p_2]p_1\\ +\quad\text{if $k < n-1$,}\\ +F_{2,1}+kG_{2,2}p_1=kA_{2,1}p_2+k[A_{2,1}+(k-1)A_{2,2}p_2]p_1\\ +\quad\text{if $k = n-1$,}\\ +kF_{2,2}p_1=k(k-1)A_{2,2}p_2p_1\\ +\quad\text{if $k = n$.} +\end{cases} +\end{equation} + +Since the iterative procedure used to derive equations \eqref{EQ6} and \eqref{EQ7} can be further +applied as many times as required, it follows that for any nonuniform probability distribution +$p_1$, ..., $p_n$, the equation for $P$ can be obtained from equation \eqref{EQ5} with $n-1$ steps +at most, where each step consists of replacing terms of the form $A_{i,j}$ with terms of either form +$G_{i+1,j+1}$ or $F_{i+1,j+1}$, depending on whether $k-j < n-i$ or $k-j = n-i$. + +Let us re-use letters $n$, $k$ in lieu of $n-i$ and $k-j$, and use letters $a$, $x$ as aliases for +$a_i$ and $x_{i+1}$. Then, any aforesaid replacement is equivalent to the insertion of either of the +following expressions, regarded as images of as many functions $G$, $F$ of real variable $x$: + +\begin{align} +\nonumber +G(x)&=\frac{(n-k)(n-1)!}{(n-k)!}\left(\frac{a}{n} - \frac{x}{n-1}\right)^k\\ +\label{EQ8} +&\quad+k\frac{(n-1)!}{(n-k)!}\left(\frac{a}{n} - \frac{x}{n-1}\right)^{k-1} +\left(\frac{a}{n} + x\right) +&& \text{for $k < n$,}\\ +\label{EQ9} +F(x)&=n!\left(\frac{a}{n} - \frac{x}{n-1}\right)^{n-1}\left(\frac{a}{n} + x\right) +&& \text{for $k = n$} +\end{align} + +in place of the following expression: + +\begin{equation} +\label{EQ10} +\frac{n!}{(n-k)!}\left(\frac{a}{n}\right)^k= +\begin{cases} +G(0)&\quad\text{if $k < n$,}\\ +F(0)&\quad\text{if $k = n$.} +\end{cases} +\end{equation} + +Equation \eqref{EQ10} can be obtained from equations \eqref{EQ8} and \eqref{EQ9} in the same way as +equations \eqref{EQ3} and \eqref{EQ4} have previously been shown to match equations \eqref{EQ1} and +\eqref{EQ2} for $x_1 = 0$. + +Since every such replacement takes place within a sum of nonnegative terms, $P$ can be proven to be +increasingly less than $P_0$ for increasingly nonuniform probability distributions -- which implies +that the probability of event $E$ is maximum in case of equiprobable buckets -- by proving that +functions $G$ and $F$ are strictly decreasing in $[0, b]$, where $b = a(n-1)/n$. + +The slopes of the segments joining points $(0, G(0))$, $(b, G(b))$ and $(0, F(0))$, $(b, F(b))$ are: + +\begin{align*} +\frac{G(b)-G(0)}{b-0}&=\frac{0-\dfrac{n!}{(n-k)!}\left(\dfrac{a}{n}\right)^k}{b}<0\text{,}\\ +\frac{F(b)-F(0)}{b-0}&=\frac{0-n!\left(\dfrac{a}{n}\right)^n}{b}<0\text{.} +\end{align*} + +Therefore, by Lagrange's mean value theorem, there exist $c, d \in (0, b)$ such that $G'(c) < 0$ and +$F'(d) < 0$. On the other hand, it is: + +\begin{align*} +G'(x)&=-k\frac{(n-1)!}{(n-k)!}\frac{n-k}{n-1}\left(\frac{a}{n}-\frac{x}{n-1}\right)^{k-1}\\ +&\quad-k\frac{(n-1)!}{(n-k)!}\frac{k-1}{n-1}\left(\frac{a}{n}-\frac{x}{n-1}\right)^{k-2} +\left(\frac{a}{n}+x\right)\\ +&\quad+k\frac{(n-1)!}{(n-k)!}\left(\frac{a}{n}-\frac{x}{n-1}\right)^{k-1}\text{,}\\ +F'(x)&=-n!\left(\frac{a}{n}-\frac{x}{n-1}\right)^{n-2} +\left(\frac{a}{n}+x\right) ++n!\left(\frac{a}{n}-\frac{x}{n-1}\right)^{n-1}\text{.} +\end{align*} + +Thus, solving equations $G'(x) = 0$ and $F'(x) = 0$ for $x \neq b$, viz. for $a/n - x/(n-1) \neq 0$, +it results: + +\begin{align*} +&G'(x)=0\\ +&\quad\Rightarrow\bcancel{k\frac{(n-1)!}{(n-k)!}\left(\frac{a}{n}-\frac{x}{n-1}\right)^{k-2}} +\biggl[\frac{k-n}{n-1}\left(\frac{a}{n}-\frac{x}{n-1}\right)\\ +&\quad\quad\quad+\frac{1-k}{n-1}\left(\frac{a}{n}+x\right)+\frac{a}{n}-\frac{x}{n-1}\biggr]=0\\ +&\quad\Rightarrow\frac{1}{\bcancel{n(n-1)^2}}\bigl\{(k-n)[(n-1)a-nx]+(1-k)[(n-1)a+n(n-1)x]\\ +&\quad\quad\quad+(n-1)^2a-n(n-1)x\bigr\}=0\\ +&\quad\Rightarrow\bcancel{k(n-1)a}-knx-n(n-1)a+n^2x+(n-1)a+\bcancel{n(n-1)x}\\ +&\quad\quad\quad-\bcancel{k(n-1)a}-kn(n-1)x+(n-1)^2a-\bcancel{n(n-1)x}=0\\ +&\quad\Rightarrow-\bcancel{knx}-\bcancel{n^2a}+\bcancel{na}+n^2x+\bcancel{na}-\bcancel{a} +-kn^2x+\bcancel{knx}+\bcancel{n^2a}-\bcancel{2na}+\bcancel{a}=0\\ +&\quad\Rightarrow\bcancel{n^2(1-k)}x=0\\ +&\quad\Rightarrow x=0\text{,} +\end{align*} + +\begin{align*} +&F'(x)=0\\ +&\quad\Rightarrow\bcancel{n!\left(\frac{a}{n}-\frac{x}{n-1}\right)^{n-2}} +\left(-\bcancel{\frac{a}{n}}-x+\bcancel{\frac{a}{n}}-\frac{x}{n-1}\right)=0\\ +&\quad\Rightarrow\bcancel{\frac{n}{1-n}}x=0\\ +&\quad\Rightarrow x=0\text{.} +\end{align*} + +Hence, there is no $x \in (0, b)$ such that $G'(x) = 0$ or $F'(x) = 0$. Moreover, if there existed +$y, z \in (0, b)$ such that $G'(y) > 0$ or $F'(z) > 0$, by Bolzano's theorem there would also exist +$u, v$ in the open intervals with endpoints $c, y$ and $d, z$, both included in $(0, b)$, such that +$G'(u) = 0$ or $F'(v) = 0$, which is not the case. Therefore, $G'(x)$ and $F'(x)$ are negative for +any $x \in (0, b)$, so that functions $G$ and $F$ are strictly decreasing in $[0, b]$, Q.E.D.. +\ + +subsubsection "Buckets' probability -- Implementation" + +text \ +\label{SEC2} + +Given $n > 1$ buckets, numbered with indices 0 to $n - 1$, and a finite set $A$ of objects having +minimum key $mi$ and maximum key $ma$, let $E(k)$, $I(mi,ma)$ be the following events, defined as +subsets of the whole range $R$ of function $key$, with $k$ varying over $R$: + +\begin{align*} +E(k)&\equiv\{k'\in R.\;k'\leq k\}\\ +I(mi,ma)&\equiv\{k'\in R.\;mi\leq k'\leq ma\} +\end{align*} + +Furthermore, define functions $r$, $f$ as follows: + +\begin{align*} +r(k,n,mi,ma)&\equiv (n-1)\cdot P(E(k)\;|\;I(mi,ma))\\ +f(k,n,mi,ma)&\equiv floor(r(k,n,mi,ma)) +\end{align*} + +where $P(E(k)\;|\;I(mi,ma))$ denotes the conditional probability of event $E(k)$, viz. for a key not +to be larger than $k$, given event $I(mi,ma)$, viz. if the key is comprised between $mi$ and $ma$. + +Then, the buckets' probabilities can be made as much uniform as possible by placing each object $x +\in A$ into the bucket whose index matches the following value: + +\begin{equation*} +index(key,x,n,mi,ma)\equiv f(key(x),n,mi,ma) +\end{equation*} + +For example, given $n = 5$ buckets, suppose that the image of set $A$ under function $key$ consists +of keys $k_1 = mi$, $k_2$, ..., $k_9 = ma$, where the conditional probabilities for a key comprised +between $k_1$ and $k_9$ to match each of these keys have the following values: + +\begin{align*} +P_1&=0.05,\\ +P_2&=0.05,\\ +P_3&=0.15,\\ +P_4&=0.075,\\ +P_5&=0.2,\\ +P_6&=0.025,\\ +P_7&=0.1,\\ +P_8&=0.25,\\ +P_9&=0.1 +\end{align*} + +Evidently, there is no way of partitioning set $\{k_1, ..., k_9\}$ into five equiprobable subsets +comprised of contiguous keys. However, it results: + +\begin{equation*} +floor\left(4\cdot\sum_{i=1}^{n}P_i\right)= +\begin{cases} +0&\quad\text{for $n = 1, 2$,}\\ +1&\quad\text{for $n = 3, 4$,}\\ +2&\quad\text{for $n = 5, 6, 7$,}\\ +3&\quad\text{for $n = 8$,}\\ +4&\quad\text{for $n = 9$.} +\end{cases} +\end{equation*} + +Hence, in spite of the highly nonuniform distribution of the keys' probabilities -- key $k_8$'s +probability is 10 times that of key $k_6$ --, function $index$ manages all the same to split the +objects in $A$ so as to make the buckets' probabilities more uniform -- with the maximum one being +about 3 times the minimum one --, as follows: + +\begin{itemize} + +\item +Bucket 0 has probability 0.1, as it collects the objects with keys $k_1$, $k_2$. + +\item +Bucket 1 has probability 0.225, as it collects the objects with keys $k_3$, $k_4$. + +\item +Bucket 2 has probability 0.325, as it collects the objects with keys $k_5$, $k_6$, $k_7$. + +\item +Bucket 3 has probability 0.25, as it collects the objects with key $k_8$. + +\item +Bucket 4 has probability 0.1, as it collects the objects with key $k_9$. + +\end{itemize} + +Remarkably, function $index$ makes the buckets' probabilities exactly or almost uniform -- meaning +that the maximum one is at most twice the minimum nonzero one, possibly except for the last bucket +alone -- in the following most common, even though special, cases: + +\begin{enumerate} + +\item +$I(mi,ma)$ is a finite set of equiprobable keys. + +\item +$I(mi,ma)$ is a closed interval of real numbers, i.e. $I(mi,ma) = [mi,ma] \subset \mathbb{R}$, with +$P(\{mi\}) = 0$, and function $r$ is continuous for $k \in [mi,ma]$. + +\end{enumerate} + +In case 1, let $m$ be the cardinality of $I(mi,ma)$. It is $m > 0$ since $mi \in I(mi,ma)$, so that +each key in $I(mi,ma)$ has probability $1/m$. + +If $m \leq n - 1$, then $(n - 1)/m \geq 1$, thus $f$ is nonzero and strictly increasing for $k \in +I(mi,ma)$. Thus, in this subcase function $index$ fills exactly $m$ buckets, one for each single key +in $I(mi,ma)$, whereas the remaining $n - m$ buckets, particularly the first one, are left unused. +Therefore, every used bucket has probability $1/m$. + +If $m > n - 1$ and $m$ is divisible by $n - 1$, let $q > 1$ be the quotient of the division, so that +$m = q(n - 1)$. Dividing both sides of this equation by $m(n - 1)$, it turns out that $1/(n - 1) = +q/m$, and then $1/(n - 1) - 1/m = (q - 1)/m$. Hence, $f$ matches zero for the first $q - 1$ keys in +$I(mi,ma)$, increases by one for each of the $n - 2$ subsequent groups of $q$ contiguous keys, and +reaches value $n - 1$ in correspondence with the last key. Indeed, $q - 1 + q(n - 2) + 1 = q + +q(n - 2) = q(n - 1) = m$. + +Consequently, in this subcase function $index$ places the objects mapped to the first $q - 1$ keys +into the first bucket -- which then has probability $(q - 1)/m$ --, the objects mapped to the $i$-th +subsequent group of $q$ keys, where $1 \leq i \leq n - 2$, into the bucket with index $i$ -- which +then has probability $q/m$ -- and the objects mapped to the last key into the last bucket -- which +then has probability $1/m$ --. Since $2(q - 1)/m = 2q/m - 2/m \geq 2q/m - q/m = q/m$, the maximum +probability is at most twice the minimum one, excluding the last bucket if $q > 2$. + +If $m > n - 1$ and $m$ is not divisible by $n - 1$, let $q$, $r$ be the quotient and the remainder +of the division, where $q > 0$ and $n - 1 > r > 0$. For any $i > 0$, it is: + +\begin{align} +\nonumber +&m=q(n-1)+r\\ +\nonumber +&\quad\Rightarrow\frac{\bcancel{m}}{\bcancel{m}(n-1)}= +\frac{q\bcancel{(n-1)}}{m\bcancel{(n-1)}}+\frac{r}{m(n-1)}\\ +\nonumber +&\quad\Rightarrow\frac{i}{n-1}=\frac{iq}{m}+i\frac{r}{m(n-1)}\\ +\label{EQ11} +&\quad\Rightarrow\frac{iq}{m}=\frac{i}{n-1}-\left(i\frac{r}{n-1}\right)\frac{1}{m}\\ +\label{EQ12} +&\quad\Rightarrow\frac{iq+1}{m}=\frac{i}{n-1}+\left(1-i\frac{r}{n-1}\right)\frac{1}{m} +\end{align} + +Both equations \eqref{EQ11} and \eqref{EQ12} have something significant to say for $i = 1$. + +Equation \eqref{EQ11} takes the following form: + +\begin{equation*} +\frac{q}{m}=\frac{1}{n-1}-\left(\frac{r}{n-1}\right)\frac{1}{m} +\end{equation*} + +where $r/(n - 1) > 0$, so that $q/m < 1/(n - 1)$. This implies that, if $k$ is the first key in +$I(mi,ma)$ for which $f$ matches any given value, the subsequent $q - 1$ keys are never sufficient +to increase $f$ by one. Thus, function $index$ fills every bucket but the last one -- which collects +the objects mapped to the last key only -- with the objects mapped to $1 + q - 1 = q$ keys at least. + +For its part, equation \eqref{EQ12} takes the following form: + +\begin{equation*} +\frac{q+1}{m}=\frac{1}{n-1}+\left(1-\frac{r}{n-1}\right)\frac{1}{m} +\end{equation*} + +where $1 - r/(n - 1) > 0$, so that $(q + 1)/m > 1/(n - 1)$. Therefore, the $q$ keys following any +aforesaid key $k$ are always sufficient to increase $f$ by one. Hence, function $index$ fills every +bucket with the objects mapped to $1 + q = q + 1$ keys at most. A further consequence is that $f$ +changes from zero to one for $k$ matching the $(q + 1)$-th key in $I(mi,ma)$, which entails that the +first bucket collects the objects mapped to exactly the first $q$ keys. + +Which is the first $i_1$, if any, such that the bucket with index $i_1$ collects the objects mapped +to $q + 1$, rather than $q$, keys? Such bucket, if any, is preceded by $i_1$ buckets (as indices are +zero-based), whose total probability is $i_1q/m$ (as each of those buckets accommodates a group of +$q$ keys). So, $i_1$ is the least index, if any, such that $0 < i_1 < n - 1$ and $[(i_1 + 1)q + 1]/m +< (i_1 + 1)/(n - 1)$. Rewriting the latter inequality using equation \eqref{EQ12}, it results: + +\begin{align*} +&\bcancel{\frac{i_1+1}{n-1}}+\left[1-(i_1+1)\frac{r}{n-1}\right]\frac{1}{m}< +\bcancel{\frac{i_1+1}{n-1}}\\ +&\quad\Rightarrow\left[1-(i_1+1)\frac{r}{n-1}\right]\bcancel{\frac{1}{m}}<0\\ +&\quad\Rightarrow(i_1+1)\frac{r}{n-1}>1\\ +&\quad\Rightarrow i_1>\frac{n-1}{r}-1 +\end{align*} + +where $(n - 1)/r - 1 > 0$ since $r < n - 1$. Hence, index $i_1$ there exists just in case: + +\begin{align*} +&\frac{n-1}{r}-11 +\end{align*} + +Likewise, let $i_2$ be the next index, if any, such that the bucket with index $i_2$ accommodates a +group of $q + 1$ keys. Such bucket, if any, is preceded by $i_2 - 1$ buckets accommodating $q$ keys +and one bucket accommodating $q + 1$ keys, whose total probability is $(i_2q + 1)/m$. Thus, $i_2$ is +the least index, if any, such that $i_1 < i_2 < n - 1$ and $[(i_2 + 1)q + 2]/m < (i_2 + 1)/(n - 1)$. +Adding term $1/m$ to both sides of equation \eqref{EQ12}, the latter inequality can be rewritten as +follows: + +\begin{align*} +&\bcancel{\frac{i_2+1}{n-1}}+\left[2-(i_2+1)\frac{r}{n-1}\right]\frac{1}{m}< +\bcancel{\frac{i_2+1}{n-1}}\\ +&\quad\Rightarrow\left[2-(i_2+1)\frac{r}{n-1}\right]\bcancel{\frac{1}{m}}<0\\ +&\quad\Rightarrow(i_2+1)\frac{r}{n-1}>2\\ +&\quad\Rightarrow i_2>\frac{2(n-1)}{r}-1 +\end{align*} + +where $2(n - 1)/r - 1 > [(n - 1)/r - 1] + 1 \geq i_1$. Hence, index $i_2$ there exists just in case: + +\begin{align*} +&\frac{2(n-1)}{r}-12 +\end{align*} + +To sum up, in this subcase function $index$ turns out to work as follows: + +\begin{itemize} + +\item +The $r - 1$ buckets whose indices $i_j$ match the least solutions of inequalities $i_j > j(n - 1)/r +- 1$, for $1 \leq j \leq r - 1$, accommodate a group of $q + 1$ contiguous keys each, so that each +one has probability $(q + 1)/m$. + +\item +The other $n - 1 - (r - 1) = n - r$ buckets excluding the last one, particularly the first bucket, +accommodate a group of $q$ contiguous keys each, so that each one has probability $q/m$. + +\item +The last bucket accommodates the last key alone, so that its probability is $1/m$. + +\end{itemize} + +Indeed, $(q + 1)(r - 1) + q(n - r) + 1 = \bcancel{qr} - q + r - \bcancel{1} + qn - \bcancel{qr} + +\bcancel{1} = q(n - 1) + r = m$. Furthermore, being $2q/m \geq (q + 1)/m$, the maximum value among +buckets' probabilities is at most twice the minimum one, excluding the last bucket if $q > 2$. + +Two further observations can be made concerning case 1. First, if $m > n - 1$, then the larger $q$ +gets, the more efficient it becomes to use the buckets' number $n$ itself instead of $n - 1$ within +function $r$, placing the objects with index $n$, viz. mapped to the last key, into the bucket with +index $n - 1$. In fact, this ensures that all the buckets have almost uniform probabilities rather +than leaving a bucket, the last one, with a small, or even negligible, probability. + +Second, if keys are integers and $I(mi,ma)$ includes all the integers comprised between $mi$ and +$ma$, it is $m = ma - mi + 1$, whereas the cardinality of set $E(k) \cap I(mi,ma)$ is $k - mi + 1$ +for any $k \in I(mi,ma)$. Therefore, it results: + +\begin{equation*} +r(k,n,mi,ma)=(n-1)\frac{k - mi + 1}{ma - mi + 1}, +\end{equation*} + +so that function $r$ resembles the approximate rank function $R$ described in \cite{R3}. + +In case 2, let $Z$ be the set of the integers $i$ such that $0 \leq i \leq n-1$. As $r(k,n,mi,ma)$ +matches 0 for $k = mi$ and $n-1$ for $k = ma$, by the intermediate value theorem, for each $i \in Z$ +there exists a least $k_i \in [mi,ma]$ such that $r(k_i,n,mi,ma) = i$, where $k_0 = mi$. Then, let +$B_i = [k_i, k_{i+1})$ for each $i \in Z - \{n-1\}$ and $B_{n-1} = [k_{n-1}, ma]$. + +For any $i \in Z - \{n-1\}$, $k \in B_i$, it is $r(k,n,mi,ma) \neq i+1$, since otherwise there would +exist some $k < k_{i+1}$ in $[mi,ma]$ such that $r(k,n,mi,ma) = i+1$. On the other hand, being $k < +k_{i+1}$, it is $r(k,n,mi,ma) \leq i+1$, since function $r$ is increasing with respect to variable +$k$. Hence, it turns out that $r(k,n,mi,ma) < i+1$. Moreover, the monotonicity of $r$ also implies +that $r(k,n,mi,ma) \geq i$. Therefore, it is $f(k,n,mi,ma) = i$, so that for any $i \in Z$, function +$index$ fills the bucket with index $i$ with the objects mapped to the keys in $B_i$. + +Consequently, for each $i \in Z - \{n-1\}$, the probability of the bucket with index $i$ is: + +\begin{align*} +&P(B_i\;|\;I(mi,ma))\\ +&\quad=\frac{P(B_i\cap I(mi,ma))}{P(I(mi,ma))}\\ +&\quad=\frac{P((k_i,k_{i+1}]\cap I(mi,ma))}{P(I(mi,ma))}\\ +&\quad=\frac{P(E(k_{i+1})\cap I(mi,ma))-P(E(k_i)\cap I(mi,ma))}{P(I(mi,ma))}\\ +&\quad=\frac{P(E(k_{i+1})\cap I(mi,ma))}{P(I(mi,ma))}-\frac{P(E(k_i)\cap I(mi,ma))}{P(I(mi,ma))}\\ +&\quad=P(E(k_{i+1})\;|\;I(mi,ma))-P(E(k_i)\;|\;I(mi,ma))\\ +&\quad=\frac{(n-1)\cdot P(E(k_{i+1})\;|\;I(mi,ma))-(n-1)\cdot P(E(k_i)\;|\;I(mi,ma))}{n-1}\\ +&\quad=\frac{r(k_{i+1},n,mi,ma)-r(k_i,n,mi,ma)}{n-1}\\ +&\quad=\frac{\bcancel{i}+1-\bcancel{i}}{n-1}\\ +&\quad=\frac{1}{n-1} +\end{align*} + +Observe that the computation uses: + +\begin{itemize} + +\item +The definition of conditional probability. + +\item +The fact that events $B_i$ and $(k_i, k_{i+1}]$ differ by singletons $\{k_i\}$ and $\{k_{i+1}\}$, +whose probability is zero. +\\Indeed, it is $P(\{k_0\}) = P(\{mi\}) = 0$ by hypothesis, whereas for any $k \in (mi,ma]$, it is +$P(\{k\}) = 0$ due to the continuity of function $r$, and then of function $P(E(k) \cap I(mi,ma))$, +in point $k$. In fact, for any $k' \in (mi,k)$ it is $E(k') \cap I(mi,ma) = [mi,k'] \subset [mi,k)$, +so that $P(E(k') \cap I(mi,ma)) \leq P([mi,k))$. However, it is also $E(k) \cap I(mi,ma) = [mi,k] = +[mi,k) \cup \{k\}$, so that $P(E(k) \cap I(mi,ma)) = P([mi,k)) + P(\{k\})$. Thus, if $P(\{k\}) > 0$, +then $P(E(k) \cap I(mi,ma)) > P([mi,k))$, in contradiction with the assumption that: + +\begin{equation*} +\lim_{k'\to k^-}P(E(k') \cap I(mi,ma))=P(E(k) \cap I(mi,ma)) +\end{equation*} + +\item +The fact that event $E(k_{i+1}) \cap I(mi,ma)$ is equal to the union of the disjoint events $E(k_i) +\cap I(mi,ma)$ and $(k_i, k_{i+1}] \cap I(mi,ma)$, so that the probability of the former event is +equal to the sum of the probabilities of the latter ones. + +\end{itemize} + +As a result, all the buckets but the last one are equiprobable, whereas the last one has probability +zero. Thus, in this case it is again more efficient to replace $n - 1$ with $n$ within function $r$, +assigning the objects with index $n$, viz. mapped to the keys falling in $B_n$, to the bucket with +index $n - 1$, which ensures that all the buckets have uniform probabilities. + +If function $r$ is linear for $k \in [mi,ma]$, viz. if interval $[mi,ma]$ is endowed with a constant +probability density, then the function's graph (with factor $n - 1$ replaced by $n$) is the straight +line passing through points $(mi,0)$ and $(ma,n)$. Therefore, it results: + +\begin{equation*} +r(k,n,mi,ma)=n\frac{k - mi}{ma - mi}, +\end{equation*} + +so that function $r$ matches the approximate rank function $R$ described in \cite{R3}. +\ + +subsubsection "Buckets' number -- Proof" + +text \ +\label{SEC3} + +Given $n$ equiprobable buckets and $k$ objects to be partitioned randomly among such buckets, where +$1 < k \leq n$, the probability $P_{n,k}$ that each bucket will contain at most one object is given +by equation \eqref{EQ1}, namely: + +\begin{equation*} +P_{n,k}=\frac{n!}{(n-k)!n^k} +\end{equation*} + +Thus, it is: + +\begin{align*} +&P_{n+1,k}-P_{n,k}\\ +&\quad=\frac{(n+1)!}{(n-k+1)(n-k)!(n+1)^k}-\frac{n!}{(n-k)!n^k}\\ +&\quad=\frac{(n+1)!n^k-n!(n-k+1)(n+1)^k}{(n-k+1)(n-k)!n^k(n+1)^k} +\end{align*} + +Using the binomial theorem and Pascal's rule, the numerator of the fraction in the right-hand side +of this equation can be expressed as follows: + +\begin{align*} +&(n+1)!n^k-(n-k+1)n!(n+1)^k\\ +&\quad=n!(n+1)n^k+n!k(n+1)^k-n!n(n+1)^k-n!(n+1)^k\\ +&\quad=n!n^{k+1}+n!n^k\\ +&\quad\quad+n!k\left[n^k+\binom{k}{1}n^{k-1}+\binom{k}{2}n^{k-2}+... ++\binom{k}{k-1}n+\binom{k}{k}\right]\\ +&\quad\quad-n!n\left[n^k+\binom{k}{1}n^{k-1}+\binom{k}{2}n^{k-2}+... ++\binom{k}{k-1}n+\binom{k}{k}\right]\\ +&\quad\quad-n!\left[n^k+\binom{k}{1}n^{k-1}+\binom{k}{2}n^{k-2}+... ++\binom{k}{k-1}n+\binom{k}{k}\right]\\ +&\quad=\bcancel{n!n^{k+1}}+\bcancel{n!n^k}+\bcancel{n!kn^k}\\ +&\quad\quad+n!k\binom{k}{1}n^{k-1}+n!k\binom{k}{2}n^{k-2}+... ++n!k\binom{k}{k-1}n+n!k\\ +&\quad\quad-\bcancel{n!n^{k+1}}-\bcancel{n!kn^k}-n!\binom{k}{2}n^{k-1}-... +-n!\binom{k}{k-1}n^2-n!\binom{k}{k}n\\ +&\quad\quad-\bcancel{n!n^k}-n!\binom{k}{1}n^{k-1}-n!\binom{k}{2}n^{k-2}-... +-n!\binom{k}{k-1}n-n!\\ +&\quad=n!(k-1)+n!n^{k-1}\left[k\binom{k}{1}-\binom{k}{1}-\binom{k}{2}\right]+...\\ +&\quad\quad+n!n\left[k\binom{k}{k-1}-\binom{k}{k-1}-\binom{k}{k}\right]\\ +&\quad=n!(k-1)+n!\cdot\sum_{i=1}^{k-1} +n^{k-i}\left\{k\binom{k}{i}-\left[\binom{k}{i}+\binom{k}{i+1}\right]\right\}\\ +&\quad=n!(k-1)+n!\cdot\sum_{i=1}^{k-1} +n^{k-i}\left[k\binom{k}{i}-\binom{k+1}{i+1}\right]\\ +&\quad=n!(k-1)+n!\cdot\sum_{i=1}^{k-1} +n^{k-i}\left[\frac{kk!}{i!(k-i)!}-\frac{(k+1)!}{(i+1)i!(k-i)!}\right]\\ +&\quad=n!(k-1)+n!\cdot\sum_{i=1}^{k-1} +n^{k-i}k!\frac{(i+1)k-(k+1)}{(i+1)i!(k-i)!}\\ +&\quad=n!(k-1)+n!\cdot\sum_{i=1}^{k-1} +n^{k-i}k!\frac{ik-1}{(i+1)i!(k-i)!}>0 +\end{align*} + +Therefore, for any fixed $k > 1$, sequence $(P_{n,k})_{n \geq k}$ is strictly increasing, viz. the +larger $n$ is, such is also the probability that each of the $n$ equiprobable buckets will contain +at most one of the $k$ given objects. + +Moreover, it is $P_{n,k} = [n(n-1)(n-2)(n-3)...(n-k+1)]/n^k < n^k/n^k = 1$, as the product enclosed +within the square brackets comprises $k$ factors, one equal to $n$ and the other ones less than $n$. + +On the other hand, it turns out that: + +\begin{align*} +&n(n-1)(n-2)(n-3)...(n-k+1)\\ +&\quad=n^2[(n-2)(n-3)...(n-k+1)]-n[(n-2)(n-3)...(n-k+1)]\\ +&\quad\geq n^2[(n-2)(n-3)...(n-k+1)]-n\cdot n^{k-2}\\ +&\quad=n[n(n-2)(n-3)...(n-k+1)]-n^{k-1}\\ +&\quad=n\cdot n^2[(n-3)...(n-k+1)]-n\cdot 2n[(n-3)...(n-k+1)]-n^{k-1}\\ +&\quad\geq n^3[(n-3)...(n-k+1)]-2n^2\cdot n^{k-3}-n^{k-1}\\ +&\quad=n^2[n(n-3)...(n-k+1)]-(1+2)n^{k-1}\;... +\end{align*} + +Hence, applying the same line of reasoning until the product within the square brackets disappears, +it results: + +\begin{align*} +&n(n-1)(n-2)(n-3)...(n-k+1)\\ +&\quad\geq n^k-[1+2+...+(k-1)]n^{k-1}\\ +&\quad=n^k-\frac{k(k-1)}{2}n^{k-1}, +\end{align*} + +so that: + +\begin{equation*} +P_{n,k}=\frac{n(n-1)(n-2)(n-3)...(n-k+1)}{n^k}\geq 1-\frac{k(k-1)}{2n} +\end{equation*} + +Therefore, for any fixed $k > 1$, the terms of sequence $(P_{n,k})_{n \geq k}$ are comprised between +the corresponding terms of sequence $(1 - k(k-1)/2n)_{n \geq k}$ and constant sequence $(1)_{n \geq +k}$. Since both of these sequences converge to 1, by the squeeze theorem it is: + +\begin{equation*} +\lim_{n\to\infty}P_{n,k}=1, +\end{equation*} + +viz. the larger $n$ is, the closer to 1 is the probability that each of the $n$ equiprobable buckets +will contain at most one of the $k$ given objects. + +As a result, the probability of placing at most one object into each bucket in any algorithm's round +is maximized by increasing the number of the buckets as much as possible, Q.E.D.. +\ + +subsubsection "Buckets' number -- Implementation" + +text \ +\label{SEC4} + +Let $n$ be the number of the objects to be sorted, and $p$ the upper bound on the counters' number +-- and then on the buckets' number as well, since there must be exactly one counter per bucket --. +This means that before the round begins, the objects to be split are located in $m$ buckets $B_1$, +..., $B_m$, where $0 < m \leq p$, respectively containing $n_1$, ..., $n_m$ objects, where $n_i > 0$ +for each $i$ from 1 to $m$ and $n_1 + ... + n_m = n$. + +Moreover, let $c$ be the number of the objects known to be the sole elements of their buckets, viz. +to require no partition into finer-grained buckets, at the beginning of a given algorithm's round. +Then, the number of the objects requiring to be split into finer-grained buckets in that round is +$n - c$, whereas the number of the available buckets is $p - c$, since $c$ counters must be left to +store as many 1s, one for each singleton bucket. + +How to compute $c$? At first glance, the answer seems trivial: by counting, among the counters input +(either by the algorithm's caller or by the previous round) to the round under consideration, those +that match 1. However, this value would not take into account the fact that, for each non-singleton +bucket, the algorithm must find the leftmost occurrence of the minimum key, as well as the rightmost +occurrence of the maximum key, and place the corresponding objects into two new singleton buckets, +which shall be the first and the last finer-grained bucket, respectively. + +The most fundamental reason for this is that, as a result of the partition of such a bucket, nothing +prevents all its objects from falling in the same finer-grained bucket -- particularly, this happens +whenever all its objects have the same key --, in which case the algorithm does not terminate unless +some object is removed from the bucket prior to the partition, so as to reduce its size. Just as +clearly, the algorithm must know where to place the finer-grained buckets containing the removed +objects with respect to the finer-grained buckets resulting from the partition. This is exactly what +is ensured by removing objects with minimum or maximum keys, whereas selecting the leftmost or the +rightmost ones, respectively, preserves the algorithm's stability. + +Actually, the algorithm's termination requires the removal of at least one object per non-singleton +bucket, so the removal of one object only, either with minimum or maximum key, would be sufficient. +Nonetheless, the leftmost minimum and the rightmost maximum can be searched via a single loop, and +finding both of them enables to pass them as inputs to the function $index$ described in section +\ref{SEC2}, or to whatever other function used to split buckets into finer-grained ones. Moreover, +non-singleton buckets whose objects all have the same key can be detected as those whose minimum and +maximum keys are equal. This allows to optimize the algorithm by preventing it from unnecessarily +applying multiple recursive rounds to any such bucket; it shall rather be left as is, just replacing +its counter with as many 1s as its size to indicate that it is already sorted. + +Therefore, as the round begins, the objects already known to be placed in singleton buckets are one +for each bucket whose counter matches 1, and two for each bucket whose counter is larger than 1. As +a result, $c$ shall be computed as follows. First, initialize $c$ to zero. Then, for each $i$ from 1 +to $m$, increase $c$ by one if $n_i = 1$, by two otherwise. + +Conversely, for any such $i$, the number $N_i$ of the objects contained in bucket $B_i$ having to be +partitioned into finer-grained buckets is 0 if $n_i = 1$, $n_i - 2$ otherwise, so that $N_1 + ... + +N_m = n - c$. According to the findings of section \ref{SEC3}, the number $N'_i$ of the resulting +finer-grained buckets should be maximized, and the most efficient way to do this is to render $N'_i$ +proportional to $N_i$, since otherwise, viz. if some buckets were preferred to some other ones, the +unprivileged buckets would form as many bottlenecks. + +This can be accomplished by means of the following procedure. First, initialize integers $R$ and $U$ +to 0. Then, for each $i$ from 1 to $m$, check whether $N_i \leq 1$: + +\begin{itemize} + +\item +If so, set $N'_i$ to $N_i$. +\\In fact, no finer-grained bucket is necessary if there are no objects to be split, while a single +finer-grained bucket is sufficient for a lonely object. + +\item +Otherwise, perform the integer division of $N_i \cdot (p - c) + R$ by $n - c$, and set integer $Q$ +to the resulting quotient and $R$ to the resulting remainder. Then, if the minimum and maximum keys +occurring in bucket $B_i$ are equal, increase $U$ by $Q - N_i$, otherwise set $N'_i$ to $U + Q$ and +reset $U$ to 0. +\\In fact, as observed above, if its minimum and maximum keys are equal, bucket $B_i$ can be split +into $n_i = N_i + 2$ singleton buckets. Hence, the difference $Q - N_i$ between the number of the +available finer-grained buckets, i.e. $Q + 2$ (where 2 is the number of the buckets containing the +leftmost minimum and the rightmost maximum), and the number of those being used, i.e. $N_i + 2$, can +be added to the total number $U$ of the available finer-grained buckets still unused in the current +round. Such buckets can then be utilized as soon as a bucket $B_j$ with $N_j > 1$ whose minimum and +maximum keys do not match is encountered next, in addition to those already reserved for $B_j$. + +\end{itemize} + +Of course, for any $i$ from 1 to $m$ such that $N_i > 1$, it is $N_i \leq Q$ -- viz. the number of +the objects in $B_i$ to be split is not larger than that of the finer-grained buckets where they are +to be placed even if $U = 0$, so that the probability of placing at most one object into each bucket +is nonzero -- just in case $n - c \leq p - c$, i.e. $n \leq p$. Indeed, it will be formally proven +that for $n \leq p$, this procedure is successful in maximizing the buckets' number in each round +never exceeding the upper bound $p$. +\ + +subsubsection "Generalized counting sort (GCsort)" + +text \ +The conclusions of the efficiency analysis performed so far, put together, result in the following +\emph{generalized counting sort (GCsort)} algorithm. + +Let $xs$ be the input array containing $n$ objects to be sorted, and $ns$ an array of $p$ integers, +where $0 < p$ and $n \leq p$. Moreover, let $xs'$ and $ns'$ be two further arrays of the same type +and size of $xs$ and $ns$, respectively, and let $i$, $i'$, and $j$ be as many integers. + +Then, GCsort works as follows (assuming arrays to be zero-based): + +\begin{enumerate} + +\item +Initialize the first element of $ns$ to $n$ and any other element to 0. + +\item +Check whether $ns$ contains any element larger than 1. +\\If not, terminate the algorithm and output $xs$ as the resulting sorted array. + +\item +Initialize $i$, $i'$, and $j$ to 0. + +\item +Check whether $ns[i] = 1$ or $ns[i] > 1$: + + \begin{enumerate} + + \item + In the former case, set $xs'[j]$ to $xs[j]$ and $ns'[i']$ to 1. + \\Then, increase $i'$ and $j$ by 1. + + \item + In the latter case, partition the bucket comprised of objects $xs[j]$ to $xs[j+ns[i]-1]$ into + finer-grained buckets according to section \ref{SEC4}, storing the resulting $n'$ buckets in + $xs'[j]$ to $xs'[j+ns[i]-1]$ and their sizes in $ns'[i']$ to $ns'[i'+n'-1]$. + \\Then, increase $i'$ by $n'$ and $j$ by $ns[i]$. + + \end{enumerate} + +\item +Increase $i$ by 1, and then check whether $i < p$. +\\If so, go back to step 4. +\\Otherwise, perform the following operations: + + \begin{enumerate} + + \item + If $i' < p$, set integers $ns'[i']$ to $ns'[p-1]$ to 0. + + \item + Swap addresses $xs$ and $xs'$, as well as addresses $ns$ and $ns'$. + + \item + Go back to step 2. + + \end{enumerate} + +\end{enumerate} + +Since the algorithm is tail-recursive, the memory space required for its execution matches the one +required for a single recursive round, which is $O(n) + O(p)$. + +The best-case running time can be computed easily. The running time taken by step 1 is equal to $p$. +Moreover, the partition of a bucket into finer-grained ones is performed by determining their sizes, +computing the cumulative sum of such sizes, and rearranging the bucket's objects according to the +resulting offsets. All these operations only involve sequential, non-nested loops, which iterate +through either the objects or the finer-grained counters pertaining to the processed bucket alone. +Hence, the running time taken by a single recursive round is $O(n) + O(p)$, so that in the best case +where at most one round is executed after step 1, the running time taken by the algorithm is $O(n) + +O(p)$, too. + +The asymptotic worst-case running time can be computed as follows. Let $t_{n,p}$ be the worst-case +running time taken by a single round. As $t_{n,p}$ is $O(n) + O(p)$, there exist three real numbers +$a > 0$, $b > 0$, and $c$ such that $t_{n,p} \leq an + bp + c$. Moreover, let $U_{n,p}$ be the set +of the $p$-tuples of natural numbers such that the sum of their elements matches $n$, and $max(u)$ +the maximum element of a given $u \in U_{n,p}$. Finally, let $T_{n,p,u}$ be the worst-case running +time taken by the algorithm if it starts from step 2, viz. skipping step 1, using as initial content +of array $ns$ the $p$-tuple $u \in U_{n,p}$. + +Then, it can be proven by induction on $max(u)$ that: + +\begin{equation} +\label{EQ13} +T_{n,p,u}\leq +\begin{cases} +a\dfrac{max(u)}{2}n+\left[b\dfrac{max(u)}{2}+1\right]p+c\dfrac{max(u)}{2}\\ +\quad\text{if $max(u)$ is even,}\\ +\\ +a\dfrac{max(u)-1}{2}n+\left[b\dfrac{max(u)-1}{2}+1\right]p+c\dfrac{max(u)-1}{2}\\ +\quad\text{if $max(u)$ is odd} +\end{cases} +\end{equation} + +In fact, if $max(u) = 0$, the initial $p$-tuple $u$ matches the all-zero one. Hence, the algorithm +executes step 2 just once and then immediately terminates. Therefore, the running time is $p$, which +matches the right-hand side of inequality \eqref{EQ13} for $max(u) = 0$. + +If $max(u) = 1$, $u$ contains no element larger than 1. Thus, again, the algorithm terminates just +after the first execution of step 2. As a result, the running time is still $p$, which matches the +right-hand side of inequality \eqref{EQ13} for $max(u) = 1$. + +If $max(u) = 2$, $u$ contains some element larger than 1, so that one round is executed, taking time +$t_{n,p}$ in the worst case. Once this round is over, array $ns$ will contain a $p$-tuple $u' \in +U_{n,p}$ such that $max(u') = 1$. Hence, step 2 is executed again, taking time $p$, and then the +algorithm terminates. As a result, it is: + +\begin{equation*} +T_{n,p,u}\leq an+bp+c+p=an+(b+1)p+c, +\end{equation*} + +which matches the right-hand side of inequality \eqref{EQ13} for $max(u) = 2$. + +Finally, if $max(u) > 2$, $u$ has some element larger than 1, so one round is executed, taking time +$t_{n,p}$ in the worst case. Once this round is over, array $ns$ will contain a $p$-tuple $u' \in +U_{n,p}$ such that $max(u') \leq max(u) - 2$, because of the removal of the leftmost minimum and the +rightmost maximum from any non-singleton bucket. By the induction hypothesis, the worst-case time +$T_{n,p,u'}$ taken by the algorithm from this point onward complies with inequality \eqref{EQ13}, +whose right-hand side is maximum if such is $max(u')$, viz. if $max(u') = max(u) - 2$. + +As a result, if $max(u)$ is even, it is: + +\begin{align*} +&T_{n,p,u}\leq an+bp+c\\ +&\quad\quad+a\frac{max(u)-2}{2}n+\left[b\frac{max(u)-2}{2}+1\right]p+c\frac{max(u)-2}{2}\\ +&\quad=a\frac{max(u)}{2}n+\left[b\frac{max(u)}{2}+1\right]p+c\frac{max(u)}{2}, +\end{align*} + +which matches the right-hand side of inequality \eqref{EQ13} for an even $max(u)$. + +Similarly, if $max(u)$ is odd, it is: + +\begin{align*} +&T_{n,p,u}\leq an+bp+c\\ +&\quad\quad+a\frac{max(u)-3}{2}n+\left[b\frac{max(u)-3}{2}+1\right]p+c\frac{max(u)-3}{2}\\ +&\quad=a\frac{max(u)-1}{2}n+\left[b\frac{max(u)-1}{2}+1\right]p+c\frac{max(u)-1}{2}, +\end{align*} + +which matches the right-hand side of inequality \eqref{EQ13} for an odd $max(u)$. + +With this, the proof of inequality \eqref{EQ13} is complete. Now, let $T_{n,p}$ be the worst-case +time taken by the algorithm executed in full. Step 1 is executed first, taking time $p$. Then, array +$ns$ contains a $p$-tuple $u$ such that $max(u) = n$, and by definition, the worst-case time taken +by the algorithm from this point onward is $T_{n,p,u}$. Therefore, applying inequality \eqref{EQ13}, +it turns out that: + +\begin{align*} +&T_{n,p}=p+T_{n,p,u}\\ +&\quad\leq +\begin{cases} +a\dfrac{n^2}{2}+\left(b\dfrac{n}{2}+2\right)p+c\dfrac{n}{2} +&\quad\text{if $n$ is even,}\\ +\\ +a\dfrac{n^2-n}{2}+\left(b\dfrac{n-1}{2}+2\right)p+c\dfrac{n-1}{2} +&\quad\text{if $n$ is odd} +\end{cases} +\end{align*} + +As a result, the asymptotic worst-case running time taken by the algorithm is $O(n^2) + O(np)$. +\ + + +subsection "Formal definitions" + +text \ +Here below, a formal definition of GCsort is provided, which will later enable to formally prove the +correctness of the algorithm. Henceforth, the main points of the formal definitions and proofs are +commented. For further information, see Isabelle documentation, particularly \cite{R5}, \cite{R6}, +\cite{R7}, and \cite{R8}. + +The following formalization of GCsort does not define any specific function @{text index} to be used +to split buckets into finer-grained ones. It rather defines only the type @{text index_sign} of such +functions, matching the signature of the function $index$ described in section \ref{SEC2}, along +with three predicates that whatever chosen @{text index} function is required to satisfy for GCsort +to work correctly: + +\begin{itemize} + +\item +Predicate @{text index_less} requires function @{text index} to map any object within a given bucket +to an index less than the number of the associated finer-grained buckets (\emph{less than} instead +of \emph{not larger than} since type @{typ "'a list"} is zero-based). + +\item +Predicate @{text index_mono} requires function @{text index} to be monotonic with respect to the +keys of the objects within a given bucket. + +\item +Predicate @{text index_same} requires function @{text index} to map any two distinct objects within +a given bucket that have the same key to the same index (premise \emph{distinct} is added to enable +this predicate to be used by the simplifier). + +\end{itemize} + +\null +\ + +type_synonym ('a, 'b) index_sign = "('a \ 'b) \ 'a \ nat \ 'b \ 'b \ nat" + +definition index_less :: "('a, 'b::linorder) index_sign \ ('a \ 'b) \ bool" +where +"index_less index key \ + \x n mi ma. key x \ {mi..ma} \ 0 < n \ + index key x n mi ma < n" + +definition index_mono :: "('a, 'b::linorder) index_sign \ ('a \ 'b) \ bool" +where +"index_mono index key \ + \x y n mi ma. {key x, key y} \ {mi..ma} \ key x \ key y \ + index key x n mi ma \ index key y n mi ma" + +definition index_same :: "('a, 'b::linorder) index_sign \ ('a \ 'b) \ bool" +where +"index_same index key \ + \x y n mi ma. key x \ {mi..ma} \ x \ y \ key x = key y \ + index key x n mi ma = index key y n mi ma" + +text \ +\null + +Functions @{text bn_count} and @{text bn_comp} count, respectively, the objects known to be placed +in singleton buckets in a given round, and the finer-grained buckets available to partition a given +non-singleton bucket, according to section \ref{SEC4}. + +\null +\ + +fun bn_count :: "nat list \ nat" where +"bn_count [] = 0" | +"bn_count (Suc (Suc (Suc (Suc n))) # ns) = Suc (Suc (bn_count ns))" | +"bn_count (n # ns) = n + bn_count ns" + +fun bn_comp :: "nat \ nat \ nat \ nat \ nat \ nat" where +"bn_comp (Suc (Suc n)) p q r = + ((Suc (Suc n) * p + r) div q, (Suc (Suc n) * p + r) mod q)" | +"bn_comp n p q r = (n, r)" + +fun bn_valid :: "nat \ nat \ nat \ bool" where +"bn_valid (Suc (Suc n)) p q = (q \ {0<..p})" | +"bn_valid n p q = True" + +text \ +\null + +Functions @{text mini} and @{text maxi} return the indices of the leftmost minimum and the rightmost +maximum within a given non-singleton bucket. + +\null +\ + +primrec (nonexhaustive) mini :: "'a list \ ('a \ 'b::linorder) \ nat" where +"mini (x # xs) key = + (let m = mini xs key in if xs = [] \ key x \ key (xs ! m) then 0 else Suc m)" + +primrec (nonexhaustive) maxi :: "'a list \ ('a \ 'b::linorder) \ nat" where +"maxi (x # xs) key = + (let m = maxi xs key in if xs = [] \ key (xs ! m) < key x then 0 else Suc m)" + +text \ +\null + +Function @{text enum} counts the objects contained in each finer-grained bucket reserved for the +partition of a given non-singleton bucket. + +\null +\ + +primrec enum :: "'a list \ ('a, 'b) index_sign \ ('a \ 'b) \ + nat \ 'b \ 'b \ nat list" where +"enum [] index key n mi ma = replicate n 0" | +"enum (x # xs) index key n mi ma = + (let i = index key x n mi ma; + ns = enum xs index key n mi ma + in ns[i := Suc (ns ! i)])" + +text \ +\null + +Function @{text offs} computes the cumulative sum of the resulting finer-grained buckets' sizes so +as to generate the associated offsets' list. + +\null +\ + +primrec offs :: "nat list \ nat \ nat list" where +"offs [] i = []" | +"offs (n # ns) i = i # offs ns (i + n)" + +text \ +\null + +Function @{text fill} fills the finer-grained buckets with their respective objects. + +\null +\ + +primrec fill :: "'a list \ nat list \ ('a, 'b) index_sign \ ('a \ 'b) \ + nat \ 'b \ 'b \ 'a option list" where +"fill [] ns index key n mi ma = replicate n None" | +"fill (x # xs) ns index key n mi ma = + (let i = index key x (length ns) mi ma; + ys = fill xs (ns[i := Suc (ns ! i)]) index key n mi ma + in ys[ns ! i := Some x])" + +text \ +\null + +Then, function @{text round} formalizes a single GCsort's recursive round. + +\null +\ + +definition round_suc_suc :: "('a, 'b::linorder) index_sign \ ('a \ 'b) \ + 'a list \ nat \ nat \ nat \ nat \ nat list \ 'a list" where +"round_suc_suc index key ws n n' u \ + let nmi = mini ws key; nma = maxi ws key; + xmi = ws ! nmi; xma = ws ! nma; mi = key xmi; ma = key xma + in if mi = ma + then (u + n' - n, replicate (Suc (Suc n)) (Suc 0), ws) + else + let k = case n of Suc (Suc i) \ u + n' | _ \ n; + zs = nths ws (- {nmi, nma}); ms = enum zs index key k mi ma + in (u + n' - k, Suc 0 # ms @ [Suc 0], + xmi # map the (fill zs (offs ms 0) index key n mi ma) @ [xma])" + +fun round :: "('a, 'b::linorder) index_sign \ ('a \ 'b) \ nat \ nat \ nat \ + nat \ nat list \ 'a list \ nat \ nat list \ 'a list" where +"round index key p q r (u, [], xs) = (u, [], xs)" | +"round index key p q r (u, 0 # ns, xs) = round index key p q r (u, ns, xs)" | +"round index key p q r (u, Suc 0 # ns, xs) = + (let (u', ns', xs') = round index key p q r (u, ns, tl xs) + in (u', Suc 0 # ns', hd xs # xs'))" | +"round index key p q r (u, Suc (Suc n) # ns, xs) = + (let ws = take (Suc (Suc n)) xs; (n', r') = bn_comp n p q r; + (v, ms', ws') = round_suc_suc index key ws n n' u; + (u', ns', xs') = round index key p q r' (v, ns, drop (Suc (Suc n)) xs) + in (u', ms' @ ns', ws' @ xs'))" + +text \ +\null + +Finally, function @{text gcsort_aux} formalizes GCsort. Since the algorithm is tail-recursive, this +function complies with the requirements for an auxiliary tail-recursive function applying to step 1 +of the proof method described in \cite{R4} -- henceforth briefly referred to as the \emph{proof +method} --. This feature will later enable to formally prove the algorithm's correctness properties +by means of such method. + +\null +\ + +abbreviation gcsort_round :: "('a, 'b::linorder) index_sign \ ('a \ 'b) \ + nat \ nat list \ 'a list \ nat \ nat list \ 'a list" where +"gcsort_round index key p ns xs \ + round index key (p - bn_count ns) (length xs - bn_count ns) 0 (0, ns, xs)" + +function gcsort_aux :: "('a, 'b::linorder) index_sign \ ('a \ 'b) \ nat \ + nat \ nat list \ 'a list \ nat \ nat list \ 'a list" where +"gcsort_aux index key p (u, ns, xs) = (if find (\n. Suc 0 < n) ns = None + then (u, ns, xs) + else gcsort_aux index key p (gcsort_round index key p ns xs))" +by auto + +text \ +\null + +First of all, even before accomplishing step 2 of the proof method, it is necessary to prove that +function @{const gcsort_aux} always terminates by showing that the maximum bucket's size decreases +in each recursive round. + +\null +\ + +lemma add_zeros: + "foldl (+) (m :: nat) (replicate n 0) = m" +by (induction n, simp_all) + +lemma add_suc: + "foldl (+) (Suc m) ns = Suc (foldl (+) m ns)" +by (induction ns arbitrary: m, simp_all) + +lemma add_update: + "i < length ns \ foldl (+) m (ns[i := Suc (ns ! i)]) = Suc (foldl (+) m ns)" +by (induction ns arbitrary: i m, simp_all add: add_suc split: nat.split) + +lemma add_le: + "(m :: nat) \ foldl (+) m ns" +by (induction ns arbitrary: m, simp_all, rule order_trans, rule le_add1) + +lemma add_mono: + "(m :: nat) \ n \ foldl (+) m ns \ foldl (+) n ns" +by (induction ns arbitrary: m n, simp_all) + +lemma add_max [rule_format]: + "ns \ [] \ Max (set ns) \ foldl (+) (0 :: nat) ns" +by (induction ns, simp_all add: add_le, erule impCE, simp, rule ballI, drule bspec, + assumption, rule order_trans, assumption, rule add_mono, simp) + +lemma enum_length: + "length (enum xs index key n mi ma) = n" +by (induction xs, simp_all add: Let_def) + +lemma enum_add_le: + "foldl (+) 0 (enum xs index key n mi ma) \ length xs" +proof (induction xs, simp_all add: Let_def add_zeros) + fix x xs + let ?i = "index key x n mi ma" + assume "foldl (+) 0 (enum xs index key n mi ma) \ length xs" + (is "foldl _ _ ?ns \ _") + thus "foldl (+) 0 (?ns[?i := Suc (?ns ! ?i)]) \ Suc (length xs)" + by (cases "?i < length ?ns", simp_all add: add_update) +qed + +lemma enum_max_le: + "0 < n \ Max (set (enum xs index key n mi ma)) \ length xs" + (is "_ \ Max (set ?ns) \ _") +by (insert add_max [of ?ns], insert enum_add_le [of xs index key n mi ma], + simp only: length_greater_0_conv [symmetric] enum_length, simp) + +lemma mini_less: + "0 < length xs \ mini xs key < length xs" +by (induction xs, simp_all add: Let_def) + +lemma maxi_less: + "0 < length xs \ maxi xs key < length xs" +by (induction xs, simp_all add: Let_def) + +lemma mini_lb: + "x \ set xs \ key (xs ! mini xs key) \ key x" +by (induction xs, simp_all add: Let_def, auto) + +lemma maxi_ub: + "x \ set xs \ key x \ key (xs ! maxi xs key)" +by (induction xs, simp_all add: Let_def, auto) + +lemma mini_maxi_neq [rule_format]: + "Suc 0 < length xs \ mini xs key \ maxi xs key" +proof (induction xs, simp_all add: Let_def, rule conjI, (rule impI)+, + (rule_tac [2] impI)+, rule_tac [2] notI, simp_all, rule ccontr) + fix x xs + assume "key (xs ! maxi xs key) < key x" and "key x \ key (xs ! mini xs key)" + hence "key (xs ! maxi xs key) < key (xs ! mini xs key)" by simp + moreover assume "xs \ []" + hence "0 < length xs" by simp + hence "mini xs key < length xs" + by (rule mini_less) + hence "xs ! mini xs key \ set xs" by simp + hence "key (xs ! mini xs key) \ key (xs ! maxi xs key)" + by (rule maxi_ub) + ultimately show False by simp +qed + +lemma mini_maxi_nths: + "length (nths xs (- {mini xs key, maxi xs key})) = + (case length xs of 0 \ 0 | Suc 0 \ 0 | Suc (Suc n) \ n)" +proof (simp add: length_nths split: nat.split, rule allI, rule conjI, rule_tac [2] allI, + (rule_tac [!] impI)+, simp add: length_Suc_conv, erule exE, simp, blast) + fix n + assume A: "length xs = Suc (Suc n)" + hence B: "Suc 0 < length xs" by simp + hence C: "0 < length xs" by arith + have "{i. i < Suc (Suc n) \ i \ mini xs key \ i \ maxi xs key} = + {.. i \ mini xs key \ i \ maxi xs key} = n" + by (simp add: card_Diff_singleton_if, insert mini_maxi_neq [OF B, of key], + simp add: mini_less [OF C] maxi_less [OF C] A [symmetric]) +qed + +lemma mini_maxi_nths_le: + "length xs \ Suc (Suc n) \ length (nths xs (- {mini xs key, maxi xs key})) \ n" +by (simp add: mini_maxi_nths split: nat.split) + +lemma round_nil: + "(fst (snd (round index key p q r t)) \ []) = (\n \ set (fst (snd t)). 0 < n)" +by (induction index key p q r t rule: round.induct, + simp_all add: round_suc_suc_def Let_def split: prod.split) + +lemma round_max_eq [rule_format]: + "fst (snd t) \ [] \ Max (set (fst (snd t))) = Suc 0 \ + Max (set (fst (snd (round index key p q r t)))) = Suc 0" +proof (induction index key p q r t rule: round.induct, simp_all add: Let_def split: + prod.split del: all_simps, rule impI, (rule_tac [2] allI)+, (rule_tac [2] impI)+, + (rule_tac [3] allI)+, (rule_tac [3] impI)+, rule_tac [3] FalseE) + fix index p q r u ns xs and key :: "'a \ 'b" + let ?t = "round index key p q r (u, ns, xs)" + assume "ns \ [] \ Max (set ns) = Suc 0 \ + Max (set (fst (snd ?t))) = Suc 0" + moreover assume A: "Max (insert 0 (set ns)) = Suc 0" + hence "ns \ []" + by (cases ns, simp_all) + moreover from this have "Max (set ns) = Suc 0" + using A by simp + ultimately show "Max (set (fst (snd ?t))) = Suc 0" + by simp +next + fix index p q r u ns xs u' ns' xs' and key :: "'a \ 'b" + let ?t = "round index key p q r (u, ns, tl xs)" + assume A: "?t = (u', ns', xs')" and + "ns \ [] \ Max (set ns) = Suc 0 \ Max (set (fst (snd ?t))) = Suc 0" + hence B: "ns \ [] \ Max (set ns) = Suc 0 \ Max (set ns') = Suc 0" + by simp + assume C: "Max (insert (Suc 0) (set ns)) = Suc 0" + show "Max (insert (Suc 0) (set ns')) = Suc 0" + proof (cases "ns' = []", simp) + assume D: "ns' \ []" + hence "fst (snd ?t) \ []" + using A by simp + hence "\n \ set ns. 0 < n" + by (simp add: round_nil) + then obtain n where E: "n \ set ns" and F: "0 < n" .. + hence G: "ns \ []" + by (cases ns, simp_all) + moreover have "n \ Max (set ns)" + using E by (rule_tac Max_ge, simp_all) + hence "0 < Max (set ns)" + using F by simp + hence "Max (set ns) = Suc 0" + using C and G by simp + ultimately have "Max (set ns') = Suc 0" + using B by simp + thus ?thesis + using D by simp + qed +next + fix n ns + assume "Max (insert (Suc (Suc n)) (set ns)) = Suc 0" + thus False + by (cases ns, simp_all) +qed + +lemma round_max_less [rule_format]: + "fst (snd t) \ [] \ Suc 0 < Max (set (fst (snd t))) \ + Max (set (fst (snd (round index key p q r t)))) < Max (set (fst (snd t)))" +proof (induction index key p q r t rule: round.induct, simp_all add: Let_def split: + prod.split del: all_simps, rule impI, (rule_tac [2] allI)+, (rule_tac [2] impI)+, + (rule_tac [3] allI)+, (rule_tac [3] impI)+, rule_tac [2] ballI) + fix index p q r u ns xs and key :: "'a \ 'b" + let ?t = "round index key p q r (u, ns, xs)" + assume "ns \ [] \ Suc 0 < Max (set ns) \ + Max (set (fst (snd ?t))) < Max (set ns)" + moreover assume A: "Suc 0 < Max (insert 0 (set ns))" + hence "ns \ []" + by (cases ns, simp_all) + moreover from this have "Suc 0 < Max (set ns)" + using A by simp + ultimately show "Max (set (fst (snd ?t))) < Max (insert 0 (set ns))" + by simp +next + fix index p q r u ns xs u' ns' xs' i and key :: "'a \ 'b" + let ?t = "round index key p q r (u, ns, tl xs)" + assume + "?t = (u', ns', xs')" and + "ns \ [] \ Suc 0 < Max (set ns) \ + Max (set (fst (snd ?t))) < Max (set ns)" + hence "ns \ [] \ Suc 0 < Max (set ns) \ + Max (set ns') < Max (set ns)" + by simp + moreover assume A: "Suc 0 < Max (insert (Suc 0) (set ns))" + hence B: "ns \ []" + by (cases ns, simp_all) + moreover from this have "Suc 0 < Max (set ns)" + using A by simp + ultimately have "Max (set ns') < Max (set ns)" by simp + moreover assume "i \ set ns'" + hence "i \ Max (set ns')" + by (rule_tac Max_ge, simp) + ultimately show "i < Max (insert (Suc 0) (set ns))" + using B by simp +next + fix index p q r u n ns n' r' v ms' ws' u' ns' xs' + and key :: "'a \ 'b" and xs :: "'a list" + let ?ws = "take (Suc (Suc n)) xs" + let ?ys = "drop (Suc (Suc n)) xs" + let ?r = "\n'. round_suc_suc index key ?ws n n' u" + let ?t = "\r' v. round index key p q r' (v, ns, ?ys)" + assume + A: "?r n' = (v, ms', ws')" and + B: "?t r' v = (u', ns', xs')" + moreover assume "\ws a b c d e f g h. + ws = ?ws \ a = bn_comp n p q r \ (b, c) = bn_comp n p q r \ + d = ?r b \ (e, f) = ?r b \ (g, h) = f \ + ns \ [] \ Suc 0 < Max (set ns) \ + Max (set (fst (snd (?t c e)))) < Max (set ns)" and + "bn_comp n p q r = (n', r')" + ultimately have C: "ns \ [] \ Suc 0 < Max (set ns) \ + Max (set ns') < Max (set ns)" + by simp + from A [symmetric] show "Max (set ms' \ set ns') < + Max (insert (Suc (Suc n)) (set ns))" + proof (simp add: round_suc_suc_def Let_def, subst Max_less_iff, simp_all, + rule_tac impI, simp add: Let_def split: if_split_asm, rule_tac ballI, + erule_tac UnE, simp add: Let_def split: if_split_asm, (erule_tac [1-2] conjE)+) + fix i + assume "i = Suc 0 \ i = Suc 0 \ 0 < n" + hence "i = Suc 0" by blast + hence "i < Suc (Suc n)" by simp + also have "\ \ Max (insert (Suc (Suc n)) (set ns))" + by (rule Max_ge, simp_all) + finally show "i < Max (insert (Suc (Suc n)) (set ns))" . + next + fix i + let ?nmi = "mini ?ws key" + let ?nma = "maxi ?ws key" + let ?xmi = "?ws ! ?nmi" + let ?xma = "?ws ! ?nma" + let ?mi = "key ?xmi" + let ?ma = "key ?xma" + let ?k = "case n of 0 \ n | Suc 0 \ n | Suc (Suc i) \ u + n'" + let ?zs = "nths ?ws (- {?nmi, ?nma})" + let ?ms = "enum ?zs index key ?k ?mi ?ma" + assume "i = Suc 0 \ i \ set ?ms" + moreover { + assume "i = Suc 0" + hence "i < Suc (Suc n)" by simp + } + moreover { + assume D: "i \ set ?ms" + hence "i \ Max (set ?ms)" + by (rule_tac Max_ge, simp) + moreover have "0 < length ?ms" + using D by (rule length_pos_if_in_set) + hence "0 < ?k" + by (simp add: enum_length) + hence "Max (set ?ms) \ length ?zs" + by (rule enum_max_le) + ultimately have "i \ length ?zs" by simp + moreover have "length ?zs \ n" + by (rule mini_maxi_nths_le, simp) + ultimately have "i < Suc (Suc n)" by simp + } + ultimately have "i < Suc (Suc n)" .. + also have "\ \ Max (insert (Suc (Suc n)) (set ns))" + by (rule Max_ge, simp_all) + finally show "i < Max (insert (Suc (Suc n)) (set ns))" . + next + fix i + assume D: "i \ set ns'" + hence "0 < length ns'" + by (rule length_pos_if_in_set) + hence "fst (snd (?t r' v)) \ []" + using B by simp + hence E: "\n \ set ns. 0 < n" + by (simp add: round_nil) + hence F: "ns \ []" + by (cases ns, simp_all) + show "i < Max (insert (Suc (Suc n)) (set ns))" + proof (cases "Suc 0 < Max (set ns)") + case True + hence "Max (set ns') < Max (set ns)" + using C and F by simp + moreover have "i \ Max (set ns')" + using D by (rule_tac Max_ge, simp) + ultimately show ?thesis + using F by simp + next + case False + moreover from E obtain j where G: "j \ set ns" and H: "0 < j" .. + have "j \ Max (set ns)" + using G by (rule_tac Max_ge, simp) + hence "0 < Max (set ns)" + using H by simp + ultimately have "Max (set ns) = Suc 0" by simp + hence "Max (set (fst (snd (?t r' v)))) = Suc 0" + using F by (rule_tac round_max_eq, simp_all) + hence "Max (set ns') = Suc 0" + using B by simp + moreover have "i \ Max (set ns')" + using D by (rule_tac Max_ge, simp) + ultimately have "i < Suc (Suc n)" by simp + also have "\ \ Max (insert (Suc (Suc n)) (set ns))" + by (rule Max_ge, simp_all) + finally show ?thesis . + qed + qed +qed + +termination gcsort_aux +proof (relation "measure (\(index, key, p, t). Max (set (fst (snd t))))", + simp_all add: find_None_iff, erule exE, erule conjE) + fix index p ns xs i and key :: "'a \ 'b" + let ?t = "gcsort_round index key p ns xs" + assume A: "Suc 0 < i" and B: "i \ set ns" + have C: "0 < length ns" + using B by (rule length_pos_if_in_set) + moreover have "\i \ set ns. Suc 0 < i" + using A and B .. + hence "Suc 0 < Max (set ns)" + using C by (subst Max_gr_iff, simp_all) + ultimately have "Max (set (fst (snd ?t))) < Max (set (fst (snd (0, ns, xs))))" + by (insert round_max_less [of "(0, ns, xs)"], simp) + thus "Max (set (fst (snd ?t))) < Max (set ns)" by simp +qed + +text \ +\null + +Now steps 2, 3, and 4 of the proof method, which are independent of the properties to be proven, can +be accomplished. Particularly, function @{text gcsort} constitutes the complete formal definition of +GCsort, as it puts the algorithm's inputs and outputs into their expected form. + +Observe that the conditional expression contained in the definition of function @{const gcsort_aux} +need not be reflected in the definition of inductive set @{text gcsort_set} as just one alternative +gives rise to a recursive call, viz. as its only purpose is to ensure the function's termination. + +\null +\ + +definition gcsort_in :: "'a list \ nat \ nat list \ 'a list" where +"gcsort_in xs \ (0, [length xs], xs)" + +definition gcsort_out :: "nat \ nat list \ 'a list \ 'a list" where +"gcsort_out \ snd \ snd" + +definition gcsort :: "('a, 'b::linorder) index_sign \ ('a \ 'b) \ nat \ + 'a list \ 'a list" where +"gcsort index key p xs \ gcsort_out (gcsort_aux index key p (gcsort_in xs))" + +inductive_set gcsort_set :: "('a, 'b::linorder) index_sign \ ('a \ 'b) \ nat \ + nat \ nat list \ 'a list \ (nat \ nat list \ 'a list) set" +for index key p t where +R0: "t \ gcsort_set index key p t" | +R1: "(u, ns, xs) \ gcsort_set index key p t \ + gcsort_round index key p ns xs \ gcsort_set index key p t" + +lemma gcsort_subset: + assumes A: "t' \ gcsort_set index key p t" + shows "gcsort_set index key p t' \ gcsort_set index key p t" +by (rule subsetI, erule gcsort_set.induct, rule A, rule R1) + +lemma gcsort_aux_set: + "gcsort_aux index key p t \ gcsort_set index key p t" +proof (induction index key p t rule: gcsort_aux.induct, simp, rule conjI, + rule_tac [!] impI, rule R0, simp) + fix index p u ns xs and key :: "'a \ 'b" + let ?t = "gcsort_round index key p ns xs" + assume "gcsort_aux index key p ?t \ gcsort_set index key p ?t" + moreover have "(u, ns, xs) \ gcsort_set index key p (u, ns, xs)" + by (rule R0) + hence "?t \ gcsort_set index key p (u, ns, xs)" + by (rule R1) + hence "gcsort_set index key p ?t \ gcsort_set index key p (u, ns, xs)" + by (rule gcsort_subset) + ultimately show "gcsort_aux index key p ?t + \ gcsort_set index key p (u, ns, xs)" .. +qed + + +subsection "Proof of a preliminary invariant" + +text \ +This section is dedicated to the proof of the invariance of predicate @{text add_inv}, defined here +below, over inductive set @{const gcsort_set}. This invariant will later be used to prove GCsort's +correctness properties. + +Another predicate, @{text bn_inv}, is also defined, using predicate @{const bn_valid} defined above. + +\null +\ + +fun bn_inv :: "nat \ nat \ nat \ nat list \ 'a list \ bool" where +"bn_inv p q (u, ns, xs) = + (\n \ set ns. case n of Suc (Suc m) \ bn_valid m p q | _ \ True)" + +fun add_inv :: "nat \ nat \ nat list \ 'a list \ bool" where +"add_inv n (u, ns, xs) = + (foldl (+) 0 ns = n \ length xs = n)" + +lemma gcsort_add_input: + "add_inv (length xs) (0, [length xs], xs)" +by simp + +lemma add_base: + "foldl (+) (k + m) ns = foldl (+) m ns + (k :: nat)" +by (induction ns arbitrary: m, simp_all, subst add.assoc, simp) + +lemma add_base_zero: + "foldl (+) k ns = foldl (+) 0 ns + (k :: nat)" +by (insert add_base [of k 0 ns], simp) + +lemma bn_count_le: + "bn_count ns \ foldl (+) 0 ns" +by (induction ns rule: bn_count.induct, simp_all add: add_suc, subst add_base_zero, + simp) + +text \ +\null + +Here below is the proof of the main property of predicate @{const bn_inv}, which states that if the +objects' number is not larger than the counters' upper bound, then, as long as there are buckets to +be split, the arguments $p$ and $q$ passed by function @{const round} to function @{const bn_comp} +are such that $0 < q \leq p$. + +\null +\ + +lemma bn_inv_intro [rule_format]: + "foldl (+) 0 ns \ p \ + bn_inv (p - bn_count ns) (foldl (+) 0 ns - bn_count ns) (u, ns, xs)" +proof (induction ns, simp_all, (rule impI)+, subst (asm) (3) add_base_zero, + subst (1 2) add_base_zero, simp) + fix n ns + assume + A: "\x \ set ns. case x of 0 \ True | Suc 0 \ True | Suc (Suc m) \ + bn_valid m (p - bn_count ns) (foldl (+) 0 ns - bn_count ns)" and + B: "foldl (+) 0 ns + n \ p" + show + "(case n of 0 \ True | Suc 0 \ True | Suc (Suc m) \ + bn_valid m (p - bn_count (n # ns)) + (foldl (+) 0 ns + n - bn_count (n # ns))) \ + (\x \ set ns. case x of 0 \ True | Suc 0 \ True | Suc (Suc m) \ + bn_valid m (p - bn_count (n # ns)) + (foldl (+) 0 ns + n - bn_count (n # ns)))" + proof (rule conjI, rule_tac [2] ballI, simp_all split: nat.split, rule_tac [!] allI, + rule_tac [!] impI) + fix m + assume C: "n = Suc (Suc m)" + show "bn_valid m (p - bn_count (Suc (Suc m) # ns)) + (Suc (Suc (foldl (+) 0 ns + m)) - bn_count (Suc (Suc m) # ns))" + (is "bn_valid _ ?p ?q") + proof (rule bn_valid.cases [of "(m, ?p, ?q)"], simp_all, erule conjE, rule conjI) + fix k + have "bn_count ns \ foldl (+) 0 ns" + by (rule bn_count_le) + thus "bn_count ns < Suc (Suc (foldl (+) 0 ns + k))" by simp + next + fix k + assume "m = Suc (Suc k)" + hence "Suc (Suc (foldl (+) 0 ns + k)) - bn_count ns = + foldl (+) 0 ns + n - Suc (Suc (bn_count ns))" + using C by simp + also have "\ \ p - Suc (Suc (bn_count ns))" + using B by simp + finally show "Suc (Suc (foldl (+) 0 ns + k)) - bn_count ns \ + p - Suc (Suc (bn_count ns))" . + qed + next + fix n' m + assume "n' \ set ns" + with A have "case n' of 0 \ True | Suc 0 \ True | Suc (Suc m) \ + bn_valid m (p - bn_count ns) (foldl (+) 0 ns - bn_count ns)" .. + moreover assume "n' = Suc (Suc m)" + ultimately have "bn_valid m (p - bn_count ns) (foldl (+) 0 ns - bn_count ns)" + by simp + thus "bn_valid m (p - bn_count (n # ns)) + (foldl (+) 0 ns + n - bn_count (n # ns))" + (is "bn_valid _ ?p ?q") + proof (rule_tac bn_valid.cases [of "(m, ?p, ?q)"], simp_all, (erule_tac conjE)+, + simp) + fix p' q' + assume "bn_count ns < foldl (+) 0 ns" + moreover assume "p - bn_count (n # ns) = p'" + hence "p' = p - bn_count (n # ns)" .. + moreover assume "foldl (+) 0 ns + n - bn_count (n # ns) = q'" + hence "q' = foldl (+) 0 ns + n - bn_count (n # ns)" .. + ultimately show "0 < q' \ q' \ p'" + using B by (rule_tac bn_count.cases [of "n # ns"], simp_all) + qed + qed +qed + +text \ +\null + +In what follows, the invariance of predicate @{const add_inv} over inductive set @{const gcsort_set} +is then proven as lemma @{text gcsort_add_inv}. It holds under the conditions that the objects' +number is not larger than the counters' upper bound and function @{text index} satisfies predicate +@{const index_less}, and states that, if the counters' sum initially matches the objects' number, +this is still true after any recursive round. + +\null +\ + +lemma bn_comp_fst_ge [rule_format]: + "bn_valid n p q \ n \ fst (bn_comp n p q r)" +proof (induction n p q r rule: bn_comp.induct, simp_all del: mult_Suc, + rule impI, erule conjE) + fix n p r and q :: nat + assume "0 < q" + hence "Suc (Suc n) = Suc (Suc n) * q div q" by simp + also assume "q \ p" + hence "Suc (Suc n) * q \ Suc (Suc n) * p" + by (rule mult_le_mono2) + hence "Suc (Suc n) * q div q \ (Suc (Suc n) * p + r) div q" + by (rule_tac div_le_mono, simp) + finally show "Suc (Suc n) \ (Suc (Suc n) * p + r) div q" . +qed + +lemma bn_comp_fst_nonzero: + "bn_valid n p q \ 0 < n \ 0 < fst (bn_comp n p q r)" +by (drule bn_comp_fst_ge [where r = r], simp) + +lemma bn_comp_snd_less: + "r < q \ snd (bn_comp n p q r) < q" +by (induction n p q r rule: bn_comp.induct, simp_all) + +lemma add_replicate: + "foldl (+) k (replicate m n) = k + m * n" +by (induction m arbitrary: k, simp_all) + +lemma fill_length: + "length (fill xs ns index key n mi ma) = n" +by (induction xs arbitrary: ns, simp_all add: Let_def) + +lemma enum_add [rule_format]: + assumes + A: "index_less index key" and + B: "0 < n" + shows "(\x \ set xs. key x \ {mi..ma}) \ + foldl (+) 0 (enum xs index key n mi ma) = length xs" +proof (induction xs, simp_all add: Let_def add_zeros, rule impI, (erule conjE)+, + simp) + fix x xs + assume "mi \ key x" and "key x \ ma" + hence "index key x n mi ma < n" + (is "?i < _") + using A and B by (simp add: index_less_def) + hence "?i < length (enum xs index key n mi ma)" + (is "_ < length ?ns") + by (simp add: enum_length) + hence "foldl (+) 0 (?ns[?i := Suc (?ns ! ?i)]) = Suc (foldl (+) 0 ?ns)" + by (rule add_update) + moreover assume "foldl (+) 0 ?ns = length xs" + ultimately show "foldl (+) 0 (?ns[?i := Suc (?ns ! ?i)]) = Suc (length xs)" + by simp +qed + +lemma round_add_inv [rule_format]: + "index_less index key \ bn_inv p q t \ add_inv n t \ + add_inv n (round index key p q r t)" +proof (induction index key p q r t arbitrary: n rule: round.induct, simp_all + add: Let_def split: prod.split, (rule allI)+, (rule impI)+, erule conjE, + (rule_tac [2] allI)+, (rule_tac [2] impI)+, (erule_tac [2] conjE)+, + rule_tac [2] ssubst [OF add_base_zero], simp_all add: add_suc) + fix n ns ns' and xs' :: "'a list" + assume "\n'. foldl (+) 0 ns = n' \ n - Suc 0 = n' \ + foldl (+) 0 ns' = n' \ length xs' = n'" + hence "foldl (+) 0 ns = n - Suc 0 \ + foldl (+) 0 ns' = n - Suc 0 \ length xs' = n - Suc 0" + by simp + moreover assume "Suc (foldl (+) 0 ns) = n" + ultimately show "Suc (foldl (+) 0 ns') = n \ Suc (length xs') = n" by simp +next + fix index p q r u m m' ns v ms' ws' ns' n + and key :: "'a \ 'b" and xs :: "'a list" and xs' :: "'a list" and r' :: nat + let ?ws = "take (Suc (Suc m)) xs" + assume + A: "round_suc_suc index key ?ws m m' u = (v, ms', ws')" and + B: "bn_comp m p q r = (m', r')" and + C: "index_less index key" and + D: "bn_valid m p q" and + E: "length xs = n" + assume "\ws a b c d e f g h n'. + ws = ?ws \ a = (m', r') \ b = m' \ c = r' \ + d = (v, ms', ws') \ e = v \ f = (ms', ws') \ g = ms' \ h = ws' \ + foldl (+) 0 ns = n' \ n - Suc (Suc m) = n' \ + foldl (+) 0 ns' = n' \ length xs' = n'" + moreover assume "Suc (Suc (foldl (+) m ns)) = n" + hence F: "foldl (+) 0 ns + Suc (Suc m) = n" + by (subst (asm) add_base_zero, simp) + ultimately have + G: "foldl (+) 0 ns' = n - Suc (Suc m) \ length xs' = n - Suc (Suc m)" + by simp + from A show "foldl (+) 0 ns' + foldl (+) 0 ms' = n \ + length ws' + length xs' = n" + proof (subst (2) add_base_zero, simp add: round_suc_suc_def Let_def split: + if_split_asm, (erule_tac [!] conjE)+, simp_all) + assume "Suc 0 # Suc 0 # replicate m (Suc 0) = ms'" + hence "ms' = Suc 0 # Suc 0 # replicate m (Suc 0)" .. + hence "foldl (+) 0 ms' = Suc (Suc m)" + by (simp add: add_replicate) + hence "foldl (+) 0 ns' + foldl (+) 0 ms' = n" + using F and G by simp + moreover assume "?ws = ws'" + hence "ws' = ?ws" .. + hence "length ws' = Suc (Suc m)" + using F and E by simp + hence "length ws' + length xs' = n" + using F and G by simp + ultimately show ?thesis .. + next + let ?nmi = "mini ?ws key" + let ?nma = "maxi ?ws key" + let ?xmi = "?ws ! ?nmi" + let ?xma = "?ws ! ?nma" + let ?mi = "key ?xmi" + let ?ma = "key ?xma" + let ?k = "case m of 0 \ m | Suc 0 \ m | Suc (Suc i) \ u + m'" + let ?zs = "nths ?ws (- {?nmi, ?nma})" + let ?ms = "enum ?zs index key ?k ?mi ?ma" + assume "Suc 0 # ?ms @ [Suc 0] = ms'" + hence "ms' = Suc 0 # ?ms @ [Suc 0]" .. + moreover assume + "?xmi # map the (fill ?zs (offs ?ms 0) index key m ?mi ?ma) @ [?xma] = ws'" + hence "ws' = ?xmi # map the (fill ?zs (offs ?ms 0) index key m ?mi ?ma) + @ [?xma]" .. + ultimately show ?thesis + proof (simp add: fill_length, subst (2) add_base_zero, simp, cases m) + case 0 + moreover from this have "length ?ms = 0" + by (simp add: enum_length) + ultimately show "Suc (Suc (foldl (+) 0 ns' + foldl (+) 0 ?ms)) = n \ + Suc (Suc (m + length xs')) = n" + using F and G by simp + next + case Suc + moreover from this have "0 < fst (bn_comp m p q r)" + by (rule_tac bn_comp_fst_nonzero [OF D], simp) + hence "0 < m'" + using B by simp + ultimately have H: "0 < ?k" + by (simp split: nat.split) + have "foldl (+) 0 ?ms = length ?zs" + by (rule enum_add [OF C H], simp, rule conjI, + ((rule mini_lb | rule maxi_ub), erule in_set_nthsD)+) + moreover have "length ?ws = Suc (Suc m)" + using F and E by simp + hence "length ?zs = m" + by (simp add: mini_maxi_nths) + ultimately show "Suc (Suc (foldl (+) 0 ns' + foldl (+) 0 ?ms)) = n \ + Suc (Suc (m + length xs')) = n" + using F and G by simp + qed + qed +qed + +lemma gcsort_add_inv: + assumes A: "index_less index key" + shows "\t' \ gcsort_set index key p t; add_inv n t; n \ p\ \ + add_inv n t'" +by (erule gcsort_set.induct, simp, rule round_add_inv [OF A], simp_all del: + bn_inv.simps, erule conjE, frule sym, erule subst, rule bn_inv_intro, simp) + + +subsection "Proof of counters' optimization" + +text \ +In this section, it is formally proven that the number of the counters (and then of the buckets as +well) used in each recursive round is maximized never exceeding the fixed upper bound. + +This property is formalized by theorem @{text round_len}, which holds under the condition that the +objects' number is not larger than the counters' upper bound and states what follows: + +\begin{itemize} + +\item +While there is some bucket with size larger than two, the sum of the number of the used counters and +the number of the unused ones -- viz. those, if any, left unused due to the presence of some bucket +with size larger than two and equal minimum and maximum keys (cf. section \ref{SEC4}) -- matches the +counters' upper bound. +\\In addition to ensuring the upper bound's enforcement, this implies that the number of the used +counters matches the upper bound unless there is some aforesaid bucket not followed by any other +bucket with size larger than two and distinct minimum and maximum keys. + +\item +Once there is no bucket with size larger than two -- in which case a round is executed just in case +there is some bucket with size two --, the number of the used counters matches the objects' number. +\\In fact, the algorithm immediately terminates after such a round since every resulting bucket has +size one, so that increasing the number of the used counters does not matter in this case. + +\end{itemize} + +\null +\ + +lemma round_len_less [rule_format]: + "bn_inv p q t \ r < q \ + (r + (foldl (+) 0 (fst (snd t)) - bn_count (fst (snd t))) * p) mod q = 0 \ + (fst (round index key p q r t) + + length (fst (snd (round index key p q r t)))) * q = + (fst t + bn_count (fst (snd t))) * q + + (foldl (+) 0 (fst (snd t)) - bn_count (fst (snd t))) * p + r" +proof (induction index key p q r t rule: round.induct, simp_all add: Let_def + split: prod.split del: all_simps, ((rule allI)+, (rule impI)+, simp add: + add_suc)+, subst (asm) (3) add_base_zero, subst add_base_zero, erule conjE) + fix index p q r u n ns n' r' v ms' ws' u' + and key :: "'a \ 'b" and xs :: "'a list" and ns' :: "nat list" + let ?ws = "take (Suc (Suc n)) xs" + assume + A: "round_suc_suc index key ?ws n n' u = (v, ms', ws')" and + B: "bn_comp n p q r = (n', r')" and + C: "bn_valid n p q" + have D: "bn_count ns \ foldl (+) 0 ns" + by (rule bn_count_le) + assume "\ws a b c d e f g h. + ws = ?ws \ a = (n', r') \ b = n' \ c = r' \ + d = (v, ms', ws') \ e = v \ f = (ms', ws') \ g = ms' \ h = ws' \ + r' < q \ (r' + (foldl (+) 0 ns - bn_count ns) * p) mod q = 0 \ + (u' + length ns') * q = + (v + bn_count ns) * q + (foldl (+) 0 ns - bn_count ns) * p + r'" + moreover assume "r < q" + hence "snd (bn_comp n p q r) < q" + by (rule bn_comp_snd_less) + hence "r' < q" + using B by simp + moreover assume E: "(r + (Suc (Suc (foldl (+) 0 ns + n)) - + bn_count (Suc (Suc n) # ns)) * p) mod q = 0" + from B [symmetric] have "(r' + (foldl (+) 0 ns - bn_count ns) * p) mod q = 0" + proof (rule_tac trans [OF _ E], rule_tac bn_comp.cases [of "(n, p, q, r)"], + simp_all add: add_mult_distrib diff_mult_distrib mod_add_left_eq, + rule_tac arg_cong2 [where f = "(mod)"], simp_all) + fix n p q r + have "bn_count ns * p \ foldl (+) 0 ns * p" + using D by (rule mult_le_mono1) + thus "p + (p + (n * p + (foldl (+) 0 ns * p - bn_count ns * p))) = + p + (p + (foldl (+) 0 ns * p + n * p)) - bn_count ns * p" + by arith + qed + ultimately have "(u' + length ns') * q = + (v + bn_count ns) * q + (foldl (+) 0 ns - bn_count ns) * p + r'" + by simp + with A [symmetric] and B [symmetric] show + "(u' + (length ms' + length ns')) * q = + (u + bn_count (Suc (Suc n) # ns)) * q + + (Suc (Suc (foldl (+) 0 ns + n)) - bn_count (Suc (Suc n) # ns)) * p + r" + proof (rule_tac bn_comp.cases [of "(n, p, q, r)"], + simp_all add: round_suc_suc_def Let_def enum_length split: if_split_asm) + fix m p' q' r' + assume + "n = Suc (Suc m)" and + "p = p'" and + "q = q'" and + "r = r'" + moreover have "n \ fst (bn_comp n p q r)" + using C by (rule bn_comp_fst_ge) + ultimately have "Suc (Suc m) \ (p' + (p' + m * p') + r') div q'" + (is "_ \ ?a div _") + by simp + hence F: "Suc (Suc m) * q' \ ?a div q' * q'" + by (rule mult_le_mono1) + moreover assume "(u' + length ns') * q' = + (u + ?a div q' - Suc (Suc m) + bn_count ns) * q' + + (foldl (+) 0 ns - bn_count ns) * p' + ?a mod q'" + ultimately have "(u' + length ns') * q' + Suc (Suc m) * q' = + (u + bn_count ns) * q' + (foldl (+) 0 ns - bn_count ns) * p' + + ?a div q' * q' + ?a mod q'" + (is "?c = ?d") + proof (simp add: add_mult_distrib diff_mult_distrib) + have "Suc (Suc m) * q' \ ?a div q' * q' + ?a mod q'" + using F by arith + hence "q' + (q' + m * q') \ ?a" + (is "?b \ _") + by simp + thus + "?a + (u * q' + (bn_count ns * q' + + (foldl (+) 0 ns * p' - bn_count ns * p'))) - ?b + ?b = + ?a + (u * q' + (bn_count ns * q' + + (foldl (+) 0 ns * p' - bn_count ns * p')))" + by simp + qed + moreover have "?c = q' + (q' + (u' + (m + length ns')) * q')" + (is "_ = ?e") + by (simp add: add_mult_distrib) + moreover have "bn_count ns * p' \ foldl (+) 0 ns * p'" + using D by (rule mult_le_mono1) + hence "?d = (u + bn_count ns) * q' + + ((Suc (Suc (foldl (+) 0 ns + m)) - bn_count ns) * p' + r')" + (is "_ = ?f") + by (simp (no_asm_simp) add: add_mult_distrib diff_mult_distrib) + ultimately show "?e = ?f" by simp + next + fix m p' q' r' + assume "(u' + length ns') * q' = bn_count ns * q' + + (foldl (+) 0 ns - bn_count ns) * p' + (p' + (p' + m * p') + r') mod q'" + (is "_ = _ + _ + ?a mod _") + hence "(u' + length ns') * q' + (u + ?a div q') * q' = + (u + bn_count ns) * q' + (foldl (+) 0 ns - bn_count ns) * p' + ?a" + (is "?c = ?d") + by (simp add: add_mult_distrib) + moreover have "?c = (u' + (u + ?a div q' + length ns')) * q'" + (is "_ = ?e") + by (simp add: add_mult_distrib) + moreover have "bn_count ns * p' \ foldl (+) 0 ns * p'" + using D by (rule mult_le_mono1) + hence "?d = (u + bn_count ns) * q' + + ((Suc (Suc (foldl (+) 0 ns + m)) - bn_count ns) * p' + r')" + (is "_ = ?f") + by (simp (no_asm_simp) add: add_mult_distrib diff_mult_distrib) + ultimately show "?e = ?f" by simp + qed +qed + +lemma round_len_eq [rule_format]: + "bn_count (fst (snd t)) = foldl (+) 0 (fst (snd t)) \ + length (fst (snd (round index key p q r t))) = foldl (+) 0 (fst (snd t))" +proof (induction index key p q r t rule: round.induct, simp_all add: Let_def + split: prod.split del: all_simps, ((rule allI)+, (rule impI)+, simp add: + add_suc)+, subst (asm) (3) add_base_zero, subst add_base_zero) + fix index p q r u n ns n' v ms' ws' + and key :: "'a \ 'b" and xs :: "'a list" and ns' :: "nat list" and r' :: nat + let ?ws = "take (Suc (Suc n)) xs" + assume + A: "round_suc_suc index key ?ws n n' u = (v, ms', ws')" and + B: "bn_count (Suc (Suc n) # ns) = Suc (Suc (foldl (+) 0 ns + n))" + assume "\ws a b c d e f g h. + ws = ?ws \ a = (n', r') \ b = n' \ c = r' \ + d = (v, ms', ws') \ e = v \ f = (ms', ws') \ g = ms' \ h = ws' \ + bn_count ns = foldl (+) 0 ns \ length ns' = foldl (+) 0 ns" + moreover have C: "n = 0 \ n = Suc 0" + using B by (rule_tac bn_comp.cases [of "(n, p, q, r)"], + insert bn_count_le [of ns], simp_all) + hence "bn_count ns = foldl (+) 0 ns" + using B by (erule_tac disjE, simp_all) + ultimately have "length ns' = foldl (+) 0 ns" by simp + with A [symmetric] show "length ms' + length ns' = + Suc (Suc (foldl (+) 0 ns + n))" + by (rule_tac disjE [OF C], simp_all + add: round_suc_suc_def Let_def enum_length split: if_split_asm) +qed + +theorem round_len: + assumes + A: "length xs = foldl (+) 0 ns" and + B: "length xs \ p" + shows "if bn_count ns < length xs + then fst (gcsort_round index key p ns xs) + + length (fst (snd (gcsort_round index key p ns xs))) = p + else length (fst (snd (gcsort_round index key p ns xs))) = length xs" + (is "if _ then fst ?t + _ = _ else _") +proof (split if_split, rule conjI, rule_tac [!] impI) + assume C: "bn_count ns < length xs" + moreover have + "bn_inv (p - bn_count ns) (foldl (+) 0 ns - bn_count ns) (0, ns, xs)" + using A and B by (rule_tac bn_inv_intro, simp) + ultimately have + "(fst ?t + length (fst (snd ?t))) * (length xs - bn_count ns) = + bn_count ns * (length xs - bn_count ns) + + (p - bn_count ns) * (length xs - bn_count ns)" + (is "?a * ?b = ?c") + using A by (subst round_len_less, simp_all) + also have "bn_count ns \ p" + using B and C by simp + hence "bn_count ns * ?b \ p * ?b" + by (rule mult_le_mono1) + hence "?c = p * ?b" + by (simp (no_asm_simp) add: diff_mult_distrib) + finally have "?a * ?b = p * ?b" . + thus "?a = p" + using C by simp +next + assume "\ bn_count ns < length xs" + moreover have "bn_count ns \ foldl (+) 0 ns" + by (rule bn_count_le) + ultimately show "length (fst (snd ?t)) = length xs" + using A by (subst round_len_eq, simp_all) +qed + +end diff --git a/thys/Generalized_Counting_Sort/Conservation.thy b/thys/Generalized_Counting_Sort/Conservation.thy new file mode 100644 --- /dev/null +++ b/thys/Generalized_Counting_Sort/Conservation.thy @@ -0,0 +1,2036 @@ +(* Title: An Efficient Generalization of Counting Sort for Large, possibly Infinite Key Ranges + Author: Pasquale Noce + Software Engineer at HID Global, Italy + pasquale dot noce dot lavoro at gmail dot com + pasquale dot noce at hidglobal dot com +*) + +section "Proof of objects' conservation" + +theory Conservation + imports + Algorithm + "HOL-Library.Multiset" +begin + +text \ +\null + +In this section, it is formally proven that GCsort \emph{conserves objects}, viz. that the objects' +list returned by function @{const gcsort} contains as many occurrences of any given object as the +input objects' list. + +Here below, steps 5, 6, and 7 of the proof method are accomplished. Particularly, @{text count_inv} +is the predicate that will be shown to be invariant over inductive set @{const gcsort_set}. + +\null +\ + +fun count_inv :: "('a \ nat) \ nat \ nat list \ 'a list \ bool" where +"count_inv f (u, ns, xs) = (\x. count (mset xs) x = f x)" + +lemma gcsort_count_input: + "count_inv (count (mset xs)) (0, [length xs], xs)" +by simp + +lemma gcsort_count_intro: + "count_inv f t \ count (mset (gcsort_out t)) x = f x" +by (cases t, simp add: gcsort_out_def) + +text \ +\null + +The main task to be accomplished to prove that GCsort conserves objects is to prove that so does +function @{const fill} in case its input offsets' list is computed via the composition of functions +@{const offs} and @{const enum}, as happens within function @{const round}. + +To achieve this result, a multi-step strategy will be adopted. The first step, addressed here below, +opens with the definition of predicate @{text offs_pred}, satisfied by an offsets' list $ns$ and an +objects' list $xs$ just in case each bucket delimited by $ns$ is sufficiently large to accommodate +the corresponding objects in $xs$. Then, lemma @{text offs_pred_cons} shows that this predicate, if +satisfied initially, keeps being true if each object in $xs$ is consumed as happens within function +@{const fill}, viz. increasing the corresponding offset in $ns$ by one. + +\null +\ + +definition offs_num :: "nat \ 'a list \ ('a, 'b) index_sign \ + ('a \ 'b) \ 'b \ 'b \ nat \ nat" where +"offs_num n xs index key mi ma i \ + length [x\xs. index key x n mi ma = i]" + +abbreviation offs_set_next :: "nat list \ 'a list \ ('a, 'b) index_sign \ + ('a \ 'b) \ 'b \ 'b \ nat \ nat set" where +"offs_set_next ns xs index key mi ma i \ + {k. k < length ns \ i < k \ 0 < offs_num (length ns) xs index key mi ma k}" + +abbreviation offs_set_prev :: "nat list \ 'a list \ ('a, 'b) index_sign \ + ('a \ 'b) \ 'b \ 'b \ nat \ nat set" where +"offs_set_prev ns xs index key mi ma i \ + {k. i < length ns \ k < i \ 0 < offs_num (length ns) xs index key mi ma k}" + +definition offs_next :: "nat list \ nat \ 'a list \ ('a, 'b) index_sign \ + ('a \ 'b) \ 'b \ 'b \ nat \ nat" where +"offs_next ns ub xs index key mi ma i \ + if offs_set_next ns xs index key mi ma i = {} + then ub else ns ! Min (offs_set_next ns xs index key mi ma i)" + +definition offs_none :: "nat list \ nat \ 'a list \ ('a, 'b) index_sign \ + ('a \ 'b) \ 'b \ 'b \ nat \ bool" where +"offs_none ns ub xs index key mi ma i \ + (\j < length ns. 0 < offs_num (length ns) xs index key mi ma j \ + i \ {ns ! j + offs_num (length ns) xs index key mi ma j..< + offs_next ns ub xs index key mi ma j}) \ + offs_num (length ns) xs index key mi ma 0 = 0 \ + i < offs_next ns ub xs index key mi ma 0 \ + 0 < offs_num (length ns) xs index key mi ma 0 \ + i < ns ! 0" + +definition offs_pred :: "nat list \ nat \ 'a list \ ('a, 'b) index_sign \ + ('a \ 'b) \ 'b \ 'b \ bool" where +"offs_pred ns ub xs index key mi ma \ + \i < length ns. offs_num (length ns) xs index key mi ma i \ + offs_next ns ub xs index key mi ma i - ns ! i" + +lemma offs_num_cons: + "offs_num n (x # xs) index key mi ma i = + (if index key x n mi ma = i then Suc else id) (offs_num n xs index key mi ma i)" +by (simp add: offs_num_def) + +lemma offs_next_prev: + "(0 < offs_num (length ns) xs index key mi ma i \ + offs_set_next ns xs index key mi ma i \ {} \ + Min (offs_set_next ns xs index key mi ma i) = j) = + (0 < offs_num (length ns) xs index key mi ma j \ + offs_set_prev ns xs index key mi ma j \ {} \ + Max (offs_set_prev ns xs index key mi ma j) = i)" + (is "?P = ?Q") +proof (rule iffI, (erule_tac [!] conjE)+) + let ?A = "offs_set_next ns xs index key mi ma i" + let ?B = "offs_set_prev ns xs index key mi ma j" + assume + A: "0 < offs_num (length ns) xs index key mi ma i" and + B: "?A \ {}" and + C: "Min ?A = j" + have "Min ?A \ ?A" + using B by (rule_tac Min_in, simp) + hence D: "j \ ?A" + using C by simp + hence E: "i \ ?B" + using A by simp + show ?Q + proof (rule conjI, rule_tac [2] conjI) + show "0 < offs_num (length ns) xs index key mi ma j" + using D by simp + next + show "?B \ {}" + using E by blast + next + from E show "Max ?B = i" + proof (subst Max_eq_iff, simp, blast, simp, rule_tac allI, rule_tac impI, + (erule_tac conjE)+, rule_tac ccontr, simp) + fix k + assume F: "k < j" and "j < length ns" + hence "k < length ns" by simp + moreover assume + "\ k \ i" and + "0 < offs_num (length ns) xs index key mi ma k" + ultimately have "k \ ?A" by simp + hence "Min ?A \ k" + by (rule_tac Min_le, simp) + thus False + using C and F by simp + qed + qed +next + let ?A = "offs_set_prev ns xs index key mi ma j" + let ?B = "offs_set_next ns xs index key mi ma i" + assume + A: "0 < offs_num (length ns) xs index key mi ma j" and + B: "?A \ {}" and + C: "Max ?A = i" + have "Max ?A \ ?A" + using B by (rule_tac Max_in, simp) + hence D: "i \ ?A" + using C by simp + hence E: "j \ ?B" + using A by simp + show ?P + proof (rule conjI, rule_tac [2] conjI) + show "0 < offs_num (length ns) xs index key mi ma i" + using D by simp + next + show "?B \ {}" + using E by blast + next + from E show "Min ?B = j" + proof (subst Min_eq_iff, simp, blast, simp, rule_tac allI, rule_tac impI, + (erule_tac conjE)+, rule_tac ccontr, simp) + fix k + assume + "j < length ns" and + "\ j \ k" and + "0 < offs_num (length ns) xs index key mi ma k" + hence "k \ ?A" by simp + hence "k \ Max ?A" + by (rule_tac Max_ge, simp) + moreover assume "i < k" + ultimately show False + using C by simp + qed + qed +qed + +lemma offs_next_cons_eq: + assumes + A: "index key x (length ns) mi ma = i" and + B: "i < length ns" and + C: "0 < offs_num (length ns) (x # xs) index key mi ma j" + shows + "offs_set_prev ns (x # xs) index key mi ma i = {} \ + Max (offs_set_prev ns (x # xs) index key mi ma i) \ j \ + offs_next (ns[i := Suc (ns ! i)]) ub xs index key mi ma j = + offs_next ns ub (x # xs) index key mi ma j" + (is "?P \ ?Q \ _") +proof (simp only: disj_imp, cases "j < i") + let ?A = "offs_set_prev ns (x # xs) index key mi ma i" + let ?B = "offs_set_next (ns[i := Suc (ns ! i)]) xs index key mi ma j" + let ?C = "offs_set_next ns (x # xs) index key mi ma j" + case True + hence D: "j \ ?A" + using B and C by simp + hence "j \ Max ?A" + by (rule_tac Max_ge, simp) + moreover assume "\ ?P \ ?Q" + hence ?Q + using D by blast + ultimately have E: "j < Max ?A" by simp + have F: "Max ?A \ ?A" + using D by (rule_tac Max_in, simp, blast) + have G: "Max ?A \ ?B" + proof (simp, rule conjI, rule_tac [2] conjI) + show "Max ?A < length ns" + using F by auto + next + show "j < Max ?A" + using E . + next + have "0 < offs_num (length ns) (x # xs) index key mi ma (Max ?A)" + using F by blast + moreover have "Max ?A < i" + using F by blast + ultimately show "0 < offs_num (length ns) xs index key mi ma (Max ?A)" + using A by (simp add: offs_num_cons) + qed + hence H: "offs_next (ns[i := Suc (ns ! i)]) ub xs index key mi ma j = + ns[i := Suc (ns ! i)] ! Min ?B" + by (simp only: offs_next_def split: if_split, blast) + have "Min ?B \ Max ?A" + using G by (rule_tac Min_le, simp) + moreover have "Max ?A < i" + using F by blast + ultimately have I: "Min ?B < i" by simp + hence J: "offs_next (ns[i := Suc (ns ! i)]) ub xs index key mi ma j = + ns ! Min ?B" + using H by simp + have "Min ?B \ ?B" + using G by (rule_tac Min_in, simp, blast) + hence K: "Min ?B \ ?C" + using A and I by (simp add: offs_num_cons) + hence L: "Min ?C \ Min ?B" + by (rule_tac Min_le, simp) + have "Min ?C \ ?C" + using K by (rule_tac Min_in, simp, blast) + moreover have "Min ?C < i" + using L and I by simp + ultimately have "Min ?C \ ?B" + using A by (simp add: offs_num_cons) + hence "Min ?B \ Min ?C" + using G by (rule_tac Min_le, simp) + hence "Min ?B = Min ?C" + using L by simp + moreover have "offs_next ns ub (x # xs) index key mi ma j = ns ! Min ?C" + using K by (simp only: offs_next_def split: if_split, blast) + ultimately show ?thesis + using J by simp +next + let ?A = "offs_set_next (ns[i := Suc (ns ! i)]) xs index key mi ma j" + let ?B = "offs_set_next ns (x # xs) index key mi ma j" + case False + hence "?A = ?B" + using A by (rule_tac set_eqI, simp add: offs_num_cons) + thus ?thesis + proof (simp only: offs_next_def split: if_split, + (rule_tac conjI, blast, rule_tac impI)+) + assume "?B \ {}" + hence "Min ?B \ ?B" + by (rule_tac Min_in, simp) + hence "i < Min ?B" + using False by simp + thus "ns[i := Suc (ns ! i)] ! Min ?B = ns ! Min ?B" by simp + qed +qed + +lemma offs_next_cons_neq: + assumes + A: "index key x (length ns) mi ma = i" and + B: "offs_set_prev ns (x # xs) index key mi ma i \ {}" and + C: "Max (offs_set_prev ns (x # xs) index key mi ma i) = j" + shows "offs_next (ns[i := Suc (ns ! i)]) ub xs index key mi ma j = + (if 0 < offs_num (length ns) xs index key mi ma i + then Suc (ns ! i) + else offs_next ns ub (x # xs) index key mi ma i)" +proof (simp, rule conjI, rule_tac [!] impI) + let ?A = "offs_set_next ns (x # xs) index key mi ma j" + assume "0 < offs_num (length ns) xs index key mi ma i" + with A have "offs_set_next (ns[i := Suc (ns ! i)]) xs index key mi ma j = ?A" + by (rule_tac set_eqI, rule_tac iffI, simp_all add: offs_num_cons split: + if_split_asm) + moreover have "0 < offs_num (length ns) (x # xs) index key mi ma i" + using A by (simp add: offs_num_def) + hence "0 < offs_num (length ns) (x # xs) index key mi ma j \ ?A \ {} \ + Min ?A = i" + using B and C by (subst offs_next_prev, simp) + ultimately show "offs_next (ns[i := Suc (ns ! i)]) ub xs index key mi ma j = + Suc (ns ! i)" + using B by (simp only: offs_next_def, simp, subst nth_list_update_eq, blast, + simp) +next + let ?A = "offs_set_prev ns (x # xs) index key mi ma i" + assume "offs_num (length ns) xs index key mi ma i = 0" + with A have "offs_set_next (ns[i := Suc (ns ! i)]) xs index key mi ma j = + offs_set_next ns (x # xs) index key mi ma i" + proof (rule_tac set_eqI, rule_tac iffI, simp_all add: offs_num_cons split: + if_split_asm, rule_tac conjI, rule_tac notI, simp, rule_tac impI, + (erule_tac [!] conjE)+, rule_tac ccontr, simp_all) + fix k + have "i < length ns" + using B by blast + moreover assume "i \ k" and "\ i < k" + hence "k < i" by simp + moreover assume "0 < offs_num (length ns) xs index key mi ma k" + ultimately have "k \ ?A" + by (simp add: offs_num_cons) + hence "k \ Max ?A" + by (rule_tac Max_ge, simp) + moreover assume "j < k" + ultimately show False + using C by simp + next + fix k + have "Max ?A \ ?A" + using B by (rule_tac Max_in, simp_all) + hence "j < i" + using C by simp + moreover assume "i < k" + ultimately show "j < k" by simp + qed + with B show "offs_next (ns[i := Suc (ns ! i)]) ub xs index key mi ma j = + offs_next ns ub (x # xs) index key mi ma i" + proof (simp only: offs_next_def split: if_split, (rule_tac conjI, rule_tac [!] impI, + simp)+, subst nth_list_update, blast, simp (no_asm_simp)) + assume "offs_set_next ns (x # xs) index key mi ma i \ {}" + hence "Min (offs_set_next ns (x # xs) index key mi ma i) + \ offs_set_next ns (x # xs) index key mi ma i" + by (rule_tac Min_in, simp) + thus "i \ Min (offs_set_next ns (x # xs) index key mi ma i)" by simp + qed +qed + +lemma offs_pred_ub_aux [rule_format]: + assumes A: "offs_pred ns ub xs index key mi ma" + shows "i < length ns \ + \j < length ns. i \ j \ 0 < offs_num (length ns) xs index key mi ma j \ + ns ! j + offs_num (length ns) xs index key mi ma j \ ub" +proof (erule strict_inc_induct, rule_tac [!] allI, (rule_tac [!] impI)+, + drule le_less_Suc_eq, simp_all) + fix i + assume B: "length ns = Suc i" + hence "offs_num (Suc i) xs index key mi ma i \ + offs_next ns ub xs index key mi ma i - ns ! i" + using A by (simp add: offs_pred_def) + moreover have "offs_next ns ub xs index key mi ma i = ub" + using B by (simp add: offs_next_def) + ultimately have "offs_num (Suc i) xs index key mi ma i \ ub - ns ! i" by simp + moreover assume "0 < offs_num (Suc i) xs index key mi ma i" + ultimately show "ns ! i + offs_num (Suc i) xs index key mi ma i \ ub" by simp +next + fix i j + assume "j < length ns" + hence "offs_num (length ns) xs index key mi ma j \ + offs_next ns ub xs index key mi ma j - ns ! j" + using A by (simp add: offs_pred_def) + moreover assume + B: "\k < length ns. Suc i \ k \ + 0 < offs_num (length ns) xs index key mi ma k \ + ns ! k + offs_num (length ns) xs index key mi ma k \ ub" and + C: "i \ j" + have "offs_next ns ub xs index key mi ma j \ ub" + proof (simp only: offs_next_def split: if_split, rule conjI, simp, rule impI) + let ?A = "offs_set_next ns xs index key mi ma j" + assume "?A \ {}" + hence "Min ?A \ ?A" + by (rule_tac Min_in, simp) + hence "ns ! Min ?A + offs_num (length ns) xs index key mi ma (Min ?A) \ ub" + using B and C by simp + thus "ns ! Min ?A \ ub" by simp + qed + ultimately have "offs_num (length ns) xs index key mi ma j \ ub - ns ! j" + by simp + moreover assume "0 < offs_num (length ns) xs index key mi ma j" + ultimately show "ns ! j + offs_num (length ns) xs index key mi ma j \ ub" + by simp +qed + +lemma offs_pred_ub: + "\offs_pred ns ub xs index key mi ma; i < length ns; + 0 < offs_num (length ns) xs index key mi ma i\ \ + ns ! i + offs_num (length ns) xs index key mi ma i \ ub" +by (drule offs_pred_ub_aux, assumption+, simp) + +lemma offs_pred_asc_aux [rule_format]: + assumes A: "offs_pred ns ub xs index key mi ma" + shows "i < length ns \ + \j k. k < length ns \ i \ j \ j < k \ + 0 < offs_num (length ns) xs index key mi ma j \ + 0 < offs_num (length ns) xs index key mi ma k \ + ns ! j + offs_num (length ns) xs index key mi ma j \ ns ! k" +proof (erule strict_inc_induct, simp, (rule allI)+, (rule impI)+, simp) + fix i j k + assume + B: "k < length ns" and + C: "j < k" + hence "j < length ns" by simp + hence "offs_num (length ns) xs index key mi ma j \ + offs_next ns ub xs index key mi ma j - ns ! j" + using A by (simp add: offs_pred_def) + moreover assume + D: "\j k. k < length ns \ Suc i \ j \ j < k \ + 0 < offs_num (length ns) xs index key mi ma j \ + 0 < offs_num (length ns) xs index key mi ma k \ + ns ! j + offs_num (length ns) xs index key mi ma j \ ns ! k" and + E: "i \ j" and + F: "0 < offs_num (length ns) xs index key mi ma k" + have "offs_next ns ub xs index key mi ma j \ ns ! k" + proof (simp only: offs_next_def split: if_split, rule conjI, rule_tac [!] impI, + rule ccontr, simp) + assume "\n > j. n < length ns \ + offs_num (length ns) xs index key mi ma n = 0" + hence "offs_num (length ns) xs index key mi ma k = 0" + using B and C by simp + thus False + using F by simp + next + let ?A = "offs_set_next ns xs index key mi ma j" + have G: "k \ ?A" + using B and C and F by simp + hence "Min ?A \ k" + by (rule_tac Min_le, simp) + hence "Min ?A < k \ Min ?A = k" + by (simp add: le_less) + moreover { + have "Suc i \ Min ?A \ Min ?A < k \ + 0 < offs_num (length ns) xs index key mi ma (Min ?A) \ + ns ! Min ?A + offs_num (length ns) xs index key mi ma (Min ?A) \ ns ! k" + using B and D and F by simp + moreover assume "Min ?A < k" + moreover have "Min ?A \ ?A" + using G by (rule_tac Min_in, simp, blast) + ultimately have "ns ! Min ?A < ns ! k" + using E by simp + } + moreover { + assume "Min ?A = k" + hence "ns ! Min ?A = ns ! k" by simp + } + ultimately show "ns ! Min ?A \ ns ! k" + by (simp add: le_less, blast) + qed + ultimately have "offs_num (length ns) xs index key mi ma j \ ns ! k - ns ! j" + by simp + moreover assume "0 < offs_num (length ns) xs index key mi ma j" + ultimately show "ns ! j + offs_num (length ns) xs index key mi ma j \ ns ! k" + by simp +qed + +lemma offs_pred_asc: + "\offs_pred ns ub xs index key mi ma; i < j; j < length ns; + 0 < offs_num (length ns) xs index key mi ma i; + 0 < offs_num (length ns) xs index key mi ma j\ \ + ns ! i + offs_num (length ns) xs index key mi ma i \ ns ! j" +by (drule offs_pred_asc_aux, erule less_trans, assumption+, rule order_refl) + +lemma offs_pred_next: + assumes + A: "offs_pred ns ub xs index key mi ma" and + B: "i < length ns" and + C: "0 < offs_num (length ns) xs index key mi ma i" + shows "ns ! i < offs_next ns ub xs index key mi ma i" +proof (simp only: offs_next_def split: if_split, rule conjI, rule_tac [!] impI) + have "ns ! i + offs_num (length ns) xs index key mi ma i \ ub" + using A and B and C by (rule offs_pred_ub) + thus "ns ! i < ub" + using C by simp +next + assume "offs_set_next ns xs index key mi ma i \ {}" + hence "Min (offs_set_next ns xs index key mi ma i) + \ offs_set_next ns xs index key mi ma i" + by (rule_tac Min_in, simp) + hence "ns ! i + offs_num (length ns) xs index key mi ma i \ + ns ! Min (offs_set_next ns xs index key mi ma i)" + using C by (rule_tac offs_pred_asc [OF A], simp_all) + thus "ns ! i < ns ! Min (offs_set_next ns xs index key mi ma i)" + using C by simp +qed + +lemma offs_pred_next_cons_less: + assumes + A: "offs_pred ns ub (x # xs) index key mi ma" and + B: "index key x (length ns) mi ma = i" and + C: "offs_set_prev ns (x # xs) index key mi ma i \ {}" and + D: "Max (offs_set_prev ns (x # xs) index key mi ma i) = j" + shows "offs_next ns ub (x # xs) index key mi ma j < + offs_next (ns[i := Suc (ns ! i)]) ub xs index key mi ma j" + (is "?M < ?N") +proof - + have E: "0 < offs_num (length ns) (x # xs) index key mi ma i" + using B by (simp add: offs_num_cons) + hence "0 < offs_num (length ns) (x # xs) index key mi ma j \ + offs_set_next ns (x # xs) index key mi ma j \ {} \ + Min (offs_set_next ns (x # xs) index key mi ma j) = i" + using C and D by (subst offs_next_prev, simp) + hence F: "?M = ns ! i" + by (simp only: offs_next_def, simp) + have "?N = (if 0 < offs_num (length ns) xs index key mi ma i + then Suc (ns ! i) + else offs_next ns ub (x # xs) index key mi ma i)" + using B and C and D by (rule offs_next_cons_neq) + thus ?thesis + proof (split if_split_asm) + assume "?N = Suc (ns ! i)" + thus ?thesis + using F by simp + next + assume "?N = offs_next ns ub (x # xs) index key mi ma i" + moreover have "ns ! i < offs_next ns ub (x # xs) index key mi ma i" + using C by (rule_tac offs_pred_next [OF A _ E], blast) + ultimately show ?thesis + using F by simp + qed +qed + +lemma offs_pred_next_cons: + assumes + A: "offs_pred ns ub (x # xs) index key mi ma" and + B: "index key x (length ns) mi ma = i" and + C: "i < length ns" and + D: "0 < offs_num (length ns) (x # xs) index key mi ma j" + shows "offs_next ns ub (x # xs) index key mi ma j \ + offs_next (ns[i := Suc (ns ! i)]) ub xs index key mi ma j" + (is "?M \ ?N") +proof - + let ?P = "offs_set_prev ns (x # xs) index key mi ma i \ {} \ + Max (offs_set_prev ns (x # xs) index key mi ma i) = j" + have "?P \ \ ?P" by blast + moreover { + assume ?P + hence "?M < ?N" + using B by (rule_tac offs_pred_next_cons_less [OF A], simp_all) + hence ?thesis by simp + } + moreover { + assume "\ ?P" + hence "?N = ?M" + by (rule_tac offs_next_cons_eq [OF B C D], blast) + hence ?thesis by simp + } + ultimately show ?thesis .. +qed + +lemma offs_pred_cons: + assumes + A: "offs_pred ns ub (x # xs) index key mi ma" and + B: "index key x (length ns) mi ma = i" and + C: "i < length ns" + shows "offs_pred (ns[i := Suc (ns ! i)]) ub xs index key mi ma" + using A +proof (simp add: offs_pred_def, rule_tac allI, rule_tac impI, simp) + let ?ns' = "ns[i := Suc (ns ! i)]" + fix j + assume + "\j < length ns. offs_num (length ns) (x # xs) index key mi ma j \ + offs_next ns ub (x # xs) index key mi ma j - ns ! j" and + "j < length ns" + hence D: "offs_num (length ns) (x # xs) index key mi ma j \ + offs_next ns ub (x # xs) index key mi ma j - ns ! j" + by simp + from B and C show "offs_num (length ns) xs index key mi ma j \ + offs_next ?ns' ub xs index key mi ma j - ?ns' ! j" + proof (cases "j = i", case_tac [2] "0 < offs_num (length ns) xs index key mi ma j", + simp_all) + assume "j = i" + hence "offs_num (length ns) xs index key mi ma i \ + offs_next ns ub (x # xs) index key mi ma i - Suc (ns ! i)" + using B and D by (simp add: offs_num_cons) + moreover have "offs_next ns ub (x # xs) index key mi ma i \ + offs_next ?ns' ub xs index key mi ma i" + using B by (rule_tac offs_pred_next_cons [OF A _ C], simp_all add: + offs_num_cons) + ultimately show "offs_num (length ns) xs index key mi ma i \ + offs_next ?ns' ub xs index key mi ma i - Suc (ns ! i)" + by simp + next + assume "j \ i" + hence "offs_num (length ns) xs index key mi ma j \ + offs_next ns ub (x # xs) index key mi ma j - ns ! j" + using B and D by (simp add: offs_num_cons) + moreover assume "0 < offs_num (length ns) xs index key mi ma j" + hence "offs_next ns ub (x # xs) index key mi ma j \ + offs_next ?ns' ub xs index key mi ma j" + using B by (rule_tac offs_pred_next_cons [OF A _ C], simp_all add: + offs_num_cons) + ultimately show "offs_num (length ns) xs index key mi ma j \ + offs_next ?ns' ub xs index key mi ma j - ns ! j" + by simp + qed +qed + +text \ +\null + +The next step consists of proving, as done in lemma @{text fill_count_item} in what follows, that if +certain conditions hold, particularly if offsets' list $ns$ and objects' list $xs$ satisfy predicate +@{const offs_pred}, then function @{const fill} conserves objects if called using $xs$ and $ns$ as +its input arguments. + +This lemma is proven by induction on $xs$. Hence, lemma @{thm [source] offs_pred_cons}, proven in +the previous step, is used to remove the antecedent containing predicate @{const offs_pred} from the +induction hypothesis, which has the form of an implication. + +\null +\ + +lemma offs_next_zero: + assumes + A: "i < length ns" and + B: "offs_num (length ns) xs index key mi ma i = 0" and + C: "offs_set_prev ns xs index key mi ma i = {}" + shows "offs_next ns ub xs index key mi ma 0 = + offs_next ns ub xs index key mi ma i" +proof - + have "offs_set_next ns xs index key mi ma 0 = + offs_set_next ns xs index key mi ma i" + proof (rule set_eqI, rule iffI, simp_all, (erule conjE)+, rule ccontr, simp add: + not_less) + fix j + assume D: "0 < offs_num (length ns) xs index key mi ma j" + assume "j \ i" + hence "j < i \ j = i" + by (simp add: le_less) + moreover { + assume "j < i" + hence "j \ offs_set_prev ns xs index key mi ma i" + using A and D by simp + hence False + using C by simp + } + moreover { + assume "j = i" + hence False + using B and D by simp + } + ultimately show False .. + qed + thus ?thesis + by (simp only: offs_next_def) +qed + +lemma offs_next_zero_cons_eq: + assumes + A: "index key x (length ns) mi ma = i" and + B: "offs_num (length ns) (x # xs) index key mi ma 0 = 0" and + C: "offs_set_prev ns (x # xs) index key mi ma i \ {}" + (is "?A \ _") + shows "offs_next (ns[i := Suc (ns ! i)]) ub xs index key mi ma 0 = + offs_next ns ub (x # xs) index key mi ma 0" +proof - + have D: "Min ?A \ ?A" + using C by (rule_tac Min_in, simp) + moreover have E: "0 < Min ?A" + proof (rule ccontr, simp) + assume "Min ?A = 0" + hence "offs_num (length ns) (x # xs) index key mi ma (Min ?A) = 0" + using B by simp + moreover have "0 < offs_num (length ns) (x # xs) index key mi ma (Min ?A)" + using D by auto + ultimately show False by simp + qed + ultimately have "Min ?A \ offs_set_next ns (x # xs) index key mi ma 0" + (is "_ \ ?B") + by auto + moreover from this have "Min ?B = Min ?A" + proof (subst Min_eq_iff, simp, blast, simp, rule_tac allI, rule_tac impI, + (erule_tac conjE)+, rule_tac ccontr, simp add: not_le) + fix j + assume F: "j < Min ?A" + have "i < length ns" + using D by simp + moreover have "Min ?A < i" + using D by auto + hence "j < i" + using F by simp + moreover assume "0 < offs_num (length ns) (x # xs) index key mi ma j" + ultimately have "j \ ?A" by simp + hence "Min ?A \ j" + by (rule_tac Min_le, simp) + thus False + using F by simp + qed + moreover have "Min ?A \ offs_set_next (ns[i := Suc (ns ! i)]) + xs index key mi ma 0" + (is "_ \ ?C") + using D and E by (auto, simp add: offs_num_cons A [symmetric]) + moreover from this have "Min ?C = Min ?A" + proof (subst Min_eq_iff, simp, blast, simp, rule_tac allI, rule_tac impI, + (erule_tac conjE)+, rule_tac ccontr, simp add: not_le) + fix j + assume F: "j < Min ?A" + have "i < length ns" + using D by simp + moreover have "Min ?A < i" + using D by auto + hence "j < i" + using F by simp + moreover assume "0 < offs_num (length ns) xs index key mi ma j" + hence "0 < offs_num (length ns) (x # xs) index key mi ma j" + by (simp add: offs_num_cons) + ultimately have "j \ ?A" by simp + hence "Min ?A \ j" + by (rule_tac Min_le, simp) + thus False + using F by simp + qed + moreover have "Min ?A < i" + using D by auto + ultimately show ?thesis + by (simp only: offs_next_def split: if_split, + (rule_tac conjI, blast, rule_tac impI)+, simp) +qed + +lemma offs_next_zero_cons_neq: + assumes + A: "index key x (length ns) mi ma = i" and + B: "i < length ns" and + C: "0 < i" and + D: "offs_set_prev ns (x # xs) index key mi ma i = {}" + shows "offs_next (ns[i := Suc (ns ! i)]) ub xs index key mi ma 0 = + (if 0 < offs_num (length ns) xs index key mi ma i + then Suc (ns ! i) + else offs_next ns ub (x # xs) index key mi ma i)" +proof (simp, rule conjI, rule_tac [!] impI) + let ?ns' = "ns[i := Suc (ns ! i)]" + assume "0 < offs_num (length ns) xs index key mi ma i" + with A have "offs_set_next ?ns' xs index key mi ma 0 = + offs_set_next ns (x # xs) index key mi ma 0" + by (rule_tac set_eqI, rule_tac iffI, simp_all add: offs_num_cons split: + if_split_asm) + moreover have "i \ offs_set_next ns (x # xs) index key mi ma 0" + using A and B and C by (simp add: offs_num_cons) + moreover from this have + "Min (offs_set_next ns (x # xs) index key mi ma 0) = i" + proof (subst Min_eq_iff, simp, blast, simp, rule_tac allI, rule_tac impI, + (erule_tac conjE)+, rule_tac ccontr, simp add: not_le) + fix j + assume "j < i" and "0 < offs_num (length ns) (x # xs) index key mi ma j" + hence "j \ offs_set_prev ns (x # xs) index key mi ma i" + using B by simp + thus False + using D by simp + qed + ultimately show "offs_next ?ns' ub xs index key mi ma 0 = Suc (ns ! i)" + by (simp only: offs_next_def split: if_split, rule_tac conjI, rule_tac [!] impI, + simp_all) +next + let ?ns' = "ns[i := Suc (ns ! i)]" + assume "offs_num (length ns) xs index key mi ma i = 0" + moreover have "offs_set_prev ?ns' xs index key mi ma i = {}" + using D by (simp add: offs_num_cons split: if_split_asm, blast) + ultimately have "offs_next ?ns' ub xs index key mi ma 0 = + offs_next ?ns' ub xs index key mi ma i" + using B by (rule_tac offs_next_zero, simp_all) + moreover have "offs_next ?ns' ub xs index key mi ma i = + offs_next ns ub (x # xs) index key mi ma i" + using A and B and D by (rule_tac offs_next_cons_eq, simp_all add: + offs_num_cons) + ultimately show "offs_next ?ns' ub xs index key mi ma 0 = + offs_next ns ub (x # xs) index key mi ma i" + by simp +qed + +lemma offs_pred_zero_cons_less: + assumes + A: "offs_pred ns ub (x # xs) index key mi ma" and + B: "index key x (length ns) mi ma = i" and + C: "i < length ns" and + D: "0 < i" and + E: "offs_set_prev ns (x # xs) index key mi ma i = {}" + shows "offs_next ns ub (x # xs) index key mi ma 0 < + offs_next (ns[i := Suc (ns ! i)]) ub xs index key mi ma 0" + (is "?M < ?N") +proof - + have "i \ offs_set_next ns (x # xs) index key mi ma 0" + using B and C and D by (simp add: offs_num_cons) + moreover from this have + "Min (offs_set_next ns (x # xs) index key mi ma 0) = i" + proof (subst Min_eq_iff, simp, blast, simp, rule_tac allI, rule_tac impI, + (erule_tac conjE)+, rule_tac ccontr, simp add: not_le) + fix j + assume "j < i" and "0 < offs_num (length ns) (x # xs) index key mi ma j" + hence "j \ offs_set_prev ns (x # xs) index key mi ma i" + using C by simp + thus False + using E by simp + qed + ultimately have F: "?M = ns ! i" + by (simp only: offs_next_def split: if_split, blast) + have "?N = (if 0 < offs_num (length ns) xs index key mi ma i + then Suc (ns ! i) + else offs_next ns ub (x # xs) index key mi ma i)" + using B and C and D and E by (rule offs_next_zero_cons_neq) + thus ?thesis + proof (split if_split_asm) + assume "?N = Suc (ns ! i)" + thus ?thesis + using F by simp + next + assume "?N = offs_next ns ub (x # xs) index key mi ma i" + moreover have "ns ! i < offs_next ns ub (x # xs) index key mi ma i" + using B by (rule_tac offs_pred_next [OF A C], simp add: offs_num_cons) + ultimately show ?thesis + using F by simp + qed +qed + +lemma offs_pred_zero_cons: + assumes + A: "offs_pred ns ub (x # xs) index key mi ma" and + B: "index key x (length ns) mi ma = i" and + C: "i < length ns" and + D: "offs_num (length ns) (x # xs) index key mi ma 0 = 0" + shows "offs_next ns ub (x # xs) index key mi ma 0 \ + offs_next (ns[i := Suc (ns ! i)]) ub xs index key mi ma 0" + (is "?M \ ?N") +proof (cases "offs_set_prev ns (x # xs) index key mi ma i = {}") + case True + have "0 < i" + using B and D by (rule_tac ccontr, simp add: offs_num_cons) + hence "?M < ?N" + using True and B by (rule_tac offs_pred_zero_cons_less [OF A _ C]) + thus ?thesis by simp +next + case False + hence "?N = ?M" + by (rule offs_next_zero_cons_eq [OF B D]) + thus ?thesis by simp +qed + +lemma replicate_count: + "count (mset (replicate n x)) x = n" +by (induction n, simp_all) + +lemma fill_none [rule_format]: + assumes A: "index_less index key" + shows + "(\x \ set xs. key x \ {mi..ma}) \ + ns \ [] \ + offs_pred ns ub xs index key mi ma \ + offs_none ns ub xs index key mi ma i \ + fill xs ns index key ub mi ma ! i = None" +proof (induction xs arbitrary: ns, simp add: offs_none_def offs_num_def offs_next_def, + (rule impI)+, simp add: Let_def, (erule conjE)+) + fix x xs and ns :: "nat list" + let ?i' = "index key x (length ns) mi ma" + let ?ns' = "ns[?i' := Suc (ns ! ?i')]" + assume + B: "offs_pred ns ub (x # xs) index key mi ma" and + C: "offs_none ns ub (x # xs) index key mi ma i" + assume + D: "ns \ []" and "mi \ key x" and "key x \ ma" + moreover from this have + E: "?i' < length ns" + using A by (simp add: index_less_def) + hence "offs_pred ?ns' ub xs index key mi ma" + by (rule_tac offs_pred_cons [OF B], simp) + moreover assume "\ns. ns \ [] \ offs_pred ns ub xs index key mi ma \ + offs_none ns ub xs index key mi ma i \ + fill xs ns index key ub mi ma ! i = None" + ultimately have + F: "offs_none ?ns' ub xs index key mi ma i \ + fill xs ?ns' index key ub mi ma ! i = None" + by simp + show "(fill xs ?ns' index key ub mi ma)[ns ! ?i' := Some x] ! i = None" + proof (insert C, simp add: offs_none_def, erule disjE, erule_tac [2] disjE, simp_all + add: offs_num_cons split: if_split_asm, erule conjE, rule case_split, drule mp, + assumption, simp_all, (erule conjE)+, (erule_tac [2] conjE)+, + erule_tac [3] conjE, erule_tac [5] conjE) + fix j + assume + G: "?i' = j" and + H: "j < length ns" and + I: "Suc (ns ! j + offs_num (length ns) xs index key mi ma j) \ i" and + J: "i < offs_next ns ub (x # xs) index key mi ma j" + show "fill xs (ns[j := Suc (ns ! j)]) index key ub mi ma ! i = None" + proof (cases "0 < offs_num (length ns) xs index key mi ma j", + case_tac [2] "offs_set_prev ns (x # xs) index key mi ma j \ {}", + simp_all only: not_not not_gr0) + have "j < length ns \ 0 < offs_num (length ns) xs index key mi ma j \ + ?ns' ! j + offs_num (length ns) xs index key mi ma j \ i \ + i < offs_next ?ns' ub xs index key mi ma j \ + fill xs ?ns' index key ub mi ma ! i = None" + using F by (simp add: offs_none_def, blast) + moreover assume "0 < offs_num (length ns) xs index key mi ma j" + moreover have "?ns' ! j + offs_num (length ns) xs index key mi ma j \ i" + using G and H and I by simp + moreover have "0 < offs_num (length ns) (x # xs) index key mi ma j" + using G by (simp add: offs_num_cons) + hence "offs_next ns ub (x # xs) index key mi ma j \ + offs_next ?ns' ub xs index key mi ma j" + using G and H by (rule_tac offs_pred_next_cons [OF B], simp_all) + hence "i < offs_next ?ns' ub xs index key mi ma j" + using J by simp + ultimately have "fill xs ?ns' index key ub mi ma ! i = None" + using H by blast + thus ?thesis + using G by simp + next + let ?j' = "Max (offs_set_prev ns (x # xs) index key mi ma j)" + have "?j' < length ns \ 0 < offs_num (length ns) xs index key mi ma ?j' \ + ?ns' ! ?j' + offs_num (length ns) xs index key mi ma ?j' \ i \ + i < offs_next ?ns' ub xs index key mi ma ?j' \ + fill xs ?ns' index key ub mi ma ! i = None" + using F by (simp add: offs_none_def, blast) + moreover assume K: "offs_set_prev ns (x # xs) index key mi ma j \ {}" + hence "?j' \ offs_set_prev ns (x # xs) index key mi ma j" + by (rule_tac Max_in, simp) + hence L: "?j' < length ns \ ?j' < j \ + 0 < offs_num (length ns) xs index key mi ma ?j'" + using G by (auto, subst (asm) (2) offs_num_cons, simp) + moreover have "ns ! ?j' + offs_num (length ns) (x # xs) + index key mi ma ?j' \ ns ! j" + using G and H and L by (rule_tac offs_pred_asc [OF B], simp_all add: + offs_num_cons) + hence "?ns' ! ?j' + offs_num (length ns) xs index key mi ma ?j' \ ns ! j" + using G and H and L by (subst nth_list_update, simp_all add: offs_num_cons) + hence "?ns' ! ?j' + offs_num (length ns) xs index key mi ma ?j' \ i" + using I by simp + moreover assume M: "offs_num (length ns) xs index key mi ma j = 0" + have "offs_next (ns[j := Suc (ns ! j)]) ub xs index key mi ma ?j' = + (if 0 < offs_num (length ns) xs index key mi ma j + then Suc (ns ! j) + else offs_next ns ub (x # xs) index key mi ma j)" + using G and K by (rule_tac offs_next_cons_neq, simp_all) + hence "offs_next ?ns' ub xs index key mi ma ?j' = + offs_next ns ub (x # xs) index key mi ma j" + using G and M by simp + hence "i < offs_next ?ns' ub xs index key mi ma ?j'" + using J by simp + ultimately have "fill xs ?ns' index key ub mi ma ! i = None" by blast + thus ?thesis + using G by simp + next + have "offs_num (length ns) xs index key mi ma 0 = 0 \ + i < offs_next ?ns' ub xs index key mi ma 0 \ + fill xs ?ns' index key ub mi ma ! i = None" + using F by (simp add: offs_none_def) + moreover assume + K: "offs_set_prev ns (x # xs) index key mi ma j = {}" and + L: "offs_num (length ns) xs index key mi ma j = 0" + have "offs_set_prev ns (x # xs) index key mi ma j = + offs_set_prev ?ns' xs index key mi ma j" + using G by (rule_tac set_eqI, rule_tac iffI, + simp_all add: offs_num_cons split: if_split_asm) + hence M: "offs_set_prev ?ns' xs index key mi ma j = {}" + using K by simp + hence "offs_num (length ns) xs index key mi ma 0 = 0" + using H and L by (cases j, simp_all) + moreover have N: "offs_next ?ns' ub xs index key mi ma 0 = + offs_next ?ns' ub xs index key mi ma j" + using H and L and M by (rule_tac offs_next_zero, simp_all) + have "offs_next ?ns' ub xs index key mi ma j = + offs_next ns ub (x # xs) index key mi ma j" + using G and H and K by (subst offs_next_cons_eq, simp_all add: + offs_num_cons) + hence "i < offs_next ?ns' ub xs index key mi ma 0" + using J and N by simp + ultimately have "fill xs ?ns' index key ub mi ma ! i = None" by blast + thus ?thesis + using G by simp + qed + next + fix j + assume + G: "?i' \ j" and + H: "j < length ns" and + I: "ns ! j + offs_num (length ns) xs index key mi ma j \ i" and + J: "i < offs_next ns ub (x # xs) index key mi ma j" and + K: "0 < offs_num (length ns) xs index key mi ma j" + from G have "ns ! ?i' \ i" + proof (rule_tac notI, cases "?i' < j", simp_all add: not_less le_less) + assume "?i' < j" + hence "ns ! ?i' + offs_num (length ns) (x # xs) index key mi ma ?i' \ ns ! j" + using H and K by (rule_tac offs_pred_asc [OF B], simp_all add: + offs_num_cons) + moreover assume "ns ! ?i' = i" + ultimately have "i < ns ! j" by (simp add: offs_num_cons) + thus False + using I by simp + next + assume "j < ?i'" + hence L: "?i' \ offs_set_next ns (x # xs) index key mi ma j" + (is "_ \ ?A") + using E by (simp add: offs_num_cons) + hence "Min ?A \ ?i'" + by (rule_tac Min_le, simp) + hence "Min ?A < ?i' \ Min ?A = ?i'" + by (simp add: le_less) + hence "ns ! Min ?A \ ns ! ?i'" + proof (rule disjE, simp_all) + assume "Min ?A < ?i'" + moreover have "Min ?A \ ?A" + using L by (rule_tac Min_in, simp, blast) + hence "0 < offs_num (length ns) (x # xs) index key mi ma (Min ?A)" + by simp + ultimately have "ns ! Min ?A + offs_num (length ns) (x # xs) + index key mi ma (Min ?A) \ ns ! ?i'" + using E by (rule_tac offs_pred_asc [OF B], simp_all add: offs_num_cons) + thus ?thesis by simp + qed + hence "offs_next ns ub (x # xs) index key mi ma j \ ns ! ?i'" + using L by (simp only: offs_next_def split: if_split, blast) + moreover assume "ns ! ?i' = i" + ultimately show False + using J by simp + qed + thus "(fill xs ?ns' index key ub mi ma)[ns ! ?i' := Some x] ! i = None" + proof simp + have "j < length ns \ 0 < offs_num (length ns) xs index key mi ma j \ + ?ns' ! j + offs_num (length ns) xs index key mi ma j \ i \ + i < offs_next ?ns' ub xs index key mi ma j \ + fill xs ?ns' index key ub mi ma ! i = None" + using F by (simp add: offs_none_def, blast) + moreover have "offs_next ns ub (x # xs) index key mi ma j \ + offs_next ?ns' ub xs index key mi ma j" + using E and K by (rule_tac offs_pred_next_cons [OF B], simp_all add: + offs_num_cons) + hence "i < offs_next ?ns' ub xs index key mi ma j" + using J by simp + ultimately show "fill xs ?ns' index key ub mi ma ! i = None" + using G and H and I and K by simp + qed + next + assume + G: "0 < ?i'" and + H: "offs_num (length ns) xs index key mi ma 0 = 0" and + I: "i < offs_next ns ub (x # xs) index key mi ma 0" + have "ns ! ?i' \ i" + proof + have "0 < offs_num (length ns) (x # xs) index key mi ma ?i'" + by (simp add: offs_num_cons) + hence L: "?i' \ offs_set_next ns (x # xs) index key mi ma 0" + (is "_ \ ?A") + using E and G by simp + hence "Min ?A \ ?i'" + by (rule_tac Min_le, simp) + hence "Min ?A < ?i' \ Min ?A = ?i'" + by (simp add: le_less) + hence "ns ! Min ?A \ ns ! ?i'" + proof (rule disjE, simp_all) + assume "Min ?A < ?i'" + moreover have "Min ?A \ ?A" + using L by (rule_tac Min_in, simp, blast) + hence "0 < offs_num (length ns) (x # xs) index key mi ma (Min ?A)" + by simp + ultimately have "ns ! Min ?A + offs_num (length ns) (x # xs) + index key mi ma (Min ?A) \ ns ! ?i'" + using E by (rule_tac offs_pred_asc [OF B], simp_all add: offs_num_cons) + thus ?thesis by simp + qed + hence "offs_next ns ub (x # xs) index key mi ma 0 \ ns ! ?i'" + using L by (simp only: offs_next_def split: if_split, blast) + moreover assume "ns ! ?i' = i" + ultimately show False + using I by simp + qed + thus "(fill xs ?ns' index key ub mi ma)[ns ! ?i' := Some x] ! i = None" + proof simp + have "offs_num (length ns) xs index key mi ma 0 = 0 \ + i < offs_next ?ns' ub xs index key mi ma 0 \ + fill xs ?ns' index key ub mi ma ! i = None" + using F by (simp add: offs_none_def) + moreover have "offs_next ns ub (x # xs) index key mi ma 0 \ + offs_next ?ns' ub xs index key mi ma 0" + using E and G and H by (rule_tac offs_pred_zero_cons [OF B], + simp_all add: offs_num_cons) + hence "i < offs_next ?ns' ub xs index key mi ma 0" + using I by simp + ultimately show "fill xs ?ns' index key ub mi ma ! i = None" + using H by simp + qed + next + assume + G: "?i' = 0" and + H: "i < ns ! 0" + show "fill xs (ns[0 := Suc (ns ! 0)]) index key ub mi ma ! i = None" + proof (cases "0 < offs_num (length ns) xs index key mi ma 0", simp_all) + have "0 < offs_num (length ns) xs index key mi ma 0 \ i < ?ns' ! 0 \ + fill xs ?ns' index key ub mi ma ! i = None" + using F by (simp add: offs_none_def) + moreover assume "0 < offs_num (length ns) xs index key mi ma 0" + moreover have "i < ?ns' ! 0" + using D and G and H by simp + ultimately show ?thesis + using G by simp + next + have "offs_num (length ns) xs index key mi ma 0 = 0 \ + i < offs_next ?ns' ub xs index key mi ma 0 \ + fill xs ?ns' index key ub mi ma ! i = None" + using F by (simp add: offs_none_def) + moreover assume "offs_num (length ns) xs index key mi ma 0 = 0" + moreover have I: "offs_next ?ns' ub xs index key mi ma 0 = + offs_next ns ub (x # xs) index key mi ma 0" + using D and G by (rule_tac offs_next_cons_eq, simp_all add: + offs_num_cons) + have "ns ! 0 < offs_next ns ub (x # xs) index key mi ma 0" + using D and G by (rule_tac offs_pred_next [OF B], simp_all add: + offs_num_cons) + hence "i < offs_next ?ns' ub xs index key mi ma 0" + using H and I by simp + ultimately show ?thesis + using G by simp + qed + next + assume + G: "0 < ?i'" and + H: "0 < offs_num (length ns) xs index key mi ma 0" and + I: "i < ns ! 0" + have "ns ! ?i' \ i" + proof - + have "ns ! 0 + offs_num (length ns) (x # xs) index key mi ma 0 \ ns ! ?i'" + using H by (rule_tac offs_pred_asc [OF B G E], simp_all add: + offs_num_cons) + moreover have "0 < offs_num (length ns) (x # xs) index key mi ma 0" + using H by (simp add: offs_num_cons) + ultimately show ?thesis + using I by simp + qed + thus "(fill xs ?ns' index key ub mi ma)[ns ! ?i' := Some x] ! i = None" + proof simp + have "0 < offs_num (length ns) xs index key mi ma 0 \ i < ?ns' ! 0 \ + fill xs ?ns' index key ub mi ma ! i = None" + using F by (simp add: offs_none_def) + moreover have "i < ?ns' ! 0" + using G and I by simp + ultimately show "fill xs ?ns' index key ub mi ma ! i = None" + using H by simp + qed + qed +qed + +lemma fill_index_none [rule_format]: + assumes + A: "index_less index key" and + B: "key x \ {mi..ma}" and + C: "ns \ []" and + D: "offs_pred ns ub (x # xs) index key mi ma" + shows "\x \ set xs. key x \ {mi..ma} \ + fill xs (ns[(index key x (length ns) mi ma) := + Suc (ns ! index key x (length ns) mi ma)]) index key ub mi ma ! + (ns ! index key x (length ns) mi ma) = None" + (is "_ \ fill _ ?ns' _ _ _ _ _ ! (_ ! ?i) = _") + using A and B and C and D +proof (rule_tac fill_none, simp_all, rule_tac offs_pred_cons, + simp_all, simp add: index_less_def, cases "0 < ?i", + cases "offs_set_prev ns (x # xs) index key mi ma ?i = {}", + case_tac [3] "0 < offs_num (length ns) xs index key mi ma 0") + assume + E: "0 < ?i" and + F: "offs_set_prev ns (x # xs) index key mi ma ?i = {}" + have G: "?i < length ns" + using A and B and C by (simp add: index_less_def) + hence "offs_num (length ns) (x # xs) index key mi ma 0 = 0" + using E and F by (rule_tac ccontr, simp) + hence "offs_num (length ns) xs index key mi ma 0 = 0" + by (simp add: offs_num_cons split: if_split_asm) + moreover have "offs_next ?ns' ub xs index key mi ma 0 = + (if 0 < offs_num (length ns) xs index key mi ma ?i + then Suc (ns ! ?i) + else offs_next ns ub (x # xs) index key mi ma ?i)" + using E and F and G by (rule_tac offs_next_zero_cons_neq, simp_all) + hence "ns ! ?i < offs_next ?ns' ub xs index key mi ma 0" + by (simp split: if_split_asm, rule_tac offs_pred_next [OF D G], simp add: + offs_num_cons) + ultimately show "offs_none ?ns' ub xs index key mi ma (ns ! ?i)" + by (simp add: offs_none_def) +next + assume + E: "0 < ?i" and + F: "offs_set_prev ns (x # xs) index key mi ma ?i \ {}" + (is "?A \ _") + have G: "?i < length ns" + using A and B and C by (simp add: index_less_def) + have H: "Max ?A \ ?A" + using F by (rule_tac Max_in, simp) + hence I: "Max ?A < ?i" by blast + have "Max ?A < length ns" + using H by auto + moreover have "0 < offs_num (length ns) (x # xs) index key mi ma (Max ?A)" + using H by auto + hence "0 < offs_num (length ns) xs index key mi ma (Max ?A)" + using I by (subst (asm) offs_num_cons, split if_split_asm, simp_all) + moreover have "ns ! Max ?A + offs_num (length ns) (x # xs) + index key mi ma (Max ?A) \ ns ! ?i" + using G and H by (rule_tac offs_pred_asc [OF D], simp_all add: offs_num_cons) + hence "?ns' ! Max ?A + offs_num (length ns) xs + index key mi ma (Max ?A) \ ns ! ?i" + using I by (simp add: offs_num_cons) + moreover have "offs_next ?ns' ub xs index key mi ma (Max ?A) = + (if 0 < offs_num (length ns) xs index key mi ma ?i + then Suc (ns ! ?i) + else offs_next ns ub (x # xs) index key mi ma ?i)" + using F and I by (rule_tac offs_next_cons_neq, simp_all) + hence "ns ! ?i < offs_next ?ns' ub xs index key mi ma (Max ?A)" + by (simp split: if_split_asm, rule_tac offs_pred_next [OF D G], simp add: + offs_num_cons) + ultimately show "offs_none ?ns' ub xs index key mi ma (ns ! ?i)" + by (simp add: offs_none_def, blast) +next + assume "0 < offs_num (length ns) xs index key mi ma 0" and "\ 0 < ?i" + moreover have "?i < length ns" + using A and B and C by (simp add: index_less_def) + ultimately show "offs_none ?ns' ub xs index key mi ma (ns ! ?i)" + by (simp add: offs_none_def) +next + assume + E: "\ 0 < ?i" and + F: "\ 0 < offs_num (length ns) xs index key mi ma 0" + have G: "?i < length ns" + using A and B and C by (simp add: index_less_def) + have "offs_num (length ns) xs index key mi ma 0 = 0" + using F by simp + moreover have "offs_next ?ns' ub xs index key mi ma ?i = + offs_next ns ub (x # xs) index key mi ma ?i" + using E and G by (rule_tac offs_next_cons_eq, simp_all add: offs_num_cons) + hence "ns ! ?i < offs_next ?ns' ub xs index key mi ma ?i" + by (simp, rule_tac offs_pred_next [OF D G], simp add: offs_num_cons) + hence "ns ! ?i < offs_next ?ns' ub xs index key mi ma 0" + using E by simp + ultimately show "offs_none ?ns' ub xs index key mi ma (ns ! ?i)" + by (simp add: offs_none_def) +qed + +lemma fill_count_item [rule_format]: + assumes A: "index_less index key" + shows + "(\x \ set xs. key x \ {mi..ma}) \ + ns \ [] \ + offs_pred ns ub xs index key mi ma \ + length xs \ ub \ + count (mset (map the (fill xs ns index key ub mi ma))) x = + count (mset xs) x + (if the None = x then ub - length xs else 0)" +proof (induction xs arbitrary: ns, simp add: replicate_count, (rule impI)+, + simp add: Let_def map_update del: count_add_mset mset_map split del: if_split, + (erule conjE)+, subst add_mset_add_single, simp only: count_single count_union) + fix y xs and ns :: "nat list" + let ?i = "index key y (length ns) mi ma" + let ?ns' = "ns[?i := Suc (ns ! ?i)]" + assume + B: "\x \ set xs. mi \ key x \ key x \ ma" and + C: "mi \ key y" and + D: "key y \ ma" and + E: "ns \ []" and + F: "offs_pred ns ub (y # xs) index key mi ma" and + G: "Suc (length xs) \ ub" + have H: "?i < length ns" + using A and C and D and E by (simp add: index_less_def) + assume "\ns. ns \ [] \ offs_pred ns ub xs index key mi ma \ + count (mset (map the (fill xs ns index key ub mi ma))) x = + count (mset xs) x + (if the None = x then ub - length xs else 0)" + moreover have "offs_pred ?ns' ub xs index key mi ma" + using F and H by (rule_tac offs_pred_cons, simp_all) + ultimately have "count (mset (map the (fill xs ?ns' index key ub mi ma))) x = + count (mset xs) x + (if the None = x then ub - length xs else 0)" + using E by simp + moreover have "ns ! ?i + offs_num (length ns) (y # xs) + index key mi ma ?i \ ub" + using F and H by (rule offs_pred_ub, simp add: offs_num_cons) + hence "ns ! ?i < ub" + by (simp add: offs_num_cons) + ultimately show "count (mset ((map the (fill xs ?ns' index key ub mi ma)) + [ns ! ?i := y])) x = count (mset xs) x + (if y = x then 1 else 0) + + (if the None = x then ub - length (y # xs) else 0)" + proof (subst mset_update, simp add: fill_length, subst add_mset_add_single, simp + only: count_diff count_single count_union, subst nth_map, simp add: fill_length, + subst add.assoc, subst (3) add.commute, subst add.assoc [symmetric], + subst add_right_cancel) + have "fill xs ?ns' index key ub mi ma ! (ns ! ?i) = None" + using B and C and D and E by (rule_tac fill_index_none [OF A _ _ F], + simp_all) + thus "count (mset xs) x + (if the None = x then ub - length xs else 0) - + (if the (fill xs ?ns' index key ub mi ma ! (ns ! ?i)) = x then 1 else 0) = + count (mset xs) x + (if the None = x then ub - length (y # xs) else 0)" + using G by simp + qed +qed + +text \ +\null + +Finally, lemma @{text offs_enum_pred} here below proves that, if $ns$ is the offsets' list obtained +by applying the composition of functions @{const offs} and @{const enum} to objects' list $xs$, then +predicate @{const offs_pred} is satisfied by $ns$ and $xs$. + +This result is in turn used, together with lemma @{thm [source] fill_count_item}, to prove lemma +@{text fill_offs_enum_count_item}, which states that function @{const fill} conserves objects if its +input offsets' list is computed via the composition of functions @{const offs} and @{const enum}. + +\null +\ + +lemma enum_offs_num: + "i < n \ enum xs index key n mi ma ! i = offs_num n xs index key mi ma i" +by (induction xs, simp add: offs_num_def, simp add: Let_def offs_num_cons, + subst nth_list_update_eq, simp_all add: enum_length) + +lemma offs_length: + "length (offs ns i) = length ns" +by (induction ns arbitrary: i, simp_all) + +lemma offs_add [rule_format]: + "i < length ns \ offs ns k ! i = foldl (+) k (take i ns)" +by (induction ns arbitrary: i k, simp, simp add: nth_Cons split: nat.split) + +lemma offs_mono_aux: + "i \ j \ j < length ns \ offs ns k ! i \ offs ns k ! (i + (j - i))" +by (simp only: offs_add take_add, simp add: add_le) + +lemma offs_mono: + "i \ j \ j < length ns \ offs ns k ! i \ offs ns k ! j" +by (frule offs_mono_aux, simp_all) + +lemma offs_update: + "j < length ns \ + offs (ns[i := Suc (ns ! i)]) k ! j = (if j \ i then id else Suc) (offs ns k ! j)" +by (simp add: offs_add not_le take_update_swap, rule impI, subst nth_take [symmetric], + assumption, subst add_update, simp_all) + +lemma offs_equal_suc: + assumes + A: "Suc i < length ns" and + B: "ns ! i = 0" + shows "offs ns m ! i = offs ns m ! Suc i" +proof - + have "offs ns m ! i = foldl (+) m (take i ns)" + using A by (subst offs_add, simp_all) + also have "\ = foldl (+) m (take i ns @ [ns ! i])" + using B by simp + also have "\ = foldl (+) m (take (Suc i) ns)" + using A by (subst take_Suc_conv_app_nth, simp_all) + also have "\ = offs ns m ! Suc i" + using A by (subst offs_add, simp_all) + finally show ?thesis . +qed + +lemma offs_equal [rule_format]: + "i < j \ j < length ns \ + (\k \ {i.. offs ns m ! i = offs ns m ! j" +proof (erule strict_inc_induct, rule_tac [!] impI, simp_all, erule offs_equal_suc, simp) + fix i + assume A: "i < j" and "j < length ns" + hence "Suc i < length ns" by simp + moreover assume "\k \ {i.. = offs ns m ! j" + finally show "offs ns m ! i = offs ns m ! j" . +qed + +lemma offs_enum_last [rule_format]: + assumes + A: "index_less index key" and + B: "0 < n" and + C: "\x \ set xs. key x \ {mi..ma}" + shows "offs (enum xs index key n mi ma) k ! (n - Suc 0) + + offs_num n xs index key mi ma (n - Suc 0) = length xs + k" +proof - + let ?ns = "enum xs index key n mi ma" + from B have D: "last ?ns = offs_num n xs index key mi ma (n - Suc 0)" + by (subst last_conv_nth, subst length_0_conv [symmetric], simp_all add: + enum_length, subst enum_offs_num, simp_all) + have "offs ?ns k ! (n - Suc 0) = foldl (+) k (take (n - Suc 0) ?ns)" + using B by (rule_tac offs_add, simp add: enum_length) + also have "\ = foldl (+) k (butlast ?ns)" + by (simp add: butlast_conv_take enum_length) + finally have "offs ?ns k ! (n - Suc 0) + offs_num n xs index key mi ma + (n - Suc 0) = foldl (+) k (butlast ?ns @ [last ?ns])" + using D by simp + also have "\ = foldl (+) k ?ns" + using B by (subst append_butlast_last_id, subst length_0_conv [symmetric], + simp_all add: enum_length) + also have "\ = foldl (+) 0 ?ns + k" + by (rule add_base_zero) + also have "\ = length xs + k" + using A and B and C by (subst enum_add, simp_all) + finally show ?thesis . +qed + +lemma offs_enum_ub [rule_format]: + assumes + A: "index_less index key" and + B: "i < n" and + C: "\x \ set xs. key x \ {mi..ma}" + shows "offs (enum xs index key n mi ma) k ! i \ length xs + k" +proof - + let ?ns = "enum xs index key n mi ma" + have "offs ?ns k ! i \ offs ?ns k ! (n - Suc 0)" + using B by (rule_tac offs_mono, simp_all add: enum_length) + also have "\ \ offs ?ns k ! (n - Suc 0) + + offs_num n xs index key mi ma (n - Suc 0)" + by simp + also have "\ = length xs + k" + using A and B and C by (rule_tac offs_enum_last, simp_all) + finally show ?thesis . +qed + +lemma offs_enum_next_ge [rule_format]: + assumes + A: "index_less index key" and + B: "i < n" + shows "\x \ set xs. key x \ {mi..ma} \ + offs (enum xs index key n mi ma) k ! i \ + offs_next (offs (enum xs index key n mi ma) k) (length xs + k) + xs index key mi ma i" + (is "_ \ offs ?ns _ ! _ \ _") +proof (simp only: offs_next_def split: if_split, rule conjI, rule_tac [!] impI, + rule offs_enum_ub [OF A B], simp) + assume "offs_set_next (offs ?ns k) xs index key mi ma i \ {}" + (is "?A \ _") + hence C: "Min ?A \ ?A" + by (rule_tac Min_in, simp) + hence "i \ Min ?A" by simp + moreover have "Min ?A < length ?ns" + using C by (simp add: offs_length) + ultimately show "offs ?ns k ! i \ offs ?ns k ! Min ?A" + by (rule offs_mono) +qed + +lemma offs_enum_zero_aux [rule_format]: + "\index_less index key; 0 < n; \x \ set xs. key x \ {mi..ma}; + offs_num n xs index key mi ma (n - Suc 0) = 0\ \ + offs (enum xs index key n mi ma) k ! (n - Suc 0) = length xs + k" +by (subst offs_enum_last [where key = key and mi = mi and ma = ma, + symmetric], simp+) + +lemma offs_enum_zero [rule_format]: + assumes + A: "index_less index key" and + B: "i < n" and + C: "\x \ set xs. key x \ {mi..ma}" and + D: "offs_num n xs index key mi ma i = 0" + shows "offs (enum xs index key n mi ma) k ! i = + offs_next (offs (enum xs index key n mi ma) k) (length xs + k) + xs index key mi ma i" +proof (simp only: offs_next_def split: if_split, rule conjI, rule_tac [!] impI, + cases "i = n - Suc 0", simp) + assume "i = n - Suc 0" + thus "offs (enum xs index key n mi ma) k ! (n - Suc 0) = length xs + k" + using A and B and C and D by (rule_tac offs_enum_zero_aux, simp_all) +next + let ?ns = "enum xs index key n mi ma" + assume E: "offs_set_next (offs ?ns k) xs index key mi ma i = {}" + (is "?A = _") + assume "i \ n - Suc 0" + hence F: "i < n - Suc 0" + using B by simp + hence "offs ?ns k ! i = offs ?ns k ! (n - Suc 0)" + proof (rule offs_equal, simp_all add: enum_length le_less, + erule_tac conjE, erule_tac disjE, rule_tac ccontr, drule_tac [2] sym, simp_all) + fix j + assume G: "j < n - Suc 0" + hence "j < length (offs ?ns k)" + by (simp add: offs_length enum_length) + moreover assume "i < j" + moreover assume "0 < ?ns ! j" + hence "0 < offs_num (length (offs ?ns k)) xs index key mi ma j" + using G by (subst (asm) enum_offs_num, simp_all add: + offs_length enum_length) + ultimately have "j \ ?A" by simp + thus False + using E by simp + next + show "?ns ! i = 0" + using B and D by (subst enum_offs_num, simp_all) + qed + also from A and B and C have "\ = length xs + k" + proof (rule_tac offs_enum_zero_aux, simp_all, rule_tac ccontr, simp) + have "n - Suc 0 < length (offs ?ns k)" + using B by (simp add: offs_length enum_length) + moreover assume "0 < offs_num n xs index key mi ma (n - Suc 0)" + hence "0 < offs_num (length (offs ?ns k)) xs index key mi ma (n - Suc 0)" + by (simp add: offs_length enum_length) + ultimately have "n - Suc 0 \ ?A" + using F by simp + thus False + using E by simp + qed + finally show "offs (enum xs index key n mi ma) k ! i = length xs + k" . +next + let ?ns = "enum xs index key n mi ma" + assume "offs_set_next (offs ?ns k) xs index key mi ma i \ {}" + (is "?A \ _") + hence "Min ?A \ ?A" + by (rule_tac Min_in, simp) + thus "offs ?ns k ! i = offs ?ns k ! Min ?A" + proof (rule_tac offs_equal, simp_all add: le_less, simp add: offs_length, + (erule_tac conjE)+, erule_tac disjE, rule_tac ccontr, drule_tac [2] sym, simp_all) + fix j + assume E: "j < Min ?A" and "Min ?A < length (offs ?ns k)" + hence F: "j < length (offs ?ns k)" by simp + moreover assume "i < j" + moreover assume "0 < ?ns ! j" + hence "0 < offs_num (length (offs ?ns k)) xs index key mi ma j" + using F by (subst (asm) enum_offs_num, simp_all add: + offs_length enum_length) + ultimately have "j \ ?A" by simp + hence "Min ?A \ j" + by (rule_tac Min_le, simp) + thus False + using E by simp + next + show "?ns ! i = 0" + using B and D by (subst enum_offs_num, simp_all) + qed +qed + +lemma offs_enum_next_cons [rule_format]: + assumes + A: "index_less index key" and + B: "\x \ set xs. key x \ {mi..ma}" + shows "(if i < index key x n mi ma then (\) else (<)) + (offs_next (offs (enum xs index key n mi ma) k) + (length xs + k) xs index key mi ma i) + (offs_next (offs ((enum xs index key n mi ma) [index key x n mi ma := + Suc (enum xs index key n mi ma ! index key x n mi ma)]) k) + (Suc (length xs + k)) (x # xs) index key mi ma i)" + (is "(if i < ?i' then _ else _) + (offs_next (offs ?ns _) _ _ _ _ _ _ _) + (offs_next (offs ?ns' _) _ _ _ _ _ _ _)") +proof (simp_all only: offs_next_def not_less split: if_split, (rule conjI, rule impI)+, + simp, simp, (rule_tac [!] impI, (rule_tac [!] conjI)?)+, rule_tac [2-3] FalseE, + rule_tac [4] conjI, rule_tac [4-5] impI) + assume + C: "offs_set_next (offs ?ns k) xs index key mi ma i = {}" + (is "?A = _") and + D: "offs_set_next (offs ?ns' k) (x # xs) index key mi ma i \ {}" + (is "?A' \ _") and + E: "i < ?i'" + from C have F: "\j \ ?i'. j \ ?A'" + by (rule_tac allI, rule_tac impI, rule_tac notI, simp add: enum_length + offs_length offs_num_cons split: if_split_asm, (erule_tac conjE)+, simp) + from D have "Min ?A' \ ?A'" + by (rule_tac Min_in, simp) + hence G: "Min ?A' < n" + by (simp add: offs_length enum_length) + have H: "Min ?A' = ?i'" + proof (rule Min_eqI, simp, rule eq_refl, erule contrapos_pp, insert F, simp) + have "\j. j \ ?A'" + using D by blast + then obtain j where "j \ ?A'" .. + moreover from this have "j = ?i'" + by (erule_tac contrapos_pp, insert F, simp) + ultimately show "?i' \ ?A'" by simp + qed + with G have "offs ?ns' k ! Min ?A' = offs ?ns k ! Min ?A'" + by (subst offs_update, simp_all add: enum_length) + also from A and B and G and H have + "\ = offs_next (offs ?ns k) (length xs + k) xs index key mi ma (Min ?A')" + proof (rule_tac offs_enum_zero, simp_all, rule_tac ccontr, simp) + assume "?i' < n" and "0 < offs_num n xs index key mi ma ?i'" + hence "?i' \ ?A" + using E by (simp add: offs_length enum_length) + thus False + using C by simp + qed + also have "\ = length xs + k" + proof (simp only: offs_next_def split: if_split, rule conjI, simp, rule impI, + rule FalseE, simp, erule exE, (erule conjE)+) + fix j + assume "j < length (offs ?ns k)" + moreover assume "Min ?A' < j" + hence "i < j" + using E and H by simp + moreover assume "0 < offs_num (length (offs ?ns k)) xs index key mi ma j" + ultimately have "j \ ?A" by simp + thus False + using C by simp + qed + finally show "length xs + k \ offs ?ns' k ! Min ?A'" by simp +next + assume + C: "offs_set_next (offs ?ns k) xs index key mi ma i = {}" + (is "?A = _") and + D: "offs_set_next (offs ?ns' k) (x # xs) index key mi ma i \ {}" + (is "?A' \ _") and + E: "?i' \ i" + have "\j. j \ ?A'" + using D by blast + then obtain j where F: "j \ ?A'" .. + hence "j < length (offs ?ns k)" + by (simp add: offs_length) + moreover have "i < j" + using F by simp + moreover from this have + "0 < offs_num (length (offs ?ns k)) xs index key mi ma j" + using E and F by (simp add: offs_length enum_length offs_num_cons) + ultimately have "j \ ?A" by simp + thus False + using C by simp +next + assume + C: "offs_set_next (offs ?ns k) xs index key mi ma i \ {}" + (is "?A \ _") and + D: "offs_set_next (offs ?ns' k) (x # xs) index key mi ma i = {}" + (is "?A' = _") + have "\j. j \ ?A" + using C by blast + then obtain j where E: "j \ ?A" .. + hence "j < length (offs ?ns' k)" + by (simp add: offs_length) + moreover have "i < j" + using E by simp + moreover have "0 < offs_num (length (offs ?ns' k)) (x # xs) index key mi ma j" + using E by (simp add: offs_length enum_length offs_num_cons) + ultimately have "j \ ?A'" by simp + thus False + using D by simp +next + assume "offs_set_next (offs ?ns k) xs index key mi ma i \ {}" + (is "?A \ _") + hence "Min ?A \ ?A" + by (rule_tac Min_in, simp) + hence C: "Min ?A < n" + by (simp add: offs_length enum_length) + assume "offs_set_next (offs ?ns' k) (x # xs) index key mi ma i \ {}" + (is "?A' \ _") + hence D: "Min ?A' \ ?A'" + by (rule_tac Min_in, simp) + hence E: "Min ?A' < n" + by (simp add: offs_length enum_length) + have "offs ?ns k ! Min ?A \ offs ?ns k ! Min ?A'" + proof (cases "offs_num n xs index key mi ma (Min ?A') = 0") + case True + have "offs ?ns k ! Min ?A' = + offs_next (offs ?ns k) (length xs + k) xs index key mi ma (Min ?A')" + using A and B and E and True by (rule_tac offs_enum_zero, simp_all) + also from A and B and C have "\ \ offs ?ns k ! Min ?A" + proof (simp only: offs_next_def split: if_split, rule_tac conjI, rule_tac [!] impI, + rule_tac offs_enum_ub, simp, simp, simp) + assume "offs_set_next (offs ?ns k) xs index key mi ma (Min ?A') \ {}" + (is "?B \ _") + hence "Min ?B \ ?B" + by (rule_tac Min_in, simp) + hence "Min ?B \ ?A" + using D by simp + moreover from this have "Min ?A \ Min ?B" + by (rule_tac Min_le, simp) + ultimately show "offs ?ns k ! Min ?A \ offs ?ns k ! Min ?B" + by (rule_tac offs_mono, simp_all add: offs_length) + qed + finally show ?thesis . + next + case False + hence "Min ?A' \ ?A" + using D by (simp add: offs_length enum_length) + hence "Min ?A \ Min ?A'" + by (rule_tac Min_le, simp) + thus ?thesis + by (rule offs_mono, simp_all add: enum_length E) + qed + also have "\ \ offs ?ns' k ! Min ?A'" + using E by (subst offs_update, simp_all add: enum_length) + finally show "offs ?ns k ! Min ?A \ offs ?ns' k ! Min ?A'" . +next + let ?A = "offs_set_next (offs ?ns k) xs index key mi ma i" + assume "offs_set_next (offs ?ns' k) (x # xs) index key mi ma i \ {}" + (is "?A' \ _") + hence C: "Min ?A' \ ?A'" + by (rule_tac Min_in, simp) + hence D: "Min ?A' < n" + by (simp add: offs_length enum_length) + assume "?i' \ i" + hence E: "?i' < Min ?A'" + using C by simp + hence "0 < offs_num n xs index key mi ma (Min ?A')" + using C by (simp add: offs_length enum_length offs_num_cons) + hence "Min ?A' \ ?A" + using C by (simp add: offs_length enum_length) + hence "Min ?A \ Min ?A'" + by (rule_tac Min_le, simp) + hence "offs ?ns k ! Min ?A \ offs ?ns k ! Min ?A'" + by (rule offs_mono, simp_all add: enum_length D) + also have "\ < offs ?ns' k ! Min ?A'" + using E by (subst offs_update, simp_all add: enum_length D) + finally show "offs ?ns k ! Min ?A < offs ?ns' k ! Min ?A'" . +qed + +lemma offs_enum_pred [rule_format]: + assumes A: "index_less index key" + shows "(\x \ set xs. key x \ {mi..ma}) \ + offs_pred (offs (enum xs index key n mi ma) k) (length xs + k) + xs index key mi ma" +proof (induction xs, simp add: offs_pred_def offs_num_def, + simp add: Let_def offs_pred_def offs_length enum_length, rule impI, (erule conjE)+, + simp, rule allI, rule impI, erule allE, drule mp, assumption) + fix x xs i + let ?i' = "index key x n mi ma" + let ?ns = "enum xs index key n mi ma" + let ?ns' = "?ns[?i' := Suc (?ns ! ?i')]" + assume + B: "\x \ set xs. mi \ key x \ key x \ ma" and + C: "i < n" and + D: "offs_num n xs index key mi ma i \ + offs_next (offs ?ns k) (length xs + k) xs index key mi ma i - offs ?ns k ! i" + have E: "(if i < ?i' then (\) else (<)) + (offs_next (offs ?ns k) (length xs + k) xs index key mi ma i) + (offs_next (offs ?ns' k) (Suc (length xs + k)) (x # xs) index key mi ma i)" + using A and B by (subst offs_enum_next_cons, simp_all) + show "offs_num n (x # xs) index key mi ma i \ + offs_next (offs ?ns' k) (Suc (length xs + k)) (x # xs) index key mi ma i - + offs ?ns' k ! i" + proof (subst offs_update, simp add: enum_length C, rule linorder_cases [of i ?i'], + simp_all add: offs_num_cons) + assume "i < ?i'" + hence "offs_next (offs ?ns k) (length xs + k) xs index key mi ma i \ + offs_next (offs ?ns' k) (Suc (length xs + k)) (x # xs) index key mi ma i" + using E by simp + thus "offs_num n xs index key mi ma i \ + offs_next (offs ?ns' k) (Suc (length xs + k)) (x # xs) index key mi ma i - + offs ?ns k ! i" + using D by arith + next + assume F: "i = ?i'" + hence "Suc (offs_num n xs index key mi ma ?i') \ + Suc (offs_next (offs ?ns k) (length xs + k) xs index key mi ma ?i' - + offs ?ns k ! ?i')" + using D by simp + also from A and B and C and F have "\ = + Suc (offs_next (offs ?ns k) (length xs + k) xs index key mi ma ?i') - + offs ?ns k ! ?i'" + by (rule_tac Suc_diff_le [symmetric], rule_tac offs_enum_next_ge, simp_all) + finally have "Suc (offs_num n xs index key mi ma ?i') \ + Suc (offs_next (offs ?ns k) (length xs + k) xs index key mi ma ?i') - + offs ?ns k ! ?i'" . + moreover have + "Suc (offs_next (offs ?ns k) (length xs + k) xs index key mi ma ?i') \ + offs_next (offs ?ns' k) (Suc (length xs + k)) (x # xs) index key mi ma ?i'" + using E and F by simp + ultimately show "Suc (offs_num n xs index key mi ma ?i') \ + offs_next (offs ?ns' k) (Suc (length xs + k)) (x # xs) index key mi ma ?i' - + offs ?ns k ! ?i'" + by arith + next + assume "?i' < i" + hence "Suc (offs_next (offs ?ns k) (length xs + k) xs index key mi ma i) \ + offs_next (offs ?ns' k) (Suc (length xs + k)) (x # xs) index key mi ma i" + using E by simp + thus "offs_num n xs index key mi ma i \ + offs_next (offs ?ns' k) (Suc (length xs + k)) (x # xs) index key mi ma i - + Suc (offs ?ns k ! i)" + using D by arith + qed +qed + +lemma fill_offs_enum_count_item [rule_format]: + "\index_less index key; \x \ set xs. key x \ {mi..ma}; 0 < n\ \ + count (mset (map the (fill xs (offs (enum xs index key n mi ma) 0) + index key (length xs) mi ma))) x = + count (mset xs) x" +by (subst fill_count_item, simp_all, simp only: length_greater_0_conv [symmetric] + offs_length enum_length, insert offs_enum_pred [of index key xs mi ma n 0], simp) + +text \ +\null + +Using lemma @{thm [source] fill_offs_enum_count_item}, step 9 of the proof method can now be dealt +with. It is accomplished by proving lemma @{text gcsort_count_inv}, which states that the number of +the occurrences of whatever object in the objects' list is still the same after any recursive round. + +\null +\ + +lemma nths_count: + "count (mset (nths xs A)) x = + count (mset xs) x - card {i. i < length xs \ i \ A \ xs ! i = x}" +proof (induction xs arbitrary: A, simp_all add: nths_Cons) + fix v xs A + let ?B = "{i. i < length xs \ Suc i \ A \ xs ! i = x}" + let ?C = "\v. {i. i < Suc (length xs) \ i \ A \ (v # xs) ! i = x}" + have A: "\v. ?C v = ?C v \ {0} \ ?C v \ {i. \j. i = Suc j}" + by (subst Int_Un_distrib [symmetric], auto, arith) + have "\v. card (?C v) = card (?C v \ {0}) + card (?C v \ {i. \j. i = Suc j})" + by (subst A, rule card_Un_disjoint, simp_all, blast) + moreover have "\v. card ?B = card (?C v \ {i. \j. i = Suc j})" + by (rule bij_betw_same_card [of Suc], auto) + ultimately show + "(0 \ A \ + (v = x \ Suc (count (mset xs) x - card ?B) = + Suc (count (mset xs) x) - card (?C x)) \ + (v \ x \ count (mset xs) x - card ?B = + count (mset xs) x - card (?C v))) \ + (0 \ A \ + (v = x \ count (mset xs) x - card ?B = + Suc (count (mset xs) x) - card (?C x)) \ + (v \ x \ count (mset xs) x - card ?B = + count (mset xs) x - card (?C v)))" + proof ((rule_tac [!] conjI, rule_tac [!] impI)+, simp_all) + have "card ?B \ count (mset xs) x" + by (simp add: count_mset length_filter_conv_card, rule card_mono, + simp, blast) + thus "Suc (count (mset xs) x - card ?B) = Suc (count (mset xs) x) - card ?B" + by (rule Suc_diff_le [symmetric]) + qed +qed + +lemma round_count_inv [rule_format]: + "index_less index key \ bn_inv p q t \ add_inv n t \ count_inv f t \ + count_inv f (round index key p q r t)" +proof (induction index key p q r t arbitrary: n f rule: round.induct, + (rule_tac [!] impI)+, simp, simp, simp_all only: simp_thms) + fix index p q r u ns xs n f and key :: "'a \ 'b" + let ?t = "round index key p q r (u, ns, tl xs)" + assume + "\n f. bn_inv p q (u, ns, tl xs) \ add_inv n (u, ns, tl xs) \ + count_inv f (u, ns, tl xs) \ count_inv f ?t" and + "bn_inv p q (u, Suc 0 # ns, xs)" and + "add_inv n (u, Suc 0 # ns, xs)" and + "count_inv f (u, Suc 0 # ns, xs)" + thus "count_inv f (round index key p q r (u, Suc 0 # ns, xs))" + proof (cases ?t, simp add: add_suc, rule_tac allI, cases xs, + simp_all add: disj_imp split: if_split_asm) + fix y ys x and xs' :: "'a list" + assume "\n f. foldl (+) 0 ns = n \ length ys = n \ + (\x. count (mset ys) x = f x) \ (\x. count (mset xs') x = f x)" + moreover assume "Suc (foldl (+) 0 ns) = n" and "Suc (length ys) = n" + ultimately have "\n f. (\x. count (mset ys) x = f x) \ + (\x. count (mset xs') x = f x)" + by blast + moreover assume A: "\x. (y = x \ Suc (count (mset ys) x) = f x) \ + (y \ x \ count (mset ys) x = f x)" + have "\x. count (mset ys) x = (f(y := f y - Suc 0)) x" + (is "\x. _ = ?f' x") + by (simp add: A, insert spec [OF A, where x = y], simp) + ultimately have "\x. count (mset xs') x = ?f' x" .. + thus "(y = x \ Suc (count (mset xs') x) = f x) \ + (y \ x \ count (mset xs') x = f x)" + by (simp, insert spec [OF A, where x = y], rule_tac impI, simp) + qed +next + fix index p q r u m ns n f and key :: "'a \ 'b" and xs :: "'a list" + let ?ws = "take (Suc (Suc m)) xs" + let ?ys = "drop (Suc (Suc m)) xs" + let ?r = "\m'. round_suc_suc index key ?ws m m' u" + let ?t = "\r' v. round index key p q r' (v, ns, ?ys)" + assume A: "index_less index key" + assume + "\ws a b c d e g h i n f. + ws = ?ws \ a = bn_comp m p q r \ (b, c) = bn_comp m p q r \ + d = ?r b \ (e, g) = ?r b \ (h, i) = g \ + bn_inv p q (e, ns, ?ys) \ add_inv n (e, ns, ?ys) \ + count_inv f (e, ns, ?ys) \ count_inv f (?t c e)" and + "bn_inv p q (u, Suc (Suc m) # ns, xs)" and + "add_inv n (u, Suc (Suc m) # ns, xs)" and + "count_inv f (u, Suc (Suc m) # ns, xs)" + thus "count_inv f (round index key p q r (u, Suc (Suc m) # ns, xs))" + proof (simp split: prod.split, ((rule_tac allI)+, ((rule_tac impI)+)?)+, + (erule_tac conjE)+, subst (asm) (2) add_base_zero, simp) + fix m' r' v ms' ws' xs' x + assume + B: "?r m' = (v, ms', ws')" and + C: "bn_comp m p q r = (m', r')" and + D: "bn_valid m p q" and + E: "Suc (Suc (foldl (+) 0 ns + m)) = n" and + F: "length xs = n" + assume "\ws a b c d e g h i n' f. + ws = ?ws \ a = (m', r') \ b = m' \ c = r' \ + d = (v, ms', ws') \ e = v \ g = (ms', ws') \ h = ms' \ i = ws' \ + foldl (+) 0 ns = n' \ n - Suc (Suc m) = n' \ + (\x. count (mset ?ys) x = f x) \ (\x. count (mset xs') x = f x)" + hence "foldl (+) 0 ns = n - Suc (Suc m) \ + (\x. count (mset xs') x = count (mset ?ys) x)" + by simp + hence "count (mset xs') x = count (mset ?ys) x" + using E by simp + moreover assume "\x. count (mset xs) x = f x" + ultimately have "f x = count (mset ?ws) x + count (mset xs') x" + by (subst (asm) append_take_drop_id [of "Suc (Suc m)", symmetric], + subst (asm) mset_append, simp) + with B [symmetric] show "count (mset ws') x + count (mset xs') x = f x" + proof (simp add: round_suc_suc_def Let_def del: count_add_mset mset_map + split: if_split_asm, subst (1 2) add_mset_add_single, simp + only: count_single count_union) + let ?nmi = "mini ?ws key" + let ?nma = "maxi ?ws key" + let ?xmi = "?ws ! ?nmi" + let ?xma = "?ws ! ?nma" + let ?mi = "key ?xmi" + let ?ma = "key ?xma" + let ?k = "case m of 0 \ m | Suc 0 \ m | Suc (Suc i) \ u + m'" + let ?zs = "nths ?ws (- {?nmi, ?nma})" + let ?ms = "enum ?zs index key ?k ?mi ?ma" + let ?A = "{i. i < Suc (Suc m) \ (i = ?nmi \ i = ?nma) \ ?ws ! i = x}" + have G: "length ?ws = Suc (Suc m)" + using E and F by simp + hence H: "card ?A \ count (mset ?ws) x" + by (simp add: count_mset length_filter_conv_card, rule_tac card_mono, + simp, blast) + show "count (mset (map the (fill ?zs (offs ?ms 0) index key m ?mi ?ma))) x + + (if ?xma = x then 1 else 0) + (if ?xmi = x then 1 else 0) = + count (mset ?ws) x" + proof (cases "m = 0") + case True + hence I: "length ?zs = 0" + using G by (simp add: mini_maxi_nths) + have "count (mset ?zs) x = count (mset ?ws) x - card ?A" + using G by (subst nths_count, simp) + hence J: "count (mset ?ws) x = card ?A" + using H and I by simp + from I show ?thesis + proof (simp, (rule_tac [!] conjI, rule_tac [!] impI)+, + simp_all (no_asm_simp) add: True) + assume "?xmi = x" and "?xma = x" + with G have "?A = {?nmi, ?nma}" + by (rule_tac set_eqI, rule_tac iffI, simp_all, erule_tac disjE, + insert mini_less [of ?ws key], insert maxi_less [of ?ws key], + simp_all) + with G have "card ?A = Suc (Suc 0)" + by (simp, subst card_insert_disjoint, simp_all, + rule_tac mini_maxi_neq, simp) + thus "Suc (Suc 0) = count (mset (take (Suc (Suc 0)) xs)) x" + using True and J by simp + next + assume "?xmi \ x" and "?xma = x" + with G have "?A = {?nma}" + by (rule_tac set_eqI, rule_tac iffI, simp_all, (erule_tac conjE)+, + erule_tac disjE, insert maxi_less [of ?ws key], simp_all) + thus "Suc 0 = count (mset (take (Suc (Suc 0)) xs)) x" + using True and J by simp + next + assume "?xmi = x" and "?xma \ x" + with G have "?A = {?nmi}" + by (rule_tac set_eqI, rule_tac iffI, simp_all, (erule_tac conjE)+, + erule_tac disjE, insert mini_less [of ?ws key], simp_all) + thus "Suc 0 = count (mset (take (Suc (Suc 0)) xs)) x" + using True and J by simp + next + assume "?xmi \ x" and "?xma \ x" + hence "?A = {}" + by (rule_tac set_eqI, rule_tac iffI, simp_all, (erule_tac conjE)+, + erule_tac disjE, simp_all) + hence "count (mset ?ws) x = 0" + using J by simp + thus "x \ set (take (Suc (Suc 0)) xs)" + using True by simp + qed + next + case False + hence "0 < ?k" + by (simp, drule_tac bn_comp_fst_nonzero [OF D], subst (asm) C, + simp split: nat.split) + hence "count (mset (map the (fill ?zs (offs ?ms 0) index key + (length ?zs) ?mi ?ma))) x = count (mset ?zs) x" + by (rule_tac fill_offs_enum_count_item [OF A], simp, rule_tac conjI, + ((rule_tac mini_lb | rule_tac maxi_ub), erule_tac in_set_nthsD)+) + with G show ?thesis + proof (simp, (rule_tac [!] conjI, rule_tac [!] impI)+, + simp_all add: mini_maxi_nths nths_count) + assume "?xmi = x" and "?xma = x" + with G have "?A = {?nmi, ?nma}" + by (rule_tac set_eqI, rule_tac iffI, simp_all, erule_tac disjE, + insert mini_less [of ?ws key], insert maxi_less [of ?ws key], + simp_all) + with G have "card ?A = Suc (Suc 0)" + by (simp, subst card_insert_disjoint, simp_all, + rule_tac mini_maxi_neq, simp) + thus "Suc (Suc (count (mset ?ws) x - card ?A)) = count (mset ?ws) x" + using H by simp + next + assume "?xmi \ x" and "?xma = x" + with G have "?A = {?nma}" + by (rule_tac set_eqI, rule_tac iffI, simp_all, (erule_tac conjE)+, + erule_tac disjE, insert maxi_less [of ?ws key], simp_all) + thus "Suc (count (mset ?ws) x - card ?A) = count (mset ?ws) x" + using H by simp + next + assume "?xmi = x" and "?xma \ x" + with G have "?A = {?nmi}" + by (rule_tac set_eqI, rule_tac iffI, simp_all, (erule_tac conjE)+, + erule_tac disjE, insert mini_less [of ?ws key], simp_all) + thus "Suc (count (mset ?ws) x - card ?A) = count (mset ?ws) x" + using H by simp + next + assume "?xmi \ x" and "?xma \ x" + hence "?A = {}" + by (rule_tac set_eqI, rule_tac iffI, simp_all, (erule_tac conjE)+, + erule_tac disjE, simp_all) + thus "count (mset ?ws) x - card ?A = count (mset ?ws) x" + by (simp (no_asm_simp)) + qed + qed + qed + qed +qed + +lemma gcsort_count_inv: + assumes + A: "index_less index key" and + B: "add_inv n t" and + C: "n \ p" + shows "\t' \ gcsort_set index key p t; count_inv f t\ \ + count_inv f t'" +by (erule gcsort_set.induct, simp, drule gcsort_add_inv [OF A _ B C], + rule round_count_inv [OF A], simp_all del: bn_inv.simps, erule conjE, + frule sym, erule subst, rule bn_inv_intro, insert C, simp) + +text \ +\null + +The only remaining task is to address step 10 of the proof method, which is done by proving theorem +@{text gcsort_count}. It holds under the conditions that the objects' number is not larger than the +counters' upper bound and function @{text index} satisfies predicate @{const index_less}, and states +that for any object, function @{const gcsort} leaves unchanged the number of its occurrences within +the input objects' list. + +\null +\ + +theorem gcsort_count: + assumes + A: "index_less index key" and + B: "length xs \ p" + shows "count (mset (gcsort index key p xs)) x = count (mset xs) x" +proof - + let ?t = "(0, [length xs], xs)" + have "count_inv (count (mset xs)) (gcsort_aux index key p ?t)" + by (rule gcsort_count_inv [OF A _ B], rule gcsort_add_input, + rule gcsort_aux_set, rule gcsort_count_input) + hence "count (mset (gcsort_out (gcsort_aux index key p ?t))) x = + (count (mset xs)) x" + by (rule gcsort_count_intro) + moreover have "?t = gcsort_in xs" + by (simp add: gcsort_in_def) + ultimately show ?thesis + by (simp add: gcsort_def) +qed + +end diff --git a/thys/Generalized_Counting_Sort/ROOT b/thys/Generalized_Counting_Sort/ROOT new file mode 100644 --- /dev/null +++ b/thys/Generalized_Counting_Sort/ROOT @@ -0,0 +1,10 @@ +session Generalized_Counting_Sort = "HOL-Library" + + options [timeout = 600] + theories + Algorithm + Conservation + Sorting + Stability + document_files + "root.bib" + "root.tex" diff --git a/thys/Generalized_Counting_Sort/Sorting.thy b/thys/Generalized_Counting_Sort/Sorting.thy new file mode 100644 --- /dev/null +++ b/thys/Generalized_Counting_Sort/Sorting.thy @@ -0,0 +1,1267 @@ +(* Title: An Efficient Generalization of Counting Sort for Large, possibly Infinite Key Ranges + Author: Pasquale Noce + Software Engineer at HID Global, Italy + pasquale dot noce dot lavoro at gmail dot com + pasquale dot noce at hidglobal dot com +*) + +section "Proof of objects' sorting" + +theory Sorting + imports Conservation +begin + +text \ +\null + +In this section, it is formally proven that GCsort actually sorts objects. + +Here below, steps 5, 6, and 7 of the proof method are accomplished. Predicate @{text sort_inv} is +satisfied just in case, for any bucket delimited by the input counters' list $ns$, the keys of the +corresponding objects within the input objects' list $xs$ are not larger than those of the objects, +if any, to the right of that bucket. The underlying idea is that this predicate: + +\begin{itemize} + +\item +is trivially satisfied by the output of function @{const gcsort_in}, which places all objects into a +single bucket, and + +\item +implies that $xs$ is sorted if every bucket delimited by $ns$ has size one, as happens when function +@{const gcsort_aux} terminates. + +\end{itemize} + +\null +\ + +fun sort_inv :: "('a \ 'b::linorder) \ nat \ nat list \ 'a list \ bool" where +"sort_inv key (u, ns, xs) = + (\i < length ns. \j < offs ns 0 ! i. \k \ {offs ns 0 ! i.. key (xs ! k))" + +lemma gcsort_sort_input: + "sort_inv key (0, [length xs], xs)" +by simp + +lemma offs_nth: + assumes + A: "find (\n. Suc 0 < n) ns = None" and + B: "foldl (+) 0 ns = n" and + C: "k < n" + shows "\i < length ns. offs ns 0 ! i = k" +proof (cases ns, insert B C, simp, cases "k = 0", rule exI [of _ 0], simp, + rule ccontr, simp (no_asm_use)) + fix m ms + assume + D: "ns = m # ms" and + E: "0 < k" and + F: "\i < length ns. offs ns 0 ! i \ k" + have G: "\n \ set ns. n \ Suc 0" + using A by (auto simp add: find_None_iff) + let ?A = "{i. i < length ns \ offs ns 0 ! i < k}" + have H: "Max ?A \ ?A" + using D and E by (rule_tac Max_in, simp_all, rule_tac exI [of _ 0], simp) + hence I: "Max ?A < length ns" + by simp + hence J: "offs ns 0 ! Max ?A = foldl (+) 0 (take (Max ?A) ns)" + by (rule offs_add) + have "Max ?A < length ns - Suc 0 \ Max ?A = length ns - Suc 0" + (is "?P \ ?Q") + using H by (simp, arith) + moreover { + assume ?P + hence K: "Suc (Max ?A) < length ns" by simp + hence "offs ns 0 ! Suc (Max ?A) = foldl (+) 0 (take (Suc (Max ?A)) ns)" + by (rule offs_add) + moreover have "take (Suc (Max ?A)) ns = take (Max ?A) ns @ [ns ! Max ?A]" + using I by (rule take_Suc_conv_app_nth) + ultimately have "offs ns 0 ! Suc (Max ?A) = + offs ns 0 ! Max ?A + ns ! Max ?A" + using J by simp + moreover have "offs ns 0 ! Max ?A < k" + using H by simp + moreover have "ns ! Max ?A \ set ns" + using I by (rule nth_mem) + with G have "ns ! Max ?A \ Suc 0" .. + ultimately have "offs ns 0 ! Suc (Max ?A) \ k" by simp + moreover have "offs ns 0 ! Suc (Max ?A) \ k" + using F and K by simp + ultimately have "offs ns 0 ! Suc (Max ?A) < k" by simp + hence "Suc (Max ?A) \ ?A" + using K by simp + hence "Suc (Max ?A) \ Max ?A" + by (rule_tac Max_ge, simp) + hence False by simp + } + moreover { + assume ?Q + hence "offs ns 0 ! Max ?A = foldl (+) 0 (take (length ns - Suc 0) ns)" + using J by simp + moreover have K: "length ns - Suc 0 < length ns" + using D by simp + hence "take (Suc (length ns - Suc 0)) ns = + take (length ns - Suc 0) ns @ [ns ! (length ns - Suc 0)]" + by (rule take_Suc_conv_app_nth) + hence "foldl (+) 0 ns = + foldl (+) 0 (take (length ns - Suc 0) ns @ [ns ! (length ns - Suc 0)])" + by simp + ultimately have "n = offs ns 0 ! Max ?A + ns ! (length ns - Suc 0)" + using B by simp + moreover have "offs ns 0 ! Max ?A < k" + using H by simp + moreover have "ns ! (length ns - Suc 0) \ set ns" + using K by (rule nth_mem) + with G have "ns ! (length ns - Suc 0) \ Suc 0" .. + ultimately have "n \ k" by simp + with C have False by simp + } + ultimately show False .. +qed + +lemma gcsort_sort_intro: + "\sort_inv key t; add_inv n t; find (\n. Suc 0 < n) (fst (snd t)) = None\ \ + sorted (map key (gcsort_out t))" +proof (cases t, simp add: sorted_iff_nth_mono_less gcsort_out_def, + erule conjE, (rule allI)+, (rule impI)+) + fix ns xs j k + assume "find (\n. Suc 0 < n) ns = None" and "foldl (+) 0 ns = n" + moreover assume A: "k < n" + ultimately have "\i < length ns. offs ns 0 ! i = k" + by (rule offs_nth) + then obtain i where "i < length ns \ offs ns 0 ! i = k" .. + moreover assume "\i < length ns. \j < offs ns 0 ! i. \k \ {offs ns 0 ! i.. key (xs ! k)" + hence "i < length ns \ j < offs ns 0 ! i \ k \ {offs ns 0 ! i.. + key (xs ! j) \ key (xs ! k)" + by simp + ultimately have "j < k \ k < n \ key (xs ! j) \ key (xs ! k)" + by simp + moreover assume "j < k" + ultimately show "key (xs ! j) \ key (xs ! k)" + using A by simp +qed + +text \ +\null + +As lemma @{thm [source] gcsort_sort_intro} comprises an additional assumption concerning the form of +the fixed points of function @{const gcsort_aux}, step 8 of the proof method is necessary this time +to prove that such assumption is satisfied. + +\null +\ + +lemma gcsort_sort_form: + "find (\n. Suc 0 < n) (fst (snd (gcsort_aux index key p t))) = None" +by (induction index key p t rule: gcsort_aux.induct, simp) + +text \ +\null + +Here below, step 9 of the proof method is accomplished. + +In the most significant case of the proof by recursion induction of lemma @{text round_sort_inv}, +namely that of a bucket $B$ with size larger than two and distinct minimum and maximum keys, the +following line of reasoning is adopted. Let $x$ be an object contained in a finer-grained bucket +$B'$ resulting from $B$'s partition, and $y$ an object to the right of $B'$. Then: + +\begin{itemize} + +\item +If $y$ is contained in some other finer-grained bucket resulting from $B$'s partition, inequality +@{term "key x \ key y"} holds because predicate @{const sort_inv} is satisfied by a counters' list +generated by function @{const enum} and an objects' list generated by function @{const fill} in case +@{const fill}'s input offsets' list is computed via the composition of functions @{const offs} and +@{const enum}, as happens within function @{const round}. +\\This is proven beforehand in lemma @{text fill_sort_inv}. + +\item +Otherwise, inequality @{term "key x \ key y"} holds as well because object $x$ was contained in $B$ +by lemma @{thm [source] fill_offs_enum_count_item}, object $y$ occurred to the right of $B$ by lemma +@{thm [source] round_count_inv}, and by hypothesis, the key of any object in $B$ was not larger than +that of any object to the right of $B$. + +\end{itemize} + +Using lemma @{text round_sort_inv}, the invariance of predicate @{const sort_inv} over inductive set +@{const gcsort_set} is then proven in lemma @{text gcsort_sort_inv}. + +\null +\ + +lemma mini_maxi_keys_le: + "x \ set xs \ key (xs ! mini xs key) \ key (xs ! maxi xs key)" +by (frule mini_lb, drule maxi_ub, erule order_trans) + +lemma mini_maxi_keys_eq [rule_format]: + "key (xs ! mini xs key) = key (xs ! maxi xs key) \ x \ set xs \ + key x = key (xs ! maxi xs key)" +by (induction xs, simp_all add: Let_def, (rule_tac [!] impI, (rule_tac [!] conjI)?)+, + rule_tac [2-4] impI, frule_tac [1-3] mini_maxi_keys_le [where key = key], simp_all) + +lemma offs_suc: + "i < length ns \ offs ns (Suc k) ! i = Suc (offs ns k ! i)" +by (simp add: offs_add add_suc) + +lemma offs_base_zero: + "i < length ns \ offs ns k ! i = offs ns 0 ! i + k" +by (simp add: offs_add, subst add_base_zero, simp) + +lemma offs_append: + "offs (ms @ ns) k = offs ms k @ offs ns (foldl (+) k ms)" +by (induction ms arbitrary: k, simp_all) + +lemma offs_enum_next_le [rule_format]: + assumes + A: "index_less index key" and + B: "i < j" and + C: "j < n" and + D: "\x \ set xs. key x \ {mi..ma}" + shows "offs_next (offs (enum xs index key n mi ma) k) (length xs + k) + xs index key mi ma i \ offs (enum xs index key n mi ma) k ! j" + (is "_ \ offs ?ns _ ! _") +proof (rule ccontr, simp add: not_le) + assume E: "offs ?ns k ! j < + offs_next (offs ?ns k) (length xs + k) xs index key mi ma i" + from B have "offs_set_next (offs ?ns k) xs index key mi ma i = + offs_set_next (offs ?ns k) xs index key mi ma j" + proof (rule_tac set_eqI, rule_tac iffI, simp_all, rule_tac ccontr, simp add: not_less) + fix m + assume "m < length (offs ?ns k) \ i < m \ + 0 < offs_num (length (offs ?ns k)) xs index key mi ma m" + hence F: "m \ offs_set_next (offs ?ns k) xs index key mi ma i" + (is "_ \ ?A") + by simp + hence "Min ?A \ m" + by (rule_tac Min_le, simp) + moreover assume "m \ j" + ultimately have "offs ?ns k ! Min ?A \ offs ?ns k ! j" + using C by (rule_tac offs_mono, simp_all add: enum_length) + hence "offs_next (offs ?ns k) (length xs + k) xs index key mi ma i \ + offs ?ns k ! j" + using F by (simp only: offs_next_def split: if_split, rule_tac conjI, + blast, simp) + thus False + using E by simp + qed + hence "offs ?ns k ! j < + offs_next (offs ?ns k) (length xs + k) xs index key mi ma j" + using E by (simp only: offs_next_def) + moreover have "offs_num n xs index key mi ma j = 0" + proof (rule ccontr, simp) + assume "0 < offs_num n xs index key mi ma j" + hence "j \ offs_set_next (offs ?ns k) xs index key mi ma i" + (is "_ \ ?A") + using B and C by (simp add: offs_length enum_length) + moreover from this have "Min ?A \ j" + by (rule_tac Min_le, simp) + hence "offs ?ns k ! Min ?A \ offs ?ns k ! j" + using C by (erule_tac offs_mono, simp add: enum_length) + ultimately have "offs_next (offs ?ns k) (length xs + k) xs index key mi ma i \ + offs ?ns k ! j" + by (simp only: offs_next_def split: if_split, rule_tac conjI, blast, simp) + thus False + using E by simp + qed + hence "offs ?ns k ! j = + offs_next (offs ?ns k) (length xs + k) xs index key mi ma j" + using A and C and D by (rule_tac offs_enum_zero, simp_all) + ultimately show False by simp +qed + +lemma offs_pred_ub_less: + "\offs_pred ns ub xs index key mi ma; i < length ns; + 0 < offs_num (length ns) xs index key mi ma i\ \ + ns ! i < ub" +by (drule offs_pred_ub, assumption+, simp) + +lemma fill_count_none [rule_format]: + assumes A: "index_less index key" + shows + "(\x \ set xs. key x \ {mi..ma}) \ + ns \ [] \ + offs_pred ns ub xs index key mi ma \ + length xs \ ub \ + count (mset (fill xs ns index key ub mi ma)) None = ub - length xs" + using A +proof (induction xs arbitrary: ns, simp_all add: replicate_count Let_def, + (rule_tac impI)+, (erule_tac conjE)+, subst mset_update, simp add: fill_length, + erule_tac offs_pred_ub_less, simp_all add: index_less_def offs_num_cons del: + not_None_eq, subst conj_commute, rule_tac conjI, rule_tac [!] impI, rotate_tac, + rotate_tac, erule_tac contrapos_np, rule_tac fill_index_none, simp_all) + fix x xs and ns :: "nat list" + let ?i = "index key x (length ns) mi ma" + let ?ns' = "ns[?i := Suc (ns ! ?i)]" + assume "\ns. ns \ [] \ offs_pred ns ub xs index key mi ma \ + count (mset (fill xs ns index key ub mi ma)) None = ub - length xs" + moreover assume B: "ns \ []" + moreover assume + "offs_pred ns ub (x # xs) index key mi ma" and + "mi \ key x" and + "key x \ ma" + hence "offs_pred ?ns' ub xs index key mi ma" + using A and B by (rule_tac offs_pred_cons, simp_all add: index_less_def) + ultimately show "count (mset (fill xs ?ns' index key ub mi ma)) None - + Suc 0 = ub - Suc (length xs)" + by simp +qed + +lemma fill_offs_enum_count_none [rule_format]: + "\index_less index key; \x \ set xs. key x \ {mi..ma}; 0 < n\ \ + count (mset (fill xs (offs (enum xs index key n mi ma) 0) + index key (length xs) mi ma)) None = 0" +by (subst fill_count_none, simp_all, simp only: length_greater_0_conv [symmetric] + offs_length enum_length, insert offs_enum_pred [of index key xs mi ma n 0], simp) + +lemma fill_index [rule_format]: + assumes A: "index_less index key" + shows + "(\x \ set xs. key x \ {mi..ma}) \ + offs_pred ns ub xs index key mi ma \ + i < length ns \ + 0 < offs_num (length ns) xs index key mi ma i \ + j \ {ns ! i.. + fill xs ns index key ub mi ma ! j = Some x \ + index key x (length ns) mi ma = i" +proof (induction xs arbitrary: ns, simp add: offs_num_def, simp add: Let_def, + (rule impI)+, (erule conjE)+, simp) + fix y xs and ns :: "nat list" + let ?i = "index key x (length ns) mi ma" + let ?i' = "index key y (length ns) mi ma" + let ?ns' = "ns[?i' := Suc (ns ! ?i')]" + let ?P = "\ns. + offs_pred ns ub xs index key mi ma \ + i < length ns \ + 0 < offs_num (length ns) xs index key mi ma i \ + ns ! i \ j \ j < offs_next ns ub xs index key mi ma i \ + fill xs ns index key ub mi ma ! j = Some x \ + index key x (length ns) mi ma = i" + assume + B: "\x \ set xs. mi \ key x \ key x \ ma" and + C: "mi \ key y" and + D: "key y \ ma" and + E: "offs_pred ns ub (y # xs) index key mi ma" and + F: "i < length ns" and + G: "0 < offs_num (length ns) (y # xs) index key mi ma i" and + H: "ns ! i \ j" and + I: "j < offs_next ns ub (y # xs) index key mi ma i" and + J: "\ns. ?P ns" and + K: "(fill xs ?ns' index key ub mi ma)[ns ! ?i' := Some y] ! j = Some x" + have "0 < length ns" + using F by arith + hence L: "?i' < length ns" + using A and C and D by (simp add: index_less_def) + hence "ns ! ?i' + offs_num (length ns) (y # xs) index key mi ma ?i' \ ub" + by (rule_tac offs_pred_ub [OF E], simp_all add: offs_num_cons) + hence "ns ! ?i' < ub" + by (simp add: offs_num_cons) + with K show "?i = i" + proof (cases "j = ns ! ?i'", simp, subst (asm) nth_list_update_eq, simp_all + add: fill_length) + assume + M: "j = ns ! ?i" and + N: "y = x" + show ?thesis + proof (rule ccontr, erule neqE) + assume "?i < i" + hence "ns ! ?i + offs_num (length ns) (y # xs) index key mi ma ?i \ ns ! i" + using F and G and N by (rule_tac offs_pred_asc [OF E], simp_all + add: offs_num_cons) + hence "j < ns ! i" + using M and N by (simp add: offs_num_cons) + thus False + using H by simp + next + let ?A = "offs_set_next ns (y # xs) index key mi ma i" + assume "i < ?i" + hence O: "?i \ ?A" + using N and L by (simp add: offs_num_cons) + hence P: "Min ?A \ ?A" + by (rule_tac Min_in, simp, blast) + have "Min ?A \ ?i" + using O by (rule_tac Min_le, simp) + moreover have "?A \ {}" + using O by blast + ultimately have "offs_next ns ub (y # xs) index key mi ma i \ j" + using M proof (simp only: offs_next_def, simp, subst (asm) le_less, + erule_tac disjE, simp_all) + assume "Min ?A < ?i" + hence "ns ! Min ?A + offs_num (length ns) (y # xs) index key mi ma + (Min ?A) \ ns ! ?i" + using O and P by (rule_tac offs_pred_asc [OF E], simp_all) + thus "ns ! Min ?A \ ns ! ?i" by simp + qed + thus False + using I by simp + qed + next + assume + M: "j \ ns ! ?i'" and + N: "fill xs ?ns' index key ub mi ma ! j = Some x" + have "?P ?ns'" using J . + moreover from D and F have "offs_pred ?ns' ub xs index key mi ma" + using L by (rule_tac offs_pred_cons [OF E], simp) + moreover have "i < length ?ns'" + using F by simp + moreover have "0 < offs_num (length ?ns') xs index key mi ma i" + proof (rule ccontr, simp) + assume O: "offs_num (length ns) xs index key mi ma i = 0" + hence P: "offs_num (length ns) (y # xs) index key mi ma i = Suc 0" + using G by (simp add: offs_num_cons split: if_split_asm) + hence "i = ?i'" + using O by (simp add: offs_num_cons split: if_split_asm) + hence "ns ! i < j" + using H and M by simp + hence "ns ! i + offs_num (length ns) (y # xs) index key mi ma i \ j" + using P by simp + with F and G and I have "offs_none ns ub (y # xs) index key mi ma j" + by (simp add: offs_none_def, rule_tac disjI1, rule_tac exI + [where x = i], simp) + with B and C and D and E and F have + "fill (y # xs) ns index key ub mi ma ! j = None" + by (rule_tac fill_none [OF A], simp_all, erule_tac disjE, simp_all, auto) + thus False + using K by (simp add: Let_def) + qed + moreover have "?ns' ! i \ j" + using F and H and M by (cases "?i' = i", simp_all) + moreover have "offs_next ns ub (y # xs) index key mi ma i \ + offs_next ?ns' ub xs index key mi ma i" + using G and L by (rule_tac offs_pred_next_cons [OF E], simp) + hence "j < offs_next ?ns' ub xs index key mi ma i" + using I by simp + ultimately show ?thesis + using N by simp + qed +qed + +lemma fill_offs_enum_index [rule_format]: + "index_less index key \ + \x \ set xs. key x \ {mi..ma} \ + i < n \ + 0 < offs_num n xs index key mi ma i \ + j \ {offs (enum xs index key n mi ma) 0 ! i..< + offs_next (offs (enum xs index key n mi ma) 0) (length xs) + xs index key mi ma i} \ + fill xs (offs (enum xs index key n mi ma) 0) index key (length xs) + mi ma ! j = Some x \ + index key x n mi ma = i" +by (insert fill_index [of index key xs mi ma "offs (enum xs index key n mi ma) 0" + "length xs" i j x], insert offs_enum_pred [of index key xs mi ma n 0], + simp add: offs_length enum_length) + +lemma fill_sort_inv [rule_format]: + assumes + A: "index_less index key" and + B: "index_mono index key" and + C: "\x \ set xs. key x \ {mi..ma}" + shows "sort_inv key (u, enum xs index key n mi ma, + map the (fill xs (offs (enum xs index key n mi ma) 0) + index key (length xs) mi ma))" + (is "sort_inv _ (_, ?ns, _)") +proof (simp, (rule allI, rule impI)+, rule ballI, simp add: enum_length fill_length, + erule conjE) + fix i j k + let ?A = "{i. i < n \ 0 < ?ns ! i}" + let ?B = "{i. i < n \ offs ?ns 0 ! i \ j \ 0 < offs_num n xs index key mi ma i}" + let ?C = "{i. i < n \ offs ?ns 0 ! i \ k \ 0 < offs_num n xs index key mi ma i}" + assume + D: "i < n" and + E: "j < offs ?ns 0 ! i" and + F: "offs ?ns 0 ! i \ k" and + G: "k < length xs" + have H: "\i < length xs. + \x. fill xs (offs ?ns 0) index key (length xs) mi ma ! i = Some x" + proof (rule allI, rule impI, rule ccontr, simp) + fix m + assume "fill xs (offs ?ns 0) index key (length xs) mi ma ! m = None" + moreover assume "m < length xs" + hence "fill xs (offs ?ns 0) index key (length xs) mi ma ! m + \ set (fill xs (offs ?ns 0) index key (length xs) mi ma)" + by (rule_tac nth_mem, simp add: fill_length) + ultimately have "None \ set (fill xs (offs ?ns 0) index key (length xs) mi ma)" + by simp + moreover have + "count (mset (fill xs (offs ?ns 0) index key (length xs) mi ma)) None = 0" + using C and D by (rule_tac fill_offs_enum_count_none [OF A], simp_all) + ultimately show False by simp + qed + have "\y ys. xs = y # ys" + using G by (rule_tac list.exhaust [of xs], simp_all) + then obtain z and zs where I: "xs = z # zs" by blast + hence "index key z n mi ma < n" + using A and C and D by (simp add: index_less_def) + moreover from this have "0 < ?ns ! index key z n mi ma" + using I by (simp add: Let_def, subst nth_list_update_eq, simp_all add: + enum_length) + ultimately have "index key z n mi ma \ ?A" by simp + hence J: "Min ?A \ ?A" + by (rule_tac Min_in, simp, blast) + hence "offs ?ns 0 ! 0 = offs ?ns 0 ! Min ?A" + proof (cases "Min ?A = 0", simp_all, erule_tac offs_equal, simp_all add: + enum_length, rule_tac ccontr, erule_tac conjE, simp) + fix m + assume "m < Min ?A" and "Min ?A < n" and "0 < ?ns ! m" + moreover from this have "Min ?A \ m" + by (rule_tac Min_le, simp_all) + ultimately show False by simp + qed + moreover have "\m ms. ?ns = m # ms" + using D by (rule_tac list.exhaust [of ?ns], simp_all, + simp only: length_0_conv [symmetric] enum_length) + then obtain m and ms where "?ns = m # ms" by blast + ultimately have "offs ?ns 0 ! Min ?A = 0" by simp + hence "Min ?A \ ?B" + using J by (simp, subst enum_offs_num [symmetric], simp_all) + hence K: "Max ?B \ ?B" + by (rule_tac Max_in, simp, blast) + moreover have "j < offs_next (offs ?ns 0) (length xs) + xs index key mi ma (Max ?B)" + proof (simp only: offs_next_def split: if_split, rule conjI, rule_tac [!] impI, + rule_tac [2] ccontr, simp_all only: not_less) + show "j < length xs" + using E and F and G by simp + next + assume "offs_set_next (offs ?ns 0) xs index key mi ma (Max ?B) \ {}" + (is "?Z \ _") + hence L: "Min ?Z \ ?Z" + by (rule_tac Min_in, simp) + moreover assume "offs ?ns 0 ! Min ?Z \ j" + ultimately have "Min ?Z \ ?B" + by (simp add: offs_length enum_length) + hence "Min ?Z \ Max ?B" + by (rule_tac Max_ge, simp) + thus False + using L by simp + qed + moreover have + "\x. fill xs (offs ?ns 0) index key (length xs) mi ma ! j = Some x" + using E and F and G and H by simp + then obtain x where + L: "fill xs (offs ?ns 0) index key (length xs) mi ma ! j = Some x" .. + ultimately have M: "index key x n mi ma = Max ?B" + using C by (rule_tac fill_offs_enum_index [OF A], simp_all) + have N: "Max ?B \ ?C" + using E and F and K by simp + hence "Max ?C \ ?C" + by (rule_tac Max_in, simp, blast) + moreover have O: "k < offs_next (offs ?ns 0) (length xs) + xs index key mi ma (Max ?C)" + proof (simp only: offs_next_def split: if_split, rule conjI, rule_tac [!] impI, + rule_tac [2] ccontr, simp_all only: not_less) + show "k < length xs" using G . + next + assume "offs_set_next (offs ?ns 0) xs index key mi ma (Max ?C) \ {}" + (is "?Z \ _") + hence P: "Min ?Z \ ?Z" + by (rule_tac Min_in, simp) + moreover assume "offs ?ns 0 ! Min ?Z \ k" + ultimately have "Min ?Z \ ?C" + by (simp add: offs_length enum_length) + hence "Min ?Z \ Max ?C" + by (rule_tac Max_ge, simp) + thus False + using P by simp + qed + moreover have + "\x. fill xs (offs ?ns 0) index key (length xs) mi ma ! k = Some x" + using G and H by simp + then obtain y where + P: "fill xs (offs ?ns 0) index key (length xs) mi ma ! k = Some y" .. + ultimately have Q: "index key y n mi ma = Max ?C" + using C by (rule_tac fill_offs_enum_index [OF A], simp_all) + have "Max ?B \ Max ?C" + using N by (rule_tac Max_ge, simp) + hence "Max ?B < Max ?C" + proof (rule_tac ccontr, simp) + assume "Max ?B = Max ?C" + hence "offs ?ns 0 ! i < offs_next (offs ?ns 0) (length xs) + xs index key mi ma (Max ?B)" + using F and O by simp + moreover have "offs ?ns 0 ! Max ?B < offs ?ns 0 ! i" + using E and K by simp + with K have "Max ?B < i" + by (erule_tac contrapos_pp, subst not_less, subst (asm) not_less, + erule_tac offs_mono, simp add: enum_length) + hence "offs_next (offs ?ns 0) (length xs + 0) xs index key mi ma (Max ?B) \ + offs ?ns 0 ! i" + using C and D by (rule_tac offs_enum_next_le [OF A], simp_all) + ultimately show False by simp + qed + hence R: "index key x n mi ma < index key y n mi ma" + using M and Q by simp + have "count (mset (map the (fill xs (offs ?ns 0) index key + (length xs) mi ma))) x = count (mset xs) x" + using C and D by (rule_tac fill_offs_enum_count_item [OF A], simp_all) + moreover have S: "j < length (map the (fill xs (offs ?ns 0) index key + (length xs) mi ma))" + using E and F and G by (simp add: fill_length) + hence "map the (fill xs (offs ?ns 0) index key (length xs) mi ma) ! j + \ set (map the (fill xs (offs ?ns 0) index key (length xs) mi ma))" + by (rule nth_mem) + hence "0 < count (mset (map the (fill xs (offs ?ns 0) index key + (length xs) mi ma))) x" + using L and S by simp + ultimately have T: "x \ set xs" by simp + have "count (mset (map the (fill xs (offs ?ns 0) index key + (length xs) mi ma))) y = count (mset xs) y" + using C and D by (rule_tac fill_offs_enum_count_item [OF A], simp_all) + moreover have U: "k < length (map the (fill xs (offs ?ns 0) index key + (length xs) mi ma))" + using G by (simp add: fill_length) + hence "map the (fill xs (offs ?ns 0) index key (length xs) mi ma) ! k + \ set (map the (fill xs (offs ?ns 0) index key (length xs) mi ma))" + by (rule nth_mem) + hence "0 < count (mset (map the (fill xs (offs ?ns 0) index key + (length xs) mi ma))) y" + using P and U by simp + ultimately have V: "y \ set xs" by simp + have "key y \ key x \ index key y n mi ma \ index key x n mi ma" + using B and C and T and V by (simp add: index_mono_def) + hence "key x < key y" + by (rule_tac contrapos_pp [OF R], simp add: not_less) + thus "key (the (fill xs (offs ?ns 0) index key (length xs) mi ma ! j)) \ + key (the (fill xs (offs ?ns 0) index key (length xs) mi ma ! k))" + using L and P by simp +qed + +lemma round_sort_inv [rule_format]: + "index_less index key \ index_mono index key \ bn_inv p q t \ + add_inv n t \ sort_inv key t \ sort_inv key (round index key p q r t)" +proof (induction index key p q r t arbitrary: n rule: round.induct, + (rule_tac [!] impI)+, simp, simp_all only: simp_thms) + fix index p q r u ns xs n and key :: "'a \ 'b" + let ?t = "round index key p q r (u, ns, xs)" + assume "\n. bn_inv p q (u, ns, xs) \ add_inv n (u, ns, xs) \ + sort_inv key (u, ns, xs) \ sort_inv key ?t" + hence "bn_inv p q (u, ns, xs) \ add_inv n (u, ns, xs) \ + sort_inv key (u, ns, xs) \ sort_inv key ?t" . + moreover assume + "bn_inv p q (u, 0 # ns, xs)" + "add_inv n (u, 0 # ns, xs)" and + "sort_inv key (u, 0 # ns, xs)" + ultimately show "sort_inv key (round index key p q r (u, 0 # ns, xs))" + by auto +next + fix index p q r u ns xs n and key :: "'a \ 'b" + let ?t = "round index key p q r (u, ns, tl xs)" + assume + A: "index_less index key" and + B: "bn_inv p q (u, Suc 0 # ns, xs)" + moreover assume + "\n. bn_inv p q (u, ns, tl xs) \ add_inv n (u, ns, tl xs) \ + sort_inv key (u, ns, tl xs) \ sort_inv key ?t" and + "add_inv n (u, Suc 0 # ns, xs)" and + "sort_inv key (u, Suc 0 # ns, xs)" + ultimately show "sort_inv key (round index key p q r (u, Suc 0 # ns, xs))" + proof (cases ?t, cases xs, simp_all add: add_suc, (rule_tac allI, rule_tac impI)+, + rule_tac ballI, simp, (erule_tac conjE)+, simp) + fix i j k y ys u' ns' xs' + assume + C: "round index key p q r (u, ns, ys) = (u', ns', xs')" and + D: "Suc (foldl (+) 0 ns) = n" and + E: "Suc (length ys) = n" and + F: "\i < Suc (length ns). + \j < (0 # offs ns (Suc 0)) ! i. \k \ {(0 # offs ns (Suc 0)) ! i.. key (ys ! (k - Suc 0))" + assume A': "j < (0 # offs ns' (Suc 0)) ! i" + hence "\i'. i = Suc i'" + by (rule_tac nat.exhaust [of i], simp_all) + then obtain i' where B': "i = Suc i'" .. + assume "i < Suc (length ns')" + hence G: "i' < length ns'" + using B' by simp + hence H: "j < Suc (offs ns' 0 ! i')" + using A' and B' by (simp add: offs_suc) + assume "(0 # offs ns' (Suc 0)) ! i \ k" + hence "Suc (offs ns' 0 ! i') \ k" + using B' and G by (simp add: offs_suc) + moreover from this have "\k'. k = Suc k'" + by (rule_tac nat.exhaust [of k], simp_all) + then obtain k' where I: "k = Suc k'" .. + ultimately have J: "offs ns' 0 ! i' \ k'" by simp + assume "k < Suc (length xs')" + hence K: "k' < length xs'" + using I by simp + let ?P = "\n. foldl (+) 0 ns = n \ length ys = n" + let ?Q = "\n. \i < length ns. + \j < offs ns 0 ! i. \k \ {offs ns 0 ! i.. key (ys ! k)" + let ?R = "\i < length ns'. + \j < offs ns' 0 ! i. \k \ {offs ns' 0 ! i.. key (xs' ! k)" + assume "\n. ?P n \ ?Q n \ ?R" + hence "?P (n - Suc 0) \ ?Q (n - Suc 0) \ ?R" . + hence "?Q (n - Suc 0) \ ?R" + using D and E by auto + moreover have "?Q (n - Suc 0)" + proof ((rule allI, rule impI)+, rule ballI, simp, erule conjE) + fix i j k + have "Suc i < Suc (length ns) \ (\j < (0 # offs ns (Suc 0)) ! Suc i. + \k \ {(0 # offs ns (Suc 0)) ! Suc i.. key (ys ! (k - Suc 0)))" + using F .. + moreover assume "i < length ns" + ultimately have "Suc j < Suc (offs ns 0 ! i) \ + (\k \ {Suc (offs ns 0 ! i).. key (ys ! (k - Suc 0)))" + by (auto simp add: offs_suc) + moreover assume "j < offs ns 0 ! i" + ultimately have "\k \ {Suc (offs ns 0 ! i).. key (ys ! (k - Suc 0))" + by simp + moreover assume "offs ns 0 ! i \ k" and "k < n - Suc 0" + hence "Suc k \ {Suc (offs ns 0 ! i).. key (ys ! (Suc k - Suc 0))" .. + thus "key (ys ! j) \ key (ys ! k)" by simp + qed + ultimately have ?R .. + from I show "key ((y # xs') ! j) \ key (xs' ! (k - Suc 0))" + proof (cases j, simp_all) + have "bn_inv p q (u, ns, ys)" + using B by simp + moreover have "add_inv (n - Suc 0) (u, ns, ys)" + using D and E by auto + moreover have "count_inv (\x. count (mset ys) x) (u, ns, ys)" by simp + ultimately have "count_inv (\x. count (mset ys) x) + (round index key p q r (u, ns, ys))" + by (rule round_count_inv [OF A]) + hence "count (mset xs') (xs' ! k') = count (mset ys) (xs' ! k')" + using C by simp + moreover have "0 < count (mset xs') (xs' ! k')" + using K by simp + ultimately have "xs' ! k' \ set ys" by simp + hence "\m < length ys. ys ! m = xs' ! k'" + by (simp add: in_set_conv_nth) + then obtain m where L: "m < length ys \ ys ! m = xs' ! k'" .. + have "Suc 0 < Suc (length ns) \ (\j < (0 # offs ns (Suc 0)) ! Suc 0. + \k \ {(0 # offs ns (Suc 0)) ! Suc 0.. key (ys ! (k - Suc 0)))" + using F .. + moreover have M: "0 < length ns" + using D [symmetric] and E and L by (rule_tac ccontr, simp) + ultimately have "\k \ {Suc (offs ns 0 ! 0).. key (ys ! (k - Suc 0))" + by (auto simp add: offs_suc) + hence "\k \ {Suc 0.. key (ys ! (k - Suc 0))" + using M by (cases ns, simp_all) + moreover have "Suc m \ {Suc 0.. key (ys ! (Suc m - Suc 0))" .. + thus "key y \ key (xs' ! k')" + using L by simp + next + case (Suc j') + moreover have "i' < length ns' \ (\j < offs ns' 0 ! i'. + \k \ {offs ns' 0 ! i'.. key (xs' ! k))" + using `?R` .. + hence "j' < offs ns' 0 ! i' \ + (\k \ {offs ns' 0 ! i'.. key (xs' ! k))" + using G by simp + ultimately have "\k \ {offs ns' 0 ! i'.. key (xs' ! k)" + using H by simp + moreover have "k' \ {offs ns' 0 ! i'.. key (xs' ! k')" .. + qed + qed +next + fix index p q r u m ns n and key :: "'a \ 'b" and xs :: "'a list" + let ?ws = "take (Suc (Suc m)) xs" + let ?ys = "drop (Suc (Suc m)) xs" + let ?r = "\m'. round_suc_suc index key ?ws m m' u" + let ?t = "\r' v. round index key p q r' (v, ns, ?ys)" + assume + A: "index_less index key" and + B: "index_mono index key" and + C: "bn_inv p q (u, Suc (Suc m) # ns, xs)" + assume + "\ws a b c d e f g h n. + ws = ?ws \ a = bn_comp m p q r \ (b, c) = bn_comp m p q r \ + d = ?r b \ (e, f) = ?r b \ (g, h) = f \ + bn_inv p q (e, ns, ?ys) \ add_inv n (e, ns, ?ys) \ + sort_inv key (e, ns, ?ys) \ sort_inv key (?t c e)" and + "add_inv n (u, Suc (Suc m) # ns, xs)" and + "sort_inv key (u, Suc (Suc m) # ns, xs)" + with C show "sort_inv key (round index key p q r (u, Suc (Suc m) # ns, xs))" + proof (simp split: prod.split, ((rule_tac allI)+, (rule_tac impI)+)+, + rule_tac ballI, simp, (erule_tac conjE)+, subst (asm) (2) add_base_zero, simp) + fix m' r' v ms' ws' u' ns' xs' i j k + let ?nmi = "mini ?ws key" + let ?nma = "maxi ?ws key" + let ?xmi = "?ws ! ?nmi" + let ?xma = "?ws ! ?nma" + let ?mi = "key ?xmi" + let ?ma = "key ?xma" + let ?k = "case m of 0 \ m | Suc 0 \ m | Suc (Suc i) \ u + m'" + let ?zs = "nths ?ws (- {?nmi, ?nma})" + let ?ms = "enum ?zs index key ?k ?mi ?ma" + let ?P = "\n'. foldl (+) 0 ns = n' \ n - Suc (Suc m) = n'" + let ?Q = "\n'. \i < length ns. + \j < offs ns 0 ! i. \k \ {offs ns 0 ! i.. key (xs ! Suc (Suc (m + k)))" + let ?R = "\i < length ns'. + \j < offs ns' 0 ! i. \k \ {offs ns' 0 ! i.. key (xs' ! k)" + assume + D: "?r m' = (v, ms', ws')" and + E: "?t r' v = (u', ns', xs')" and + F: "bn_comp m p q r = (m', r')" and + G: "bn_valid m p q" and + H: "Suc (Suc (foldl (+) 0 ns + m)) = n" and + I: "length xs = n" and + J: "\i < Suc (length ns). \j < (0 # offs ns (Suc (Suc m))) ! i. + \k \ {(0 # offs ns (Suc (Suc m))) ! i.. key (xs ! k)" and + K: "i < length ms' + length ns'" and + L: "j < offs (ms' @ ns') 0 ! i" and + M: "offs (ms' @ ns') 0 ! i \ k" and + N: "k < length ws' + length xs'" + have O: "length ?ws = Suc (Suc m)" + using H and I by simp + with D [symmetric] have P: "length ws' = Suc (Suc m)" + by (simp add: round_suc_suc_def Let_def fill_length split: if_split_asm) + have Q: "\i. i < m \ + the (fill ?zs (offs ?ms 0) index key m ?mi ?ma ! i) \ set ?ws" + proof - + fix i + let ?x = "the (fill ?zs (offs ?ms 0) index key m ?mi ?ma ! i)" + assume R: "i < m" + hence "0 < m" by simp + hence "0 < ?k" + by (drule_tac bn_comp_fst_nonzero [OF G], subst (asm) F, + simp split: nat.split) + hence "count (mset (map the (fill ?zs (offs ?ms 0) + index key (length ?zs) ?mi ?ma))) ?x = count (mset ?zs) ?x" + by (rule_tac fill_offs_enum_count_item [OF A], simp, rule_tac conjI, + ((rule_tac mini_lb | rule_tac maxi_ub), erule_tac in_set_nthsD)+) + hence "count (mset (map the (fill ?zs (offs ?ms 0) + index key m ?mi ?ma))) ?x = count (mset ?zs) ?x" + using O by (simp add: mini_maxi_nths) + moreover have "0 < count (mset (map the (fill ?zs (offs ?ms 0) + index key m ?mi ?ma))) ?x" + using R by (simp add: fill_length) + ultimately have "?x \ set ?zs" by simp + thus "?x \ set ?ws" + by (rule in_set_nthsD) + qed + have R: "\i. i < Suc (Suc m) \ Suc (Suc m) \ k \ + key (?ws ! i) \ key (xs' ! (k - Suc (Suc m)))" + proof - + fix i + have "bn_inv p q (v, ns, ?ys)" + using C by simp + moreover have "add_inv (n - Suc (Suc m)) (v, ns, ?ys)" + using H and I by simp + moreover have "count_inv (\x. count (mset ?ys) x) (v, ns, ?ys)" + by simp + ultimately have "count_inv (\x. count (mset ?ys) x) (?t r' v)" + by (rule round_count_inv [OF A]) + hence S: "count (mset xs') (xs' ! (k - Suc (Suc m))) = + count (mset ?ys) (xs' ! (k - Suc (Suc m)))" + using E by simp + have "k < Suc (Suc (m + length xs'))" + using N and P by simp + moreover assume "Suc (Suc m) \ k" + ultimately have "0 < count (mset xs') (xs' ! (k - Suc (Suc m)))" + by simp + hence "xs' ! (k - Suc (Suc m)) \ set ?ys" + using S by simp + hence "\p < length ?ys. ?ys ! p = xs' ! (k - Suc (Suc m))" + by (simp add: in_set_conv_nth) + then obtain p where + T: "p < length ?ys \ ?ys ! p = xs' ! (k - Suc (Suc m))" .. + have "Suc 0 < Suc (length ns) \ + (\j < (0 # offs ns (Suc (Suc m))) ! Suc 0. + \k \ {(0 # offs ns (Suc (Suc m))) ! Suc 0.. key (xs ! k))" + using J .. + moreover have U: "0 < length ns" + using H [symmetric] and I and T by (rule_tac ccontr, simp) + ultimately have "\j < offs ns (Suc (Suc m)) ! 0. + \k \ {offs ns (Suc (Suc m)) ! 0.. key (xs ! k)" + by simp + moreover assume "i < Suc (Suc m)" + moreover have "offs ns (Suc (Suc m)) ! 0 = Suc (Suc m)" + using U by (subst offs_base_zero, simp, cases ns, simp_all) + ultimately have "\k \ {Suc (Suc m).. key (xs ! k)" + by simp + moreover have "Suc (Suc m) + p \ {Suc (Suc m).. key (xs ! (Suc (Suc m) + p))" .. + thus "key (?ws ! i) \ key (xs' ! (k - Suc (Suc m)))" + using O and T by simp + qed + assume "\ws a b c d e f g h n'. + ws = ?ws \ a = (m', r') \ b = m' \ c = r' \ + d = (v, ms', ws') \ e = v \ f = (ms', ws') \ g = ms' \ h = ws' \ + ?P n' \ ?Q n' \ ?R" + hence "?P (n - Suc (Suc m)) \ ?Q (n - Suc (Suc m)) \ ?R" + by simp + hence "?Q (n - Suc (Suc m)) \ ?R" + using H by simp + moreover have "?Q (n - Suc (Suc m))" + proof ((rule allI, rule impI)+, rule ballI, simp, erule conjE, + subst (1 2) append_take_drop_id [of "Suc (Suc m)", symmetric], + simp only: nth_append O, simp) + fix i j k + have "Suc i < Suc (length ns) \ + (\j < (0 # offs ns (Suc (Suc m))) ! Suc i. + \k \ {(0 # offs ns (Suc (Suc m))) ! Suc i.. key (xs ! k))" + using J .. + moreover assume "i < length ns" + ultimately have "j < offs ns 0 ! i \ + (\k \ {offs ns 0 ! i + Suc (Suc m).. key (xs ! k))" + by (simp, subst (asm) offs_base_zero, simp, + subst (asm) (2) offs_base_zero, simp_all) + moreover assume "j < offs ns 0 ! i" + ultimately have "\k \ {offs ns 0 ! i + Suc (Suc m).. key (xs ! k)" + using O by simp + moreover assume "offs ns 0 ! i \ k" and "k < n - Suc (Suc m)" + hence "Suc (Suc m) + k \ {offs ns 0 ! i + Suc (Suc m).. key (xs ! (Suc (Suc m) + k))" .. + thus "key (?ys ! j) \ key (?ys ! k)" + using O by simp + qed + ultimately have ?R .. + show "key ((ws' @ xs') ! j) \ key ((ws' @ xs') ! k)" + proof (simp add: nth_append not_less P, (rule_tac [!] conjI, + rule_tac [!] impI)+, rule_tac [3] FalseE) + assume + S: "j < Suc (Suc m)" and + T: "k < Suc (Suc m)" + from D [symmetric] show "key (ws' ! j) \ key (ws' ! k)" + proof (simp add: round_suc_suc_def Let_def split: if_split_asm, + (erule_tac [2] conjE)+, simp_all) + assume U: "?mi = ?ma" + have "j < length ?ws" + using S and O by simp + hence "?ws ! j \ set ?ws" + by (rule nth_mem) + with U have "key (?ws ! j) = ?ma" + by (rule mini_maxi_keys_eq) + moreover have "k < length ?ws" + using T and O by simp + hence "?ws ! k \ set ?ws" + by (rule nth_mem) + with U have "key (?ws ! k) = ?ma" + by (rule mini_maxi_keys_eq) + ultimately show "key (?ws ! j) \ key (?ws ! k)" by simp + next + assume U: "ms' = Suc 0 # ?ms @ [Suc 0]" + hence V: "j < (0 # offs (?ms @ Suc 0 # ns') (Suc 0)) ! i" + using L by simp + hence "\i'. i = Suc i'" + by (rule_tac nat.exhaust [of i], simp_all) + then obtain i' where W: "i = Suc i'" .. + have "i < Suc (Suc (?k + length ns'))" + using K and U by (simp add: enum_length) + hence X: "i' < Suc (?k + length ns')" + using W by simp + hence Y: "j < Suc (offs (?ms @ Suc 0 # ns') 0 ! i')" + using V and W by (simp add: enum_length offs_suc) + have "(0 # offs (?ms @ Suc 0 # ns') (Suc 0)) ! i \ k" + using M and U by simp + hence "Suc (offs (?ms @ Suc 0 # ns') 0 ! i') \ k" + using W and X by (simp add: enum_length offs_suc) + moreover from this have "\k'. k = Suc k'" + by (rule_tac nat.exhaust [of k], simp_all) + then obtain k' where Z: "k = Suc k'" .. + ultimately have AA: "offs (?ms @ Suc 0 # ns') 0 ! i' \ k'" + by simp + have AB: "j \ k'" + using Y and AA by simp + with Z show + "key ((?xmi # map the (fill ?zs (offs ?ms 0) index key m ?mi ?ma) @ + [?xma]) ! j) \ + key ((?xmi # map the (fill ?zs (offs ?ms 0) index key m ?mi ?ma) @ + [?xma]) ! k)" + proof (cases j, case_tac [2] "j < Suc m", simp_all add: nth_append + not_less fill_length, rule_tac [1-2] conjI, rule_tac [1-4] impI, + rule_tac [5] FalseE, simp_all) + assume "k' < m" + hence "the (fill ?zs (offs ?ms 0) index key m ?mi ?ma ! k') \ set ?ws" + by (rule Q) + thus "?mi \ key (the (fill ?zs (offs ?ms 0) index key m ?mi ?ma ! k'))" + by (rule mini_lb) + next + assume "m \ k'" + hence "k' = m" + using T and Z by simp + thus "?mi \ key ([?xma] ! (k' - m))" + by (simp, rule_tac mini_maxi_keys_le, rule_tac nth_mem [of 0], + subst O, simp) + next + fix j' + have "sort_inv key (0, ?ms, map the (fill ?zs (offs ?ms 0) index key + (length ?zs) ?mi ?ma))" + by (rule fill_sort_inv [OF A B], simp, rule conjI, + ((rule mini_lb | rule maxi_ub), erule in_set_nthsD)+) + hence "sort_inv key (0, ?ms, map the (fill ?zs (offs ?ms 0) index key + m ?mi ?ma))" + using O by (simp add: mini_maxi_nths) + hence "\i < ?k. \j < offs ?ms 0 ! i. \k \ {offs ?ms 0 ! i.. + key (the (fill ?zs (offs ?ms 0) index key m ?mi ?ma ! k))" + by (simp add: enum_length fill_length) + moreover assume AC: "k' < m" + hence AD: "offs (?ms @ Suc 0 # ns') 0 ! i' < m" + using AA by simp + have AE: "i' < ?k" + proof (rule contrapos_pp [OF AD], simp add: not_less offs_append + nth_append offs_length enum_length) + have "0 < m" + using AC by simp + hence "0 < ?k" + by (drule_tac bn_comp_fst_nonzero [OF G], subst (asm) F, + simp split: nat.split) + with A have "foldl (+) 0 ?ms = length ?zs" + by (rule enum_add, simp, rule_tac conjI, + ((rule_tac mini_lb | rule_tac maxi_ub), erule_tac in_set_nthsD)+) + hence "foldl (+) 0 ?ms = m" + using O by (simp add: mini_maxi_nths) + with X show "m \ (foldl (+) 0 ?ms # + offs ns' (Suc (foldl (+) 0 ?ms))) ! (i' - ?k)" + by (cases "i' - ?k", simp_all, subst offs_base_zero, simp_all) + qed + ultimately have "\j < offs ?ms 0 ! i'. \k \ {offs ?ms 0 ! i'.. + key (the (fill ?zs (offs ?ms 0) index key m ?mi ?ma ! k))" + by simp + moreover assume "j = Suc j'" + with Y and AE have "j' < offs ?ms 0 ! i'" + by (simp add: offs_append nth_append offs_length enum_length) + ultimately have "\k \ {offs ?ms 0 ! i'.. + key (the (fill ?zs (offs ?ms 0) index key m ?mi ?ma ! k))" + by simp + moreover from AA and AE have "offs ?ms 0 ! i' \ k'" + by (simp add: offs_append nth_append offs_length enum_length) + hence "k' \ {offs ?ms 0 ! i'.. + key (the (fill ?zs (offs ?ms 0) index key m ?mi ?ma ! k'))" .. + next + fix j' + assume "j' < m" + hence "the (fill ?zs (offs ?ms 0) index key m ?mi ?ma ! j') \ set ?ws" + (is "?x \ _") + by (rule Q) + moreover assume "m \ k'" + hence "k' = m" + using T and Z by simp + ultimately show "key ?x \ key ([?xma] ! (k' - m))" + by (simp, rule_tac maxi_ub) + next + fix j' + assume "j = Suc j'" and "m \ j'" + hence "Suc m \ j" by simp + hence "Suc (Suc m) \ k" + using Z and AB by simp + thus False + using T by simp + qed + qed + next + assume + S: "j < Suc (Suc m)" and + T: "Suc (Suc m) \ k" + from D [symmetric] show "key (ws' ! j) \ key (xs' ! (k - Suc (Suc m)))" + proof (simp add: round_suc_suc_def Let_def split: if_split_asm, + rule_tac R [OF S T], cases j, case_tac [2] "j < Suc m", + simp_all add: nth_append not_less fill_length) + have "?nmi < length ?ws" + using O by (rule_tac mini_less, simp) + hence "?nmi < Suc (Suc m)" + using O by simp + thus "?mi \ key (xs' ! (k - Suc (Suc m)))" + using T by (rule R) + next + fix j' + assume "j' < m" + hence U: "the (fill ?zs (offs ?ms 0) index key m ?mi ?ma ! j') \ set ?ws" + by (rule Q) + have "\p < Suc (Suc m). ?ws ! p = + the (fill ?zs (offs ?ms 0) index key m ?mi ?ma ! j')" + using O and U by (simp add: in_set_conv_nth) + then obtain p where V: "p < Suc (Suc m) \ ?ws ! p = + the (fill ?zs (offs ?ms 0) index key m ?mi ?ma ! j')" .. + hence "key (?ws ! p) \ key (xs' ! (k - Suc (Suc m)))" + using T by (rule_tac R, simp) + thus "key (the (fill ?zs (offs ?ms 0) index key m ?mi ?ma ! j')) \ + key (xs' ! (k - Suc (Suc m)))" + using V by simp + next + fix j' + assume "j = Suc j'" and "m \ j'" + moreover from this have "Suc m \ j" by simp + hence "j = Suc m" + using S by simp + ultimately have "j' = m" by simp + thus "key ([?xma] ! (j' - m)) \ key (xs' ! (k - Suc (Suc m)))" + proof simp + have "?nma < length ?ws" + using O by (rule_tac maxi_less, simp) + hence "?nma < Suc (Suc m)" + using O by simp + thus "?ma \ key (xs' ! (k - Suc (Suc m)))" + using T by (rule R) + qed + qed + next + assume "k < Suc (Suc m)" and "Suc (Suc m) \ j" + hence "k < j" by simp + moreover have "j < k" + using L and M by simp + ultimately show False by simp + next + assume + S: "Suc (Suc m) \ j" and + T: "Suc (Suc m) \ k" + have U: "0 < length ns' \ 0 < length xs'" + proof (rule ccontr, simp, erule disjE) + have "bn_inv p q (v, ns, ?ys)" + using C by simp + moreover have "add_inv (n - Suc (Suc m)) (v, ns, ?ys)" + using H and I by simp + ultimately have "add_inv (n - Suc (Suc m)) (?t r' v)" + by (rule round_add_inv [OF A]) + hence "length xs' = foldl (+) 0 ns'" + using E by simp + moreover assume "ns' = []" + ultimately have "length xs' = 0" by simp + hence "k < Suc (Suc m)" + using N and P by simp + thus False + using T by simp + next + assume "xs' = []" + hence "k < Suc (Suc m)" + using N and P by simp + thus False + using T by simp + qed + hence V: "i - length ms' < length ns'" + using K by arith + hence W: "\j < offs ns' 0 ! (i - length ms'). + \k \ {offs ns' 0 ! (i - length ms').. key (xs' ! k)" + using `?R` by simp + from D [symmetric] have X: "foldl (+) 0 ms' = Suc (Suc m)" + proof (simp add: round_suc_suc_def Let_def add_replicate add_suc + split: if_split_asm, cases "m = 0") + case True + hence "?k = 0" by simp + hence "length ?ms = 0" + by (simp add: enum_length) + thus "foldl (+) 0 ?ms = m" + using True by simp + next + case False + hence "0 < ?k" + by (simp, drule_tac bn_comp_fst_nonzero [OF G], subst (asm) F, + simp split: nat.split) + with A have "foldl (+) 0 ?ms = length ?zs" + by (rule enum_add, simp, rule_tac conjI, + ((rule_tac mini_lb | rule_tac maxi_ub), erule_tac in_set_nthsD)+) + thus "foldl (+) 0 ?ms = m" + using O by (simp add: mini_maxi_nths) + qed + have "length ms' < i" + proof (rule ccontr, simp add: not_less) + assume "i \ length ms'" + moreover have "length ms' < length (ms' @ ns')" + using U by simp + ultimately have "offs (ms' @ ns') 0 ! i \ + offs (ms' @ ns') 0 ! (length ms')" + by (rule offs_mono) + also have "\ = Suc (Suc m)" + using U and X by (simp add: offs_append nth_append offs_length, + cases ns', simp_all) + finally have "offs (ms' @ ns') 0 ! i \ Suc (Suc m)" . + hence "j < Suc (Suc m)" + using L by simp + thus False + using S by simp + qed + hence Y: "offs (ms' @ ns') 0 ! i = + Suc (Suc m) + offs ns' 0 ! (i - length ms')" + using X by (simp add: offs_append nth_append offs_length, + subst offs_base_zero [OF V], simp) + hence "j < Suc (Suc m) + offs ns' 0 ! (i - length ms')" + using L by simp + moreover from this have "0 < offs ns' 0 ! (i - length ms')" + using S by (rule_tac ccontr, simp) + ultimately have "j - Suc (Suc m) < offs ns' 0 ! (i - length ms')" + by simp + hence Z: "\k \ {offs ns' 0 ! (i - length ms').. key (xs' ! k)" + using W by simp + have "offs ns' 0 ! (i - length ms') \ k - Suc (Suc m)" + using M and Y by simp + moreover have "k - Suc (Suc m) < length xs'" + using N and P and U by arith + ultimately have "k - Suc (Suc m) \ + {offs ns' 0 ! (i - length ms').. + key (xs' ! (k - Suc (Suc m)))" .. + qed + qed +qed + +lemma gcsort_sort_inv: + assumes + A: "index_less index key" and + B: "index_mono index key" and + C: "add_inv n t" and + D: "n \ p" + shows "\t' \ gcsort_set index key p t; sort_inv key t\ \ + sort_inv key t'" +by (erule gcsort_set.induct, simp, drule gcsort_add_inv [OF A _ C D], + rule round_sort_inv [OF A B], simp_all del: bn_inv.simps, erule conjE, + frule sym, erule subst, rule bn_inv_intro, insert D, simp) + +text \ +\null + +The only remaining task is to address step 10 of the proof method, which is done by proving theorem +@{text gcsort_sorted}. It holds under the conditions that the objects' number is not larger than the +counters' upper bound and function @{text index} satisfies both predicates @{const index_less} and +@{const index_mono}, and states that function @{const gcsort} is successful in sorting the input +objects' list. + +\null +\ + +theorem gcsort_sorted: + assumes + A: "index_less index key" and + B: "index_mono index key" and + C: "length xs \ p" + shows "sorted (map key (gcsort index key p xs))" +proof - + let ?t = "(0, [length xs], xs)" + have "sort_inv key (gcsort_aux index key p ?t)" + by (rule gcsort_sort_inv [OF A B _ C], rule gcsort_add_input, + rule gcsort_aux_set, rule gcsort_sort_input) + moreover have "add_inv (length xs) (gcsort_aux index key p ?t)" + by (rule gcsort_add_inv [OF A _ _ C], + rule gcsort_aux_set, rule gcsort_add_input) + ultimately have "sorted (map key (gcsort_out (gcsort_aux index key p ?t)))" + by (rule gcsort_sort_intro, simp add: gcsort_sort_form) + moreover have "?t = gcsort_in xs" + by (simp add: gcsort_in_def) + ultimately show ?thesis + by (simp add: gcsort_def) +qed + +end diff --git a/thys/Generalized_Counting_Sort/Stability.thy b/thys/Generalized_Counting_Sort/Stability.thy new file mode 100644 --- /dev/null +++ b/thys/Generalized_Counting_Sort/Stability.thy @@ -0,0 +1,763 @@ +(* Title: An Efficient Generalization of Counting Sort for Large, possibly Infinite Key Ranges + Author: Pasquale Noce + Software Engineer at HID Global, Italy + pasquale dot noce dot lavoro at gmail dot com + pasquale dot noce at hidglobal dot com +*) + +section "Proof of algorithm's stability" + +theory Stability + imports Sorting +begin + +text \ +\null + +In this section, it is formally proven that GCsort is \emph{stable}, viz. that the sublist of the +output of function @{const gcsort} built by picking out the objects having a given key matches the +sublist of the input objects' list built in the same way. + +Here below, steps 5, 6, and 7 of the proof method are accomplished. Particularly, @{text stab_inv} +is the predicate that will be shown to be invariant over inductive set @{const gcsort_set}. + +\null +\ + +fun stab_inv :: "('b \ 'a list) \ ('a \ 'b) \ nat \ nat list \ 'a list \ + bool" where +"stab_inv f key (u, ns, xs) = (\k. [x\xs. key x = k] = f k)" + +lemma gcsort_stab_input: + "stab_inv (\k. [x\xs. key x = k]) key (0, [length xs], xs)" +by simp + +lemma gcsort_stab_intro: + "stab_inv f key t \ [x\gcsort_out t. key x = k] = f k" +by (cases t, simp add: gcsort_out_def) + +text \ +\null + +In what follows, step 9 of the proof method is accomplished. + +First, lemma @{text fill_offs_enum_stable} proves that function @{const fill}, if its input offsets' +list is computed via the composition of functions @{const offs} and @{const enum}, does not modify +the sublist of its input objects' list formed by the objects having a given key. Moreover, lemmas +@{text mini_stable} and @{text maxi_stable} prove that the extraction of the leftmost minimum and +the rightmost maximum from an objects' list through functions @{const mini} and @{const maxi} is +endowed with the same property. + +These lemmas are then used to prove lemma @{text gcsort_stab_inv}, which states that the sublist of +the objects having a given key within the objects' list is still the same after any recursive round. + +\null +\ + +lemma fill_stable [rule_format]: + assumes + A: "index_less index key" and + B: "index_same index key" + shows + "(\x \ set xs. key x \ {mi..ma}) \ + ns \ [] \ + offs_pred ns ub xs index key mi ma \ + map the [w\fill xs ns index key ub mi ma. \x. w = Some x \ key x = k] = + [x\xs. k = key x]" +proof (induction xs arbitrary: ns, simp_all add: Let_def, rule conjI, + (rule_tac [!] impI)+, (erule_tac [!] conjE)+, simp_all) + fix x xs and ns :: "nat list" + let ?i = "index key x (length ns) mi ma" + let ?ns' = "ns[?i := Suc (ns ! ?i)]" + let ?ws = "[w\fill xs ?ns' index key ub mi ma. + \y. w = Some y \ key y = key x]" + let ?ws' = "[w\(fill xs ?ns' index key ub mi ma)[ns ! ?i := Some x]. + \y. w = Some y \ key y = key x]" + assume + C: "\x \ set xs. mi \ key x \ key x \ ma" and + D: "mi \ key x" and + E: "key x \ ma" and + F: "ns \ []" and + G: "offs_pred ns ub (x # xs) index key mi ma" + have H: "?i < length ns" + using A and D and E and F by (simp add: index_less_def) + assume "\ns. ns \ [] \ offs_pred ns ub xs index key mi ma \ + map the [w\fill xs ns index key ub mi ma. + \y. w = Some y \ key y = key x] = + [y\xs. key x = key y]" + moreover have I: "offs_pred ?ns' ub xs index key mi ma" + using G and H by (rule_tac offs_pred_cons, simp_all) + ultimately have J: "map the ?ws = [y\xs. key x = key y]" + using F by simp + have "ns ! ?i + offs_num (length ns) (x # xs) index key mi ma ?i \ ub" + using G and H by (rule offs_pred_ub, simp add: offs_num_cons) + hence K: "ns ! ?i < ub" + by (simp add: offs_num_cons) + moreover from this have + L: "(fill xs ?ns' index key ub mi ma)[ns ! ?i := Some x] ! (ns ! ?i) = Some x" + by (subst nth_list_update_eq, simp_all add: fill_length) + ultimately have "0 < length ?ws'" + by (simp add: length_filter_conv_card card_gt_0_iff, + rule_tac exI [where x = "ns ! ?i"], simp add: fill_length) + hence "\w ws. ?ws' = w # ws" + by (rule_tac list.exhaust [of ?ws'], simp_all) + then obtain w and ws where "?ws' = w # ws" by blast + hence "\us vs y. + (fill xs ?ns' index key ub mi ma)[ns ! ?i := Some x] = us @ w # vs \ + (\u \ set us. \y. u = Some y \ key y \ key x) \ + w = Some y \ key y = key x \ + ws = [w\vs. \y. w = Some y \ key y = key x]" + (is "\us vs y. ?P us vs y") + by (simp add: filter_eq_Cons_iff, blast) + then obtain us and vs and y where M: "?P us vs y" by blast + let ?A = "{i. i < length ns \ 0 < offs_num (length ns) xs index key mi ma i \ + ns ! i \ length us}" + have "length us = ns ! ?i" + proof (rule ccontr, erule neqE, cases "?A = {}", + cases "0 < offs_num (length ns) xs index key mi ma 0", simp_all only: not_gr0) + assume + N: "length us < ns ! ?i" and + O: "?A = {}" and + P: "0 < offs_num (length ns) xs index key mi ma 0" + have "length us < ns ! 0" + proof (rule ccontr, simp add: not_less) + assume "ns ! 0 \ length us" + with F and P have "0 \ ?A" by simp + with O show False by blast + qed + hence "length us < ?ns' ! 0" + using F by (cases ?i, simp_all) + hence "offs_none ?ns' ub xs index key mi ma (length us)" + using P by (simp add: offs_none_def) + hence "fill xs ?ns' index key ub mi ma ! (length us) = None" + using C and F and I by (rule_tac fill_none [OF A], simp_all) + hence "(fill xs ?ns' index key ub mi ma)[ns ! ?i := Some x] ! (length us) = None" + using N by simp + thus False + using M by simp + next + assume + N: "length us < ns ! ?i" and + O: "?A = {}" and + P: "offs_num (length ns) xs index key mi ma 0 = 0" + from K and N have "length us < offs_next ?ns' ub xs index key mi ma 0" + proof (rule_tac ccontr, simp only: not_less offs_next_def split: if_split_asm) + assume "offs_set_next ?ns' xs index key mi ma 0 \ {}" + (is "?B \ _") + hence "Min ?B \ ?B" + by (rule_tac Min_in, simp) + moreover assume "?ns' ! Min ?B \ length us" + hence "ns ! Min ?B \ length us" + using H by (cases "Min ?B = ?i", simp_all) + ultimately have "Min ?B \ ?A" by simp + with O show False by blast + qed + hence "offs_none ?ns' ub xs index key mi ma (length us)" + using P by (simp add: offs_none_def) + hence "fill xs ?ns' index key ub mi ma ! (length us) = None" + using C and F and I by (rule_tac fill_none [OF A], simp_all) + hence "(fill xs ?ns' index key ub mi ma)[ns ! ?i := Some x] ! (length us) = None" + using N by simp + thus False + using M by simp + next + assume N: "length us < ns ! ?i" + assume "?A \ {}" + hence "Max ?A \ ?A" + by (rule_tac Max_in, simp) + moreover from this have O: "Max ?A \ ?i" + using N by (rule_tac notI, simp) + ultimately have "index key y (length ?ns') mi ma = Max ?A" + using C proof (rule_tac fill_index [OF A _ I, where j = "length us"], simp_all, + rule_tac ccontr, simp only: not_less not_le offs_next_def split: if_split_asm) + assume "ub \ length us" + with N have "ub < ns ! ?i" by simp + with K show False by simp + next + let ?B = "offs_set_next ?ns' xs index key mi ma (Max ?A)" + assume "?ns' ! Min ?B \ length us" + hence "ns ! Min ?B \ length us" + using H by (cases "Min ?B = ?i", simp_all) + moreover assume "?B \ {}" + hence P: "Min ?B \ ?B" + by (rule_tac Min_in, simp) + ultimately have "Min ?B \ ?A" by simp + hence "Min ?B \ Max ?A" + by (rule_tac Max_ge, simp) + thus False + using P by simp + next + have "fill xs ?ns' index key ub mi ma ! length us = + (fill xs ?ns' index key ub mi ma)[ns ! ?i := Some x] ! length us" + using N by simp + thus "fill xs ?ns' index key ub mi ma ! length us = Some y" + using M by simp + qed + moreover have "index key y (length ?ns') mi ma = ?i" + using B and D and E and M by (cases "y = x", simp_all add: + index_same_def) + ultimately show False + using O by simp + next + assume N: "ns ! ?i < length us" + hence "(fill xs ?ns' index key ub mi ma)[ns ! ?i := Some x] ! (ns ! ?i) = + us ! (ns ! ?i)" + using M by (simp add: nth_append) + hence "(fill xs ?ns' index key ub mi ma)[ns ! ?i := Some x] ! (ns ! ?i) \ set us" + using N by simp + thus False + using L and M by blast + qed + hence "take (ns ! ?i) ((fill xs ?ns' index key ub mi ma)[ns ! ?i := Some x]) = us" + using M by simp + hence N: "take (ns ! ?i) (fill xs ?ns' index key ub mi ma) = us" by simp + have "fill xs ?ns' index key ub mi ma = + take (ns ! ?i) (fill xs ?ns' index key ub mi ma) @ + fill xs ?ns' index key ub mi ma ! (ns ! ?i) # + drop (Suc (ns ! ?i)) (fill xs ?ns' index key ub mi ma)" + (is "_ = ?ts @ _ # ?ds") + using K by (rule_tac id_take_nth_drop, simp add: fill_length) + moreover have "fill xs ?ns' index key ub mi ma ! (ns ! ?i) = None" + using C and D and E and F and G by (rule_tac fill_index_none [OF A], + simp_all) + ultimately have O: "fill xs ?ns' index key ub mi ma = us @ None # ?ds" + using N by simp + have "(fill xs ?ns' index key ub mi ma)[ns ! ?i := Some x] = + ?ts @ Some x # ?ds" + using K by (rule_tac upd_conv_take_nth_drop, simp add: fill_length) + hence "(fill xs ?ns' index key ub mi ma)[ns ! ?i := Some x] = + us @ Some x # ?ds" + using N by simp + hence "?ws' = Some x # [w\?ds. \y. w = Some y \ key y = key x]" + using M by simp + also have "\ = Some x # ?ws" + using M by (subst (2) O, simp) + finally show "map the ?ws' = x # [y\xs. key x = key y]" + using J by simp +next + fix x xs and ns :: "nat list" + let ?i = "index key x (length ns) mi ma" + let ?ns' = "ns[?i := Suc (ns ! ?i)]" + let ?ws = "[w\fill xs ?ns' index key ub mi ma. \y. w = Some y \ key y = k]" + let ?ws' = "[w\(fill xs ?ns' index key ub mi ma)[ns ! ?i := Some x]. + \y. w = Some y \ key y = k]" + assume + C: "\x \ set xs. mi \ key x \ key x \ ma" and + D: "mi \ key x" and + E: "key x \ ma" and + F: "ns \ []" and + G: "offs_pred ns ub (x # xs) index key mi ma" and + H: "k \ key x" + have I: "?i < length ns" + using A and D and E and F by (simp add: index_less_def) + assume "\ns. ns \ [] \ offs_pred ns ub xs index key mi ma \ + map the [w\fill xs ns index key ub mi ma. + \y. w = Some y \ key y = k] = + [y\xs. k = key y]" + moreover have "offs_pred ?ns' ub xs index key mi ma" + using G and I by (rule_tac offs_pred_cons, simp_all) + ultimately have J: "map the ?ws = [y\xs. k = key y]" + using F by simp + have "ns ! ?i + offs_num (length ns) (x # xs) index key mi ma ?i \ ub" + using G and I by (rule offs_pred_ub, simp add: offs_num_cons) + hence K: "ns ! ?i < ub" + by (simp add: offs_num_cons) + have "fill xs ?ns' index key ub mi ma = + take (ns ! ?i) (fill xs ?ns' index key ub mi ma) @ + fill xs ?ns' index key ub mi ma ! (ns ! ?i) # + drop (Suc (ns ! ?i)) (fill xs ?ns' index key ub mi ma)" + (is "_ = ?ts @ _ # ?ds") + using K by (rule_tac id_take_nth_drop, simp add: fill_length) + moreover have "fill xs ?ns' index key ub mi ma ! (ns ! ?i) = None" + using C and D and E and F and G by (rule_tac fill_index_none [OF A], + simp_all) + ultimately have L: "fill xs ?ns' index key ub mi ma = ?ts @ None # ?ds" + by simp + have "(fill xs ?ns' index key ub mi ma)[ns ! ?i := Some x] = + ?ts @ Some x # ?ds" + using K by (rule_tac upd_conv_take_nth_drop, simp add: fill_length) + moreover have "?ws = [w\?ts. \y. w = Some y \ key y = k] @ + [w\?ds. \y. w = Some y \ key y = k]" + by (subst L, simp) + ultimately have "?ws' = ?ws" + using H by simp + thus "map the ?ws' = [y\xs. k = key y]" + using J by simp +qed + +lemma fill_offs_enum_stable [rule_format]: + assumes + A: "index_less index key" and + B: "index_same index key" + shows + "\x \ set xs. key x \ {mi..ma} \ + 0 < n \ + [x\map the (fill xs (offs (enum xs index key n mi ma) 0) + index key (length xs) mi ma). key x = k] = [x\xs. k = key x]" + (is "_ \ _ \ [_\map the ?ys. _] = _" + is "_ \ _ \ [_\map the (fill _ ?ns _ _ _ _ _). _] = _") +proof (subst fill_stable [symmetric, OF A B, of xs mi ma ?ns], simp, + simp only: length_greater_0_conv [symmetric] offs_length enum_length, + rule offs_enum_pred [OF A], simp, subst filter_map, + simp add: filter_eq_nths fill_length) + assume + C: "\x \ set xs. mi \ key x \ key x \ ma" and + D: "0 < n" + have "{i. i < length xs \ key (the (?ys ! i)) = k} = + {i. i < length xs \ (\x. ?ys ! i = Some x \ key x = k)}" + (is "?A = ?B") + proof (rule set_eqI, rule iffI, simp_all, erule_tac [!] conjE, erule_tac [2] exE, + erule_tac [2] conjE, simp_all) + fix i + assume E: "i < length xs" and F: "key (the (?ys ! i)) = k" + have "\x. ?ys ! i = Some x" + proof (cases "?ys ! i", simp_all) + assume "?ys ! i = None" + moreover have "?ys ! i \ set ?ys" + using E by (rule_tac nth_mem, simp add: fill_length) + ultimately have "None \ set ?ys" by simp + moreover have "count (mset ?ys) None = 0" + using C and D by (rule_tac fill_offs_enum_count_none [OF A], simp) + ultimately show False by simp + qed + then obtain x where "?ys ! i = Some x" .. + moreover from this have "key x = k" + using F by simp + ultimately show "\x. ?ys ! i = Some x \ key x = k" by simp + qed + thus "map the (nths ?ys ?A) = map the (nths ?ys ?B)" by simp +qed + +lemma mini_first [rule_format]: + "xs \ [] \ i < mini xs key \ + key (xs ! mini xs key) < key (xs ! i)" +by (induction xs arbitrary: i, simp_all add: Let_def, (rule impI)+, + erule conjE, simp add: not_le nth_Cons split: nat.split) + +lemma maxi_last [rule_format]: + "xs \ [] \ maxi xs key < i \ i < length xs \ + key (xs ! i) < key (xs ! maxi xs key)" +by (induction xs arbitrary: i, simp_all add: Let_def, (rule impI)+, + rule le_less_trans, rule maxi_ub, rule nth_mem, simp) + +lemma nths_range: + "nths xs A = nths xs (A \ {..A. nths xs A = nths xs (A \ {.. A} = nths xs ({i. Suc i \ A} \ {.. A} \ {.. A \ i < length xs}" + by blast + ultimately show + "nths xs {i. Suc i \ A} = nths xs {i. Suc i \ A \ i < length xs}" + by simp +qed + +lemma filter_nths_diff: + assumes + A: "i < length xs" and + B: "\ P (xs ! i)" + shows "[x\nths xs (A - {i}). P x] = [x\nths xs A. P x]" +proof (cases "i \ A", simp_all) + case True + have C: "xs = take i xs @ xs ! i # drop (Suc i) xs" + (is "_ = ?ts @ _ # ?ds") + using A by (rule id_take_nth_drop) + have "nths xs A = nths ?ts A @ nths (xs ! i # ?ds) {j. j + length ?ts \ A}" + by (subst C, simp add: nths_append) + moreover have D: "length ?ts = i" + using A by simp + ultimately have E: "nths xs A = nths ?ts (A \ {.. A}" + (is "_ = ?vs @ _ # ?ws") + using True by (simp add: nths_Cons, subst nths_range, simp) + have "nths xs (A - {i}) = nths ?ts (A - {i}) @ + nths (xs ! i # ?ds) {j. j + length ?ts \ A - {i}}" + by (subst C, simp add: nths_append) + moreover have "(A - {i}) \ {.. {.. []" and + B: "mini xs key \ A" + (is "?nmi \ _") + shows "[x\[xs ! ?nmi] @ nths xs (A - {?nmi}). key x = k] = + [x\nths xs A. key x = k]" + (is "[x\[?xmi] @ _. _] = _") +proof (simp, rule conjI, rule_tac [!] impI, rule_tac [2] filter_nths_diff, + rule_tac [2] mini_less, simp_all add: A) + assume C: "key ?xmi = k" + moreover have "?nmi < length xs" + using A by (rule_tac mini_less, simp) + ultimately have "0 < length [x\xs. key x = k]" + by (simp add: length_filter_conv_card card_gt_0_iff, rule_tac exI, rule_tac conjI) + hence "\y ys. [x\xs. key x = k] = y # ys" + by (rule_tac list.exhaust [of "[x\xs. key x = k]"], simp_all) + then obtain y and ys where "[x\xs. key x = k] = y # ys" by blast + hence "\us vs. xs = us @ y # vs \ + (\u \ set us. key u \ k) \ key y = k \ ys = [x\vs. key x = k]" + (is "\us vs. ?P us vs") + by (simp add: filter_eq_Cons_iff) + then obtain us and vs where D: "?P us vs" by blast + have E: "length us = ?nmi" + proof (rule ccontr, erule neqE) + assume "length us < ?nmi" + with A have "key ?xmi < key (xs ! length us)" + by (rule mini_first) + also have "\ = k" + using D by simp + finally show False + using C by simp + next + assume F: "?nmi < length us" + hence "?xmi = us ! ?nmi" + using D by (simp add: nth_append) + hence "?xmi \ set us" + using F by simp + thus False + using C and D by simp + qed + moreover have "xs ! length us = y" + using D by simp + ultimately have F: "?xmi = y" by simp + have "nths xs A = nths us A @ nths (y # vs) {i. i + ?nmi \ A}" + using D and E by (simp add: nths_append) + also have "\ = nths us A @ y # nths vs {i. Suc i + ?nmi \ A}" + (is "_ = _ @ _ # ?ws") + using B by (simp add: nths_Cons) + finally have G: "[x\nths xs A. key x = k] = y # [x\?ws. key x = k]" + using D by (simp add: filter_empty_conv, rule_tac ballI, + drule_tac in_set_nthsD, simp) + have "nths xs (A - {?nmi}) = nths us (A - {?nmi}) @ + nths (y # vs) {i. i + ?nmi \ A - {?nmi}}" + using D and E by (simp add: nths_append) + also have "\ = nths us (A - {?nmi}) @ ?ws" + by (simp add: nths_Cons) + finally have H: "[x\nths xs (A - {?nmi}). key x = k] = [x\?ws. key x = k]" + using D by (simp add: filter_empty_conv, rule_tac ballI, + drule_tac in_set_nthsD, simp) + show "?xmi # [x\nths xs (A - {?nmi}). key x = k] = + [x\nths xs A. key x = k]" + using F and G and H by simp +qed + +lemma maxi_stable: + assumes + A: "xs \ []" and + B: "maxi xs key \ A" + (is "?nma \ _") + shows "[x\nths xs (A - {?nma}) @ [xs ! ?nma]. key x = k] = + [x\nths xs A. key x = k]" + (is "[x\_ @ [?xma]. _] = _") +proof (simp, rule conjI, rule_tac [!] impI, rule_tac [2] filter_nths_diff, + rule_tac [2] maxi_less, simp_all add: A) + assume C: "key ?xma = k" + moreover have D: "?nma < length xs" + using A by (rule_tac maxi_less, simp) + ultimately have "0 < length [x\rev xs. key x = k]" + by (simp add: length_filter_conv_card card_gt_0_iff, + rule_tac exI [where x = "length xs - Suc ?nma"], simp add: rev_nth) + hence "\y ys. [x\rev xs. key x = k] = y # ys" + by (rule_tac list.exhaust [of "[x\rev xs. key x = k]"], simp_all) + then obtain y and ys where "[x\rev xs. key x = k] = y # ys" by blast + hence "\us vs. rev xs = us @ y # vs \ + (\u \ set us. key u \ k) \ key y = k \ ys = [x\vs. key x = k]" + (is "\us vs. ?P us vs") + by (simp add: filter_eq_Cons_iff) + then obtain us and vs where E: "?P us vs" by blast + hence F: "xs = rev vs @ y # rev us" + by (simp add: rev_swap) + have G: "length us = length xs - Suc ?nma" + proof (rule ccontr, erule neqE) + assume "length us < length xs - Suc ?nma" + hence "?nma < length xs - Suc (length us)" by simp + moreover have "length xs - Suc (length us) < length xs" + using D by simp + ultimately have "key (xs ! (length xs - Suc (length us))) < key ?xma" + by (rule maxi_last [OF A]) + moreover have "length us < length xs" + using F by simp + ultimately have "key (rev xs ! length us) < key ?xma" + by (simp add: rev_nth) + moreover have "key (rev xs ! length us) = k" + using E by simp + ultimately show False + using C by simp + next + assume H: "length xs - Suc ?nma < length us" + hence "rev xs ! (length xs - Suc ?nma) = us ! (length xs - Suc ?nma)" + using E by (simp add: nth_append) + hence "rev xs ! (length xs - Suc ?nma) \ set us" + using H by simp + hence "?xma \ set us" + using D by (simp add: rev_nth) + thus False + using C and E by simp + qed + moreover have "rev xs ! length us = y" + using E by simp + ultimately have H: "?xma = y" + using D by (simp add: rev_nth) + have "length xs = length us + Suc ?nma" + using D and G by simp + hence I: "length vs = ?nma" + using F by simp + hence "nths xs A = nths (rev vs) A @ nths (y # rev us) {i. i + ?nma \ A}" + using F by (simp add: nths_append) + also have "\ = nths (rev vs) (A \ {.. A}" + (is "_ = ?ws @ _ # _") + using B and I by (simp add: nths_Cons, subst nths_range, simp) + finally have J: "[x\nths xs A. key x = k] = [x\?ws. key x = k] @ [y]" + using E by (simp add: filter_empty_conv, rule_tac ballI, + drule_tac in_set_nthsD, simp) + have "nths xs (A - {?nma}) = nths (rev vs) (A - {?nma}) @ + nths (y # rev us) {i. i + ?nma \ A - {?nma}}" + using F and I by (simp add: nths_append) + hence "nths xs (A - {?nma}) = nths (rev vs) ((A - {?nma}) \ {.. A}" + using I by (simp add: nths_Cons, subst nths_range, simp) + moreover have "(A - {?nma}) \ {.. {..nths xs (A - {?nma}). key x = k] = + [x\?ws. key x = k]" + using E by (simp add: filter_empty_conv, rule_tac ballI, + drule_tac in_set_nthsD, simp) + show "[x\nths xs (A - {?nma}). key x = k] @ [?xma] = + [x\nths xs A. key x = k]" + using H and J and K by simp +qed + +lemma round_stab_inv [rule_format]: + "index_less index key \ index_same index key \ bn_inv p q t \ + add_inv n t \ stab_inv f key t \ stab_inv f key (round index key p q r t)" +proof (induction index key p q r t arbitrary: n f rule: round.induct, + (rule_tac [!] impI)+, simp, simp, simp_all only: simp_thms) + fix index p q r u ns xs n f and key :: "'a \ 'b" + let ?t = "round index key p q r (u, ns, tl xs)" + assume + "\n f. bn_inv p q (u, ns, tl xs) \ add_inv n (u, ns, tl xs) \ + stab_inv f key (u, ns, tl xs) \ stab_inv f key ?t" and + "bn_inv p q (u, Suc 0 # ns, xs)" and + "add_inv n (u, Suc 0 # ns, xs)" and + "stab_inv f key (u, Suc 0 # ns, xs)" + thus "stab_inv f key (round index key p q r (u, Suc 0 # ns, xs))" + proof (cases ?t, cases xs, simp_all add: add_suc, arith, erule_tac conjE, + rule_tac allI, simp) + fix k y ys xs' + let ?f' = "f(key y := tl (f (key y)))" + assume "\n' f'. foldl (+) 0 ns = n' \ length ys = n' \ + (\k'. [x\ys. key x = k'] = f' k') \ (\k'. [x\xs'. key x = k'] = f' k')" + moreover assume "Suc (foldl (+) 0 ns) = n" and "Suc (length ys) = n" + ultimately have "(\k'. [x\ys. key x = k'] = ?f' k') \ + (\k'. [x\xs'. key x = k'] = ?f' k')" + by blast + moreover assume A: "\k'. (if key y = k' + then y # [x\ys. key x = k'] else [x\ys. key x = k']) = f k'" + hence B: "f (key y) = y # [x\ys. key x = key y]" + by (drule_tac spec [where x = "key y"], simp) + hence "\k'. [x\ys. key x = k'] = ?f' k'" + proof (rule_tac allI, simp, rule_tac impI) + fix k' + assume "k' \ key y" + thus "[x\ys. key x = k'] = f k'" + using A by (drule_tac spec [where x = k'], simp) + qed + ultimately have C: "\k'. [x\xs'. key x = k'] = ?f' k'" .. + show "(key y = k \ y # [x\xs'. key x = k] = f k) \ + (key y \ k \ [x\xs'. key x = k] = f k)" + proof (rule conjI, rule_tac [!] impI) + assume "key y = k" + moreover have "tl (f (key y)) = [x\xs'. key x = key y]" + using C by simp + hence "f (key y) = y # [x\xs'. key x = key y]" + using B by (subst hd_Cons_tl [symmetric], simp_all) + ultimately show "y # [x\xs'. key x = k] = f k" by simp + next + assume "key y \ k" + thus "[x\xs'. key x = k] = f k" + using C by simp + qed + qed +next + fix index p q r u m ns n f and key :: "'a \ 'b" and xs :: "'a list" + let ?ws = "take (Suc (Suc m)) xs" + let ?ys = "drop (Suc (Suc m)) xs" + let ?r = "\m'. round_suc_suc index key ?ws m m' u" + let ?t = "\r' v. round index key p q r' (v, ns, ?ys)" + assume + A: "index_less index key" and + B: "index_same index key" + assume + "\ws a b c d e g h i n f. + ws = ?ws \ a = bn_comp m p q r \ (b, c) = bn_comp m p q r \ + d = ?r b \ (e, g) = ?r b \ (h, i) = g \ + bn_inv p q (e, ns, ?ys) \ add_inv n (e, ns, ?ys) \ + stab_inv f key (e, ns, ?ys) \ stab_inv f key (?t c e)" and + "bn_inv p q (u, Suc (Suc m) # ns, xs)" and + "add_inv n (u, Suc (Suc m) # ns, xs)" and + "stab_inv f key (u, Suc (Suc m) # ns, xs)" + thus "stab_inv f key (round index key p q r (u, Suc (Suc m) # ns, xs))" + proof (simp split: prod.split, ((rule_tac allI)+, ((rule_tac impI)+)?)+, + (erule_tac conjE)+, subst (asm) (2) add_base_zero, simp) + fix m' r' v ms' ws' xs' k + let ?f = "\k. [x\?ys. key x = k]" + let ?P = "\f. \k. [x\?ys. key x = k] = f k" + let ?Q = "\f. \k. [x\xs'. key x = k] = f k" + assume + C: "?r m' = (v, ms', ws')" and + D: "bn_comp m p q r = (m', r')" and + E: "bn_valid m p q" and + F: "Suc (Suc (foldl (+) 0 ns + m)) = n" and + G: "length xs = n" + assume "\ws a b c d e g h i n' f. + ws = ?ws \ a = (m', r') \ b = m' \ c = r' \ + d = (v, ms', ws') \ e = v \ g = (ms', ws') \ h = ms' \ i = ws' \ + foldl (+) 0 ns = n' \ n - Suc (Suc m) = n' \ ?P f \ ?Q f" + hence "\f. foldl (+) 0 ns = n - Suc (Suc m) \ ?P f \ ?Q f" + by simp + hence "\f. ?P f \ ?Q f" + using F by simp + hence "?P ?f \ ?Q ?f" . + hence "[x\xs'. key x = k] = [x\?ys. key x = k]" by simp + moreover assume "\k. [x\xs. key x = k] = f k" + hence "f k = [x\?ws @ ?ys. key x = k]" by simp + ultimately have "f k = [x\?ws. key x = k] @ [x\xs'. key x = k]" + by (subst (asm) filter_append, simp) + with C [symmetric] show "[x\ws'. key x = k] @ [x\xs'. key x = k] = f k" + proof (simp add: round_suc_suc_def Let_def del: filter.simps + split: if_split_asm) + let ?nmi = "mini ?ws key" + let ?nma = "maxi ?ws key" + let ?xmi = "?ws ! ?nmi" + let ?xma = "?ws ! ?nma" + let ?mi = "key ?xmi" + let ?ma = "key ?xma" + let ?k = "case m of 0 \ m | Suc 0 \ m | Suc (Suc i) \ u + m'" + let ?zs = "nths ?ws (- {?nmi, ?nma})" + let ?ms = "enum ?zs index key ?k ?mi ?ma" + have H: "length ?ws = Suc (Suc m)" + using F and G by simp + hence I: "?nmi \ ?nma" + by (rule_tac mini_maxi_neq, simp) + have "[x\(([?xmi] @ map the (fill ?zs (offs ?ms 0) index key m ?mi ?ma)) + @ [?xma]). key x = k] = [x\?ws. key x = k]" + proof (cases "m = 0") + case True + have "?nmi < length ?ws" + using H by (rule_tac mini_less, simp) + hence J: "?nmi < Suc (Suc 0)" + using True by simp + moreover have "?nma < length ?ws" + using H by (rule_tac maxi_less, simp) + hence K: "?nma < Suc (Suc 0)" + using True by simp + ultimately have "card ({..[?xmi] @ nths ?ws ({..[?xma]. key x = k] = [x\?ws. key x = k]" + proof (subst mini_stable, simp only: length_greater_0_conv + [symmetric] H, simp add: I J, subst filter_append [symmetric]) + show "[x\nths ?ws ({..?ws. key x = k]" + by (subst maxi_stable, simp only: length_greater_0_conv + [symmetric] H, simp add: K, simp add: True) + qed + qed + next + case False + hence "0 < ?k" + by (simp, drule_tac bn_comp_fst_nonzero [OF E], subst (asm) D, + simp split: nat.split) + hence "[x\map the (fill ?zs (offs ?ms 0) index key (length ?zs) ?mi ?ma). + key x = k] = [x\?zs. k = key x]" + by (rule_tac fill_offs_enum_stable [OF A B], simp, rule_tac conjI, + ((rule_tac mini_lb | rule_tac maxi_ub), erule_tac in_set_nthsD)+) + moreover have "[x\?zs. k = key x] = [x\?zs. key x = k]" + by (rule filter_cong, simp, blast) + ultimately have + J: "[x\map the (fill ?zs (offs ?ms 0) index key m ?mi ?ma). + key x = k] = [x\?zs. key x = k]" + using H by (simp add: mini_maxi_nths) + show ?thesis + proof (simp only: filter_append J, subst Compl_eq_Diff_UNIV, + subst Diff_insert, subst filter_append [symmetric]) + show "[x\[?xmi] @ nths ?ws (UNIV - {?nma} - {?nmi}). key x = k] + @ [x\[?xma]. key x = k] = [x\?ws. key x = k]" + proof (subst mini_stable, simp only: length_greater_0_conv + [symmetric] H, simp add: I, subst filter_append [symmetric]) + show "[x\nths ?ws (UNIV - {?nma}) @ [?xma]. key x = k] = + [x\?ws. key x = k]" + by (subst maxi_stable, simp only: length_greater_0_conv + [symmetric] H, simp, subst nths_range, subst H, simp) + qed + qed + qed + thus "[x\?xmi # map the (fill ?zs (offs ?ms 0) index key m ?mi ?ma) @ + [?xma]. key x = k] = [x\?ws. key x = k]" + by simp + qed + qed +qed + +lemma gcsort_stab_inv: + assumes + A: "index_less index key" and + B: "index_same index key" and + C: "add_inv n t" and + D: "n \ p" + shows "\t' \ gcsort_set index key p t; stab_inv f key t\ \ + stab_inv f key t'" +by (erule gcsort_set.induct, simp, drule gcsort_add_inv [OF A _ C D], + rule round_stab_inv [OF A B], simp_all del: bn_inv.simps, erule conjE, + frule sym, erule subst, rule bn_inv_intro, insert D, simp) + +text \ +\null + +The only remaining task is to address step 10 of the proof method, which is done by proving theorem +@{text gcsort_stable}. It holds under the conditions that the objects' number is not larger than the +counters' upper bound and function @{text index} satisfies both predicates @{const index_less} and +@{const index_same}, and states that function @{const gcsort} leaves unchanged the sublist of the +objects having a given key within the input objects' list. + +\null +\ + +theorem gcsort_stable: + assumes + A: "index_less index key" and + B: "index_same index key" and + C: "length xs \ p" + shows "[x\gcsort index key p xs. key x = k] = [x\xs. key x = k]" +proof - + let ?t = "(0, [length xs], xs)" + have "stab_inv (\k. [x\xs. key x = k]) key (gcsort_aux index key p ?t)" + by (rule gcsort_stab_inv [OF A B _ C], rule gcsort_add_input, + rule gcsort_aux_set, rule gcsort_stab_input) + hence "[x\gcsort_out (gcsort_aux index key p ?t). key x = k] = + [x\xs. key x = k]" + by (rule gcsort_stab_intro) + moreover have "?t = gcsort_in xs" + by (simp add: gcsort_in_def) + ultimately show ?thesis + by (simp add: gcsort_def) +qed + +end diff --git a/thys/Generalized_Counting_Sort/document/root.bib b/thys/Generalized_Counting_Sort/document/root.bib new file mode 100644 --- /dev/null +++ b/thys/Generalized_Counting_Sort/document/root.bib @@ -0,0 +1,69 @@ +@book{ + R1, + author={D. E. Knuth}, + title={The Art of Computer Programming, Volume 3: Sorting and Searching}, + edition={2nd}, + publisher={Addison-Wesley}, + year=1998 +} + +@misc{ + R2, + author={{Wikipedia contributors}}, + title={Counting sort --- {Wikipedia, The Free Encyclopedia}}, + month=sep, + year=2019, + note={\url{https://en.wikipedia.org/w/index.php?title=Counting_sort&oldid=915065502}} +} + +@misc{ + R3, + author={P. E. Black}, + title={Histogram sort --- {Dictionary of Algorithms and Data Structures}}, + month=feb, + year=2019, + note={\url{https://www.nist.gov/dads/HTML/histogramSort.html}} +} + +@article{ + R4, + author={Pasquale Noce}, + title={{A General Method for the Proof of Theorems on Tail-recursive Functions}}, + journal={Archive of Formal Proofs}, + month=dec, + year=2013, + note={\url{http://isa-afp.org/entries/Tail_Recursive_Functions.html}, Formal proof development}, + ISSN={2150-914x} +} + +@manual{ + R5, + author={Tobias Nipkow and Lawrence Paulson and Markus Wenzel}, + title={Isabelle/HOL --- A Proof Assistant for Higher-Order Logic}, + month=jun, + year=2019, + note={\url{https://isabelle.in.tum.de/website-Isabelle2019/dist/Isabelle2019/doc/tutorial.pdf}} +} + +@manual{ + R6, + author={Tobias Nipkow}, + title={Programming and Proving in Isabelle/HOL}, + month=jun, + year=2019, + note={\url{https://isabelle.in.tum.de/website-Isabelle2019/dist/Isabelle2019/doc/prog-prove.pdf}} +} + +@manual{ + R7, + author={Tobias Nipkow}, + title={A Tutorial Introduction to Structured Isar Proofs}, + note={\url{https://isabelle.in.tum.de/website-Isabelle2011/dist/Isabelle2011/doc/isar-overview.pdf}} +} + +@manual{ + R8, + author={Alexander Krauss}, + title={Defining Recursive Functions in Isabelle/HOL}, + note={\url{https://isabelle.in.tum.de/website-Isabelle2019/dist/Isabelle2019/doc/functions.pdf}} +} diff --git a/thys/Generalized_Counting_Sort/document/root.tex b/thys/Generalized_Counting_Sort/document/root.tex new file mode 100644 --- /dev/null +++ b/thys/Generalized_Counting_Sort/document/root.tex @@ -0,0 +1,77 @@ +\documentclass[11pt,a4paper,fleqn]{article} +\usepackage{isabelle,isabellesym} +\usepackage{amsfonts} +\usepackage{amsmath} +\usepackage{cancel} +\renewcommand{\isastyletxt}{\isastyletext} + +% further packages required for unusual symbols (see also +% isabellesym.sty), use only when needed + +%\usepackage{amssymb} + %for \, \, \, \, \, \, + %\, \, \, \, \, + %\, \, \ + +%\usepackage{eurosym} + %for \ + +%\usepackage[only,bigsqcap]{stmaryrd} + %for \ + +%\usepackage{eufrak} + %for \ ... \, \ ... \ (also included in amssymb) + +%\usepackage{textcomp} + %for \, \, \, \, \, + %\ + +% this should be the last package used +\usepackage{pdfsetup} + +% urls in roman style, theory text in math-similar italics +\urlstyle{rm} +\isabellestyle{it} + +% for uniform font size +%\renewcommand{\isastyle}{\isastyleminor} + + +\begin{document} + +\title{An Efficient Generalization of Counting Sort\\for Large, possibly Infinite Key Ranges} +\author{Pasquale Noce\\Software Engineer at HID Global, Italy\\pasquale dot noce dot lavoro at gmail dot com\\pasquale dot noce at hidglobal dot com} +\maketitle + +\begin{abstract} +Counting sort is a well-known algorithm that sorts objects of any kind mapped to +integer keys, or else to keys in one-to-one correspondence with some subset of +the integers (e.g. alphabet letters). However, it is suitable for direct use, +viz. not just as a subroutine of another sorting algorithm (e.g. radix sort), +only if the key range is not significantly larger than the number of the objects +to be sorted. + +This paper describes a tail-recursive generalization of counting sort making use +of a bounded number of counters, suitable for direct use in case of a large, or +even infinite key range of any kind, subject to the only constraint of being a +subset of an arbitrary linear order. After performing a pen-and-paper analysis +of how such algorithm has to be designed to maximize its efficiency, this paper +formalizes the resulting generalized counting sort (GCsort) algorithm and then +formally proves its correctness properties, namely that (a) the counters' number +is maximized never exceeding the fixed upper bound, (b) objects are conserved, +(c) objects get sorted, and (d) the algorithm is stable. +\end{abstract} + +\tableofcontents + +% sane default for proof documents +\parindent 0pt\parskip 0.5ex + +% generated text of all theories +\input{session} + +% bibliography +\bibliographystyle{abbrv} +\bibliography{root} + +\end{document} diff --git a/thys/ROOTS b/thys/ROOTS --- a/thys/ROOTS +++ b/thys/ROOTS @@ -1,504 +1,505 @@ AODV Auto2_HOL Auto2_Imperative_HOL AVL-Trees AWN Abortable_Linearizable_Modules Abs_Int_ITP2012 Abstract-Hoare-Logics Abstract-Rewriting Abstract_Completeness Abstract_Soundness Adaptive_State_Counting Affine_Arithmetic Aggregation_Algebras Akra_Bazzi Algebraic_Numbers Algebraic_VCs Allen_Calculus Amortized_Complexity AnselmGod Applicative_Lifting Architectural_Design_Patterns Aristotles_Assertoric_Syllogistic ArrowImpossibilityGS AutoFocus-Stream Automatic_Refinement AxiomaticCategoryTheory BDD BNF_Operations Bell_Numbers_Spivey Berlekamp_Zassenhaus Bernoulli Bertrands_Postulate BinarySearchTree Binding_Syntax_Theory Binomial-Heaps Binomial-Queues BNF_CC Bondy Boolean_Expression_Checkers Bounded_Deducibility_Security Buchi_Complementation Budan_Fourier Buffons_Needle Buildings BytecodeLogicJmlTypes C2KA_DistributedSystems CAVA_Automata CAVA_LTL_Modelchecker CCS CISC-Kernel CRDT CYK CakeML CakeML_Codegen Call_Arity Card_Equiv_Relations Card_Multisets Card_Number_Partitions Card_Partitions Cartan_FP Case_Labeling Catalan_Numbers Category Category2 Category3 Cauchy Cayley_Hamilton Certification_Monads Chord_Segments Circus Clean ClockSynchInst CofGroups Coinductive Coinductive_Languages Collections Comparison_Sort_Lower_Bound Compiling-Exceptions-Correctly Completeness Complete_Non_Orders Complx ComponentDependencies ConcurrentGC ConcurrentIMP Concurrent_Ref_Alg Concurrent_Revisions Consensus_Refined Constructive_Cryptography Constructor_Funs Containers CoreC++ Core_DOM Count_Complex_Roots CryptHOL CryptoBasedCompositionalProperties DFS_Framework DPT-SAT-Solver DataRefinementIBP Datatype_Order_Generator Decl_Sem_Fun_PL Decreasing-Diagrams Decreasing-Diagrams-II Deep_Learning Density_Compiler Dependent_SIFUM_Refinement Dependent_SIFUM_Type_Systems Depth-First-Search Derangements Deriving Descartes_Sign_Rule Dict_Construction Differential_Dynamic_Logic Differential_Game_Logic Dijkstra_Shortest_Path Diophantine_Eqns_Lin_Hom Dirichlet_L Dirichlet_Series Discrete_Summation DiscretePricing DiskPaxos DynamicArchitectures Dynamic_Tables E_Transcendental Echelon_Form EdmondsKarp_Maxflow Efficient-Mergesort Elliptic_Curves_Group_Law Encodability_Process_Calculi Epistemic_Logic Ergodic_Theory Error_Function Euler_MacLaurin Euler_Partition Example-Submission Factored_Transition_System_Bounding Farkas FFT FLP FOL-Fitting FOL_Harrison FOL_Seq_Calc1 Falling_Factorial_Sum FeatherweightJava Featherweight_OCL Fermat3_4 FileRefinement FinFun Finger-Trees Finite_Automata_HF First_Order_Terms First_Welfare_Theorem Fishburn_Impossibility Fisher_Yates Flow_Networks Floyd_Warshall Flyspeck-Tame FocusStreamsCaseStudies Formal_SSA Formula_Derivatives Fourier Free-Boolean-Algebra Free-Groups FunWithFunctions FunWithTilings Functional-Automata Functional_Ordered_Resolution_Prover GPU_Kernel_PL Gabow_SCC Game_Based_Crypto Gauss-Jordan-Elim-Fun Gauss_Jordan GenClock General-Triangle +Generalized_Counting_Sort Generic_Deriving Generic_Join GewirthPGCProof Girth_Chromatic GoedelGod GraphMarkingIBP Graph_Saturation Graph_Theory Green Groebner_Bases Groebner_Macaulay Gromov_Hyperbolicity Group-Ring-Module HOL-CSP HOLCF-Prelude HRB-Slicing Heard_Of HereditarilyFinite Hermite Hidden_Markov_Models Higher_Order_Terms Hoare_Time HotelKeyCards Huffman Hybrid_Multi_Lane_Spatial_Logic Hybrid_Systems_VCs HyperCTL IEEE_Floating_Point IMAP-CRDT IMO2019 IMP2 IMP2_Binary_Heap IP_Addresses Imperative_Insertion_Sort Impossible_Geometry Incompleteness Incredible_Proof_Machine Inductive_Confidentiality InfPathElimination InformationFlowSlicing InformationFlowSlicing_Inter Integration Interval_Arithmetic_Word32 Iptables_Semantics Irrationality_J_Hancl Isabelle_Meta_Model Jacobson_Basic_Algebra Jinja JinjaThreads JiveDataStoreModel Jordan_Hoelder Jordan_Normal_Form KAD KAT_and_DRA KBPs KD_Tree Key_Agreement_Strong_Adversaries Kleene_Algebra Knot_Theory Knuth_Morris_Pratt Koenigsberg_Friendship Kruskal Kuratowski_Closure_Complement LLL_Basis_Reduction LLL_Factorization LOFT LTL LTL_to_DRA LTL_to_GBA LTL_Master_Theorem Lam-ml-Normalization LambdaAuth LambdaMu Lambda_Free_KBOs Lambda_Free_RPOs Landau_Symbols Laplace_Transform Latin_Square LatticeProperties Lambda_Free_EPO Launchbury Lazy-Lists-II Lazy_Case Lehmer Lifting_Definition_Option LightweightJava LinearQuantifierElim Linear_Inequalities Linear_Programming Linear_Recurrences Liouville_Numbers List-Index List-Infinite List_Interleaving List_Inversions List_Update LocalLexing Localization_Ring Locally-Nameless-Sigma Lowe_Ontological_Argument Lower_Semicontinuous Lp MFMC_Countable MSO_Regex_Equivalence Markov_Models Marriage Mason_Stothers Matrix Matrix_Tensor Matroids Max-Card-Matching Median_Of_Medians_Selection Menger MFOTL_Monitor MiniML Minimal_SSA Minkowskis_Theorem Minsky_Machines Modal_Logics_for_NTS Modular_Assembly_Kit_Security Monad_Memo_DP Monad_Normalisation MonoBoolTranAlgebra MonoidalCategory Monomorphic_Monad MuchAdoAboutTwo Multirelations Multi_Party_Computation Myhill-Nerode Name_Carrying_Type_Inference Nat-Interval-Logic Native_Word Nested_Multisets_Ordinals Network_Security_Policy_Verification Neumann_Morgenstern_Utility No_FTL_observers Nominal2 Noninterference_CSP Noninterference_Concurrent_Composition Noninterference_Generic_Unwinding Noninterference_Inductive_Unwinding Noninterference_Ipurge_Unwinding Noninterference_Sequential_Composition NormByEval Nullstellensatz Octonions Open_Induction OpSets Optics Optimal_BST Orbit_Stabiliser Order_Lattice_Props Ordered_Resolution_Prover Ordinal Ordinals_and_Cardinals Ordinary_Differential_Equations PCF PLM Pell POPLmark-deBruijn PSemigroupsConvolution Pairing_Heap Paraconsistency Parity_Game Partial_Function_MR Partial_Order_Reduction Password_Authentication_Protocol Perfect-Number-Thm Perron_Frobenius Pi_Calculus Pi_Transcendental Planarity_Certificates Polynomial_Factorization Polynomial_Interpolation Polynomials Pop_Refinement Posix-Lexing Possibilistic_Noninterference Pratt_Certificate Presburger-Automata Prim_Dijkstra_Simple Prime_Distribution_Elementary Prime_Harmonic_Series Prime_Number_Theorem Priority_Queue_Braun Priority_Search_Trees Probabilistic_Noninterference Probabilistic_Prime_Tests Probabilistic_System_Zoo Probabilistic_Timed_Automata Probabilistic_While Projective_Geometry Program-Conflict-Analysis Promela Proof_Strategy_Language PropResPI Propositional_Proof_Systems Prpu_Maxflow PseudoHoops Psi_Calculi Ptolemys_Theorem QHLProver QR_Decomposition Quantales Quaternions Quick_Sort_Cost RIPEMD-160-SPARK ROBDD RSAPSS Ramsey-Infinite Random_BSTs Randomised_BSTs Random_Graph_Subgraph_Threshold Randomised_Social_Choice Rank_Nullity_Theorem Real_Impl Recursion-Theory-I Refine_Imperative_HOL Refine_Monadic RefinementReactive Regex_Equivalence Regular-Sets Regular_Algebras Relation_Algebra Rep_Fin_Groups Residuated_Lattices Resolution_FOL Rewriting_Z Ribbon_Proofs Robbins-Conjecture Root_Balanced_Tree Routing Roy_Floyd_Warshall Safe_OCL SATSolverVerification SDS_Impossibility SIFPL SIFUM_Type_Systems SPARCv8 Secondary_Sylow Security_Protocol_Refinement Selection_Heap_Sort SenSocialChoice Separata Separation_Algebra Separation_Logic_Imperative_HOL SequentInvertibility Shivers-CFA ShortestPath Show Sigma_Commit_Crypto Signature_Groebner Simpl Simple_Firewall Simplex Skew_Heap Slicing Smooth_Manifolds Sort_Encodings Source_Coding_Theorem Special_Function_Bounds Splay_Tree Sqrt_Babylonian Stable_Matching Statecharts Stellar_Quorums Stern_Brocot Stewart_Apollonius Stirling_Formula Stochastic_Matrices Stone_Algebras Stone_Kleene_Relation_Algebras Stone_Relation_Algebras Store_Buffer_Reduction Stream-Fusion Stream_Fusion_Code Strong_Security Sturm_Sequences Sturm_Tarski Stuttering_Equivalence Subresultants SumSquares SuperCalc Surprise_Paradox Symmetric_Polynomials Szpilrajn TESL_Language TLA Tail_Recursive_Functions Tarskis_Geometry Taylor_Models Timed_Automata Topology TortoiseHare Transcendence_Series_Hancl_Rucki Transformer_Semantics Transition_Systems_and_Automata Transitive-Closure Transitive-Closure-II Treaps Tree-Automata Tree_Decomposition Triangle Trie Twelvefold_Way Tycon Types_Tableaus_and_Goedels_God Universal_Turing_Machine UPF UPF_Firewall UpDown_Scheme UTP Valuation VectorSpace Verified-Prover VerifyThis2018 VerifyThis2019 Vickrey_Clarke_Groves VolpanoSmith WHATandWHERE_Security WebAssembly Weight_Balanced_Trees Well_Quasi_Orders Winding_Number_Eval Word_Lib WorkerWrapper XML Zeta_Function ZFC_in_HOL pGCL