bachelor_thesis/thesis/chapters/implementation.tex

% vim: ft=tex
\section{Implementation}

This chapter discusses how the concepts introduced before are implemented into
a simulator. Futher the infrastructure around the simulation and some tools are
explained.

The implementation is written as a \lstinline{python3} module. This allows
users to quickly construct circuit, apply them to a state and measure
amplitudes.  Full access to the state (including intermediate) state has been
priorized over execution speed. To keep the simulation speed as high as
possible under these constraints some parts are implemented in \lstinline{C}

\subsection{Dense State Vector Simulation}

\subsubsection{Representation of Dense State Vectors}

Recalling \eqref{eq:ci} any $n$-qbit state can be represented as a $2^n$
component vector in the integer state basis. This representation has some
useful features when it comes to computations:

\begin{itemize}
    \item{The projection on the integer states is trivial.}
    \item{For any qbit $j$ and $0 \le i \le 2^n-1$ the coefficient $c_i$ is part of the $\ket{1}_j$ amplitude iff
        $i \& (1 << j)$ and part of the $\ket{0}_j$ amplitude otherwise.}
    \item{For a qbit $j$ the coefficients $c_i$ and $c_{i \hat{} (1 << j)}$ are the conjugated coefficients.}
\end{itemize}

Where $\hat{}$ is the binary XOR, $\&$ the binary AND and $<<$ the binary
leftshift operator.

While implementing the dense state vectors two key points were allowing
a simple and readable way to use them and simple access to the states by users
that want more information than an abstracted view could allow. To meet both
requirements the states are implemented as Python objects providing abstract
features such as normalization checking, checking for sufficient qbit number
when applying a circuit, computing overlaps with other states, a stringify
method and stored measurement results.  To store the measurement results
a NumPy \lstinline{int8} array \cite{numpy_array} is used; this is called the
classical state.  The Python states also have a NumPy \lstinline{cdouble} array
that stores the quantum mechanical state.  Using NumPy arrays has the advantage
that access to the data is simple and safe while operations on the states can
be implemented in \lstinline{C} \cite{numpy_ufunc} providing a considerable
speedup.

This quantum mechanical state is the component vector in integer basis
therefore it has $2^n$ components.  Storing those components is acceptable in
a range from $1$ to $30$ qbits; above this range the state requires space in
the order of $1 \mbox{ GiB}$  which is in the range of usual RAM sizes for
personal computers. For higher qbit numbers moving to high performance
computers and other simulators is necessary.

\subsubsection{Gates}

Gates on dense state vectors are implemented as NumPy Universal Functions
(ufuncs) \cite{numpy_ufunc} mapping a classical and a quantum state to a new
classical state, a new quantum state and a $64 \mbox{ bit}$ integer indicating
what qbits have been measured. Using ufuncs has the great advantage that
managing memory is done by NumPy and an application programmer just has to
implement the logic of the function. Because ufuncs are written in
\lstinline{C} they provide a considerable speedup compared to an implementation
in Python.

The logic of gates is usually easy to implement using the integer basis. The
example below implements the Hadamard gate \ref{ref:singleqbitgates}:

\adjustbox{max width=\textwidth}{\lstinputlisting[language=C, firstline=153, lastline=178]{../pyqcs/src/pyqcs/gates/implementations/basic_gates.c}}

A basic set of gates is implemented in PyQCS:

\begin{itemize}
    \item{Hadamard $H$ gate.}
    \item{Pauli $X$ or \textit{NOT} gate.}
    \item{Pauli $Z$ gate.}
    \item{The $S$ phase gate.}
    \item{$Z$ rotation $R_\phi$ gate.}
    \item{Controlled $X$ gate: $CX$.}
    \item{Controlled $Z$ gate: $CZ$.}
    \item{The measurement "gate" $M$.}
\end{itemize}

To allow the implementation of possible hardware related gates the class
\lstinline{GenericGate} takes a unitary $2\times2$ matrix as a NumPy
\lstinline{cdouble} array and builds a gate from it.

\subsubsection{Circuits}
\label{ref:pyqcs_circuits}

As mentioned in \ref{ref:quantum_circuits} quantum circuits are central in
quantum programming.  In the implementation great care was taken to make
writing circuits as convenient and readable as possible. Users will almost
never access the actual gates that perform the operation on a state; instead
they will handle circuits.\\ Circuits can be applied to a state by multiplying
them from the left on a state object:

\begin{lstlisting}[language=Python]
new_state = circuit * state
\end{lstlisting}


The elementary gates such as $H, R_\phi, CX$ are implemented as single gate
circuits and can be constructing using the built-in generators. The generators
take the act-qbit as first argument, parameters such as the control qbit or an
angle as second argument:

%\adjustbox{max width=\textwidth}{
    \begin{lstlisting}[language=Python]
In [1]: from pyqcs import CX, CZ, H, R, Z, X 
   ...: from pyqcs import State 
   ...:  
   ...: state = State.new_zero_state(2) 
   ...: intermediate_state = H(0) * state 
   ...:  
   ...: bell_state = CX(1, 0) * intermediate_state 

In [2]: bell_state                                                                                                                                                                                                                                                                        
Out[2]: (0.7071067811865476+0j)*|0b0> + (0.7071067811865476+0j)*|0b11>
\end{lstlisting}
%}

Large circuits can be constructed using the binary OR operator \lstinline{|} in
an analogy to the pipeline operator on many *NIX systems. As usual circuits are
read from left to right similar to pipelines on *NIX systems:


%\adjustbox{max width=\textwidth}{
\begin{lstlisting}[language=Python]
In [1]: from pyqcs import CX, CZ, H, R, Z, X
   ...: from pyqcs import State
   ...:
   ...: state = State.new_zero_state(2)
   ...: 
   ...: # This is the same as
   ...: # circuit = H(0) | CX(1, 0)
   ...: circuit = H(0) | H(1) | CZ(1, 0) | H(1)
   ...:
   ...: bell_state = circuit * state

In [2]: bell_state
Out[2]: (0.7071067811865477+0j)*|0b0> + (0.7071067811865477+0j)*|0b11>
\end{lstlisting}
%}

A quick way to generate circuits programatically is to use the \lstinline{list_to_circuit}
function:

%\adjustbox{max width=\textwidth}{
\begin{lstlisting}[language=Python]
In [1]: from pyqcs import CX, CZ, H, R, Z, X  
   ...: from pyqcs import State, list_to_circuit 
   ...:  
   ...: circuit_CX = list_to_circuit([CX(i, i-1) for i in range(1, 5)])            
   ...:  
   ...: state = (H(0) | circuit_CX) * State.new_zero_state(5)

In [2]: state
Out[2]: (0.7071067811865476+0j)*|0b0> + (0.7071067811865476+0j)*|0b11111>

\end{lstlisting}
%}

\subsection{Graphical State Simulation}

\subsubsection{Graphical States}

For the graphical state $(V, E, O)$ the list of vertices $V$ can be stored implicitly
by demanding $V = \{0, ..., n - 1\}$. This leaves two components that have to be stored:
The edges $E$ and the vertex operators $O$. Storing the vertex operators is done using
a \lstinline{uint8_t} array. Every local Clifford operator is associated from $0$ to $24$,
their order is

\begin{equation}
\begin{aligned}
 &\left(\begin{matrix}\frac{\sqrt{2}}{2} & \frac{\sqrt{2}}{2}\\\frac{\sqrt{2}}{2} & - \frac{\sqrt{2}}{2}\end{matrix}\right), 
   \left(\begin{matrix}1 & 0\\0 & i\end{matrix}\right), 
   \left(\begin{matrix}1 & 0\\0 & 1\end{matrix}\right), 
   \left(\begin{matrix}\frac{\sqrt{2}}{2} & \frac{\sqrt{2}}{2}\\\frac{\sqrt{2} i}{2} & - \frac{\sqrt{2} i}{2}\end{matrix}\right), \\
   &\left(\begin{matrix}\frac{\sqrt{2}}{2} & \frac{\sqrt{2} i}{2}\\\frac{\sqrt{2}}{2} & - \frac{\sqrt{2} i}{2}\end{matrix}\right), 
   \left(\begin{matrix}1 & 0\\0 & -1\end{matrix}\right), 
   \left(\begin{matrix}\frac{\sqrt{2}}{2} & \frac{\sqrt{2} i}{2}\\\frac{\sqrt{2} i}{2} & \frac{\sqrt{2}}{2}\end{matrix}\right), 
   \left(\begin{matrix}\frac{\sqrt{2}}{2} & - \frac{\sqrt{2}}{2}\\\frac{\sqrt{2}}{2} & \frac{\sqrt{2}}{2}\end{matrix}\right), \\
   &\left(\begin{matrix}1 & 0\\0 & - i\end{matrix}\right), 
   \left(\begin{matrix}\frac{\sqrt{2}}{2} & - \frac{\sqrt{2}}{2}\\\frac{\sqrt{2} i}{2} & \frac{\sqrt{2} i}{2}\end{matrix}\right), 
   \left(\begin{matrix}\frac{\sqrt{2}}{2} & - \frac{\sqrt{2} i}{2}\\\frac{\sqrt{2}}{2} & \frac{\sqrt{2} i}{2}\end{matrix}\right), 
   \left(\begin{matrix}\frac{\sqrt{2}}{2} & - \frac{\sqrt{2} i}{2}\\\frac{\sqrt{2} i}{2} & - \frac{\sqrt{2}}{2}\end{matrix}\right), \\
   &\left(\begin{matrix}\frac{1}{2} + \frac{i}{2} & \frac{1}{2} - \frac{i}{2}\\\frac{1}{2} - \frac{i}{2} & \frac{1}{2} + \frac{i}{2}\end{matrix}\right),
   \left(\begin{matrix}\frac{\sqrt{2}}{2} & \frac{\sqrt{2}}{2}\\- \frac{\sqrt{2}}{2} & \frac{\sqrt{2}}{2}\end{matrix}\right), 
   \left(\begin{matrix}0 & 1\\1 & 0\end{matrix}\right), 
   \left(\begin{matrix}\frac{\sqrt{2}}{2} & \frac{\sqrt{2}}{2}\\- \frac{\sqrt{2} i}{2} & \frac{\sqrt{2} i}{2}\end{matrix}\right), \\
   &\left(\begin{matrix}0 & 1\\i & 0\end{matrix}\right), 
   \left(\begin{matrix}\frac{1}{2} - \frac{i}{2} & \frac{1}{2} + \frac{i}{2}\\- \frac{1}{2} + \frac{i}{2} & \frac{1}{2} + \frac{i}{2}\end{matrix}\right), 
   \left(\begin{matrix}0 & i\\1 & 0\end{matrix}\right), 
   \left(\begin{matrix}\frac{\sqrt{2}}{2} & \frac{\sqrt{2} i}{2}\\- \frac{\sqrt{2} i}{2} & - \frac{\sqrt{2}}{2}\end{matrix}\right), \\
   &\left(\begin{matrix}\frac{1}{2} - \frac{i}{2} & - \frac{1}{2} + \frac{i}{2}\\- \frac{1}{2} + \frac{i}{2} & - \frac{1}{2} + \frac{i}{2}\end{matrix}\right), 
   \left(\begin{matrix}0 & -1\\1 & 0\end{matrix}\right), 
   \left(\begin{matrix}\frac{\sqrt{2}}{2} & - \frac{\sqrt{2}}{2}\\- \frac{\sqrt{2} i}{2} & - \frac{\sqrt{2} i}{2}\end{matrix}\right), 
   \left(\begin{matrix}\frac{1}{2} - \frac{i}{2} & \frac{i \left(-1 + i\right)}{2}\\- \frac{1}{2} + \frac{i}{2} & \frac{i \left(-1 + i\right)}{2}\end{matrix}\right)
\end{aligned}
\end{equation}

The edges are stored in an adjacency matrix

\begin{equation}
    A = (a_{i,j})_{i,j = 0, ..., n-1}
\end{equation}

\begin{equation}
\begin{aligned}
    a_{i,j} = \left\{ \begin{array}{c} 1 \mbox{, if } \{i,j\} \in E\\
                            0 \mbox{, if} \{i,j\} \notin E \end{array}\right.
                            .
\end{aligned}
\end{equation}

Recalling some operations on the graph as described in
\ref{ref:dynamics_graph}, \ref{ref:meas_graph} or Lemma \ref{lemma:M_a} one
sees that it is important to efficiently access and modify the neighbourhood of
a vertex.  To ensure good performance when accessing the neighbourhood while
keeping the memory requirements low a linked list-array hybrid is used to store
the adjacency matrix. For every vertex the neighbourhood is stored in a sorted
linked list (which is a sparse representation of a column vector) and these
adjacency lists are stored in a length $n$ array.

Using this storage method all operations including searching and toggling edges
are inherite their time complexity from the sorted linked list.

\subsubsection{Operations on Graphical States}

Operations on Graphical States are divided into three classes: Local Clifford
operations, the CZ operation and measurements. The graphical states are
implemented in \lstinline{C} and are exported to python3 in the class
\lstinline{RawGraphState}. This class has three main methods to implement the
three classes of operations. 

\begin{description} 
    \item[\lstinline{RawGraphState.apply_C_L}]{This method
        implements local clifford gates. It takes the qbit index and the index
        of the local Clifford operator (ranging form $0$ to $23$).}
    \item[\lstinline{RawGraphState.apply_CZ}]{Applies the $CZ$ gate to the
            state. The first argument is the act-qbit, the second the control
            qbit (note that this is just for consistency to the $CX$ gate).}
    \item[\lstinline{RawGraphState.measure}]{Using this method one can
        measure a qbit. It takes the qbit index as first argument and
        a floating point (double precision) random number as second
        argument. This random number is used to decide the measurement outcome
        in non-deterministic measurements. This method returns either $1$ or $0$ as
        a measurement result.}
\end{description}

Because this way of modifying the state is rather unconvenient and might lead to many
errors the \lstinline{RawGraphState} is wrapped by the pure python class
\lstinline{pyqcs.graph.state.GraphState}. It allows the use of circuits as described
in \ref{ref:pyqcs_circuits} and provides the method \lstinline{GraphState.to_naive_state}
to convert the graphical state to a dense vector state.

\subsubsection{Pure C Implementation}

Because python tends to be rather slow and might not run on any architecture
a pure \lstinline{C} implementation of the graphical simulator is also provided.
It should be seen as a reference implementation that can be extended to the needs
of the user.

This implementation reads byte code from a file and executes it. The execution is 
always done in three steps:

\begin{enumerate}[1]
    \item{Initializing the state according the the header of the bytecode file.}
    \item{Applying operations given by the bytecode to the state. This includes local
            Clifford gates, $CZ$ gates and measurements (the measurement outcome is ignored).}
    \item{Sampling the state according the the description given in the header of the byte code 
        file and writing the sampling results to either a file or \lstinline{stdout}. }
\end{enumerate}

\subsection{Utilities}

To make both using the simulators more convenient and to help with using them in 
as scientific or educational context several utilities have been written. This chapter
explains some of them.

\subsubsection{Sampling and Circuit Generation}

The function \lstinline{pyqcs.sample} provides a simple way to sample from a state.
Copies of the state are made when necessary and the results are returned 
in a \lstinline{collections.Counter} object. Several qbits can be sampled at once; they can be passed
to the function either as an integer which will be interpreted as a bit mask and the
least significant bit will be sampled first. When passing the qbits to sample as a list
of integers the integers are interpreted as qbit indices and are measured in the order
they appear.

If the keyword argument \lstinline{keep_states} is \lstinline{True} the sampling
function will include the resulting states in the result. At the moment this works
for dense vectors only. Checking for equality on graphical states has yet to be implemented
but can be done in polynomial time \cite{dahlberg_ea2019}.

Writing circuits out by hand can be rather painful. The function \lstinline{pyqcs.list_to_circuit}
Converts a list of circuits to a circuit. This is particularely helpful in combination
with python's \lstinline{listcomp}:

\begin{lstlisting}
circuit_H = list_to_circuit([H(i) for i in range(nqbits)])
\end{lstlisting}

The module \lstinline{pyqcs.util.random_circuits} provides the method described in \ref{ref:performance}
to generate random circuits for both graphical and dense vector simulation. Using the module 
\lstinline{pyqcs.util.random_graphs} one can generate random graphical states which is more performant
than using random circuits.

\subsubsection{Exporting Circuits, Graphical States}


\subsection{Performance}
\label{ref:performance}

To test the performance and compare it to the dense vector simulator the python
module is used. Although the pure \lstinline{C} implementation has potential
for better performance the python module is better comparable to the dense
vector simulator which is a python module as well.

For performance tests (and for tests against the dense vector simulator) random
circuits are used. Length $m$ circuits are generated from the probability space


\begin{equation}
    \Omega = \left(\{1, ..., 4n\} \otimes \{1, ..., n-1\} \otimes [0, 2\pi)\right)^{\otimes m}
\end{equation}

with the uniform distribution. The continous part $[0, 2\pi)$ is unused when
generating random circuits for the graphical simulator; when generating random
circuits for dense vector simulations this is the argument $\phi$ of the
$R_\phi$ gate.

For $m=1$ an outcome is mapped to a gate using

\begin{equation}
\begin{aligned}
    F(i, k, x) = \left\{\begin{array}{cc} X(i - 1)              & \mbox{, if } i \le n \\
                                        H(i - n - 1)            & \mbox{, if } i \le 2n\\
                                        S(i - 2n - 1)           & \mbox{, if } i \le  3n\\
                                        CZ(i - 3n - 1, k - 1)   & \mbox{, if } k \le i - 3n - 1 \\
                                        CZ(i - 3n - 1, k)       & \mbox{, if } k > i - 3n - 1\\
                                    \end{array}\right.
                                    .
\end{aligned}
\end{equation}

This method provides equal probability for $X, H, S$ and $CZ$ gate. For the
dense vector simulator $S$ can be replaced by $R_\phi$ with the parameter $x$. 

Using this method circuits are generated and applied both to graphical and
dense vector states and the time required to execute the operations
\cite{timeit} is measured. The resulting graph can be seen in
\ref{fig:scaling_qbits_linear} and \ref{fig:scaling_qbits_log}. Note in both cases the length of the circuits
have been scaled linearely with the amount of qbits and the measured time was
divided by the number of qbits:

\begin{equation}
\begin{aligned}
    L_{\mbox{circuit}} &= \alpha n \\
    T_{\mbox{rescaled}} &= \frac{T_{\mbox{execution}}(L_{\mbox{circuit}})}{n}\\
\end{aligned}
\end{equation}

\begin{figure}
    \centering
    \includegraphics[width=\linewidth]{../performance/scaling_qbits_linear.png}
    \caption[Runtime Behaviour for Scaling Qbit Number]{Runtime Behaviour for Scaling Qbit Number}
    \label{fig:scaling_qbits_linear}
\end{figure}

\begin{figure}
    \centering
    \includegraphics[width=\linewidth]{../performance/scaling_qbits_log.png}
    \caption[Runtime Behaviour for Scaling Qbit Number (Logarithmic Scale)]{Runtime Behaviour for Scaling Qbit Number (Logarithmic Scale)}
    \label{fig:scaling_qbits_log}
\end{figure}

The reason for this scaling will be clear later; one can observe that the
performance of the graphical simulator increases in some cases with growing
number of qbits when the circuit length is constant.

As described by \cite{andersbriegel2005} the graphical simulator is exponentially
faster than the dense vector simulator. According to \cite{andersbriegel2005} it
is considerably faster than a simulator using the straight forward approach simulating
the stabilizer tableaux like CHP \cite{CHP}.

One should be aware that the gate execution time (the time required to apply a gate
to the state) highly depends on the  state it is applied to. For the dense vector 
simulator and CHP this is not true: Gate execution time is constant for all gates
and states. Because the graphical simulator has to toggle neighbourhoods the
gate execution time of the $CZ$ gate varies greatly. The plot \ref{fig:scaling_circuits_linear}
shows the circuit execution time for two different numbers of qbits. One can observe three
regimes:

\begin{figure}
    \centering
    \includegraphics[width=\linewidth]{../performance/regimes/scaling_circuits_linear.png}
    \caption[Circuit Execution Time for Scaling Circuit Length]{Circuit Execution Time for Scaling Circuit Length}
    \label{fig:scaling_circuits_linear}
\end{figure}

\begin{description}
    \item[Low-Linear Regime] {Here the circuit execution time scales approximately linearely
        with the number of gates in the circuit (i.e. the $CZ$ gate execution time is approximately constant).
        }
    \item[Intermediate Regime]{The circuit execution time has a nonlinear dependence on the circuit length.}
    \item[High-Linear Regime]{This regime shows a linear dependence on the circuit length; the slope is
        higher than in the low-linear regime.}
\end{description}

\begin{figure}
    \centering
    \includegraphics[width=\linewidth]{../performance/regimes/graph_low_linear_regime.png}
    \caption[Typical Graphical State in the Low-Linear Regime]{Typical Graphical State in the Low-Linear Regime}
    \label{fig:graph_low_linear_regime}
\end{figure}

\begin{figure}
    \centering
    \includegraphics[width=\linewidth]{../performance/regimes/graph_high_linear_regime.png}
    \caption[Typical Graphical State in the High-Linear Regime]{Typical Graphical State in the High-Linear Regime}
    \label{fig:graph_high_linear_regime}
\end{figure}

These two regimes can be explained when considering the graphical states that typical live in these
regimes. With increased circuit length the amount of edges increases which makes toggling neighbourhoods
harder. Two graphs from the low-linear and high-linear regime can be seen in \ref{fig:graph_low_linear_regime}
and \ref{fig:graph_high_linear_regime}. The latter is hardly visible; this is due to the great amount of
edges in this regime. Further these two regimes are only visibly for $n>30$ qbits so choosing smaller graphs is
not possible. 

Because states with more qbits reach the intermediate regime at higher circuit lengths it is important to 
account for this virtual performance boost when comparing with other simulation methods. This explains
why the circuit length in \ref{fig:scaling_qbits_linear} had to be scaled with the qbit number.