bachelor_thesis/thesis/chapters/implementation.tex

% vim: ft=tex
\chapter{Implementation}

This chapter discusses how the concepts introduced before are implemented into
a simulator. Further the infrastructure around the simulation and some tools are
explained.

The implementation is written as the \lstinline{python3} package \lstinline{pyqcs} \cite{pyqcs}. This allows
users to quickly construct circuits, apply them to states and measure
amplitudes.  Full access to the states (including intermediate states) has been
prioritized over execution speed. To keep the simulation speed as high as
possible under these constraints some parts are implemented in \lstinline{C}.

This document is based on \lstinline{pyqcs} \lstinline{2.1.0} that
can be downloaded under \\
\href{https://github.com/daknuett/PyQCS/releases/tag/v2.1.0}{https://github.com/daknuett/PyQCS/releases/tag/v2.1.0}.

\section{Dense State Vector Simulation}

\subsection{Representation of Dense State Vectors}

Recalling \eqref{eq:ci} any $n$-qbit state can be represented as a $2^n$
component vector in the integer state basis. This representation has some
useful features when it comes to computations:

\begin{itemize}
    \item{The projection on the integer states is trivial.}
    \item{For any qbit $j$ and $0 \le i \le 2^n-1$ the coefficient $c_i$ is part of the $\ket{1}_j$ amplitude iff
        $i \& (1 << j)$ and part of the $\ket{0}_j$ amplitude otherwise.}
    \item{For a qbit $j$ the coefficients $c_i$ and $c_{i \wedge (1 << j)}$ are the conjugated coefficients.}
\end{itemize}

Where $\wedge$ is the binary XOR, $\&$ the binary AND and $<<$ the binary
leftshift operator.

While implementing the dense state vectors two key points were a simple way to
use them and easy access to the underlaying data.  To meet both requirements
the states are implemented as Python objects that provide abstract features such
as normalization checking, checking for sufficient qbit number when applying
a circuit, computing overlaps with other states, a stringify method and stored
measurement results.  To store the measurement results a NumPy \lstinline{int8}
array \cite{numpy_array} is used; this is called the classical state.  The
Python states also have a NumPy \lstinline{cdouble} array that stores the
quantum mechanical state.  Using NumPy arrays has the advantage that access to
the data is simple and safe while operations on the states can be implemented
in \lstinline{C} \cite{numpy_ufunc} providing a considerable speedup
(see \ref{ref:benchmark_ufunc_py}).

This quantum mechanical state is the component vector in integer basis
therefore it has $2^n$ components.  Storing those components is acceptable in
a range from $1$ to $30$ qbits; above this range the state requires space in
the order of magnitude $1 \mbox{ GiB}$  which is in the range of usual RAM sizes for
personal computers. For higher qbit numbers moving to high performance
computers and other simulators is necessary.

\subsection{Gates}

Gates on dense state vectors are implemented as NumPy Universal Functions
(ufuncs) \cite{numpy_ufunc} mapping a classical and a quantum state to a new
classical state, a new quantum state and a $64 \mbox{ bit}$ integer indicating
what qbits have been measured. Using ufuncs has the great advantage that
managing memory is done by NumPy and an application programmer just has to
implement the logic of the function. Because ufuncs are written in
\lstinline{C} they provide a considerable speedup compared to an implementation
in Python (see \ref{ref:benchmark_ufunc_py}).

The logic of gates is usually easy to implement using the integer basis. The
example below implements the Hadamard gate \ref{ref:singleqbitgates}:

\lstinputlisting[title={Implementation of the Hadamard Gate in C}, language=C, firstline=153, lastline=178, breaklines=true]{../pyqcs/src/pyqcs/gates/implementations/basic_gates.c}

A basic set of gates is implemented in PyQCS:

\begin{itemize}
    \item{Hadamard $H$ gate.}
    \item{Pauli $X$ or \textit{NOT} gate.}
    \item{Pauli $Z$ gate.}
    \item{Phase gate $S$.}
    \item{$Z$ rotation $R_\phi$ gate.}
    \item{Controlled $X$ gate: $CX$.}
    \item{Controlled $Z$ gate: $CZ$.}
    \item{The "measurement gate" $M$.}
\end{itemize}

To allow the implementation of possible hardware related gates the class
\lstinline{GenericGate} takes a unitary $2\times2$ matrix as a NumPy
\lstinline{cdouble} array and builds a gate from it.

\subsection{Circuits}
\label{ref:pyqcs_circuits}

As mentioned in \ref{ref:quantum_circuits} quantum circuits are central in
quantum programming.  In the implementation great care was taken to make
writing circuits as convenient and readable as possible. Users will almost
never access the actual gates that perform the operation on a state; instead
they will handle circuits.\\ Circuits can be applied to a state by multiplying
them from the left on a state object:

\begin{lstlisting}[language=Python]
new_state = circuit * state
\end{lstlisting}


The elementary gates are implemented as single gate
circuits and can be constructed using the built-in generators. These generators
take the act-qbit as first argument, parameters such as the control qbit or an
angle as second argument:

\begin{lstlisting}[language=Python, breaklines=true, caption={Using Single Gate Circuits}]
In [1]: from pyqcs import CX, CZ, H, R, Z, X
   ...: from pyqcs import State
   ...:
   ...: state = State.new_zero_state(2)
   ...: intermediate_state = H(0) * state
   ...:
   ...: bell_state = CX(1, 0) * intermediate_state

In [2]: bell_state
Out[2]: (0.7071067811865476+0j)*|0b0>
        + (0.7071067811865476+0j)*|0b11>
\end{lstlisting}

Large circuits can be built using the binary OR operator \lstinline{|} in
an analogy to the pipeline operator on many *NIX shells. As usual circuits are
read from left to right similar to pipelines on *NIX shells:


%\adjustbox{max width=\textwidth}{
\begin{lstlisting}[language=Python, breaklines=true, caption={Constructing Circuits Using \lstinline{|}}]
In [1]: from pyqcs import CX, CZ, H, R, Z, X
   ...: from pyqcs import State
   ...:
   ...: state = State.new_zero_state(2)
   ...:
   ...: # This is the same as
   ...: # circuit = H(0) | CX(1, 0)
   ...: circuit = H(0) | H(1) | CZ(1, 0) | H(1)
   ...:
   ...: bell_state = circuit * state

In [2]: bell_state
Out[2]: (0.7071067811865477+0j)*|0b0>
        + (0.7071067811865477+0j)*|0b11>
\end{lstlisting}
%}

A quick way to generate circuits programmatically is to use the \lstinline{list_to_circuit}
function:

%\adjustbox{max width=\textwidth}{
\begin{lstlisting}[language=Python, breaklines=true, caption={Constructing Circuits Using Python Lists}]
In [1]: from pyqcs import CX, CZ, H, R, Z, X
   ...: from pyqcs import State, list_to_circuit
   ...:
   ...: circuit_CX = list_to_circuit([CX(i, i-1) for i in range(1, 5)])
   ...:
   ...: state = (H(0) | circuit_CX) * State.new_zero_state(5)

In [2]: state
Out[2]: (0.7071067811865476+0j)*|0b0>
        + (0.7071067811865476+0j)*|0b11111>

\end{lstlisting}
%}

\section{Graphical State Simulation}

\subsection{Graphical States}
\label{ref:impl_g_states}

For the graphical state $(V, E, O)$ the list of vertices $V$ can be stored implicitly
by demanding $V = \{0, ..., n - 1\}$. This leaves two components that have to be stored:
The edges $E$ and the vertex operators $O$. Storing the vertex operators is
done using a \lstinline{uint8_t} array. Every local Clifford operator is
associated with an integer ranging from $0$ to $23$
\footnote{
    \cite{andersbriegel2005} also uses integers from $0$ to $23$ to represent
    the Clifford operators. The authors do not describe the mapping from integer
    to operator and the order might be different.
}
. Their order is

\begin{equation}
\begin{aligned}
 &\left(\begin{matrix}\frac{\sqrt{2}}{2} & \frac{\sqrt{2}}{2}\\\frac{\sqrt{2}}{2} & - \frac{\sqrt{2}}{2}\end{matrix}\right),
   \left(\begin{matrix}1 & 0\\0 & i\end{matrix}\right),
   \left(\begin{matrix}1 & 0\\0 & 1\end{matrix}\right),
   \left(\begin{matrix}\frac{\sqrt{2}}{2} & \frac{\sqrt{2}}{2}\\\frac{\sqrt{2} i}{2} & - \frac{\sqrt{2} i}{2}\end{matrix}\right), \\
   &\left(\begin{matrix}\frac{\sqrt{2}}{2} & \frac{\sqrt{2} i}{2}\\\frac{\sqrt{2}}{2} & - \frac{\sqrt{2} i}{2}\end{matrix}\right),
   \left(\begin{matrix}1 & 0\\0 & -1\end{matrix}\right),
   \left(\begin{matrix}\frac{\sqrt{2}}{2} & \frac{\sqrt{2} i}{2}\\\frac{\sqrt{2} i}{2} & \frac{\sqrt{2}}{2}\end{matrix}\right),
   \left(\begin{matrix}\frac{\sqrt{2}}{2} & - \frac{\sqrt{2}}{2}\\\frac{\sqrt{2}}{2} & \frac{\sqrt{2}}{2}\end{matrix}\right), \\
   &\left(\begin{matrix}1 & 0\\0 & - i\end{matrix}\right),
   \left(\begin{matrix}\frac{\sqrt{2}}{2} & - \frac{\sqrt{2}}{2}\\\frac{\sqrt{2} i}{2} & \frac{\sqrt{2} i}{2}\end{matrix}\right),
   \left(\begin{matrix}\frac{\sqrt{2}}{2} & - \frac{\sqrt{2} i}{2}\\\frac{\sqrt{2}}{2} & \frac{\sqrt{2} i}{2}\end{matrix}\right),
   \left(\begin{matrix}\frac{\sqrt{2}}{2} & - \frac{\sqrt{2} i}{2}\\\frac{\sqrt{2} i}{2} & - \frac{\sqrt{2}}{2}\end{matrix}\right), \\
   &\left(\begin{matrix}\frac{1}{2} + \frac{i}{2} & \frac{1}{2} - \frac{i}{2}\\\frac{1}{2} - \frac{i}{2} & \frac{1}{2} + \frac{i}{2}\end{matrix}\right),
   \left(\begin{matrix}\frac{\sqrt{2}}{2} & \frac{\sqrt{2}}{2}\\- \frac{\sqrt{2}}{2} & \frac{\sqrt{2}}{2}\end{matrix}\right),
   \left(\begin{matrix}0 & 1\\1 & 0\end{matrix}\right),
   \left(\begin{matrix}\frac{\sqrt{2}}{2} & \frac{\sqrt{2}}{2}\\- \frac{\sqrt{2} i}{2} & \frac{\sqrt{2} i}{2}\end{matrix}\right), \\
   &\left(\begin{matrix}0 & 1\\i & 0\end{matrix}\right),
   \left(\begin{matrix}\frac{1}{2} - \frac{i}{2} & \frac{1}{2} + \frac{i}{2}\\- \frac{1}{2} + \frac{i}{2} & \frac{1}{2} + \frac{i}{2}\end{matrix}\right),
   \left(\begin{matrix}0 & i\\1 & 0\end{matrix}\right),
   \left(\begin{matrix}\frac{\sqrt{2}}{2} & \frac{\sqrt{2} i}{2}\\- \frac{\sqrt{2} i}{2} & - \frac{\sqrt{2}}{2}\end{matrix}\right), \\
   &\left(\begin{matrix}\frac{1}{2} - \frac{i}{2} & - \frac{1}{2} + \frac{i}{2}\\- \frac{1}{2} + \frac{i}{2} & - \frac{1}{2} + \frac{i}{2}\end{matrix}\right),
   \left(\begin{matrix}0 & -1\\1 & 0\end{matrix}\right),
   \left(\begin{matrix}\frac{\sqrt{2}}{2} & - \frac{\sqrt{2}}{2}\\- \frac{\sqrt{2} i}{2} & - \frac{\sqrt{2} i}{2}\end{matrix}\right),
   \left(\begin{matrix}\frac{1}{2} - \frac{i}{2} & \frac{i \left(-1 + i\right)}{2}\\- \frac{1}{2} + \frac{i}{2} & \frac{i \left(-1 + i\right)}{2}\end{matrix}\right).
\end{aligned}
\end{equation}

The edges are stored in an adjacency matrix

\begin{equation}
    A = (a_{i,j})_{i,j = 0, ..., n-1}
\end{equation}

\begin{equation}
\begin{aligned}
    a_{i,j} = \left\{ \begin{array}{c} 1 \mbox{, if } \{i,j\} \in E\\
                            0 \mbox{, if } \{i,j\} \notin E \end{array}\right.
                            .
\end{aligned}
\end{equation}

Recalling some operations on the graph as described in
\ref{ref:dynamics_graph}, \ref{ref:meas_graph} or Lemma \ref{lemma:M_a} one
sees that it is important to efficiently access and modify the neighbourhood of
a vertex.  Also one must take care to keep the memory required to store a state
low.  To meet both requirements a linked list-array hybrid is used.  For every
vertex the neighbourhood is stored in a sorted linked list (which is a sparse
representation of a column vector) and these adjacency lists are stored in
a length $n$ array. \cite{andersbriegel2005} uses the same representation for
the graph.

Using this storage method all operations - including searching and toggling edges -
inherit their time complexity from the sorted linked list.

\subsection{Operations on Graphical States}

Operations on Graphical States are divided into three classes: Local Clifford
operations, the CZ operation and measurements. The graphical states are
implemented in \lstinline{C} and are exported to \lstinline{python3} in the class
\lstinline{RawGraphState}. This class has three main methods to implement the
three classes of operations.

\begin{description}
    \item[\hspace{-1em}]{\lstinline{RawGraphState.apply_C_L}\\
        This method implements local Clifford gates. It takes the qbit index
        and the index of the local Clifford operator (ranging from $0$ to $23$).}
    \item[\hspace{-1em}]{\lstinline{RawGraphState.apply_CZ}\\
            Applies the $CZ$ gate to the state. The first argument is the
            act-qbit, the second the control qbit (note that this is just for
            consistency to the $CX$ gate).}
    \item[\hspace{-1em}]{\lstinline{RawGraphState.measure}\\
            Using this method one can measure a qbit. It takes the qbit index
            as first argument and a floating point (double precision) random
            number as second argument. This random number is used to decide the
            measurement outcome in non-deterministic measurements. This method
            returns either $1$ or $0$ as a measurement result.}
\end{description}

Because this way of modifying the state is inconvenient and might lead to many
errors the \lstinline{RawGraphState} is wrapped by the pure python class\\
\lstinline{pyqcs.graph.state.GraphState}. It allows the use of circuits as
described in \ref{ref:pyqcs_circuits} and provides the method
\lstinline{GraphState.to_naive_state} to convert the graphical state to a dense
vector state.

\subsection{Pure C Implementation}

Because python tends to be slow \cite{benchmarkgame} and might not run on any architecture
a pure \lstinline{C} implementation of the graphical simulator is also provided.
It should be seen as a reference implementation that can be extended to the needs
of the user.

This implementation reads byte code from a file and executes it. The execution is
always done in three steps:

\begin{enumerate}[1.]
    \item{Initializing the state according the header of the byte code file.}
    \item{Applying operations given by the byte code to the state. This includes local
            Clifford gates, $CZ$ gates and measurements (the measurement outcome is ignored).}
    \item{Sampling the state according the description in the header of the byte code
        file and writing the sampling results to either a file or \lstinline{stdout}. }
\end{enumerate}

\section{Utilities}

The package \lstinline{pyqcs} ships with several utilities that are supposed to
make using the simulators more convenient. Some utilities are designed to help
in a scientific and educational context.  This chapter explains some of them.

\subsection{Sampling and Circuit Generation}

The function \lstinline{pyqcs.sample} provides a simple way to sample from
a state.  Copies of the state are made when necessary, and the results are
returned in a \lstinline{collections.Counter} object. Several qbits can be
sampled at once; they can be passed to the function either as an integer which
will be interpreted as a bit mask, and the least significant bit will be sampled
first. When passing the qbits to sample as a list of integers the integers are
interpreted as qbit indices and are measured in the order they appear.

If the keyword argument \lstinline{keep_states} is \lstinline{True} the
sampling function will include the collapsed states in the result. At the
moment this works for dense vectors only. Checking for equality on graphical
states is not implemented due to the computational hardness.
\cite{bouchet1991} introduced an algorithm to test whether two VOP-free graph states
are equivalent with complexity $\mathcal{O}\left(n^4\right)$.
So checking for equivalency is $c-NP$ complete because the vertex operators have
to be checked as well. It might be necessary to compute all equivalent graphical
states. \cite{dahlberg_ea2019} showed that counting equivalent VOP-free
graphical states is $\#P$ complete. Showing that two graphical states are
equivalent might considerably hard.

Writing circuits out by hand can be inconvenient. The function\\
\lstinline{pyqcs.list_to_circuit} converts a list of circuits to a circuit.
This is particularly helpful in combination with python's
\lstinline{listcomp}:

\begin{lstlisting}[caption={Generating a Large Circuit Efficiently}]
circuit_H = list_to_circuit([H(i) for i in range(nqbits)])
\end{lstlisting}

The module \lstinline{pyqcs.util.random_circuits} provides the method described
in \ref{ref:performance} to generate random circuits for both graphical and
dense vector simulation. Using the module \lstinline{pyqcs.util.random_graphs}
one can generate random graphical states which is faster than using random
circuits.

The function \lstinline{pyqcs.util.to_circuit.graph_state_to_circuit} converts
graphical states to circuits (mapping the $\ket{0b0..0}$ to this state).
By utilization of these circuits the graphical state can be copied or converted to a
dense vector state. Further it is a way to optimize circuits and later run them on
other simulators. Also the circuits can be exported to \lstinline{qcircuit} code
(see below) which is a way to represent graphical states.

\subsection{Exporting and Flattening Circuits}

Circuits can be drawn with the \LaTeX  package \lstinline{qcircuit}; all
circuits in this documents use \lstinline{qcircuit}. To visualize
\lstinline{pyqcs} circuits the function\\
\lstinline{pyqcs.util.to_diagram.circuit_to_diagram} can be used to generate
\lstinline{qcircuit} code. The diagrams produced by this function are not
optimized, and the diagrams can be unnecessary long.  Usually this can be fixed
easily by editing the resulting code manually.

The circuits constructed using the \lstinline{|} operator have a tree structure
which is rather inconvenient when optimizing circuits or exporting them.
The function \\
\lstinline{pyqcs.util.flatten.flatten} converts a circuit
to a list of single gate circuits that can be simply analyzed or exported.


\section{Performance}
\label{ref:performance}

Testing the performance and comparing the graphical with the dense vector
simulator is done with the python module. Although the pure \lstinline{C}
implementation has potential for better performance the python module is better
comparable to the dense vector simulator which is a python module as well.

For performance tests (and for tests against the dense vector simulator) random
circuits are used. Length $m$ circuits are generated from the probability space


\begin{equation}
    \Omega = \left(\{1, ..., 4n\} \otimes \{1, ..., n-1\} \otimes [0, 2\pi)\right)^{\otimes m}
\end{equation}
with the uniform distribution. The continuous part $[0, 2\pi)$ is ignored when
generating random circuits for the graphical simulator; in order to generate
random circuits for dense vector simulations this is used as the argument $\phi$ of the
$R_\phi$ gate.  For $m=1$ an outcome is mapped to a gate with

\begin{equation}
\begin{aligned}
    F(i, k, x) = \left\{\begin{array}{cc} X(i - 1)              & \mbox{, if } i \le n \\
                                        H(i - n - 1)            & \mbox{, if } i \le 2n\\
                                        S(i - 2n - 1)           & \mbox{, if } i \le  3n\\
                                        CZ(i - 3n - 1, k - 1)   & \mbox{, if } k \le i - 3n - 1 \\
                                        CZ(i - 3n - 1, k)       & \mbox{, if } k > i - 3n - 1\\
                                    \end{array}\right.
                                    .
\end{aligned}
\end{equation}

This method provides equal probability for $X, H, S$ and $CZ$ gate. For the
dense vector simulator $S$ can be replaced by $R_\phi$ with the parameter $x$.

Using this method circuits are generated and applied both to graphical and
dense vector states and the time required to execute the operations
\cite{timeit} is measured\footnote{
    The used computer had an \lstinline{Intel(R) Core(TM) i5-4590 CPU @ 3.30GHz} processor,
    \lstinline{7.7 GiB} RAM, and \lstinline{27 GiB} SSD swap space (which was unused).
    The operating system was \lstinline{Linux 4.19.0-8-amd64 #1 SMP Debian 4.19.98-1 (2020-01-26) x86_64 GNU/Linux}.

}. The resulting graph can be seen in
Figure \ref{fig:scaling_qbits_linear} and Figure \ref{fig:scaling_qbits_log}.
Note that in both cases the length of the circuits have been scaled linearly
with the amount of qbits and the measured time was divided by the number of
qbits:

\begin{equation}
\begin{aligned}
    L_{\mbox{circuit}} &= \alpha n \\
    T_{\mbox{rescaled}} &= \frac{T_{\mbox{execution}}(L_{\mbox{circuit}})}{n}\\
\end{aligned}
\end{equation}

\begin{figure}
    \centering
    \includegraphics[width=\linewidth]{../performance/scaling_qbits_linear.png}
    \caption[Runtime Behaviour for Scaling Qbit Number]{Runtime Behaviour for Scaling Qbit Number}
    \label{fig:scaling_qbits_linear}
\end{figure}

\begin{figure}
    \centering
    \includegraphics[width=\linewidth]{../performance/scaling_qbits_log.png}
    \caption[Runtime Behaviour for Scaling Qbit Number (Logarithmic Scale)]{Runtime Behaviour for Scaling Qbit Number (Logarithmic Scale)}
    \label{fig:scaling_qbits_log}
\end{figure}

The reason for this scaling will be clear later. From simulation data one can
observe that the performance of the graphical simulator increases in some cases
with growing number of qbits when the circuit length is constant. The code used
to generate the data for these plots can be found in \ref{ref:code_benchmarks}.

As described by \cite{andersbriegel2005} the graphical simulator is exponentially
faster than the dense vector simulator. According to \cite{andersbriegel2005} it
is considerably faster than a simulator using the straight forward approach simulating
the stabilizer tableaux like CHP \cite{CHP} with an average runtime behaviour
of $\mathcal{O}\left(n\log(n)\right)$ instead of $\mathcal{O}\left(n^2\right)$.

One should be aware that the gate execution time (the time required to apply
a gate to the state) highly depends on the  state it is applied to \cite{andersbriegel2005}. For the
dense vector simulator and CHP this is not true: Gate execution time is
constant for all gates and states. Because the graphical simulator has to
toggle neighbourhoods the gate execution time of the $CZ$ and $M$ gates vary
significantly \cite{andersbriegel2005}. The plot Figure \ref{fig:scaling_circuits_linear}
shows the circuit execution time for two different numbers of qbits. One can observe three
regimes:

\begin{figure}
    \centering
    \includegraphics[width=\linewidth]{../performance/regimes/scaling_circuits_linear.png}
    \caption[Circuit Execution Time for Scaling Circuit Length]{Circuit Execution Time for Scaling Circuit Length}
    \label{fig:scaling_circuits_linear}
\end{figure}

\begin{description}
    \item[Low-Linear Regime] {\textit{(Ca. $0-10N_q$ gates)} Here the circuit execution time scales approximately linear
        with the number of gates in the circuit (i.e. the $CZ$ gate execution time is approximately constant).
        }
    \item[Intermediate Regime]{\textit{(Ca. $10N_q-20N_q$ gates)} The circuit execution time has a nonlinear dependence on the circuit length.}
    \item[High-Linear Regime]{\textit{(Above ca. $20N_q$ gates)} This regime shows a linear dependence on the circuit length; the slope is
        higher than in the low-linear regime.}
\end{description}

\begin{figure}
    \centering
    \includegraphics[width=\linewidth]{../performance/regimes/scaling_circuits_measurements_linear.png}
    \caption[Circuit Execution Time for Scaling Circuit Length with Random Measurements]{Circuit Execution Time for Scaling Circuit Length with Random Measurements}
    \label{fig:scaling_circuits_measurements_linear}
\end{figure}


These three regimes can be explained when considering the graphical states that
typical live in these regimes. With increased circuit length the amount of
edges increases which makes toggling neighbourhoods harder \cite{andersbriegel2005}. Graphs from the
low-linear, intermediate and high-linear regime can be seen in Figure
\ref{fig:graph_low_linear_regime}, Figure \ref{fig:graph_intermediate_regime}
and Figure \ref{fig:graph_high_linear_regime}. Due to the great amount of edges
in the intermediate and high-linear regime the pictures show a window of the
actual graph.  The full images are in \ref{ref:complete_graphs}.  Further the
regimes are not clearly visible for $n>30$ qbits therefore choosing smaller graphs is
not possible. The code that was used to generate these images can be found
in \ref{ref:code_example_graphs}.

\begin{figure}
    \centering
    \includegraphics[width=\linewidth]{graphics/graph_low_linear_regime.png}
    \caption[Typical Graphical State in the Low-Linear Regime]{Typical Graphical State in the Low-Linear Regime}
    \label{fig:graph_low_linear_regime}
\end{figure}

\begin{figure}
    \centering
    \includegraphics[width=\linewidth]{graphics/graph_intermediate_regime_cut.png}
    \caption[Window of a Typical Graphical State in the Intermediate Regime]{Window of a Typical Graphical State in the Intermediate Regime}
    \label{fig:graph_intermediate_regime}
\end{figure}

\begin{figure}
    \centering
    \includegraphics[width=\linewidth]{graphics/graph_high_linear_regime_cut.png}
    \caption[Window of a Typical Graphical State in the High-Linear Regime]{Window of a Typical Graphical State in the High-Linear Regime}
    \label{fig:graph_high_linear_regime}
\end{figure}

The Figure \ref{fig:scaling_circuits_measurements_linear} brings more substance
to this interpretation. In this simulation the Pauli $X$ gate has been replaced
by the measurement gate $M$, .i.e. in every gate drawn from the probability
space a qbit is measured with probability $\frac{1}{4}$. Pauli measurements
decrease the entanglement (and the amount of edges) in a state
\cite{hein_eisert_briegel2008}\cite{li_chen_fisher2019}.  The frequent
measurements in the simulation therefore keeps the amount of edges low thus
preventing a transition from the low linear regime to the intermediate regime.


Because states with more qbits reach the intermediate regime at higher circuit
lengths it is important to compensate this virtual performance boost when
comparing with other simulation methods. This explains why the circuit length
in Figure \ref{fig:scaling_qbits_linear} had to be scaled with the qbit number.

\section{Future Work}

Although the simulator(s) are in a working state and have been tested there is
still some work that can be done.  A noise model helping to teach and analyze
noisy execution would be an interesting improvement to implement. In order to
allow a user to execute circuits on other machines, including both real
hardware and simulators, a module that exports circuits to OpenQASM
\cite{openqasm} seems useful.

The current implementation of some graphical operations can be optimized. While
clearing VOPs as described in \ref{ref:dynamics_graph} the neighbourhood of
a vertex is toggled for every $L_a$ transformation. This is the most straight
forward implementation, but often the $L_a$ transformation is performed several
times on the same vertex. The neighbourhood would have to be toggled either
once or not at all depending on whether the number of $L_a$ transformations is
odd or even.

When toggling an edge the simulator uses a series of well tested basic linked
list operations: Searching an element in the list, inserting an element into
the list and deleting an element from the list. This is known to have no bugs
but the performance could be increased by operating directly on the linked
list. Some initial work to improve this behaviour is already done but does not
work at the moment.