diff --git a/exam/ex01/README.rst b/exam/ex01/README.rst new file mode 100644 index 0000000..4968f9a --- /dev/null +++ b/exam/ex01/README.rst @@ -0,0 +1,46 @@ +Wikipedia Link Graph Analyzer +***************************** + +.. contents:: + +Configuration +============= + +Configuration is done in the file ``cfg.py``. There one can +specify whether the system should use a sqlite or a mysql +backend. Using the sqlite backend is faster for fetching the +data because sqlite omits implicit keys. However when one +wants to analyze the data using SQL instead of the pure +python implementation mysql is faster. + +It is recommended to use sqlite for fetching the data, then +transferring it to a mysql database and use this database +for analyzing. + +The main options in ``cfg.py`` are whether to use mysql or +sqlite and options for those systems. + +Invocation +========== + +Before invocating the program one should make sure that the +`configuration`_ is correct, in particular whether the cache +directory and cache name are set correctly for sqlite and +the mysql connection information is correct. + +Then one must edit the name of the article to analyze around +and the depth to receive the links. After this is done the +link graph can be received (using ``python3 main.py``). + +It might be necessary to run this part several times if the +program was unable to fetch all links. One can check for +unreceived data by executing ``SELECT COUNT(*) FROM +failed_to_fetch``. The result should be 0. + +Then the script uses Dijkstra's Algorithm in width-first +mode to analyze the graph. By default this is done +in-memory, it is however possible to do it with SQL. Using +SQL is recommended only, if the data exceeds the RAM, as it +is way slower. + +