added README

2019-02-25 14:42:00 +01:00
parent a7539c1cad
commit bbbfd9eb57
1 changed files with 46 additions and 0 deletions
--- a/exam/ex01/README.rst
+++ b/exam/ex01/README.rst
@@ -0,0 +1,46 @@
 Wikipedia Link Graph Analyzer
 *****************************
 .. contents::
 Configuration
 =============
 Configuration is done in the file ``cfg.py``. There one can
 specify whether the system should use a sqlite or a mysql
 backend. Using the sqlite backend is faster for fetching the
 data because sqlite omits implicit keys. However when one
 wants to analyze the data using SQL instead of the pure
 python implementation mysql is faster.
 It is recommended to use sqlite for fetching the data, then
 transferring it to a mysql database and use this database
 for analyzing. 
 The main options in ``cfg.py`` are whether to use mysql or
 sqlite and options for those systems.
 Invocation
 ==========
 Before invocating the program one should make sure that the
 `configuration`_ is correct, in particular whether the cache
 directory and cache name are set correctly for sqlite and
 the mysql connection information is correct.
 Then one must edit the name of the article to analyze around
 and the depth to receive the links. After this is done the
 link graph can be received (using ``python3 main.py``).
 It might be necessary to run this part several times if the
 program was unable to fetch all links. One can check for
 unreceived data by executing ``SELECT COUNT(*) FROM
 failed_to_fetch``. The result should be 0.
 Then the script uses Dijkstra's Algorithm in width-first
 mode to analyze the graph. By default this is done
 in-memory, it is however possible to do it with SQL. Using
 SQL is recommended only, if the data exceeds the RAM, as it
 is way slower.