added README
This commit is contained in:
parent
a7539c1cad
commit
bbbfd9eb57
46
exam/ex01/README.rst
Normal file
46
exam/ex01/README.rst
Normal file
|
@ -0,0 +1,46 @@
|
||||||
|
Wikipedia Link Graph Analyzer
|
||||||
|
*****************************
|
||||||
|
|
||||||
|
.. contents::
|
||||||
|
|
||||||
|
Configuration
|
||||||
|
=============
|
||||||
|
|
||||||
|
Configuration is done in the file ``cfg.py``. There one can
|
||||||
|
specify whether the system should use a sqlite or a mysql
|
||||||
|
backend. Using the sqlite backend is faster for fetching the
|
||||||
|
data because sqlite omits implicit keys. However when one
|
||||||
|
wants to analyze the data using SQL instead of the pure
|
||||||
|
python implementation mysql is faster.
|
||||||
|
|
||||||
|
It is recommended to use sqlite for fetching the data, then
|
||||||
|
transferring it to a mysql database and use this database
|
||||||
|
for analyzing.
|
||||||
|
|
||||||
|
The main options in ``cfg.py`` are whether to use mysql or
|
||||||
|
sqlite and options for those systems.
|
||||||
|
|
||||||
|
Invocation
|
||||||
|
==========
|
||||||
|
|
||||||
|
Before invocating the program one should make sure that the
|
||||||
|
`configuration`_ is correct, in particular whether the cache
|
||||||
|
directory and cache name are set correctly for sqlite and
|
||||||
|
the mysql connection information is correct.
|
||||||
|
|
||||||
|
Then one must edit the name of the article to analyze around
|
||||||
|
and the depth to receive the links. After this is done the
|
||||||
|
link graph can be received (using ``python3 main.py``).
|
||||||
|
|
||||||
|
It might be necessary to run this part several times if the
|
||||||
|
program was unable to fetch all links. One can check for
|
||||||
|
unreceived data by executing ``SELECT COUNT(*) FROM
|
||||||
|
failed_to_fetch``. The result should be 0.
|
||||||
|
|
||||||
|
Then the script uses Dijkstra's Algorithm in width-first
|
||||||
|
mode to analyze the graph. By default this is done
|
||||||
|
in-memory, it is however possible to do it with SQL. Using
|
||||||
|
SQL is recommended only, if the data exceeds the RAM, as it
|
||||||
|
is way slower.
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue
Block a user