scientific-programming-exer.../exam/ex01/README.rst

Wikipedia Link Graph Analyzer
*****************************

.. contents::

Configuration
=============

Configuration is done in the file ``cfg.py``. There one can
specify whether the system should use a sqlite or a mysql
backend. Using the sqlite backend is faster for fetching the
data because sqlite omits implicit keys. However when one
wants to analyze the data using SQL instead of the pure
python implementation mysql is faster.

It is recommended to use sqlite for fetching the data, then
transferring it to a mysql database and use this database
for analyzing. 

The main options in ``cfg.py`` are whether to use mysql or
sqlite and options for those systems.

Invocation
==========

Before invocating the program one should make sure that the
`configuration`_ is correct, in particular whether the cache
directory and cache name are set correctly for sqlite and
the mysql connection information is correct.

Then one must edit the name of the article to analyze around
and the depth to receive the links. After this is done the
link graph can be received (using ``python3 main.py``).

It might be necessary to run this part several times if the
program was unable to fetch all links. One can check for
unreceived data by executing ``SELECT COUNT(*) FROM
failed_to_fetch``. The result should be 0.

Then the script uses Dijkstra's Algorithm in width-first
mode to analyze the graph. By default this is done
in-memory, it is however possible to do it with SQL. Using
SQL is recommended only, if the data exceeds the RAM, as it
is way slower.
added README 2019-02-25 13:42:00 +00:00			`Wikipedia Link Graph Analyzer`
			`*****************************`

			`.. contents::`

			`Configuration`
			`=============`

			Configuration is done in the file ``cfg.py``. There one can
			`specify whether the system should use a sqlite or a mysql`
			`backend. Using the sqlite backend is faster for fetching the`
			`data because sqlite omits implicit keys. However when one`
			`wants to analyze the data using SQL instead of the pure`
			`python implementation mysql is faster.`

			`It is recommended to use sqlite for fetching the data, then`
			`transferring it to a mysql database and use this database`
			`for analyzing.`

			The main options in ``cfg.py`` are whether to use mysql or
			`sqlite and options for those systems.`

			`Invocation`
			`==========`

			`Before invocating the program one should make sure that the`
			`configuration`_ is correct, in particular whether the cache
			`directory and cache name are set correctly for sqlite and`
			`the mysql connection information is correct.`

			`Then one must edit the name of the article to analyze around`
			`and the depth to receive the links. After this is done the`
			link graph can be received (using ``python3 main.py``).

			`It might be necessary to run this part several times if the`
			`program was unable to fetch all links. One can check for`
			unreceived data by executing ``SELECT COUNT(*) FROM
			failed_to_fetch``. The result should be 0.

			`Then the script uses Dijkstra's Algorithm in width-first`
			`mode to analyze the graph. By default this is done`
			`in-memory, it is however possible to do it with SQL. Using`
			`SQL is recommended only, if the data exceeds the RAM, as it`
			`is way slower.`