What Is It About,
Wikipedia Exploration Script - Personal Project

WIIA is a relatively simple recursive Python script that explores Wikipedia: you enter a word and a depth level n, the program finds the corresponding Wikipedia page for the word and then explores the linked pages on that page, then the linked pages on those... and so on n times. In the end, the program returns a list of words with, for each, a matching rate based on the redundancy of that word in the exploration process. This simulates a process of associative thinking where experience is replaced by the structure of the Wikipedia site: For the word "Coca Cola," for example, there will be a significant match with "soda," an average match with "United States," a lower match with "Santa Claus," and so on.
After a few tests, I noticed that some words were abnormally frequent. This was the case with "ISBN," which is a book identification system and appears systematically in the reference section of Wikipedia pages. So, I developed a second script, based on the first one, which found this type of statistical aberration and blacklisted them. An amusing result, on a search for which I had chosen too high an elimination factor, the program had blacklisted "United States."
This project was interesting, but the major flaw of my script was its very long execution time. I plan to revisit it to improve this aspect.

Technologies Used

Python 3.7
Wikipedia, numpy, collections libraries

Transversal Skills

Writing a project follow-up in English on Github

Documents and Associated Links

Project GitHub Repository

Return to the main page