Back to Projects

WikiCrawl - Demo on Web Automation w/ Selenium and Python3

Link to Source Code

This is a quick sample project showing the power of web automation. There are a few routes to proceed to attempt web automation, ranging from low-level (curl, Mechanize or similar) to higher-level (Selenium which typically runs a full browser instance). I recommend high-level to interact with modern, dynamic webpages which need to execute Javascript on the page. What I've shown is a pattern to deal with state management across different pages of the app. Each page has their own controls and variables and can pass to different pages as you click links. Typical use cases include logging in past authorization, navigating through menus, inputting and verifying form entry, or extracting data out of a website.

This particular project uses my app_skellington library to provide CLI-menu and configuration through ConfigObj. More developed options on the Python side include Typer, Click, or Baker.

I believe the nodejs implementation of Selenium is asynchronous. If performance is critical, implementing there may be a good option.

This is a minimally-developed app, just shown as a quick demo/starter for novices interested in web automation. For example, the datalayer is just a placeholder. Future state, I was interested to save the results in a local sqlite3 db but stopped working on it.

Installation

Typical python project. Either activate a virtualenv or install third-party dependencies into system-wide Python environment.

                    python ./setup.py install

                    .. or manually ..

                    pip install app_skellington
                    pip install configobj
                    pip install colorlog
                    pip install selenium==3.141.0
                

Usage

                    python ./road2philosophy.py -h
                    python ./road2philosophy.py open_browser
                    python ./road2philosophy.py play_single
                    python ./road2philosophy.py play_multiple