Python - Verifying Indirect
Imports At Runtime

Python - Verifying Indirect<br> Imports At Runtime

Unlike explicit imports that throw an error at application startup, indirect imports in Python throw them when they are actually used. Excessively permissive except clauses can hide them and result in unexpected behavior. Here’s how to verify them at runtime.

Introduction

While developing proxy_checker, a simple Python script to verify if a list of proxy servers are up and running or not, I ran into an interesting scenario. Out of the box, the extremely popular Requests library supports proxying your requests using HTTP and HTTPS proxies. It also has support for proxying requests via SOCKS proxies but this requires that you install the additional PySocks module. This can be done either by:

  • installing the requests[socks] module instead of the vanilla requests module
  • installing the PySocks module

The strange [socks] part in the first module is what is referred to as an optional package. setuptools allows you to specify variants of your installation package so that only modules that are needed by a given user are installed instead of it polluting your runtime environment by installing unnecessary modules. These are defined in a project’s setup.py. The request module’s setup.py shows the following:

extras_require={
    'security': [],
    'socks': ['PySocks>=1.5.6, !=1.5.7'],
    'socks:sys_platform == "win32" and python_version == "2.7"': ['win_inet_pton'],
    'use_chardet_on_py3': ['chardet>=3.0.2,<5']
},

This means that if I run:

$ pip install requests[socks]

it would also install the additional PySocks module (or win_inet_python if I am on Windows and I’m using Python 2.7) and allows me to use SOCKS proxies. Normal users who don’t need this functionality would install the requests library normally:

$ pip install requests

The Problem

Although I added the following to requirements.txt to make sure that users do in fact install the PySocks module:

requests[socks]==2.26.0

I ran into an issue where I had the requests module installed in a virtual environment and forgot to pip install the requirements.txt file to install the additional PySocks module. My script ran but something was up as it wouldn’t detect any SOCKS proxies and if I provided it with a list of SOCKS proxies, it would exit immediately without throwing an error. The corresponding code looks like this:

def check_proxy(proxy, real_ip):
    """
    checks a proxy by sending to URL and checking that our real IP is not
    included in the response. Only then is it returned as a valid proxy
    :param proxy: the proxy we want to test. This is in the format:
    socks5://127.0.0.1
    :param real_ip: our real IP addres
    """
    proxies = {"http": proxy, "https": proxy}
    try:
        resp = requests.get(URL, proxies=proxies, timeout=TIMEOUT)
        if resp.status_code == 200:
            if real_ip not in resp.text:
                print(proxy)
                live.append(proxy)
    # except requests.exceptions.ConnectTimeout:
    except:
        pass

The check_proxy function takes two parameters:

  • a proxy in the format https://100.53.1.63
  • your real IP

and sends a request to the httpbin service. The httpbin service returns something like this:

{
  "args": {},
  "headers": {
    "Accept": "*/*",
    "Host": "httpbin.org",
    "User-Agent": "curl/7.68.0",
    "X-Amzn-Trace-Id": "Root=1-61936cc1-4cdd453405d07d1913430dc4"
  },
  "origin": "183.38.151.14",
  "url": "http://httpbin.org/get"
}

If the response is a 200, it checks to see if your real IP was leaked or not and only appends the proxy to the live list (a global variable) if the proxy works and if it doesn’t leak our IP. Although it looks ugly and may not be the best option, the reason for using global list is that my script by default runs 25 concurrent threads to speed things up and appending to a list is a thread-safe operation.

The culprit for the issue of not finding any SOCKS proxies was the following:

    # except requests.exceptions.ConnectTimeout:
    except:
        pass

and the reasoning behind it was that I didn’t really care about any errors as I would simply skip the problematic proxy and move on but this also meant that any user who did the same mistake would end up skipping valid proxies, including all SOCKS proxies. Using an except: pass isn’t great design but I thought it was acceptable as missing one or two proxies wasn’t a big deal.

Solutions

Unlike most required modules which immediately throw an error at the import phase when you run your script, PySocks is an optional dependency that is only imported and used if you use a SOCKS proxy in a request. My script starts normally with no warning if it doesn’t find the module in the environment. If I didn’t use the except: pass, my script would throw the following error:

requests.exceptions.InvalidSchema

I’d then updated my script to handle it and it now looked like this:

def check_proxy(proxy, real_ip):
    """
    checks a proxy by sending to URL and checking that our real IP is not
    included in the response. Only then is it returned as a valid proxy
    :param proxy: the proxy we want to test. This is in the format:
    socks5://127.0.0.1
    :param real_ip: our real IP addres
    """
    proxies = {"http": proxy, "https": proxy}
    try:
        resp = requests.get(URL, proxies=proxies, timeout=TIMEOUT)
        if resp.status_code == 200:
            if real_ip not in resp.text:
                print(proxy)
                live.append(proxy)
    # except requests.exceptions.ConnectTimeout:
    except requests.exceptions.InvalidSchema:
        sys.exit("Exiting... Required PySocks module not found")
    except:
        pass

But this got me thinking. The above works but has the following issues:

  • I would have to add this everywhere in my code where I used a SOCKS proxy
  • what about other similar situations? I’d have to update each and every function and method that makes use of indirect dependencies to catch them

So I started looking for other options until I ran into a nice solution mentioned here. By using the pkg_resources module, I can consolidate my indirect requirements once without having to update my code everywhere. Here is an example of it at work:

import pkg_resources
import sys

required_modules = ["PySocks"]


def check_deps():
    """
    checks required dependencies are installed
    """
    missing = False
    for package in required_modules:
        try:
            dist = pkg_resources.get_distribution(package)
        except pkg_resources.DistributionNotFound:
            missing = True

    if missing:
        sys.exit("Missing requirements. Please run: `pip install -r requirements.txt` first")


if __name__ == "__main__":
    check_deps()
    ...

Although more verbose, I prefer this approach over finding and updating all functions and methods that might possibly make use of a indirect dependency. For a one-off script like proxy_checker, the first approach is perfectly fine as I only make the call in one function, but for anything serious, I definitely feel the second approach is preferable. Anyway, until next time.


© 2022. All rights reserved.