Pytosquatting

Work in progress: Fixing typosquatting+namesquatting threats in Python Package Index (PyPI).

Epilogue

We have closed the Pytosquatting initiative for now. This is because Python Security Response Team (PSRT) has announced that they will take action (see below Timeline).

Timeline

In June 2016, Typosquatting programming language package managers stated that urllib2 had ~4,000 downloads in 2 weeks. But in June 2017, we found the same package name vacant and so we (being the good guys) squatted it for several months up until this disclosure. We take these findings seriously.

20170519: Steve Stagg writes about how he registered stdlib names, sent emails and that »I raised an issue on the official pypi github issue tracker in January. This also got no reply.«

20170628: PyPI Warehouse issue #2151 is opened. Title is "Block package names that conflict with core libraries", but no names were blocked.

20170913: We squatted all available names of stdlib packages (128) - scroll down to see statistics from pingbacks.

20170914: A number of in-the-wild malicious packages on PyPI were disclosed by Slovak National Security Authority.

20170917: PyPI's main developer Donald Stufft creates PR#2396 for database-backed blacklisting of package names. It's unclear how they want to apply the blacklistings, but it would mean a more efficient process for administrators. Most of the stdlib names that we squatted are black listed.

20170922: Python Security Response Team (PSRT) takes action by announcing a detailed plan to mitigate future attacks. The plan is included in an over-all boost of PyPy, receiving a $170k grant from Mozilla Foundation.

Mitigation

Here's a couple of proprosals that we originally posted -- which have since then been expanded in a nice way in PSRT's security announcement.

  • Strategy #1: We are namesquatting a bunch of stuff on PyPI (all available Python 2 and Python 3 standard libraries). So no matter if you use the security hardened Pip installer, we have managed to mitigate the bulk of the immediate problem.
  • Strategy #2: Use a Pip installer that does safety lookups and fails loudly if the attempted package name does not validate. This should be implemented in your automated deployments and test builds!

Aftermath

We had a pingback in the setup.py of packages involved in Strategy #1, meaning that during a limited duration, we gathered statistics on the extend of the issue. The callback didn't involve any stats from user systems, just an IP so we can count that a unique system has attempted to install a non-existing package that could have been exploited.

We are calling for analysis of the current PyPI resources to find in-the-wild exploits of typosquatting as Slovak National Security Authority has done. We hope there are none, but the problem has been around for a long time, and our primer didn't get reactions from the PyPI admins.

Mockup of Strategy #2

Once done, we hope to achieve a better pip installer that:

  • Verifies that you don't install a package with the name of a stdlib
  • Asks a webservice or local database if you are installing a typo of a popular package

It could look like this...

pip install pipsec  # Install security-hardening plugin for pip
pip install virtualenv-wrapper  # See that it fails
pip install virtualenvwrapper  # This is correct
          

It seems to be hinted by the closure of pip#4527 that attempts to add security to the client side isn't popular. Arguments are weak, though, so there's no real reason not to do something like the above.

Media

Ars Technica: Devs unknowingly use “malicious” modules snuck into official Python repository

Golem.de: Bösartige Python-Pakete entdeckt (DE)

Hacker News: Malicious software libraries found in PyPI posing as well known libraries

Ack

Send comments or complaints to Benjamin Bach and Hanno Böck.

Check out the code for this website on https://github.com/benjaoming/pytosquatting.


Appendix

Stdlib installations

Blocked stdlib installations since 20170913-20170916: 10123

On 20170916, PyPI removed our Top 20 of squatted packages, so our statistics won't match up anymore. They didn't remove the other 108 squatted packages.

Package Average per day
1 smtplib 102.1
2 random 100.0
3 glob 85.1
4 shutil 83.4
5 base64 56.4
6 zipfile 47.5
7 webbrowser 43.9
8 socketserver 41.4
9 codecs 40.0
10 urllib2 39.1
11 getpass 39.0
12 traceback 28.5
13 operator 25.7
14 copy 23.6
15 timeit 22.3
16 struct 22.0
17 pydoc 21.7
18 ftplib 21.7
19 tokenize 21.7
20 optparse 21.3