Advertisement

Your Ad could be here. I want to connect my readers to relavant ads. If you have a product targeted at developers, let's talk. [email protected]

The next version of markata will be around a full second faster at building it's docs, that's a 30% bump in performance at the current state. This performance will come when virtual environments are stored in the same directory as the source code.

"One lone jedi stands in Glowing chains of interconnected network of technological cubes, in the middle of a futuristic cyberpunk dubai city, in the art style of dan mumford and marc simonetti, atmospheric lighting, intricate, volumetric lighting, beautiful, sharp focus, ultra detailed" -s50 -W800 -H350 -C7.5 -Ak_lms -S1657735302

What happened??

I was looking through my profiler for some unexpected performance hits, and noticed that the docs plugin was taking nearly a full second (sometimes more), just to run glob.


    |  |- 1.068 glob  markata/plugins/docs.py:40
    |  |  |- 0.838 <listcomp>  markata/plugins/docs.py:82
    |  |  |  `- 0.817 PathSpec.match_file  pathspec/pathspec.py:165
    |  |  |        [14 frames hidden]  pathspec, <built-in>, <string>

Python scandir ignores hidden directories

I started looking for different solutions and what I found was that I was hitting pathspec with way more files than I needed to.


len(list(Path().glob("**/*.py")))
# 6444
len([Path(f) for f in glob.glob("**/*.py", recursive=True)])
# 110

After digging into the docs I found that glob.glob uses os.scandir which ignores '.' and '..' directories while Path.glob does not.

https://docs.python.org/3/library/os.html#os.scandir

results?

Now glob.py from the docs plugin does not even show up in the profiler.

I opened up ipython and saw the following results. For some reason as I hit docs.glob it was only hitting 488 ms from ipython, but it was still a massive improvement over the original.


%timeit docs.glob(m)
# 488 ms ± 3.05 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit docs.glob(m)
# 9.37 ms ± 90.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)