Donnerstag, 2. Juni 2011

GSoC: Benchmark suite is merged

I've started working on the coding part of the project, so far the PyPy and CPython benchmarks have been merged.

I've also done a couple of small commits such as fixing typos, eliminating unnecessary whitespace etc.

The next part of the project will be implementing the configurable tool which downloads and builds interpreters based on a given configuration.

So far I'm able to load and validate a simple configuration, which will be a simple Python file as this is the easiest way to start.

I've a very long weekend ahead, from today to monday which will give me a lot of time to work on the project.

Dienstag, 26. April 2011

Accepted to GSoC

Yesterday the projects which were accepted to GSoC have been announced. Among them are several interesting projects under the PSF umbrella, including mine.

During GSoC I will create a benchmark suite (based on existing ones) with "real world" benchmarks which can be easily used for every Python interpreter. Up until now each interpreter more or less rolled his own suite of benchmarks of varying quality. This makes comparisons unnecessarily hard and binds resources better used elsewhere.

Furthermore I will create an application which is able to download and build interpreters and execute the benchmarks with them using a simple configuration. Up until now such an application does not exist and e.g. http://speed.pypy.org compares released and current(from trunk) PyPy versions with other released CPython versions. As nice as that is, being able to compare the most current versions of various implementations is clearly favorable.

Once that work is completed I will port the benchmark suite to Python 3.x, as several benchmarks have dependencies that do not support 3.x, yet, I will not be able to port the entire suite, however it will be at least in start when it comes to benchmarks for 3.x.

I'm currently compiling a list with information on available benchmarks (what and how does it test) so that people unfamiliar with them can achieve an easy overview, once that is finished I will send E-Mails to the CPython, PyPy, IronPython, Jython and Cython mailing lists with the benchmarks I propose and asking for other benchmarks or changes to my proposed list.

Further information on my project will be published here and on Twitter as soon as possible.

Montag, 4. April 2011

Writing CLI Applications in Python: A Rant

A couple of weeks ago I searched for a way to better organize my music. I have several GBs of music all more or less properly tagged and organized however I wanted to be able to reorganize it, change metadata, search and add covers and add new music easily. The existing applications are either horribly confusing or simply don't provide the features I want, so like every programmer In decided I could do better and started a project.

As I usually write web applications, libraries or do small researchy projects to learn stuff I researched a bit concerning the tools I need. I needed stuff for configuration, something to deal with input and output from and to the CLI.

The first thing I noticed is that there is absolutely no solution to handle configuration. I want something that handles multiple hierarchical configuration files, preserves comments in them even when the configuration changes and at best supports more formats than just INI, choosing the proper parser based on the file extension.

A search for that on pypi shows several packages, several of them don't have a description, those that do have one don't necessarily have documentation and those that have it, tend to lack it and provide no way to contribute to the project or to report bugs. For all intents and purposes those projects don't exist.

Trying to figure out how to handle configuration I took a look at the mercurial source code (another project which needs a more obvious link to a source code browser), I learned that I never want to do that again, at least when it comes to that part of the source. Oh, I nearly forgot, apparently configuration is best handled on your own which is what mercurial is doing.

The next thing I considered was handling CLI arguments for which there are two widely used solutions optparse and argparse. Optparse is the older one and is probably used by more people than argparse so I decided to look at that first, it has no way to handle commands or arguments so I deemed it unusable for my purposes.

On first glance argparse seems to be almost identical to optparse, that is because the developers wanted to preserve "backwards-compatibility", at some point they recognized that this doesn't work and changed the API making it a merge of optparse and "something else". argparse handles arguments and provides commands however that latter is rather awkward to use.

You can't just create a command and add it to the parser, no you have to call `.add_subparsers` on the parser which does not add multiple subparsers as one might think, it returns a special object with a single `.add_parser` method which adds a subparser to the parser `.add_subparsers` was called on. I have no idea why you have to do that and as I value my sanity I probably really don't want to know but something tells me that nobody sane involved ever gave the API design any consideration.

As it is really just designed as a parser `argparse.Parser.parse_args` always returns a flat data structure which does not provide information about commands invoked, which would certainly be helpful to call the appropriate function implementing that command. The documented solution for this problem is to add a default function `func` to every subparser (and yes you can specify "defaults" independent of options or arguments on a subparser which are actually not default at all because they are never changed) and call the function, which ends up as `func` in the result, with the result.

I realize things are almost always more difficult than they appear to be and there are probably good reasons for the decisions which have been made but surely there have to be better solutions to this problem.

User input is a somewhat ugly thing, all that parsing and validating dealing with those idiots calling themselves users is not really pleasant so I was hopeful that at least output on a terminal can be considered a solved problem. It is not.

If you want to write a paragraph of text, wrapped to the width of the terminal, to stdout, you have to get the width of the terminal in platform dependent ways via iocntl and fcntl on linux (I guess you have to wrap the Windows API with ctypes); luckily at least textwrap is already in the stdlib.

You have to implement progress bars and coloring yourself unless you want to have dependencies for all of these things.

Also don't forget that a simple print statement may cause problems as soon as you don't have an ASCII decoding and that even if you decode to the proper stdout encoding if possible user input might not be encode-able (umlauts to ASCII) and that you therefore may have to transliterate unless you are willing to just replace and ignore these errors but I'm sure everyone of you does this carefully everywhere.

The fact that there are no solutions to these problems is a really big WTF and makes writing CLI applications a pain in the ass which really shouldn't be the case.

Sonntag, 6. Februar 2011

Documentation Directory Structure

If you are using Sphinx for documentation thinking about the directory structure is important because changing it later gets painful. The builders replicate the directory structure in the source directory in the build directory so if you want to change it you will have to make redirects for the HTML documentation.

In my experience the following structure is rather flexible:

docs/user
Documentation directed at users such as tutorials, guides etc.

docs/api
API documentation mostly generated with autodoc.

docs/development
Documentation directed at developers of your project.

docs/additional
Use this for your change log, license etc.

Sonntag, 30. Januar 2011

Privacy vs. Acceptance

In the last couple of months I've seen a lot of debate going on about privacy. Some people complain about the fact that privacy seems to disappear others see it as a positive evolution of cultures and then there are all those people who either don't care or simply go along with it.

I see why people like privacy, exhibitionism is not a common trait and people fear the judgement of others. Knowledge gives you power, if the balance between the knowledge others have about and you have about them shifts to either side disadvantages arise. Nevertheless I think there is something very important to gain by giving up privacy: acceptance of others or tolerance towards them.

The internet gives us the possibility to re-evaluate what society looks like, it provides us with a picture of an accuracy we have never seen before in the world where the media, which provided a mirror of society, filtered the information from the picture.

Ethics and morals will adjust to that new picture, what is considered normal and what not will switch.

The difference between what is considered normal and the people, we live and communicate with, the people we meet, might stay the same but by changing the view on what is normal acceptance will change for the better.

I believe that by providing a broader picture of society, a more accurate picture of society acceptance will be broader, people will not have to fear judgement of others (as much) and therefore are encouraged to develop themselves into what they want to be.

I believe in an open world, what do you believe in?

Freitag, 3. September 2010

New at Pocoo

Just a short message for those who haven't heard, yet. I'm now a member of Pocoo, the umbrella project for Werkzeug, Jinja2, Sphinx and a lot of other very awesome projects :)

Donnerstag, 19. August 2010

A summary of the Google Summer of Code

The Google Summer of Code is nearly over and it's time to present some results. First of all every project we made can be considered successful, the team was great and I hope to be able to work with the others in the future to achieve new great things. However let's get to the point: As some of you already now Sphinx has now Python 3.x support in trunk which means that with the next non-bugfix release you will be able to use Sphinx with Python 3.x.

In separate branches which are hopefully merged soon into trunk we have i18n support, this allows you to build gettext message catalogs which contain ids as comments which you can use to identify messages which have been changed in the documentation. Another great achievement is websupport, this allows you to create web applications using Sphinx with server-side search, comments on paragraphs and code blocks and proposals to change the documentation from users.

My contribution to i18n and websupport has been an AST based merging algorithm which allows you to easiely track changes across multiple builds of the documentation. This makes it possible to identify changes to the documentation so that we don't have to delete all the comments with every documentation rebuild.