Freitag, 3. September 2010

New at Pocoo

Just a short message for those who haven't heard, yet. I'm now a member of Pocoo, the umbrella project for Werkzeug, Jinja2, Sphinx and a lot of other very awesome projects :)

Donnerstag, 19. August 2010

A summary of the Google Summer of Code

The Google Summer of Code is nearly over and it's time to present some results. First of all every project we made can be considered successful, the team was great and I hope to be able to work with the others in the future to achieve new great things. However let's get to the point: As some of you already now Sphinx has now Python 3.x support in trunk which means that with the next non-bugfix release you will be able to use Sphinx with Python 3.x.

In separate branches which are hopefully merged soon into trunk we have i18n support, this allows you to build gettext message catalogs which contain ids as comments which you can use to identify messages which have been changed in the documentation. Another great achievement is websupport, this allows you to create web applications using Sphinx with server-side search, comments on paragraphs and code blocks and proposals to change the documentation from users.

My contribution to i18n and websupport has been an AST based merging algorithm which allows you to easiely track changes across multiple builds of the documentation. This makes it possible to identify changes to the documentation so that we don't have to delete all the comments with every documentation rebuild.

Samstag, 14. August 2010

Making Sphinx faster

I've recently spend a lot of time thinking about programming languages, it's something I'm very interested in and creating my own is an item on my todo list. One topic that comes up if you think about that is parallelization so to get my mind of sphinx-web-support for a while I looked at Sphinx to see how easily I could use it in Sphinx. Before you get too excited these are just a couple of thoughts on my part, there could be something I'm still missing, now read on.

Sphinx' Design

In order to implement this feature we have to look at the design of Sphinx. The design is more or less simple, we have an Application which set's everything up and can be used to run the build process. The build process is handled by the Environment, the environment parses every document in the source directory, creates a doctree(AST) transforms is as necessary and uses the information from the doctree to populate the index, after that the doctree is stored in build/doctrees/document.doctree. Once every document has been processed the environment invokes a Builder, the builder loads each doctree, modifies it if necessary and passes it on to the Writer which creates the code for each doctree we store in the build directory under the name of the builder.

The Problem

Currently the environment does actually a lot more than it should in my opinion, the index is kept global in the environment, as well as everything we need to know about the current document. This makes it impossible to simply parallelize the process of parsing and building process because there is too much shared state.

The Solution

The obvious solution and the better design is to keep the data associated with a specific document in an object I call DocumentContext, this context is used to store the necessary information for a document as well as information we get from the document which is relevant for the Environment. After parsing, transforming and processing each document we use the context and put the relevant information in the environment.

This way the Environment is immutable from a parser perspective and we can easily use parallelization to make the entire build process a lot faster than it is currently.

Another Problem and another Solution: Backwards compatibility

Changing Sphinx in the way I propose will probably break some extensions, it will definitely break the existing domains. Personally I don't really care about this issue because I think software has to evolve and constantly change over time in order to make it in the long run.

However I know that a lot of people do care so I propose something a lot of people know from web applications, context locals, basically they are proxies which point to the objects in the current context, which is either a process, a thread or even a simpler concept based on coroutines. Using those the current API could be kept at least partially and we could deprecate it first before removing it at some point in the future.

Dienstag, 10. August 2010

Hey, what you are doing?

Those of you who follow the discussions on the IRC channel and Twitter already know, we have Python 3.x Support now in Trunk, so sphinx-py3k can be considered a success. However what else is going on?

One of the problems with both i18n and websupport is that we need a way to identify parts of documentation across multiple builds. A simple example is a document, it has multiple paragraphs and we want store comments for each paragraph, we need to keep track of the paragraph even if the document changes or the paragraph itself does. If we don't we have to throw away all the comments for every build as we don't know where we have to put them or if they still apply in case a paragraph has been removed entirely.

Especially identifying a changed paragraph is a bit complicated and required a bit of research on my side however I have a solution which should work mostly, it doesn't pass all the test I came up with however I hope to be able to finish my work soon, so I can talk to birkenfeld about merging my branch with trunk so it can be used in web-support and i18n.

You can take a look at the code in the bitbucket repo, if you want to keep updated about the most recent developments I suggest visiting #pocoo on freenode and/or following me on Twitter. These would also be the right places to ask me questions about the project or to simply ask me about the current status.

Sonntag, 11. Juli 2010

163 tests passed

The most important milestone of sphinx-py3k has been reached by passing the complete test suite with Python 2 and Python 3. Now it should be possible to use Sphinx with Python 3 without encountering any problems.

However nothing is ever perfect so I encourage everybody to test sphinx-py3k and report problems should they occur, so that they can be fixed.

Samstag, 26. Juni 2010

Current git branch in a zsh prompt

I've looked for hours to find a way to just display the current git branch in the prompt of my zsh shell and I've come to this more or less understandable solution which should work with any common VCS available:

autoload -U colors && colors
autoload -Uz vcs_info

zstyle ':vcs_info:*:prompt:*' formats "$VCSPROMPT" "[%b]"

precmd() {
vcs_info 'prompt'

if [ -n vcs_info_msg_0_ ]; then
RPROMPT="${vcs_info_msg_1_}"
else
RPROMPT=""
fi
}

PROMPT=$'%F{green}%~#%f '


The result is that the current working directory is shown on the left side and the name of the branch in braces on the right. I could probably configure it in a way which shows me a lot more information but all I want is a little reminder about the branch I'm currently in.

Freitag, 18. Juni 2010

The current state of sphinx-py3k

I have not made a report in quite some time. I know I am supposed to make weekly reports but up until now I have not found anything really worth reporting at least since the last report. So what happend that you can find here one now?

I fixed every error in the test suite, which were about 55, now there are only 7 failures left. They might be even harder to fix then a usual error but I do not think this is going to be much of a problem.

What is a real problem are doctests. Currently there is no way of converting those automatically and even if there was there are certain problems like the change in the results. The representation of the string types has changed and some functions in the math module returns ints instead of floats were applicable and if you look around you can probably find more examples. A converter cannot deal with these changes easiely. I have some ideas on the problem but I will have to talk to my mentor about a solution.

Sonntag, 16. Mai 2010

GSoC: Scripts are (mostly) working

Looks like another week or so has ended which means a status report by me. Instead of porting Sphinx itself I took my time to port the scripts which are used to check the coding style and a couple of other things.

This means that there is no difference between developing Sphinx with 2.x or 3.x anymore as long as you ignore the errors and failures the test suite gives you when using 3.x.

There still are some problems with the reindent script which is in use. I got the latest one from the CPython svn, adjusted it a bit so it can be converted with 2to3 but for some reason it behaves differently with 3.x.

P.S.: You might have noticed that I changed the blog title, that is because I plan on publishing more non GSoC related posts and I think opening another blog is not worth it.

Samstag, 8. Mai 2010

Progress on the 3.x Port

Since the beginning of the project I was able to remvove every deprecation warning which occurs when running the Sphinx test suite with python -3. Also it is now possible, using distribute, to install it using Python 3. However there are still several problems which do not cause any deprecation warnings and are not fixed by 2to3. So it is not usable for now.

The remaining problems are usually problems with handling strings and files. Even when used correctly in 2.x the changes made with 3.x are causing several problems.

I did nearly nothing during the last week in terms of coding so I will do a lot of work this weekend and make sure to have more time in the future.

Freitag, 30. April 2010

Hashable or Unhashable: That is the question!

Python 3 has changed a lot of things, one of them being that if a class implements __eq__ the __hash__ method will not be inherited. This behavior seems weird at first but if you take a closer look at the topic it shows you a great problem.

Most Python developers do not really think about the design and behavior of an object when they create it, they have a class with an __init__ method, a couple of attributes and methods and may be a __repr__. If they implement other special methods they are usually __getattr__, __*item etc. but given an arbitary class you created could you tell me if it is supposed to be hashable or not and if it is supposed be hashable if it actually is?

Usually you do not care about __hash__ at all, you inherit from object which implements it and on CPython it returns the id of an object on other implementations it may be the same, it may be something different altogether but it is unique to the object.

This result of this is that everytime you create a type and you do not implement __hash__, objects of that type will be hashable even if it is not supposed to be. The worst case is that if you implement __eq__ but not __hash__, equal objects do not behave like equal objects when used as keys in dictionaries, if you put them into a set or if you do something else which relies on hashes as a way to check if objects are equal.

This results in two rules you should always stick to:
1. If you implement __eq__ also implement __hash__.
2. If your object is mutable, set __hash__ to None in order to make it unhashable, this way your object has the same behavior you expect from dictionaries or lists.

If you write new code stick to the rules and you are good to go in your old code make sure to implement those methods and you make it one step easier to port it to Python 3.

Dienstag, 27. April 2010

Congratulations! Your proposal has been accepted.

Yesterday Google released the list of accepted projects for GSoC, my project is one of them. In the next months I will be one of the three students working on Sphinx, my part will be porting Sphinx to Python 3.x and the integration of the web application which was developed during the last year.

Currently we are in the "Community Bonding Period" which means basically we are all idling in #pocoo to get in touch with the community and get an idea of the development process until May 24 which will be the day everybody starts coding. However as Sphinx is not such a big project at least in terms of the number of contributors, I think that spending so much time on "bonding" is a waste of time so I will probably start earlier.

So that's it for now. I will try to give you as much information as possible through blog posts in the future so everyone can easiely follow the progress.