rocket

From: James D.
Sent on: Wednesday, August 25, 2010 3:18 AM
Hey Everyone -

James Dennis here. I felt as though I let the community down with my shoddy presentation tonight so I have decided to write?about Rocket and share my thinking in a more cohesive form. It's going to be a somewhat long email so no worries if it's too long for anyone. I feel the need to properly explain the system to the community, especially after finding something quite similar implemented in xmlrpclib. Thanks to David Christian for showing me that.


First, an overview

When I got into Python, I was attempting to earn money programming and wanted to do so using Python. I enjoy building things, so I was eager to build ways of communicating with Twitter and Facebook, etc. I found myself implementing remote API's frequently and doing the whole thing. Perhaps I could've downloaded a library, but I was teaching myself Git and didn't really know how to work my way around the world of Python yet. Excuse, excuse... Anyway, I ended up writing this a lot.

>>> import urllib
>>> import simplejson
>>> results_json = urllib.urlopen(query_url).read()
>>> results = simplejson.loads(results_json)
>>> results['results'][0]
{'iso_language_code': 'en', 'text': "Every time I talk about Rocket, I find out this work was done for XML. In fact, Python's xmlrpc seems to have extremely similar code.", 'created_at': 'Wed, 25 Aug[masked]:43:03 +0000', 'profile_image_url': 'http://a0.twimg.com/profile_images/435479248/Picture_1_normal.png', 'source': '<a href="http://twitter.com" rel="nofollow">Tweetie for Mac</a>', 'from_user': 'j2labs', 'from_user_id':[masked], 'to_user_id': None, 'geo': None, 'id':[masked], 'metadata': {'result_type': 'recent'}}
>>>?

This got old quickly.

I found myself implementing many client-side api's for systems that didn't have a full python solution. Event Brite, Machine Translation services like Moses or Apertium (I dabbled with Machine Translation in the past) and I started to get curious as to how I could save time. I didn't really want to muck about with how I handle communicating with services more than once. So when I needed event brite for a project, using Python, I implemented pyventbrite. Pyventbrite gets it's inspiration from pyfacbeook, written by?Samuel Cormier-Iijima and many others. In here is where I first came across IDL's in Python. Some people say they are Interface Description Languages. I have heard college professors say Interface Domain Language. Wikipedia suggests that Interface Definition Language is also used. Anyway, pyfacebook is where I first saw Rocket's proxy function technique.


The time came for me to implement the Event Brite API so I stripped down pyfacebook to just the core of what I'd need for a json system and pyventbrite is where I ended up. The Facebook API had a lot?of stuff in it that was unnecessary for me, so I removed as much as I could. I really just wanted a way to generate the functionality for reaching out to a remote host as quickly as possible while providing some security over whether or not my inputs made sense.?I then had to implement the Sailthru API for Python but there didn't seem to be a Python driver around. Here we go again...

After making some modifications to pyventbrite, splitting the core of it's functionality into a separate module and finally arriving at an implementation of Sailthru, I had the first draft of Rocket.

A quick refresher on __call__

To recall, we know that Python's __call__() is a function that gets called when you call an object. __call__() can make an object callable.?Let's take a quick look.

>>> class SomeClass(object):
... ? ? def foo(self):
... ? ? ? ? ? ? print 'foo'
... ? ? def __call__(self):
... ? ? ? ? ? ? print 'You called __call__()'
...?
>>> sc = SomeClass()
>>> sc.foo()
foo
>>> sc()
You called __call__()
>>>

You can call an instance of SomeClass and it will be treated like a function. This concept is core to the foundation of Rocket because the Proxy class instances are an instance of proxy being called like a function. We'll get to what that means in a second.

Looking from the top down


A user of pysailthru, which is a module for Sailthru that implements Rocket, would write code like this:

>>> api_key = ''
>>> api_secret_key = ''
>>> email = '[address removed]'
>>>
>>> sailthru = Sailthru(api_key, api_secret_key)
>>> email = sailthru.email.get(email)

An implementer of pysailthru has some work to do, but not too much. Lines[masked] in pysailthru (link below) and you see the IDL for pysailthru. The IDL is most of the work. That's how I feel it should?be for someone to implement an API.


Proxy's

The IDL for pysailthru starts with the following:

FUNCTIONS = ?{
?? ?'email': {
?? ? ? ?'get': [
?? ? ? ? ? ?('email', str, []),
?? ? ? ?]
...

That's the same function I called above. So how does this dictionary of dictionarys of lists get turned into something callable? We generate some code! This is why pysailthru calls rocket.generate_proxies() immediately after defining the IDL on lines 111-113.

rocket.generate_proxies(FUNCTIONS,
?? ? ? ? ? ? ? ? ? ? ? ?_get_api_docstring,
?? ? ? ? ? ? ? ? ? ? ? ?foreign_globals=globals())

_get_api_docstring is a function that attempts to build a URL that maps to the remote function's documentation. In the case of sailthru, the documentation exists for email.get exists at http://docs.sailthru.com/api/email#get-mode, so we can easily supply the url of the documentation by inserting the namespace and the method into the URL. The implementation of _get_api_docstring on line 27 will show that.


Anyway, back to the Proxy's. The Proxy objects in Rocket are defined (potentially confusing elements removed) like this:

class Proxy(object):
?? ?"""Represents a namespace of API calls."""

?? ?def __init__(self, client, name, gen_namespace_pair=gen_namespace_pair):
?? ? ? ?self._client = client
?? ? ? ?self._name = name

?? ?def __call__(self, method=None, args=None, add_session_args=True):
?? ? ? ?return self._client('%s.%s' % (self._name, method), args)

Let's take a look at that last line in __call__. If a method is passed in, we call _client with two arguments; a string and args. _client is actually our Sailthru object, so we are calling Sailthru.__call__() when we call _client like a function! Remember this??
sailthru = Sailthru(api_key, api_secret_key)). This call is actually like:?Sailthru.__call__('sailthru.email.get', args).?

Sailthru.__call__() is actually Rocket's __call__, which is defined on lines 289-330. This function takes 'sailthru.email.get', splits it into 'sailthru', 'email' and 'get' and then starts constructing the actual web request. It then sends the web request, parses the output and gives it back to the user as a native Python structure.

But wait a sec... How did the arguments get here? What called Proxy.__call__ or Proxy()? That's coming in a moment...


A series of __call__()'s

In Rocket's __init__ function, you will see the following lines

for namespace in self.function_list:
?? ?(ns_name, ns_title) = self.gen_namespace_pair(namespace)
?? ?self.__dict__[ns_name] = eval('%sProxy(self, \'%s\')'
?? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?% (ns_title,
?? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? '%s.%s' % (client, ns_name)))

After the modules are in Rocket's namespace, it instantiate's them and attaches them to the Rocket instance. In this case, the Rocket instance is our Sailthru object, which subclasses Rocket.

>>> sailthru = Sailthru('', '')
KeyboardInterrupt
>>> sailthru
<pysailthru.Sailthru object at 0x1011d47d0>
>>> sailthru.email
<rocket.rocket.EmailProxy object at 0x1011d4d50>
>>> sailthru.email.get
<bound method EmailProxy.get of <rocket.rocket.EmailProxy object at 0x1011d4d50>>
>>>?

So, we see the generated Proxy objects are attached to the client in Rocket, but how did get EmailProxy? That happened in generate_proxies, which was called when the module was loaded but before Sailthru was insantitated.

generate_proxies()

When pysailthru is first loaded, we get EmailProxy as a result of pysailthru calling generate_proxies() in it's loading process. The call loops over the namespaces (email, send, blast, ...), which requires another loop over the functions (http methods) and then a final loop over each argument for each function. Code is generated that looks like this, for email.get:

def get(self, email):
?? ?"""Sailthru call. See http://docs.sailthru.com/api/email#get-mode"""
?? ?args = {}
?? ?args['email'] = email
?? ?return self('get', args)

email.get() has no optional values and simply takes a string for it's argument. A more complicated function might look like this:

def post(self, email, vars=None, lists=None, templates=None):
?? ?args = {}
?? ?args['email'] = email
?? ?if isinstance(vars, list) or isinstance(vars, dict): vars = json_encode(vars)
?? ?if vars is not None: args['vars'] = vars
?? ?if isinstance(lists, list) or isinstance(lists, dict): lists = json_encode(lists)
?? ?if lists is not None: args['lists'] = lists
?? ?if isinstance(templates, list) or isinstance(templates, dict): templates = json_encode(templates)
?? ?if templates is not None: args['templates'] = templates
?? ?return self('post', args)

In this function (email.post) we see special handling for vars, lists and templates. First, they are listed as keyword arguments to post(). This is because they are all also defined as 'optional' in the IDL. We also see that they might come in the form of a list or a dict. In the cases where instances is not provided, we simply don't add them to post()'s output. The output of post() is the data we intend to send along to email.post.

The body of the function, according to the inner-most loop, is a list of lines of code called 'body'. The first line is args = {}. We see this on line 159, before the loop around a functions arguments begins. It then loops over the arguments to fill in their behavior.?

The first check looks at the arguments options, like whether or not the argument is optional or has a default value. This list can be empty, but for optional arguments it will contain the string 'optional' like ['optional']. It might be a tuple where the first element is 'default' with the second element containing argument's default value. This might look like [('default', 1)] or something like that.?The second check is for the param type. If the param type is rocket.json, we know to json encode the structure given.?The third argument check is simply for whether or the not the value is optional.?

After the body is written, we have a list that represents the function body, but it doesn't have a definition line yet. First, we insert the generated doc string at the beginning of this list (see _get_api_docstring for details). Then we insert the function's definition line (def get(...)). The function's definition is essentially a collection of argument names that are required and then some keyword arguments representing the optional arguments in the IDL. vars, lists and templates were all optional for email.post, so we see them passed as vars=None, etc.

Once the loop over a namespace's functions is complete, Rocket then calls exec on the function body to instantiate it. I think this particular portion of the code could be made neater, but it currently calls eval() with the assumed method name (get for email.get) and puts the output into a list function instances it uses at the next layer of the loop. When it finally reaches completes a loop iteration for the namespace (the outermost loop at this point), the generated code is instantiated as a new type called NamespaceProxy (really, this is EmailProxy).

Namespaces

We see the following lines at the end of generate_proxies().

proxy = type('%sProxy' % ns_title, (Proxy, ), methods)
globals()[proxy.__name__] = proxy
foreign_globals[proxy.__name__] = proxy

Less common use of type() here. type(), when called with one argument, will tell you the type of some structure. type(), called with three arguments, type functions as a constructor.


Now, let's recall that Proxy is a class created to give us __call__, and this Proxy's __call__ then calls Rocket.__call__ which then makes the actual web connection for us. We saw this definition above, but I will copy the important parts here too:

class Proxy(object):
?? ?def __call__(self, method=None, args=None, add_session_args=True):
?? ? ? ?return self._client('%s.%s' % (self._name, method), args)

The call to type() then instantiates '%sProxy' % ns_title, where ns_title is the namespace with .title() applied, giving us EmailProxy. EmailProxy is a subclass of the following classes (Proxy, ) and has methods for members. This gives us an EmailProxy with get and post defined, but it's not available to code until we put it in the namespace.

It might be controversial to require globals() from a module implementing Rocket, but that is how it is designed as of now. After the Proxy object is put into the namespace of pysailthru (foreign_globals[proxy.__name__] = proxy), the __init__ function for Sailthru can then instantiate all of the proxy objects in it's __init__, even though they were constructed inside Rocket's namespace.

Remember the code in Rocket's __init__ that looked like below?

for namespace in self.function_list:
?? ?(ns_name, ns_title) = self.gen_namespace_pair(namespace)
?? ?self.__dict__[ns_name] = eval('%sProxy(self, \'%s\')'
?? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?% (ns_title,
?? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? '%s.%s' % (client, ns_name)))

Once this part is complete, the Sailthru object now has a member called EmailProxy, which has the functions 'get()' and 'post()'.

Before we're done, let's jump back to the definition for:

def get(self, email):
?? ?"""Sailthru call. See http://docs.sailthru.com/api/email#get-mode"""
?? ?args = {}
?? ?args['email'] = email
?? ?return self('get', args)

That last line is a __call__(). So the chain of __call__()'s is finally complete.

Quick recap

1) The IDL is defined.
2) The module implementing an API with Rocket generates the proxy objects from the IDL
3) These objects, called NamespaceObject, each have get(), post(), delete(), put(), etc defined.
4) The Proxy objects exist in the global namespace.
5) When a subclass of Rocket is instantiated, the Proxy objects are attached to the subclass
6) subclass.namespace.method() is now callable.
7) subclass.namespace.method() is a function that validates inputs and then calls subclass.namespace('method', args)
8) subclass.namespace('method', args) then calls subclass('subclass.namespace.method', args)
9) subclass('subclass.namespace.method', args) calls out to the Internet to complete the request.
10) Rocket parses the returned result and returns a native python structure representing the response

An implementation of a Rocket module

So, we're almost done. The general logic is covered for the code generation techniques but an implementer shouldn't?have to care about that stuff. Implementers care about 1) obtaining the IDL whether by coding it or receiving it from a remote source and 2) implementing any protocol specific stuff like request signatures or how API URL's are constructed.

In the case of Sailthru, the URL for email.get is http://api.sailthru.com/email. We know the namespace is email, so we just put the namespace there in the gen_query_url callback. See line 169 in pysailthru.

Each request must be signed for Sailthru to accept them, so build_query_args is essentially a way of passing a sign_sorted_args as the signing algorithm. See utils.py from Rocket for argument signing functions. There's some more stuff in there, but it's not full featured yet.

check_error implements Sailthru's error response, which is to send json with the key 'error' in it.

So, to recap quickly for implementers, you probably only need to define the IDL, gen_query_url, build_query_args and a function that can handle errors from the server called check_error.?

Fin.

I am very interested in hearing about improvements, thoughts, criticisms, etc. Passing globals() seems nasty, but I don't have a clean solution here yet.

Anyway,
James (@j2labs)

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy