12/18/2014

An Architecture for Real-Time Geo-tracking with Python, Celery, RabbitMQ, and More


If you look at a map online and someone asked what you see, most people will answer, “a map.”
That’s not the answer you would get from Ragi Burham. He sees the underlying geometry, vectors, styling, raster data, GUI controls, application logic, and data store. That’s because Burham is an expert in the area of mapping solutions, and his background includes over a decade of geo-related work with organizations like ESRI, NIMA, National Geographic, Navteq, TeleAtlas, and his own company, AmigoCloud.
At Pivotal, we get excited about people like Ragi Burham—an open source minded software engineer with a new way of doing things. At PyCon in March of 2013, Burham shared a talk titled, “Realtime Tracking and Mapping of Geographic Objects using Python”—it’s another way to look at geography-based big, fast data architectures.
Below, we recap the “Normal” and “Real-time” architectures that Burham shared in his talk. These two approaches use OpenLayersVarnishNginxGunicornDjango,TileStacheMemcachedMapnikPostgreSQLPostGISSocket.ioNode.jsCelery, andRabbitMQ. Much like our recent post on scaling social media with Celery and RabbitMQ, Burham adds these two components along with Socket.io and Node.js to help his real-time architecture work in a fundamentally differently way and scale.
If you have 26 minutes, you can also view the video below.

cta-download-rabbitmqA “NORMAL” GEO-BASED ARCHITECTURE
THE CHALLENGES WITH REAL-TIME GEO-BASED ARCHITECTURES
Beyond an explanation of underlying geometry, vectors, styling, raster data, and architecture, Burham covered some of use cases and requirements driving the specifics of what geo-based systems do—determining the amount of information on the map, omitting certain types of mapping objects, making calculations like speed or direction, alerting that something has crossed a boundary, and more.
He began outlining his “Normal” GeoStack architecture by listing the most popular open source data sets and covering some of the pros and cons of OpenStreetMap,NAIPSRTM, and Natural Earth Data. With the data sources covered, he filled in the details of the architecture. This included a client with OpenLayers, a JavaScript library that helps with rendering maps (another option is Leaflet). Varnish was used in front of servers with the following components: Nginx, Gunicorn, TileStache, Memcached, and Mapnik. Then, a PostgreSQL database stores the map information with the PostGIS spatial database extension.
Because rendering style can take a lot of time, Varnish is used for speeding things up and can also be used as a reverse proxy. For those unfamiliar, TileStache is used to serve map tiles based on rendered geographic data the Mapnik toolkit is used for map development. Here is a screen shot of his normal architecture:
normal-geostack
One of the key challenges with this type of architecture is when you need to get state from the database. Clients end up polling for many real-time tracking scenarios. This presents a problem when, for example, you have 1000s of vehicles confirming state every second. Polling the database at this rate can create an overwhelming and unnecessary amount of traffic. And it doesn’t scale.
But, what if the server could notify a client? To evolve the architecture, Burham proposes four additional components—Node.js with Socket.io and Celery with RabbitMQ.
real-time-geostack
By adding Socket.io to the client, we get a way to use websockets across browsers and allow data to be transported bi-directionally. We can send information from the server back to the client specifically and only when it is needed. Varnish changes purpose slightly in this architecture—it still does the HTTP caching, but now it also acts as a reverse proxy. You can configure small lines of code on Varnish to deal with traffic differently. For example, if websocket traffic comes in, you can tell it to go to server A, and, if other types of traffic come in, use server B. In this case, Node.js is connected with Varnish and offers a way to push or publish notifications back out to clients.
Now, we need a way to take an incoming client’s geo-position update, asynchronously decide if some alert should happen, and distribute a notification back out for only certain clients. This is where we add Celery and RabbitMQ to the architecture. With these two components, we can support asynchronous tasks and queues. If you aren’t familiar with Celery, it is a task queue based on distributed message passing. Your code can call Celery tasks to execute concurrently and operate in either asynchronous or synchronous modes. This is great for geographic use cases like crossing a boundary or fence—when something is entering or leaving a polygon. You don’t need to calculate when an update happens in real-time, you can receive a task, make a calculation, and decide to notify others later. With the addition of RabbitMQ (Celery’s default broker), we then transform these tasks and calculations into a Pub/Sub channel where Node.js is listening. Whenever an appropriate event is triggered, we can push a message down the channel, and Node.js forwards the information back to the appropriate clients using Socket.io.

Flask performance optimization


Here are my performance optimization guidelines that I try to follow

General

After transitioning from PHP to Python development I was quite surprised by the lack of performance optimization tips and scaling advice in general. So with this post I will try and lay out the basic ideas of scaling a website from 10 to 50,000 users with Flask Framework.

Server Setup

It all starts with your server setup, and the server depends on your budget. There are several options that you can chose from:
  1. 4GB RAM 8 Core VPS from Linode (80$)
  2. 4GB RAM 2 Core VPS from DigitalOcean (40$)
  3. Or a dedicated server from any of hundreds of hosting companies.
But I would strongly advise taking something with >= 2GB RAM. This much RAM and a decent CPU will help you scale to hundreds of thousands of views per day. (With the proper amount of caching of course)

Frontend Reverse Proxy

Now that we have our server, we need to setup a frontend proxy. Your choice is between Apache, Lighttpd or Nginx. I would suggest using Nginx because it’s event-driven (that means amazingly low memory footprint) and possibility to serve 10,000 concurrent users with no problems at all.

Caching static pages with Nginx itself

Before the users even gets to your dynamic site, it’s usually a good idea to cache some pages with nginx. For example you have a page with a list of posts that updates only once a day, or a comments page, that updates only once a week. For such pages you can setup Nginx caching, so that the user wont even get to the python interpeter. A guide on how to setup Filesystem cache is here, or if you want to setup Nginx + Memcached guide.

UWSGI and Gevent

Now that the user got to the dynamic part of your application, we need to decide what kind of web server we want to use. There are different benchmarks here and here that praise different kinds of web servers. But I personally prefer UWSGI, because it’s fast and lightweight plus there are many different CLI options that you can provide. For example: Auto-reload python compiled files on every request. (Good for development).
Now we add some asynchronous IO and lightweight multi-threading to it with Gevent and boom we have a very fast web server, able to handle thousands of concurrent connections.

Flask PyPy

If you’re not using any 3rd party extensions that are mainly written in C (Like the MySQLdb driver for python) your should consider trying PyPy for more performance boost. Flask is fully compatible with PyPy although some patching will be required to use Gevent with PyPy (source)
Here are some benchmarks CPython vs PyPy: here and here

Flask-Cache

This should be the first extension that you install. It’s easy to setup and easy to use.
Example caching with decorator:
@cache.cached(timeout=500, key_prefix='daily_posts')
def get_posts():
    posts = get_all_posts_with_multiple_joins_heavy_db()
    return [x.content for x in posts]

cached_posts = get_posts()

Memcached/Redis

Another advice I can give you is storing EVERYTHING, that can be cached, in memory. It doesn’t matter what kind of system you prefer (Although a horizontally scalable structure would be nice) Memcached, Redis, Voldemort. The main point is, cache, cache, cache. Don’t do anything dynamic if you can get the results from cache.

Jinja2 template bytecode caching

With Jinja2 you have several options of caching bytecode of compiled templates:
  1. FileSystemBytecodeCache
  2. MemcachedBytecodeCache
Use them.

Consider using ElasticSearch

If you’re doing a lot of queries like this:
SELECT * FROM posts WHERE author='someauthor'
Then you should definitely consider replacing these kinds of queries with ElasticSearch. Here are some articles on ES performance: here and here
Although ES is not a database, you can feed data into it from your RDBMS and then serve the data to user from it.
The main points are:
  1. ES is easily scalable. You can scale it from 1 to 10 servers “with just a click of a button”
  2. Very fast. Based on Apache Lucene the FULL-TEXT search capabilities are extremely fast.
  3. Easy to use. You can add remove documents with requests to the REST API.
  4. Active development and big community.

Code Profiling and Optimization

Now we’ve come down to the last thing you must regularly do. Profile your application! Check out how many queries you’re doing per page, how long does the sorting last, maybe there are ways you can optimize the nested for loops? There’s always something that can be rewritten.
Good for you there’s a good profiling tool for Flask that makes it easy to find the bottlenecks of your application

Conclusion


I’m pretty sure everyone has their own “performance guidelines” that they usually follow with their apps. But this one is mine, and I hope you found out something useful here.

What is the technology stack behind Pinterest?

Paul SciarraI'm a co-founder.
377 upvotes by Quora UserYash NelapatiBen Silbermann(more)
We use python + heavily-modified Django at the application layer.  Tornado and (very selectively) node.js as web-servers.  Memcached and membase / redis for object- and logical-caching, respectively.  RabbitMQ as a message queue.  Nginx, HAproxy and Varnish for static-delivery and load-balancing.  Persistent data storage using MySQL.  MrJob on EMR for map-reduce.

Quora’s Technology Examined

Quora has taken the tech and entrepreneurial world by storm, providing a system that works so fluidly that it is sometimes hard to see what the big fuss is all about. This slick tool is powered, not only by an intelligent crowd of askers and answerers, but by a well-crafted backend created by co-founders who honed their skills at Facebook.
It is not surprising that, with all the smart people using this smart tool, there are many pondering on how it works so well. The NoSQL boffins scratch their heads and ponder such questions as, “Why does Quora use MySQL as the data store rather than NoSQLs such as Cassandra, MongoDB, CouchDB, etc?“.
In this blog post I will delve into the snippets of information available on Quora and look at Quora from a technical perspective. What technical decisions have they made? What does their architecture look like? What languages and frameworks do they use? How do they make that search bar respond so quickly?

Components Of Quora

The general components that make up Quora are…
  • You can ask questions
  • You can answer questions (anonymously if you desire)
  • You can comment on answered questions
  • You can vote-up or vote-down answers to questions
  • Questions can be assigned to topics
  • You can write a post (a informative statement, rather like a orphaned answer or blog post)
  • You can follow questions, topics or other users
  • A super-fast auto-complete search-box at the top, which doubles as the method for entering new questions
The last point, the super-fast auto-complete search-box, is one of the defining features of Quora. You can see immediately, as you begin to enter a question, whether somebody else has already asked that question or if there is a topic or post on the subject. Let’s start there…

What’s Cooking Under That Hood?

Only the questions, topic labels, user names or post titles are indexed and served up to the search-box. There is no full-text search, so searching the content of questions and answers will not work. The text that is indexed is tokenized so that words in a different order will still be matched. Prefix matching enables best matches to be shown before the entire word is entered. For instance, typing “mi” might immediately show “Microsoft” in the results.
There is some simple stemming of words, since “nears” matches “near”, but “pony” does not match “ponies”. “Topic-aliases” allow for similar matches on topic names, such as “startup” and “start-up”. These topic-aliases have been manually entered by users. Otherwise these would not match.
If a duplicate question is redirected to another question (a feature of Quora), then that original question will still appear in the search results, since it increases the chances of a match. There is no n-gram indexing, so slight mis-spellings will not match. For instance, “gooogle” (with an extra “o”) finds nothing.
Previously, they did use an open source search server, called Sphinx. It supports the features they are using above, but they have since moved from this due to real-time constraints. Their new solution is built in-house and allows them better prefix indexing and control over the matching algorithms. They built this in Python.
What libraries does Quora use for search?
Adam D’Angelo, Quora Founder (Nov 13, 2010)
Our search is custom-written. It doesn’t use any libraries aside from Thrift, and Python’s unicode library, which we use for unicode normalization.

Speedy Queries

Did I mention that the search-box is fast? I did some tests and found the responses to be impressive. Queries are sent over AJAX as a GET request. Responses come back as JSON with the rendered HTML embedded inside the JSON. Rendering of the results on the server-side, as opposed to rendering them in JavaScript, seems to be due to the need to highlight matching words in the text. This is sometimes too complex for JavaScript. For instance, typing “categories” might highlight the world “category” in the result text.
I was seeing responses of roughly 50 milliseconds per query from my Linode machine. Quora does not short-change you when sending requests. From within the browser, I found typing “Microsoft” (9 characters) would result in nine requests to the Quora search server, no matter how fast you type. As you will see later, the server is in control, so if it did become over-loaded, then it could update the results less frequently without changing the JavaScript.
Quora uses persistent connections. A HTTP connection is established with the server when you start typing the search query. This connection is kept open and further requests are made on this same open connection. The connection will terminate (times-out) if not used for 60 seconds. If a connection times-out then a new connection is established when typing begins.
To simulate the typing of a word into the search-box, I sent the following requests, character-by-character, across a persistent connection. For instance “butler” is six requests (“b”, “bu”, “but” … “butler”).
"butler" (6 chars) duration: 0.393 secs 0.065 secs per query
"butler monkeys" (14 chars) duration: 0.672 secs 0.048 secs per query
"fasdisajfosdffsa" (16 chars) duration: 0.746 secs 0.046 secs per query
That last query was used to test if there was a slow-down for a word that would obviously not be in a caching layer. I saw no slow-down. This means that they are not caching, caching is only used to take the load off the backend search engine or they are doing something smarter (e.g. if there is no match for “fasd” then there will be no match for “fasdi”, so abort).
Is Quora going to implement full-text search?
Adam D’Angelo, I made a lot of the early Quora … (Sep 1, 2010)
Yes, eventually. We haven’t implemented this yet because we’ve prioritized other things, but we will definitely do it in the future.

Webnode2 And LiveNode

Webnode2 and LiveNode are some of Quora’s internal systems, which were built for managing the content. Webnode2 generates HTML, CSS and JavaScript and is tightly coupled with LiveNode, which is responsible for managing the display of the content on the webpage. Charlie Cheever says that he were to start a similar project without LiveNode, then the first thing he would do is rebuild it.
They seem very pleased with the technology they have built and struggled to find its weaknesses. One weakness is that it is tricky for LiveNode to keep track of what is happening within the browser as it pushes changes from the server. If users A and B are viewing the same question then ones interactions will affect the other. For instance, if user A up-votes an answer then that answer will be promoted and will visibly move up the page. This display change will be pushed over AJAX to user B’s browser. Any prior browser-side change that user B made, such as expanding a comments section, might be lost.
LiveNode is written in Python, C++, and JavaScript. jQuery and Cython is also used.
While they would like to open-source LiveNode and have tried to keep code separation, doing so right now would be too much work and would take time away from their main goal, which is making Quora better.

Amazon Web Services

Amazon EC2 and S3 is used for their hosting. While this is not as cost-effective in the long-term as running your own servers, it is perfectly designed for fast growing companies like Quora.

Ubuntu Linux

Quora uses Ubuntu Linux as its OS of choice. No major surprises there. It is easy to deploy and manage on Amazon EC2. Adam D’Angelo points out that he used Debian Linux at high school and college and stuck with it because “it works and there hasn’t been a compelling reason to switch”.

Static Content

You only need to look at the source HTML of any Quora webpage to see that they are using Amazon’s distributed content delivery network, Cloudfront. URLs are in the form…
http://d2o7bfz2il9cb7.cloudfront.net/main-thumb-670336-25-7kmigSSkkdusoE6gHRkdQsXfjuTCaxQs.jpeg
CloudFront is used for all static images, CSS and JavaScript (except for Google’s Analytics JavaScript, which is hosted by Google). Images are uploaded to the EC2 machine, then resized and uploaded to S3. This is managed using the Python S3 API.

HAProxy Load-Balancing

Quora uses HAProxy at the front-line, which load-balances onto the distributed Nginx servers behind them.

Nginx

Behind the load-balancer, Nginx is used as a reverse-proxy server onto the web-servers.
To understand more about this setup I recommend reading “Using Nginx As Reverse-Proxy Server On High-Loaded Sites“.

Pylons And Paste

Pylons, a lightweight web framework, is used as their main web-server behind Nginx. They use the defaultPylon + Paste stack.
Pylons was selected much like you would select a pumpkin for Haloween. They scooped out the insides, such as templates and the ORM, and replaced it with their own technology, written in Python. This is whereLiveNode and webnode2 reside.
MochiMedia was also one of the inspirations for using Pylons, since they were using it themselves.

Python

Coming from Facebook, it was a good bet that Charlie and Adam would choose PHP for their development language. As Adam points out, “Facebook is stuck on that for legacy reasons, not because it is the best choice right now“. From this experience they knew that choosing technologies, especially programming languages, for the long-run was very important. They also looked at C#, Java, and Scala. Discounting Mono, C# would be a choice of more than just the language. It would require them to build on-top of a Microsoft stack. Python won over Java because it is more expressive and quicker to write code than Java. Scala was too new. Adam mentions speed and the lack of type-checking as drawbacks with Python, but they both already knew the language reasonably well. Where Python lacks speed for performance critical backend components, they opt to write them in C++. They saw Ruby as a close match to Python, but their experience with Python and lack of experience in Ruby, made Python the winner. Python 2.6, to be precise.
Additional benefits for using Python are the fact that data-structures that map well to JSON, code readability, there is a large collection of libraries and the availability of good debuggers and reloaders. Browser-server communication using JSON is major component of what Quora does, so this was an important factor.
No IDEs are used for development as most use the Emacs text editor. Obviously this is personal choice, and would change as the team grows.
PyPy, a project that aims to produce a flexible and fast Python implementation, was also mentioned as something that might give them a speed-boost.

Thrift

Thrift is used for communications between backend systems. The Thrift service is written in C++.
Why would you write a Thrift service in C++?
Adam D’Angelo, I’ve written a lot of Python, in… (Sep 4, 2010)
Mainly if you want to keep data in memory between requests, and want to keep your Python code stateless. Writing a Python wrapper around a C library involves some memory management with reference counting that requires some understanding of the Python internals, but writing a thrift interface is simple. You also isolate failures this way – if the service goes down it won’t take the Python code down with it.

Tornado

The Tornado web framework is used for live updating. This is their Comet server, which handles the large volumes of open connections used for long-polling and pushes updates to the browsers.

Long Polling (Comet)

Quora does not display just static web pages. Each page will update will new content as questions, answers and comments are submitted by other others. As Adam D’Angelo points out, one of the best ways to do this currently is with “long polling”. This is different to “polling”. With polling the browser will repeatedly send requests to the server saying “Any updates?” and the server will respond with “No”. A few seconds later it will ask again, “How about now?”. “No”. “How about now?”, “No, already!”. This puts the client (web-browser) in the driver’s seat. This is backwards because the client does not how long to wait before asking again. If the client asks the server too frequently then it will unduly overload the server. If it pings the server too infrequently, then server will be sitting on updates while it waits for the client to request them and the end-user will not see updates immediately.
Long polling, also known as Comet, puts the server in control, by making the client wait for responses. The conversation between the client and server is the same, but instead of the client waiting before making another request, the server waits before it makes the response. The server can keep the connection open for a long period of time (e.g. 60 seconds) while it waits to see if any updates come in. When updates do come in it can respond immediately to the client. On receiving the update, the client then immediately sends a new request for more updates. The server, once again, delays responding until it knows something worth telling the client or enough time has past that it would be rude not to respond.
The benefit to long-polling is that there is less back-and-forth between the client and server. The server is in control of the timing, so updates to the browser can be made within milliseconds. This makes it ideal for chat applications or applications that want really snappy updates for their users.
The down-side is that you are going have lots of open connections between the clients and your servers. If you have a million users (Quora will soon) and, if only 10% of them are online on your site, you will need an architecture that can hold open at least 100,000 concurrent connections. This assumes they only have one tab open to your site. Right now I have 7 tabs open for quora.com in my browser. Each tab usually has multiple connections open to quora.com. In short, Quora must maintain a lot of open connections.
The good news is that there are technologies specifically designed for this. It costs very little to hold open connections in memory if you free up all the resources used for that connection. For instance, Nginx (Quora uses this for proxying requests) is a single-threaded event-based application and uses very little memory for each connection. Each Nginx process is actively dealing with only one connection at a time. This means it can scale to tens of thousands of concurrent connections.
How do you push messages back to a web-browser client through AJAX? Is there any way to do this without having the client constantly polling the server for updates?
Adam D’Angelo, Quora (Sep 29, 2010)
There is no reliable way to do this without having the client polling the server. However, you can make the server stall its responses (50 seconds is a safe bet) and then complete them when a message is ready for the client. This is called “long polling” and it’s how Quora, Gmail, Meebo, etc all handle the problem.
If you have a specialized server that uses epoll or kqueue, you should be able to hold on the order of 100k users per server (depending on how many messages are going). This is called the “c10k” problem. http://www.kegel.com/c10k.html

MySQL

Just like Facebook, where co-founder Adam D’Angelo previously worked, Quora heavily uses MySQL. In answer to the Quora question “When Adam D’Angelo says “partition your data at the application level”, what exactly does he mean?“, D’Angelo goes into the details of how to use MySQL (or relational-databases generally) as a distributed data-store.
The basic advice is to only partition data if necessary, keep data on one machine if possible and use a hash of the primary key to partition larger datasets across multiple databases. Joins must be avoided. He sites FriendFeed’s architecture as a good example of this. FriendFeed’s architecture is described by Bret Taylor in his post “How FriendFeed uses MySQL to store schema-less data“. D’Angelo also states that you should not use a NoSQL database for a social site until you have millions of users.
It is not only Quora and FriendFeed who are heavily using MySQL. Ever heard of “Google”? It is hard to imagine, since everything Google does has to scale so well, but in the words of Google, “Google uses MySQL [...] in some of the applications that we build that are not search related”. Google has released patches for MySQL related to replication, syncing, monitoring and faster master promotion.
How does one evaluate if a database is efficient enough to not crash as it’s put under increasing load?
Adam D’Angelo, Quora (Oct 10, 2010)
One option is to simulate some load. Write a script that mimics the kinds of queries your application will be doing, and make sure it can handle the amount of load you want it to be ready for (especially as the size of the dataset changes).

Memcached

Memcached is used as a caching layer in front of MySQL.

Git

JavaScript Placement

If you look at Quora’s source you will see that the JavaScript comes at the end of the page. Charlie Cheeversuggests that this give the feeling of a quicker loading page, since the browser has content to display before the JavaScript has be seen.

Charlie Cheever Follows “14 Rules for Faster-Loading Web Sites”

Steve Souders, author of High Performance Web Sites and Even Faster Web Sites, lists the following rules for making websites faster. This list is mentioned by Charlie Cheever, the co-founder of Quora, as one of the reasons for Quora’s speed.
“One resource we used as a guide is Steve Souders’ list of rules for high performance websites:http://stevesouders.com/hpws/rules.php
– Charlie Cheever, Quora
Steve Souders’ 14 rules are…
  • Make Fewer HTTP Requests
  • Use a Content Delivery Network
  • Add an Expires Header
  • Gzip Components
  • Put Stylesheets at the Top
  • Put Scripts at the Bottom
  • Avoid CSS Expressions
  • Make JavaScript and CSS External
  • Reduce DNS Lookups
  • Minify JavaScript
  • Avoid Redirects
  • Remove Duplicate Scripts
  • Configure ETags
  • Make AJAX Cacheable

Conclusion

Quora is a great example of a modern tech start-up. They are very small team who understand the technologies they are using very well. They have made considered choices in the technology they have selected and have a good vision of which components would be better written from scratch. They seem keen to share these in-house technologies with the open-source community and I look forward to when they have the time to make this a reality.