Tuesday, November 25, 2008

Scaling MySQL for dummies

Alvinator's LOG: November 26, 2008 4:30am

I know it seems unhealthy to stay up awake at these times. I just finished my work but I can't manage to sleep anymore so I thought about writing a short blog.

MySQL has been my favorite RDBMS since I started working with databases. It's free, fast and easy to use although I know a lot of people arguing about its lack for features vs Oracle and postgresql specially on scaling. Well there's this mysql-cluster that promises High-Availabity and Performance, but I still find it pretty complex to setup and maintain. What you can do is put something on top of MySQL and you can do that with your own elbow-grease. There's a lot of techniques in doing this and these three are those that I've used so far:

1. Replication
You can replicate MySQL database (n) times, then configure your database applications to issue the SELECT commands on the replicas and INSERT/UPDATE/DELETE on the master database. This kind of setup is good for websites where most of the database operations is SELECT.

Advantages: Fast Reads
Disadvantages: Only one Master Database for writes (one point of failure)
Problems that may arise: Slave data might lag due to connectivity problems with the master rendering your SELECTS outdated at some point in time.

2. Partitioning
A table that has multi-million rows will drag your queries. One solution is to partition your table into multiple tables on a specific category like dates, year, location, etc. You can do this programatically yourself or you can make use of MySQL 5.x's built-in partitioning support. I've used it when parsing logs into database.

Advantages: Fast Reads
Disadvantages: queries into multiple tables is quite hairy
Problems that may arise: You're still using one MySQL server instance and that makes it one point of failure.

3. Sharding
I first heard about this when I was watching the youtube guys in google tech talk giving some information about their experiences scaling youtube. Sharding in a nutshell is where you partition your database across different MySQL servers and across different machines. A data can be stored in a shard (a MySQL database instance) using a specific foreign key like name, date, or a programmatic hash. I've personally implemented this kind of setup and I'm quite satisfied with it. Just one note: This is NOT the holy grail of MySQL Scalability, your specific requirements may not be met on this kind of setup.

Advantages: Fast Read/Writes, Better HA since you have multiple MySQL instances
Disadvantages: You have to implement it yourself, your database applications will need to connect to multiple databases.
Problem that may arise: Resharding is a nightmare, your system admins will hate you because manual queries will be harder, a lookup that is not using the foreign key will require you to query across all shards.

In sharding you have to decide how you can spread your data across shards and know how to easily obtain them. You use for example the first letter in username ('A' goes to shard1, 'B' goes to shard2, ...) which is human understandable although there's probably more A's than X's, Y's, and Z's right? A better solution is by programmatic hash. Using a programmatic hash ensures that your data is equally scattered among your shards. In python you can do it like this:

Python 2.5.1 (r251:54863, May 2 2007, 16:56:35)
[GCC 4.1.2 (Ubuntu 4.1.2-0ubuntu4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> def hash_me(username, numshards):
... return hash(username) % numshards
>>> hash_me("alvinator", 4)

That makes "alvinator" assigned to shard number 3 if you have 4 shards.

Monday, November 10, 2008

Smart Uzzap launch

It's been almost six months since I last posted in this blog. The main reason of this long absence is because of this:

Our main project Smart Uzzap has been launched last July 7, 2008 and it looks that it will have a bright future ahead. The total user base still keeps growing to the point that our public chat rooms can barely keep up to the number of people that would like to socialize.

Like all systems, we did experienced a lot of birth pains and problems. We even encountered OS level issues that was pretty alien to us (TCP Zero Window, TCP half-open scenarios, weird behavior of mobile GGSN's, fuzzy network congestions, etc.). Although amidst of these stressful situations, I'm quite greatful for being a part of it since very few people do encounter and learn how to fix these problems.

Being a python developer, I'm very proud to say that the Smart Uzzap server and its other components is written in python. While currently on production, it still keeps evolving silently in the background. Even though I get tired of attending meetings, fixing bugs, and introducing new features; I feel good knowing that people start building their own community with Uzzap. There has been some eyeballs, friendster accounts, blogs, and youtube clips like one below that was created by Uzzap users

Here's the things that I've learned since the project launch:

1. Every system has its own issues so don't panic.
2. Resource Monitoring and Reports is a must but at the same time, you should be smart enough to interpret what they say to track down bottlenecks.
3. Do not take TCP for granted!!! You should configure your TCP settings according to how you utilize the server.
4. A low load average doesn't mean that the network traffic is low.
5. Server applications should be smart enough to reconfigure itself during runtime.
6. Avoid, the one-point of failure systems. Always provide a backup or have each component scalable.
7. Do the network programmer loop:
Relax -> Look for bottlenecks -> Fix bottlenecks -> Test the fixes on development setup -> Apply Fixes -> go back to step 1.
8. Sharding databases is quite nice as long as you only have 1 foreign key to each of the database servers.
9. Sometimes, a measly one-liner of code can greatly improve the system. ^_^

As for home projects, I'm too tired to horse around in my cave anymore although I'm planning to try my hands on some streaming stuff like the RFC 2326 RTSP protocol. So there we go, if anyone needs some advise on scalability, don't be shy send me an email and I'll try my best to answer.

Monday, May 5, 2008

Testing 1-2-3 and foobar

I have a habit of using strings for sanity tests and tutorials such as: "hello world", "foo bar" and "Testing 1-2-3". I believe I got these from looking at other persons code and reading RFC's. Out of boredom, I googled these strings and found out they have colorful history attached to them:

1. "hello, world" - There is almost no programming book and tutorial that doesn't teach you how to display this word. The string "hello world" first came out with Brian Kernighan's "The C Programming Language" in 1974 and has become the basis for introducing programming languages. (http://en.wikipedia.org/wiki/Hello_world_program)

2. "Testing 1-2-3" - was the first ever email message sent by ARPANet's Ray Tomlinson in 1971. Although Ray himself vaguely remembers it (It could've been "QWERTYUIOP" or "Testing 1 2 3 4"). (http://openmap.bbn.com/~tomlinso/ray/mistakes.html)

3. "foo & bar" - if you happen to read IETF RFC papers (Request for Comments), you'll often come across with these words. Most of the times I noticed that they are used as variable names. The term foobar was first found in DEC (Digital Equipment Corporation) manuals during 1960's. One theory is that is came from the term "FUBAR" meaning "Fucked Up Beyond All Repair". (http://en.wikipedia.org/wiki/Foobar, http://en.wikipedia.org/wiki/FUBAR)

Although I can't assure you that my resources are true, it still quite fun digging through wikipedia pages. :)

Sunday, April 27, 2008

Fun with DirectX SDK

Whew! What a stressful week! I've been looking forward for this weekend to cave in my room and relax. Instead of talking about my SCTP work, I decided to spend my weekend to play around with DirectX SDK.

There was a point that I considered myself a hard-core gamer. I remember the days back in college where I save every little bit of my allowance just to get that state-of-the-art Video Card. You'll often find me browsing the net looking at the colorful video card performance comparison graphs on guru3d making sure I'd get the best card for my hard-earned cash. Then after 6 months I got a Radeon 9600 which was a midrange card at that time. It was money well spent and even after five years I still use it.

After wasting hours shooting pixelated enemies with my trusty video card, I got interested on DirectX. If you're a PC gamer there's is a point in time that you'll be asked to install DirectX first before the game could work. So what the hell is DirectX?

DirectX is a set of tools that provides software developers direct interface to video hardware. In addition to that it also provides API's for other multimedia devices like keyboard/mouse input and sound cards which makes it a great tool for developing games. Developing with DirectX gives you an assurance that your application will work with across different video cards and platforms (like XBox360) and the most important thing is that it's free! How cool is that? So I headed over Microsoft to download the DirectX 9.0 SDK to get my hands dirty.

There's a very good DirectX game programming Tutorial over the net. I recommend you to drop by and read the tutorials here . The website is a good jumpstart and I plan to read it throughout this study. One notable game programming resource is gamedev.net which I happen to read to get my interest going.

With the great help of the tutorials, I managed to write a code that speaks directly to my Radeon 9600! It is basically a morphing triangle that redraws in 25 frames per second. ^_^

I wrapped most of the Windows-related mumbo-jumbo stuff in functions (specially the Window creation). The code will compile as long as you added the DirectX SDK to the project. This is a very basic demonstration on how to use DirectX. The code is still in its early stages but in truth, it can be used as your default template for creating games! I added a comment in the code where you'll insert your game logic. Hurry up and download the source code!

It's been my dream to write my own game and so far I had my first step and I have a list of TODO's:

- Write a collection of classes to make my job easier
- Take a look at openGL
- Learn how to load fonts, bitmap, and 3d models in DirectX

I hope this could mutate into something interesting. When it gets to that point, I'll be sharing my code. :)

Thursday, April 17, 2008

Writing Python modules in C

Python just like other scripting languages is interpreted meaning that a program written in python is processed by an interpreter that encodes it to computer understandable machine code during runtime. The interpreter overhead does reduce its performance over machine compiled languages like C & C++. One advantage though is that programs written in interpreted languages are portable as long as the platform has the corresponding interpreter installed.

Thinking forward:
--- "Is there a way to write an application that should perform as fast as possible but at the same time take the advantage of python rapid development?"

--- "I have a project that uses a protocol that can only be written in C. Is there a surefire way to make my life easier?"

...and the answer to both of the questions is "yes" there's a way.

Most of the modules that comes with python is actually written in C or C++. They are compiled as a shared object library and it does makes applications using it run faster. It's just like writing small and fast C modules and gluing them together with a high-level language. One popular example of this technique is the game Civilization IV which uses python as its interface layer on its C++ game codes and today I'll teach you how to do that. :)

First you need to have these necessary stuff installed in your system:
gcc - GNU C compiler
python-dev - python include files (you can download this on their website)

For this tutorial, I wrote a very simple and useless python extension that takes two integer values and returns the sum:

#include <Python.h>
int a, b, sum;
static PyObject *
Add(PyObject *self, PyObject *args)
if (!PyArg_ParseTuple(args, "ii", &a, &b))
return NULL;
sum = a + b;

return Py_BuildValue("i", sum);

PyMethodDef adder_methods[] = {
{"add", Add, METH_VARARGS, "Add two integers"},


(void) Py_InitModule("adder", adder_methods);

Here's a quick run through:
Line 1: Include Python header
Lines 3-4: Create a function named Add that accepts and returns an abstract Python Object
Lines 6-7: PyArg_ParseTuple says that the function will require two Integers denoted by "ii" if the requirement is not met return NULL (which causes an error)
Line 8: Our shameful adder logic
Line 9: Py_BuildValue converts the C integer variable sum to Python integer denoted by "i"
Lines 12-15: PyMethodDef defines the functions available in this modules. As you can see it defines a function name "add" that points to our C function "Add".
Lines 17-20: Initializes our modules and assigns it with a name "adder"

To be able to use this in python, we need to build it first. The easiest way to do this is to use python distutils. Create a file named setup.py:

from distutils.core import setup, Extension
import sys

setup(name = "adder",

version = "0.1",
ext_modules = [Extension("adder", ["adder.c"])])

Then build adder by this chain of commands:
# python setup.py build (compiles it with gcc)
# sudo python setup.py install (installs it to python site-packages directory)

And there you have it. Your own python module written in C. Cool isn't it?

Sadly this is a blog, if I just have an actual web page I could explain this much further. For now, if you want to know more about the Python.h stuff that I used in this program, head over to python/C API reference manual

This tutorial actually leads to my next blog which is about how do SMS (Short Messaging Service) actually work and at the same time give you a nice python extension that I wrote (SCTP sockets in python!). I hope that I helped somebody out there, Til next blog! ^_^

Monday, April 14, 2008

A Simple Chat Room Server

Server programming poses a lot of challenges but nowadays with the advent of high-level languages such as python, the code that actually makes the basic server functions is put out of the equation giving us the time to think more about the logic and features that were going to implement.

Last March I conducted a network programming seminar in PUP's College of Eng'g. I expected a good question and answer session from them but sad to say, there were very few questions that was thrown at me. Probably the reason was that the seminar was a little bit too technical for the students and network programming is very rarely taught in College. (I wish I could've at least had more time to teach them with a hands on session)

On the seminar I provided a simple Chat Room server program written in python. This is a very good example how easy it is to write server programs nowadays. The code is roughly 100 lines and is self-explanatory (rewriting this to C++ could take more lines).

The Chat room server implements a very simple protocol:

LIST - returns the nickname of the people that are online
LOGIN: - login to the chat room with the provided nick name (Default is 'Anonymous Coward')
EXIT - leave the chat room

The server assumes that it's a message if the typed message does not match the existing commands. The chat_server.py is tested to work under linux and should be started just like:

# ./chat_server.py "port number to listen to"

You can use telnet to connect to the server:

# telnet localhost "port number where the server is listening"

We use localhost or assuming that the server in running in the same machine. The interface is similar to the old unix talk program.

So there, a Chat Room server written in 100 lines of code! If you're bored, you can download the Source Code and see for yourself (some explanatory comments added).

Now that I've posted a sample python code, next blog is about how to make your own python modules in C. That's it for now. ^_^

Sunday, April 13, 2008

Making a living out of python

I was introduced with a number of scripting languages when I started developing unix applications. I've tried quite a few: PHP, perl, and even ruby but nothing can compare on how many projects I've done using python.

What I like about python is the very simple syntax to the point that it's self-documenting. It has its own console (just what is shown above) which is great for debugging and checking what methods and members is inside a package or class (using the dir() function). It has a huge community of developers writing cool modules on it like DirectX and openGL wrappers, game libraries (pygame), network programming api's (python-twisted), and more. In case if you need help, there's a very active mailing-list (comp.lang.python) where you could ask questions.

In my work which is mostly writing web services and implementing network protocols and stuff, python helped me to think more about how to make things work rather than how should I code it. Funny thing is that I know a lot of software developers who's never heard of it (I live in Philippines by the way, it could differ in other places). After googling on the web you'll find companies like google and youtube using python on their backend services. So if you have some spare time head to python.org and see for yourself. There also a free online book that you can download.

Okay after a brief introduction, we'll write some python code next. After that we'll write a python module in C. Till next blog! :)

Saturday, April 12, 2008

Tools of Trade

I've been professionally writing software for three years and worked with a number of IDEs, platforms, and languages depending on the problem at hand. So far here are my favourites:

Platform: Linux/Unix
Editor: emacs
IDE: none
Languages: C/C++ and python

These are the combination that puts food on my table. I don't use IDE in linux even though there's eclipse. So far I can stay productive hacking away at a dark console window that is running my favorite emacs editor. Python hands down is the best scripting language in my own opinion.

Platform: Windows
Editor: emacs
IDE: BloodShed Dev-C++
Languages: C/C++

If I want to awe myself out of boredom, I write some useless software in Windows. Writing C++ software in Windows is really a headache but thanks to MSDN library and intellisense, it helps to ease out some pain. Lately, I've been horsing around with DirectX 9 SDK.

There we go. I should have added Microsoft Visual Studio but I can't afford it. Microsoft does release free editions called "Express" editions but MS Visual C++ Express can't even compile a simple Win32 "Hello World!" program for it doesn't include the necessary libraries.

I have a lot of plans for this blog. Namely introducing the Python language. It irks me to find that there is not much python job offerings in Philippines even knowing how wonderful it is. A more advanced Python tutorial (Writing C extensions in Python) will follow given time.

Why Playground?

All right this is my first time writing an online blog. Since I don't have any resources to have my own dedicated web server, I resorted to blogging for the meantime.

I have a lot of hobbies specially dreaming of things to make while staring blankly on a car's window during commutes. Most of the time they eventually get written in code - Source Code under the directory "Playground".

From here I'll be posting my ideas, code, and literature in hope to make the world a better place. :)