Tuesday, March 2, 2010

PLM Open Source Seminar

For those who have attended the Open Source Seminar we conducted in Pamantasan ng Lungsod ng Maynila last saturday, I've posted the presentation and the cheat sheets I used below. Again, good luck on the tests. :)

Presentation:


Cheat Sheet

Friday, August 21, 2009

Flumotion 0.6.0

Just a few hours ago, Flumotion 0.6.0 "Can Punyetes" has been released. It's one of my first contributions to an open source project and I'm quite proud to see it hitting a milestone (even though I only worked on low-hanging fruits). So if you're looking for a way to stream media for free, be sure to check it out.

Release Notes: http://flumotion.net/releases/flumotion/0.6.0.html
Download: http://www.flumotion.net/download/
Demosite: http://flumotion.com/demosite/index.php

It's quite sad that I won't be able to attend the release party though. =/

Tuesday, May 5, 2009

A geek saying goodbye...


As of thursday this week, I will be no longer be tied with Smart Uzzap. Honestly it's kind of sad to leave a project that I've worked on for almost two years. I've been there since its infancy, growth pains, and up to its maturity. All I wish for now is to see my project Amazilia (the Smart Uzzap's backend) continue pumping out big numbers that will keep the project managers happy. As for now, I bid farewell buddy.

Friday, April 17, 2009

Developing Applications with Python seminar

For those who will attend my Python Seminar on April 18, 2009 at Informatics Northgate; you can view and download the materials below:

Presentation:


Source Codes:
tweet.py - A simple micro-blogging software using CherryPy.
qt-textedit.py - A basic text editor using PyQt4

Please feel free to email me if there is some inconsistencies with my materials.

Wednesday, April 8, 2009

GStreamer Hack Sprint

My target: Get to know gstreamer and write something useful with it. :)

Day 1 (Hello GTK+):

GStreamer's cool and has a good python binding although I decided to use C this time. It's been a long time since I wrote software in a low-level language and it seems now is the best time to resharpen. First up getting to know gtk, glib, and gobjects. I started my baby steps by writing a simple text editor using GTK 2.0 and Glade. Doing this exercise made me realize that I've been pampered by python for too long.


Here's the source code for those who are interested:
textedit.c
textedit.glade

Day 2 (Hello GStreamer):

After barely reawakening some of my C programming, I headed over to freedesktop to read about application programming in gstreamer. I give kudos to all people who wrote the gstreamer manual. This enabled me to create a command-line mp3 player. (I'm now listening to my favorite artist Shiina Ringo with it). ^_^


The pipeline that was used here is:
filesrc ! mad ! audioconvert ! alsasink
The mp3 player uses the mad plugin which is in 'gst-plugins-ugly' so you may have to install the gstreamer plugins-ugly to make it work (they usually don't get installed by default specially by Ubuntu). Below is the source code and a python version which does the same thing:

mp3player.c
mp3player.py

Day 3 & 4 (GtkTreeView woes):

Having some limited knowledge about GstBus (which is needed for catching GstMessages and take action on them), it seemed that it would be enough to create a simple 'playlist' engine by listening to GST_MESSAGE_EOS (end-of-stream).


Here's my baby steps to combine GTK+ and GStreamer together. It's miles away from becoming useful yet. GtkTreeView (as my playlist) is giving me a lot of challenges and parsing tags (media file info) doesn't seem to work properly.

gstplayer.c
gstplayer.glade

Tuesday, November 25, 2008

Scaling MySQL for dummies

Alvinator's LOG: November 26, 2008 4:30am

I know it seems unhealthy to stay up awake at these times. I just finished my work but I can't manage to sleep anymore so I thought about writing a short blog.

MySQL has been my favorite RDBMS since I started working with databases. It's free, fast and easy to use although I know a lot of people arguing about its lack for features vs Oracle and postgresql specially on scaling. Well there's this mysql-cluster that promises High-Availabity and Performance, but I still find it pretty complex to setup and maintain. What you can do is put something on top of MySQL and you can do that with your own elbow-grease. There's a lot of techniques in doing this and these three are those that I've used so far:

1. Replication
You can replicate MySQL database (n) times, then configure your database applications to issue the SELECT commands on the replicas and INSERT/UPDATE/DELETE on the master database. This kind of setup is good for websites where most of the database operations is SELECT.

Advantages: Fast Reads
Disadvantages: Only one Master Database for writes (one point of failure)
Problems that may arise: Slave data might lag due to connectivity problems with the master rendering your SELECTS outdated at some point in time.

2. Partitioning
A table that has multi-million rows will drag your queries. One solution is to partition your table into multiple tables on a specific category like dates, year, location, etc. You can do this programatically yourself or you can make use of MySQL 5.x's built-in partitioning support. I've used it when parsing logs into database.

Advantages: Fast Reads
Disadvantages: queries into multiple tables is quite hairy
Problems that may arise: You're still using one MySQL server instance and that makes it one point of failure.

3. Sharding
I first heard about this when I was watching the youtube guys in google tech talk giving some information about their experiences scaling youtube. Sharding in a nutshell is where you partition your database across different MySQL servers and across different machines. A data can be stored in a shard (a MySQL database instance) using a specific foreign key like name, date, or a programmatic hash. I've personally implemented this kind of setup and I'm quite satisfied with it. Just one note: This is NOT the holy grail of MySQL Scalability, your specific requirements may not be met on this kind of setup.

Advantages: Fast Read/Writes, Better HA since you have multiple MySQL instances
Disadvantages: You have to implement it yourself, your database applications will need to connect to multiple databases.
Problem that may arise: Resharding is a nightmare, your system admins will hate you because manual queries will be harder, a lookup that is not using the foreign key will require you to query across all shards.


In sharding you have to decide how you can spread your data across shards and know how to easily obtain them. You use for example the first letter in username ('A' goes to shard1, 'B' goes to shard2, ...) which is human understandable although there's probably more A's than X's, Y's, and Z's right? A better solution is by programmatic hash. Using a programmatic hash ensures that your data is equally scattered among your shards. In python you can do it like this:

Python 2.5.1 (r251:54863, May 2 2007, 16:56:35)
[GCC 4.1.2 (Ubuntu 4.1.2-0ubuntu4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> def hash_me(username, numshards):
... return hash(username) % numshards
...
>>> hash_me("alvinator", 4)
3
>>>

That makes "alvinator" assigned to shard number 3 if you have 4 shards.

Monday, November 10, 2008

Smart Uzzap launch

It's been almost six months since I last posted in this blog. The main reason of this long absence is because of this:






Our main project Smart Uzzap has been launched last July 7, 2008 and it looks that it will have a bright future ahead. The total user base still keeps growing to the point that our public chat rooms can barely keep up to the number of people that would like to socialize.

Like all systems, we did experienced a lot of birth pains and problems. We even encountered OS level issues that was pretty alien to us (TCP Zero Window, TCP half-open scenarios, weird behavior of mobile GGSN's, fuzzy network congestions, etc.). Although amidst of these stressful situations, I'm quite greatful for being a part of it since very few people do encounter and learn how to fix these problems.

Being a python developer, I'm very proud to say that the Smart Uzzap server and its other components is written in python. While currently on production, it still keeps evolving silently in the background. Even though I get tired of attending meetings, fixing bugs, and introducing new features; I feel good knowing that people start building their own community with Uzzap. There has been some eyeballs, friendster accounts, blogs, and youtube clips like one below that was created by Uzzap users



Here's the things that I've learned since the project launch:

1. Every system has its own issues so don't panic.
2. Resource Monitoring and Reports is a must but at the same time, you should be smart enough to interpret what they say to track down bottlenecks.
3. Do not take TCP for granted!!! You should configure your TCP settings according to how you utilize the server.
4. A low load average doesn't mean that the network traffic is low.
5. Server applications should be smart enough to reconfigure itself during runtime.
6. Avoid, the one-point of failure systems. Always provide a backup or have each component scalable.
7. Do the network programmer loop:
Relax -> Look for bottlenecks -> Fix bottlenecks -> Test the fixes on development setup -> Apply Fixes -> go back to step 1.
8. Sharding databases is quite nice as long as you only have 1 foreign key to each of the database servers.
9. Sometimes, a measly one-liner of code can greatly improve the system. ^_^

As for home projects, I'm too tired to horse around in my cave anymore although I'm planning to try my hands on some streaming stuff like the RFC 2326 RTSP protocol. So there we go, if anyone needs some advise on scalability, don't be shy send me an email and I'll try my best to answer.