Anyone who has booked a Tatkal railway ticket on IRCTC Website knows that getting a ticket when the counters open is no mere feat. “Service Unavailable” and “Page not found” screens greet you and when you finally do get past them, the seat selection module or the payment module conks out on you. These are not one off instances. In Jan 2011 alone, there were more than 2.9 million failed transactions (of the 11.7 million total transactions) which is a whooping 25% fail rate. To add to the woes, failed login attempts and other request failures only goes to show that something is really amiss here!
Why is booking a train ticket on India’s largest (and one of the world’s top 5) e-commerce sites such a pain? Is it because, as people suspect, the “evil” travel agents lobby to shut it from individual customers or does the problem lie somewhere else?
Before we get to that, here are some statistics (released by IRCTC in August 2010) to give us a sense of the complexities involved.
- IRCTC has more than 16 million users who book about 9 million tickets every month online and about 40% of these are booked during the morning hours (8 AM IST to 10 AM IST approx).In other words, everyday during the morning hours (8 AM to 10 AM approx) more than 120,000 tickets get sold.
- IRCTC web site and its API if open 23 hours every day (with a daily maintenance shut down from 23:30 to 0:30)
IRCTC’s Existing Architecture
From the information available in the public domain, we understand that the existing architecture of IRCTC’s e-ticketing infrastructure is a 3 tier architecture involving front-end web/app servers for the website and back-end database with integration to payment services, third party API’s such as clear trip and integration with printing services and so on. The application is developed on the Microsoft platform and uses Oracle as the database.
Proposed Scalable Architecture
DISCLAIMER: The architecture proposed is based on our understanding of the IRCTC ticketing system and information available in the IRCTC website and the public domain. IRCTC or Indian railways are in no way connected to this project.
The infrastructure needs to be designed to dynamically scale in order to handle variable demand during the day as well as peak demands during festive season and holidays. Having a static number of servers for such requirement will not be ideal as it would lead to either under-provisioning (poor performance issues) or over-provisioning (over-paying). This is where a cloud architecture really pays off. This architecture can be designed to bring in scalability, elasticity and loose coupling to provide optimal performance for all kinds of infrastructure demands. And IRCTC only has to pay for what they use, hence saving them a lot of capital expenditure on infrastructure.
Here’s a small video clip of the proposed architecture
Given below are the modules in the architecture
Scalable Web/App and WebServices
A major share of the customer requests are handled by the IRCTC website/App and its API modules (for third party integration with other travel sites). These modules would be separated out and setup in an AutoScaling mode, where there are always a certain minimum number of servers running which will automatically scale up and down according to the number of requests.
A front end Load Balancer in the infrastructure effectively manages the automatically scaled servers by routing requests between the servers. Failover is built-in into the architecture. If one of the server becomes unavailable, the Load Balancer automatically routes the requests to other servers. This also ensures High-availability of the website and the APIs since the degraded servers will be automatically replaced without any downtime.
Database High Availability
The Database layer would be setup with multiple Master/Slave configuration with data replication. Writes would be handled by the Master server and reads would happen on the Slaves. In addition, the Slaves can be automatically promoted to Masters when needed thereby making the database layer highly available. Other services such as reporting, printing can be handled by the Slaves directly which will take the load off the Masters.
Content Distribution Network
Moving the static content such as Images, Scripts, Style Sheets, HTML to a content distribution network serves the content to the end user faster through the nearest edge location resulting in reduced latency. As the content is served from a different infrastructure/network the load on the main web app server is reduced.
Managed DNS Server
Using a Managed DNS Server, in case of an outage at a primary data center, automatic transition from the primary data center to a redundant secondary setup (Hot Disaster Recovery setup) can be achieved. This results in no service down time even in extreme cases of a complete data center level failure.
De-coupling the systems
One of the best practices in designing a highly scalable and available architecture is to have a decoupled architecture. In IRCTC’s case, the web/app, APIs and the search engine are to be completely decoupled so that each can be scaled independently. For example, one of the main functions in the web site is train look-up where the trains, routes and seats are looked up by the users. This area can be separated out with a search engine and an effective caching layer thereby removing the load on other systems.
We believe, by utilizing the cloud and making some minor modifications in the architecture, one of the widely used services in India can be made more robust and would considerably lessen the agony of the person booking a train ticket.
8KMiles is an internet company that is focused on building solutions around cloud computing. 8KMiles’ Cloud Solutions group offers cloud consulting, engineering and migration services to help companies leverage the power of cloud computing. 8KMiles is an Amazon Web Services Systems Integrator and a Microsoft SPLA partner.