Latest Posts

Monday, February 6, 2017

Designing Multi level, Multi Server, Multi Database System

I am providing consultancy in field of system design for quite some time . Recently one of my
acquaintance approach me to help him in designing a system for handling huge data gathering  and processing process.

I asked him what he meant by huge data gathering and processing. He explained me about the system He is looking forward to build a system to provide one stop solution for analyzing the data from various channels/sources. 

He has large customer base and all customer has multiple channels/sources to provide data. The size of data is varies in the rage of 3 to 5 million rows. Also the number of columns are not fix. 

As the number of columns were not fixed for unprocessed data, I asked him, "what about the number of column after processing the data". His answer was it is going to be fixed. 
on further discussion about data, We zeroed to MongoDB for storing unprocessed data and MySql for storing processed data. The primary reason behind using MongoDB for storing the unprocessed data  were
  1. The data size was huge
  2. The number of columns are not fixed i.e. schema of tables were not fixed.
  3. It is supported by strong developer community.
The processed data is structured so we decided to use MySql. MySql is also free database and well supported by developer community. 

After deciding the database we further discussed about the sources/channels of data. We zeroed that it is good strategy to bind the sources and scripts needed for gathering and processing of data. It will be further passed on to the client at the time of registration i.e. at the time of client registration the client will be mapped to sources/channels and it will automatically bind the scripts required for processing and gathering of data for the client. The advantage of using this strategy was
  1. The source/channel , script and client can work independently. If required the scripts can be added and deleted at client level.
  2. The logical change in source/channel script will not effect the current working of client
To make system more robust we save the details of client database on client level. It will help us at the time of gathering and processing process to save and manipulate the data.


Till now the structure was like

We discussed much about the data management, now we moved to data processing part. It is the point were a system can make or break. 
We started discussing about the advantages and dis-advantages of using a single server for executing scripts for gathering and processing data of all client.
Dis-Advantages
  1. Risk of mixing client data
  2. If once process fails then all next process in queue will fail 
  3. As data amount is huge so it may be slow down the server
  4. client specific customization is difficult  
Advantages  
  1. Low cost
  2. easy to maintain
We come to conclusion that it is not good idea to use single server for gathering/processing of data for all client. 
We decide to use independent server for each client. The respective server of client will take care of data gathering and processing.   

On respective client server we will set the cron job. This Cron job executes a php script file which has the client Id value. With the help of client id , server gets the detail of scripts to be executed and the DB server detail. It also gets the other detail if required for the data channel/Source from the central MongoDB.
After getting the scripts it puts it into a designated folder with respective values. On the designated folder we will creat "INCRONTAB". As soon as the files are saved into folder the incrontab will come into action. It will start executing the files parlally.

The status of execution process are maintained for the respective client on central MongoDB server with various status FLAG like running, completed , error etc.

 The final architecture of system looks like as below image



The above solution will work well on combination of Linux/Unix server. To achieve same on windows server we need to use windows services. windows services provides the same working as we do with CRON. 
The only disadvantage of this architecture is that running cost for client is going to be HIGH.  

This architecture can be further optimized by using Job scheduler Queue. By Using Job scheduler the system can handle any exception occurred during data gathering or processing.



 

No comments:

Post a Comment