How large can Mango Automation scale

We love Mango for small projects. It’s so easy to install and configure that it makes projects like replacing a SCADA system for a small water treatment plant, pump station or energy management system a breeze.

More and more we see the need for larger IIoT (Industrial Internet of Things) applications where you might have 100 or 1000 smaller systems all feeding data into one large cloud based system.  One of the reasons Mango is such a great choice is that it really can scale.

Here is one real life example where we have a distributed data acquisition system with 35 Mango servers across the globe,  Each server has between 10,000 and 23,000 data points and data updates on average every 2 seconds with some updates as fast as 100ms.  Data Points are mostly Modbus IP.

Each server uses the Mango Persistent TCP publisher to push it’s real time data and sync it’s historical data to one large Amazon server where all the data is available for users.  Every sample of data is stored which makes for a very large database which increases about 250GB each day.

In total there are about 400,000 Data Points on the Amazon server and it’s consistently writing around 300,000 values per second with burst write speeds over 3 million per second to the database.

Server Ram is heavily used for buffering the data streams.  When you have this much data coming in there needs to be a sophisticated system to buffer the data and then write it into the database in batches.  If the disk is busy you need to be able to hold all the data in the queue and then write as fast as possible when the disk is free.

Here is a 24 hour screenshot of the writes per second

The remote Mango systems also have a robust system of buffering data so in the event of an internet connection outage or the central server is not available all data will be buffered on the remote system.  This means in addition to being able to handle the constant volume of data coming in from regular updates every 2 seconds the system also needs to be able to keep up after a server reboot or a remote system has built up a large buffer.

In the event of a reboot on the central server all remote systems will buffer their data.  As soon as the system is back up they will connect their publishers and start sending real time values immediately and will simultaneously start dumping their buffers to the central server.  This can result in a stream of data 4-6 times the normal volume of data and can last for several hours as all the buffered data is synced.

In this example Mango as a large central IIoT application has been more than capable of receiving the steady stream of data and also handle the hammering of system reboots and network loss recoveries.

System Architecture:

Development History

Mango was not always capable of operating at this scale and significant resources for development and testing were used to achieve this.  During this process we tested two other third party databases for the main data store.  CassandraDB and InfluxDB.

CassandraDB

CassandraDB has been widely used as a large scale NoSQL database so we initially thought this would be a good choice.  What we found was the cost of running a Cassandra cluster was going to be extremely expensive.  It was going to take 6-8 servers to achieve the write speeds and even then the performance was nothing compared to what we see now.  The data storage in amazon was going to cost over $100,000 just in the first year.

InfluxDB

This new database is similar to Mango NoSQL in that it is designed specifically to store time series data.  The setup for the initial benchmarking test was easy and the performance showed much improvement over Cassandra.  The data storage was much more compact than cassandra and even though there would be a significant cost for the Influx Enterprise license it would be offset by the savings in data storage.  InfluxDB has some other nice features and for some applications could still be a good choice.  

Most of the development work has been done for Mango to work with CassandraDB and InfluxDB so if you are interested in this option you can contact us for more information.

Mango NoSQL

While we expected that these 3rd party databases would give us the performance we needed we continued to benchmark against our own NoSQL database which had been developed specifically for Mango.  At the end of our trials and through ongoing improvements and tweaks in the Mango NoSQL database we were pleased to find our final benchmarks greatly exceeded all other tests.  In addition to exceptional performance, the cost savings were much larger than Cassandra and Influx so in the end using the Mango NoSQL database was an obvious choice.

There are several reasons the Mango NoSQL database out performed other options.  One of the simple explanations is that the database is designed specifically for Mango and only does what Mango needs it to do. Other more general use databases have a lot of extra complexity to accommodate thousands of different types of applications and configurations.

Conclusion

Mango shines as a large scale IIoT platform and users will be impressed with the performance and the extremely low cost to operate and maintain the system.  In this real life operating system there are 400,000 data points but we expect the system to scale much larger without problems.  With Mango 3.0 and the REST api it will be possible to have a cluster of Mango servers all bridged by a single UI so the system can scale out to millions of data points.

If you have a need for a large, medium or even very small IIoT system don’t hesitate to contact us for more information and a free consultation.