By Andrin Meier

How we built a modern platform for ETFs using AWS

25. May 2021
platform is a platform to quickly find and analyze Exchange Traded Funds (ETF), which is hosted on Amazon Web Services (AWS). It provides daily updated prices and information about a variety of ETFs. Using in-house domain expertise and a small developer team, we managed to build a solution that is easy to use and scalable. Read more to find out how we developed, how it evolved and what challenges we have overcome.

1. Introduction

Our goal was to build a platform that is user-friendly, provides added value to the investor and serves as a hub for structured ETF price and holding data. In this article we focus on the technical aspects of the solution. First, the high-level architecture will be described before each component will be explained in more detail. Questions like “How could the technical implementation help us to achieve our goals?”, “What development has our platform undergone?” or “How can the solution be improved in the future?”. Note that for the sake of readability of the text, the technical terms are described in the glossary below.

2. Architecture is completely hosted on AWS, based on a serverless architecture which means that we do not host any servers ourselves and the services only start when they are needed (the only exception is the data integration server). The main reasons for choosing a serverless architecture were the lower costs as well as the reduced effort to maintain the system.

The architecture is split into three parts as visualized below: The Front End, Back End and Data Integration. The front end is a web application written in Angular, which is hosted on AWS Amplify. The REST API in the back end accesses a MySQL database through Lambda functions. The ETF data itself is integrated daily on an EC2 instance using an application written in Java. In the following sections the three parts will be described in more detail.

The architecture is split into three parts: The Front End, Back End and Data Integration.

2.1. Data Integration

The basis of is the data which is categorized into static data and prices. The static data is integrated manually in regular intervals, whereas the prices are loaded fully automated on a daily basis by a self-written Java application, which runs on an EC2 instance (Linux server).

The integration of price data was a milestone. Why? Because each fund provider sends its own file by e-mail, mainly Excel or CSV, which differs in content and structure. The data loader saves and loads these files into a MySQL database hosted on AWS RDS. The basis for this routine is the configuration in relation to the specific fund provider and the file.

In the next section the preparation of the data in the back end will be described.

2.2. Back End

The back end follows a more or less traditional approach of offering a REST API over HTTP with a slight twist in that we do not host any servers ourselves. Instead, we use the serverless services provided by AWS. The REST API is hosted using the AWS API Gateway. This allows the user to easily define an endpoint through HTTP. Each endpoint in turn then calls a Lambda function which accesses the MySQL database. The Lambda functions are small functions written in a language of your choice (in our case TypeScript) which are hosted on a small virtual machine that is spun up on demand. The advantage of this approach is that the system can easily scale and only uses the disk space for the lambda function. Processor time is only used when the lambda function is called. A big downside is that spinning up these virtual machines takes time (in the order of hundreds of milliseconds). This is what is usually referred to as cold starts.

In the next section we will conclude our description of the architecture with a view on the front end.

2.3. Front End

To make the data available to the end user we have built a web application using Angular. Angular is a framework written in TypeScript provided by Google. The CSS framework we use is Materialize and is based on the Material Design principles published by Google.

To host the web application, we use AWS Amplify. Amplify is a library as well as a service that allows users to easily host their web applications. With Amplify you can easily set up a custom domain and a free SSL certificate. The main advantage of using Amplify is that we do not have to maintain a web server ourselves.

We have now reached the end of the architecture discussion and will look in the next chapter how we manage the infrastructure of the platform.

3. DevOps

The key for every software is the efficiency with regards to DevOps, hence we try to follow the Infrastructure as a Code (IaaC) mantra and want to automate as much of the infrastructure work as possible. To achieve this, we decided to use the popular serverless framework. Each of the AWS services mentioned in this article are created and updated automatically, nothing is done by hand.

However, the infrastructure was not always automated. In the next section, we will look at how the current system evolved to provide more insight into the thoughts we put into

4. Initial Implementation

Earlier versions used two additional EC2 servers: One for the web server to host the front end and one for the back end to host the REST API. Over time we have noticed that the overhead for managing these servers is too much. Another challenge was the manual updating via secure copy protocol (SCP). We then solved these issues by switching to a serverless architecture, using the serverless framework and hosting our web application on Amplify.

Going forward, we want to look what can be done to improve from a functional and non-functional perspective.

5. Future Improvements

When it comes to improving the technical implementation of, the focus should be on the following topics: The front end framework, the data storage and the data integration.

5.1. Front End Framework

The bmpi software engineers acquired a profound knowledge of React applications with our flagship reporting software Cinnamon Reporting. Since the existing functionality and the user interaction was kept to a minimum, rewriting the web application to React would be a massive improvement.

5.2. Data Storage

Another improvement is following a more polyglot persistence approach. All the data is currently stored in a MySQL database. The main downside of this is that the schema is restricted. This can be limiting because the reference data of an ETF can differ a lot based on its asset class. This currently results in a lot of NULL values. This issue could be solved by migrating to a more flexible NoSQL database such as DynamoDB. Furthermore, the prices are loaded in one single table as well. This could further be improved by using a time series database specifically designed for this purpose.

5.3. Data Integration

The third improvement is the data integration. As previously mentioned, we receive all prices by e-mail. This data is currently not usable by other systems. Nowadays, this is solved by introducing a so-called Data Lake. A Data Lake is a database that stores mainly unstructured data that can be accessed by all other systems. In more concise terms this could mean that the prices received in flat files could be stored in S3 buckets which could then be further processed and stored in other databases.

6. Conclusion

The prerequisite for building such a platform is sound technical and business knowledge. With this foundation, we have shown that it is possible to quickly develop a platform in the cloud that is appealing, easy to use and scalable. Thanks to the solid architecture and the use of modern web technologies, the platform is future-proof and ready to handle any kind of growth, be it the increase of users or the amount of data to be handled. So, stay tuned for future updates and news of this product!

Would you like to learn more about our inETF platform?

You want to know more about Cinnamon Reporting?

Get news on client reporting,
trends, white papers and more.
Subscribe to our blog articles.