How to set up Superset? Case Study based on the TransIT project

soldevelo-superset

Superset is an open-source software developed by Apache. It is responsible for visualization based on data provided by a particular database connection or other sources. Superset is a BI (Business Intelligence) tool. It is a type of solution that is currently very popular in many companies and organizations. BI tools allow us to use a variety of ready-made templates to generate different types of reports. In this article we would like to show you how to set up Superset instance and embed a dashboard into the front-end React application, without logging into the Superset instance (handled automatically on the back-end side).  

It was our organization’s first experience with setting up Superset from scratch. That is also the reason why this article was created – to share the knowledge with other people who might want to use Superset in similar projects.  

Why Superset?

superset

We have recently finished the TransIT project – a transportation system dedicated to the low-resource and challenging last-mile environments. You can read the full case study about it on our blog.

As a part of the TransIT project we needed to decide what visualization tool we are going to use. We have chosen Superset because of the fact that it is open source. Using such a tool allowed us to achieve one of the main and key goals of the TransIT project – the software was required to be fully open-sourced, easily accessible and serve as a Global Good

Moreover, Superset has a huge community as well as the support provided by Apache. In case of any issues developers can reach for help on this GitHub page. This platform also contains templates for reports and visualizations that we needed in our project (bar, pie and map charts). 

Another useful functionality is that Superset can be easily connected with a particular database to have the visualization of a specific business field. It could be presented either on the main page of the target app or on a specific page related, in order to show statistics or KPI. 

There are also other advantages of using this tool, such as:

  • saving time on developing similar custom solution from the scratch;
  • flexible changes on particular dashboard, SQL queries – depends on implementation of project and needs;
  • non-technical person could create reports or data visualizations;
  • possibility to add new custom features.

Setting up Superset: Step by Step

In this section we are going to show you how to set up a Superset application and integrate it with your application. This tutorial is based on our experience in the TransIT project. We are hoping to provide you with valuable knowledge that you will be able to reuse in other similar projects. 

Why did we decide to fork the original repository?

We have made a decision to fork this repository into our TransIT workspace. In the following section you will find the reasons and the processes that shaped our strategy.

Firstly, we decided to use the repository in our TransIT project Superset from a dockerized image. We added dockerfile into the back-end repository of the TransIT application. We followed some steps in order to achieve our goal: ‘To embed Superset dashboard into our front-end app without logging’. Another advantage of Superset is that this environment delivered a lot of options that could be added in config so as to enable features to expose the dashboard in another application.

Those options should be presented in docker/pythonpath_dev/superset_config.py file, and are:

SESSION_COOKIE_SAMESITE = None

ENABLE_PROXY_FIX = True

PUBLIC_ROLE_LIKE_GAMMA = True

FEATURE_FLAGS = {

    “EMBEDDED_SUPERSET”: True

}

CORS_OPTIONS = {

    ‘supports_credentials’: True,

    ‘allow_headers’: [‘*’],

    ‘resources’: [‘*’],

    ‘origins’: [‘http://localhost:8088’, ‘http://localhost:8888’]

}

Unfortunately when we added those options into our dockerfile, it was not working as expected and we were not able to see the features that could allow us to embed the dashboard into an external app. That is why we decided to fork the Superset Apache repository and run Superset from the image delivered by this forked repository. 

The differences are: 

  • docker/pythonpath_dev/superset_config.py;
  • removed ‘Github Actions’ files;
  • removed ‘workflows’ files;
  • updated README section to remove default one and indicate that this repository is forked from the original one;
  • changes in dockerfile and docker-compose to add transit/superset networks; 
  • support adding default data provided by unpacked exported ZIP;
  • setting env SUPERSET_LOAD_EXAMPLES=no in  docker/.env-non-dev – this option indicates whether Superset should have default examples or not;
  • SUPERSET_FEATURE_EMBEDDED_SUPERSET=true in docker/.env-non-dev – this option allows to have the special feature to embed dashboard (you can find more details in section related to how to embed Superset dashboard into external app)
  • More details about what was changed should be visible on this PR .

 

set-up-superset

Superset options that enable embedding dashboards

 

After updating the fork repository and following steps described in ‘Local instance for local development’ Superset provides options required to embed a dashboard and thanks to this we could focus on achieving our goal when it comes to creating visual reports and show them in the TransIT application.

 

set-up-superset

Superset dashboard page – on the left there is a ‘three dots’ button which displays the context menu and ‘Embed dashboard’ option should be visible after setting the options mentioned above. This feature is not available by default.

How to set up a local instance for local development

The following steps will guide you through the process of setting up a local instance:

  1. Clone repository from your fork or original repository.
  2. Install all necessary requirements for Superset, if not present in your local environment (node, npm, docker, docker-compose).
  3. Create image based on this repository that is used on this server (in the future this image will be added into for example ‘hub.docker’):docker build –target lean -t transit/superset:latest
  4. In order to integrate with TransIT app you have to create docker network by following two commands:docker network create transit-network
    docker network create superset-network
    Command superset-network is used for integration with superset dashboards embedded in frontend applications.
  5. Run docker-compose up
  6. You should have Superset on your local network on port 8088.

How to set up a dev instance with using AWS EC2 instance

Some steps are the same as the ones mentioned above. Additional points are related to setting up EC2 instances and the security part with changing Admin password:

  1. Create AWS EC2 Ubuntu instance for deploying Superset.
  2. Go into AWS EC2 Ubuntu instance through SSH.
  3. Clone repository from your fork or original Apache repository.
  4. Install all necessary requirements on your server, such as docker, docker-compose, npm, node.
  5. Create image based on this repository that is used on this server (in the future this image will be added into for example ‘hub.docker’):docker build –target lean -t transit/superset:latest
  6. In order to integrate with TransIT app you have to create docker network by following two commands:docker network create transit-network
    docker network create superset-network
    Command superset-network is used for integration with superset dashboards embedded in front-end applications.
  7. Run docker-compose up -d (to run backgroundly).
  8. After this operation you should have deployed and running Superset on port 8088 on your instance.
  9. Admin user is created with default credentials. It needs to be changed by following these steps:docker exec -it superset_app /bin/bash
    superset fab reset-password –username admin –password {yourpassword}
  10. Go into “{instance_http_address}/login/” and sign in as Admin, using the credentials updated in the previous step.
  11. Share credentials for Admin in a safe restricted place with limited access.

What Superset application looks like

In this section we would like to show what the Superset application looks like and briefly describe the most important features that could allow you to create reports for the TransIT project. This is important when you create first reports for your app. Later on you can reuse them – see ‘How to export our results clicked on the Superset application’ section).

 

set-up-superset

Login page for Superset application.

 

set-up-superset

Main page of Superset application after logging in. On the top bar you can see the most important options allowing you to create reports. Those options are: SQL, Datasets, Charts, Dashboards.

 

set-up-superset

List of users that could be seen by clicking ‘Settings’ on top left and going into ‘List users’.

 

set-up-superset

SQL lab feature which allows you to experiment with some queries that you would like to use as a dataset in the ‘Datasets’ feature.

 

set-up-superset

Datasets feature where we can define the source on which based reports are generated. In order to have access to a particular source from the database you have to go into ‘Settings’ and then ‘Database Connections’.

 

set-up-superset

‘Database Connections’ feature where you can either add or edit database connections.

 

set-up-superset

Modal to add a new connection in ‘Database Connections’.

 

set-up-superset

Modal to edit Dataset in ‘Datasets’ feature. As you can see we decided to use a ‘Virtual (SQL)’ type of dataset. Below there is a SQL editor to apply SQL query which is used as the source for reports. Thanks to this approach we could modify our dataset using SQL.

 

set-up-superset

List of defined Charts in ‘Charts’ feature. Those charts are sourced by ‘ShipmentDetails’ dataset.

 

set-up-superset

Example of chart (bar chart) definition in Superset.

 

set-up-superset

Dashboard for TransIT that is defined on the Superset level. This dashboard consists of three charts sourced by ShipmentDetails dataset.

 

How to export and import our results clicked on the Superset application. 

Superset enables us to export our progress of works and reuse it in another build/instance by importing it. We can create our target dashboard with all necessary charts to visualize data on the local environment, and export that data as a ZIP. 

The only thing that needs to be configured is database connections (because of security reasons) – password and host must be changed in order to gain access to your imported dataset. 

You can export datasets, charts, dashboards. If you want to have everything in one place, exporting the dashboard is the best option. You have defined charts, datasets and, of course, your dashboard by exporting the dashboard.

 

set-up-superset

 

When you hover the row with the dashboard, Icons should appear under ‘Actions’. You have to click the ‘Export’ icon. Export will be triggered and once it is finished you should see ZIP. This ZIP could be added as the default source for Superset – see below how it is done in the TransIT application.

Moreover, on our fork Superset repository we could push such content of ZIP after unpacking into ‘transit_data’. This is helpful while deploying on a particular instance. Those options provided by the content of the ZIP file should be visible after deploying such a Superset application. Thanks to this we saved a lot of time on repeating the same actions while creating/updating Superset instances.

How to embed Superset dashboard into front-end React application

After completing the steps and points described in above sections, there are only a few points that need to be done in order to have reports provided by Superset and displayed on your app:

  1. Go into the ‘dashboard’ that you want to embed. 
  2. On this page on the left there is a ‘three dots’ button which displays the context menu. ‘Embed dashboard’ option should be visible after completing the previously described instructions. This feature is not available by default.
  3. By clicking this option the modal should appear on the page. Click ‘activate’. You should see the ‘uuid’ of the dashboard after this operation. You can also ‘deactivate’ this option. This uuid will be necessary in next steps on the front-end React app.

 

set-up-superset

‘Embed Dashboard’ option

 

set-up-superset

Embed dashboard modal with uuid that will be used on front-end React application.

 

4. Next step is to create a new type of user which will be necessary to obtain a guest token automatically on the back-end side. The important things are roles: ‘public’ and ‘gamma’. You can create new users with such roles by clicking ‘Settings’ -> ‘List roles’ -> ‘add new User’.

 

set-up-superset

New user that will be used on the back-end side to gain access to the dashboard.

 

5. On the back-end side you have to add a mechanism to access the dashboard by guest token. The best solution is to add an endpoint on your back-end app that will result in a guest token. We did such a mechanism in the TransIT project. Here is an explanation of the meaning of a ‘guest token’. This endpoint should consist of the following parts:

  • Gaining access token by calling Superset endpoint ‘/api/v1/security/login’. Access token requires credentials for admin user. Below there is the example of payload for obtaining access token:

{

    “password”: {SUPERSET_ADMIN_PASSWORD},

    “provider”: “db”,

    “refresh”: True,

    “username”: {SUPERSET_ADMIN},

}

  • After obtaining an access token this one is used in the next API call ‘/api/v1/security/guest_token/. In the header you add an ‘access token’. You have to pass the username for ‘Guest user’ that you have defined above. Moreover, dashboard uuid is required to add into request payload. Below there is the example of payload for obtaining guest token:

{

    “user”: {

        “username”: {SUPERSET_USERNAME_DASHBOARD},

        “first_name”: {SUPERSET_FIRSTNAME_DASHBOARD},

        “last_nane”: {SUPERSET_LASTNAME_DASHBOARD}

    },

    “resources”: [{

        “type”: “dashboard”,

        “id”: {embedded_id}

    }],

    “rls”: []

}

  • Returning guest token from guest token endpoint. 

6. Next step is related to the front-end part. You have to add a new package for ‘embed dashboard’  into your React application. License for that package is Apache-2.0.
7. After installing this package into your React application, you have to add such code into the component where you want to have those reports from Superset displayed:

 

set-up-superset

 

  • Id: uuid of embedded dashboard;
  • supersetDomain: Superset host;
  • mountPoint: place in your app where the external dashboard from Superset will be mounted; it should be a <div> HTML element;
  • fetchGuestToken: here is a trigger endpoint to gain a guest token (endpoint defined in your back-end API to gain guest token);
  • dashboardUiConfig: additional optional configurations to show or hide some elements in the embedded dashboard

8. Now you should see the Superset dashboard on your local/dev environment on the React app.

 

set-up-superset

This is what the embedding dashboard looks like on the TransIT app after completing all of the steps described in this article – TransIT dev instance.

 

How to configure MapBox in Superset

MapBox is an American provider of custom online maps for websites and applications. In order to visualize maps you have to gain API token from MapBox. Otherwise you will see dots on your map without the contours of countries and continents.

In the TransIT project we had to deliver the report which is based on ‘points’ on the map representing places where drivers delivered shipments. Based on the requirements we decided to use ‘deck.gl Scatterplot’ type of visualization to fulfill requirements. The engine for the map turned out to be MapBox. It was quite surprising when we suddenly saw dots on an empty map. In order to have a map with countries and continents you have to go to the MapBox page, register your account, and get an API key. Moreover, there are some limits to using MapBox for free. 

MapBox free tier is:

– up to 25,000 mobile users, 

– up to 50,000 web loads. 

That was quite surprising for us and we started thinking how to solve this problem, since this issue has stopped us from creating a fully open-source application. In the next stage of the TransIT project we will probably either add custom visualization based on ‘OpenStreet’ maps on our fork repository or add such visualization on React app level.

 

set-up-superset

‘Delivery points’ on Superset without MapBox API key.

 

However, in order to proceed with adding MapBox configuration you have to use this API key in your Superset instance by updating the file docker/pythonpath_dev/superset_config.py:

 

set-up-superset

 

This API key should be placed here in that file. After updating the Superset instance we should see points on the map with countries and continents based on ‘deck.gl Scatterplot’ visualization.

Challenges

  • Setting up local instances based on fork repositories. There were a lot of minor changes required to have Superset working on the local environment. Also developers need to consider different factors like using the proper version of npm, node in order to avoid problems with Superset UI. Our team faced many problems with node/npm while setting up the local version of Superset.
  • MapBox issue. As we have already described, there was a problem with blank maps. This problem was realized to be a serious issue that breaks the idea of fully open-source software. To handle this we need to discuss with our client about the possible solutions. We have already started investigating alternative approaches.
  • Some problems with access for ‘public/gamma’ roles on dev server due to an internal problem on the Superset side. It gave us a 403 (Permission Denied) error on the front-end TransIT app side. Therefore, we were not able to see the charts. We had to update the list of available permissions for the ‘gamma’ role, and it started working as expected on dev instance. In case of a similar issue, you have to check available permissions for ‘public/gamma’ roles. You can also look into this GitHub issue to find out more information on how to handle similar displaying errors with Superset dashboard in external apps.

Possible changes in the nearest future

  • Adding new custom visualization for the chart with ‘Delivered points’ – probably on our Superset fork in TransIT GitHub workspace. 
  • Publishing docker image of our Superset fork instance from TransIT GitHub workspace.
  • Improving EC2 instance where Superset for dev TransIT app is deployed by adding domain, etc.

Conclusions

Superset is a powerful BI solution. The biggest advantages are:

  • open-source software;
  • it is easy to investigate your queries and write them to perform data sources for charts;
  • a non-technical person could work on data provided by the dataset when the tool is ready for use.

There are some drawbacks, such as:

  • MapBox as an engine for map visualizations; 
  • quite complex deployment and configurations while setting up this software; 
  • limited number of visualizations;
  • long time for responding to issues raised by the community on Apache Superset Github pages.

Generally speaking, Superset turned out to be a good choice apart from the ‘Deliveries points’ chart. When our organization will handle similar tasks in other projects it shouldn’t be a problem to set up this software for report visualization thanks to the experience and knowledge that we have gained in the TransIT project.