Rounding Cape Horn

There was a time when setting sail to known or unknown destinations, was a journey with an uncertain outcome to say the least. For a lot of watery places (think lakes, seas, oceans) this is still the case nowadays, and quite some of the expressions we use in our daily language refers to what could happen in those unpredictable scenarios. One of these expressions is ’rounding Cape Horn’, meaning successfully finishing a grueling task. Key is that something exciting is about to happen, for sure not easy , and a strong belief in a positive outcome urges the brave to try it anyway (often multiple times before succeeding).

File:CapeHorn.jpg — *Source: https://en.wikipedia.org/wiki/Cape_Horn*

Well, that is maybe a bit of a long introduction to say we are on a similar journey with Dataether, and we we believe we successfully rounded our first really challenging cape of growing bigger with the data in a real life use case, probably one of a few more capes to come.

So, what is going on? In the Proof of Concept we are running, we passed the 21 million documents mark in the main data collection. We are close to a 70 GB database size (and growing with every scan we run) with a current total of 36 collections: a few big ones with source scan data, the others with derived and aggregated data to speed up analytics, including integrated full text search. Oh, and the data gets enriched too! Using both default and custom(er) defined rules to make it both actionable (data lives!), and traceable for compliancy purposes.

With the expansion of the data we also see the need for additional or other indexes to speed up end user queries, and the good thing about the MongoDB Atlas application data platform is that it helps us to understand the data better while work is in progress. As an example I’m adding some screenshots of the Performance Advisor. You see it provides useful recommendations to improve performance and save resources.

In this case the Performance Advisor suggests schema solutions about adjust the document structure and inspect the document size to improve database performance.

I’ll follow up on these schema improvements later, but first want to focus on the index recommendations. And look, with the introduction of the new scanid_checksum compound index (we need it now for quick document lookup based on checksum over multiple scans), another index that already existed became redundant.

Getting rid of the scanid index does not affect performance because it is covered by the new compound index. Let’s hit the [ DROP INDEX ] button to see how this works.

The window that pops up, informs us how we can remove the index either in the Atlas portal, or by using MongoDB Shell, and through the MongoDB drivers. If you don’t feel like dropping the index directly, you can also hide it first as the ( TIP ) explains. In case you want to revert the change, just unhide the index without the need to build it again in case you decide to keep it. A nice addition to make live even more developer friendly in my opinion. To stay aligned with the marine terminology in this post I can state that the modern platform we run on, enables us to focus on shipping features (and yes, we scrub the deck too).

As we are progressing on our journey, we will definitely round more capes, and counter a few other challenges. Luckily with help of the early-warning assistance from the intelligent Atlas platform we are building our solution on.

*Maybe we sail the MongoDB Atlas Data Lake too some time soon*…

Writing this post reminded me of a quote from a famous teacher at Larenstein , where I spent most of my student life. It came down to not mess around on board of a dredging vessel otherwise “flikkert de schipper je zo in het majem 🌊🌊🌊“. Just so you know 🤓

Related

One thought on “Rounding Cape Horn”

Leave a Reply Cancel reply