Hello everybody, welcome to The Curated Data Platform. My name is Kevin Feasel.
I'm a Microsoft data platform MVP out of Durham, NC where I have a blog called Curated SQL. The idea of Curated SQL is that I want to try to find and link to five to ten interesting posts per day all across the data platform space. Check that out at curatedsql.com.
So, what we're going to talk about during the course of this training is answering the question of: Which data platform is right for me?
So, according to the website DB-Engines, there are at least 350 separate data platform technologies. I'm not an expert in all 350. You're probably not an expert in all 350. In fact, I don't think it's possible to be an expert in all 350 data platform technologies. That's a little insane to me.
This also runs the gamut of everything from relational databases, analytical columnar store databases, document databases, key-value pairs. There's a lot in here. And, if I take you to the DB-Engines website right here, this is a ranking as of June 2021 looking at the most popular databases. Now, they have a logarithmic scale here for their score of just how interesting or how popular a particular data platform technology is.
So, for example, at the top we have Oracle, MySQL, SQL Server, PostgreSQL. Those are all relational databases, all interesting technologies. As you move further, you get other types of technologies and you see that there's a massive drop off after the first, say six or so, data platform technologies.
But, there's a lot of interesting stuff below the fold, and what we're going to do throughout this training is: learn a bit about those technologies, try to understand where some of them may fit in and possibly make your environment better as a result.
So, that's the high level of motivation of what we're going to talk about throughout this training. We'll go through several scenarios, scenarios that I think are pretty realistic in terms of taking a company and expanding out its data utilization and its data storage. We're going to show where different types of data storage make sense, but also where they don't make sense.
We will look not only on premises but also in two of the major clouds: AWS and Azure. In the United States, they're responsible for somewhere around 90% of public cloud utilization, so understand those two platforms and you've got most of the market understood.
And we'll describe some technologies that pull everything together.
I do want to give you a brief warning before we go too far into this and that is, like I said, I'm not an expert on every platform technology. Nobody is an expert on all of them. So, we are going to talk in broad strokes. We're going to give you just the high level, kind of an architects understanding of the technologies, more than the individual products themselves. I won't spend a lot of time, for example, comparing: here is Oracle, here is SQL Server, here's where this is better than that, and vice versa. That's not the point of this training. The point of the training is more of: Oracle and SQL Server have these characteristics which differentiate them from Mongo DB or Cosmos DB because they have these characteristics, and understanding what are those characteristics, what are the valuable use cases for each of them, and where are cases where it just goes wrong.
One other thing that I want to talk about for just a moment here is: I do want to harp on the fact that you don't have to take all of the data platform technologies. You don't need to have everything in your toolbox. You don't necessarily even have to understand all of them. Instead, focus on, for example, the ones that you have, because the data platform technology you have is probably a good answer to: which data platform technology is right for me.
If you have a shop running SQL Server or running Oracle or PostgreSQL, that's a viable technology. Even if you're running something like IBM® Db2, that's still a viable technology. I'm not going to tell you: go switch off of that because it's horrible.
But, what we will talk about is: well, here is a viable answer, here is another viable answer that if you're building something new, maybe you want to think about.
At the end of the day though, understand that the technologies that you have are the ones that your team will understand the best. They're the technologies that you've written the most code for, they're the technologies that you've vetted the most, and unless you have a serious problem with that technology as it is today, I think it's a fair answer to say: let's stick with the one that got us here. So, I do want to bias you toward what data platform technology do you have today, because that's probably going to be, if not necessarily the right technology, a reasonable technology.
And with that, I hope you come with me on the rest of this course and we're going to dive through a lot of different data platform technologies.