How data analytics works in 2021 and how to profit from it
Data-driven technologies. Is magic or craft more important? Honest interview with the head of the Big Data team at Creative Dock.
Big data, machine learning, AI, NLP and data science. Concepts shrouded in mystery that become annoying buzzwords before most people can understand them. To prevent this from happening to you, the head of the Big Data team Creative Dock Company, Hynek Jina, reveals this secret in our interview and explains how it can all be used in business.
There’s a mysterious data department at Creative Dock that you’re the head of. How would you describe what you actually do? Our industry is a living organism that consists of four parts: Risk (probabilistic decision making), Data Science (magic), Reporting (images), and CRM (understanding the customer). The first area is risk analysis. This means decision-making processes. When you want to assess risk or perhaps evaluate a loan. We use this on a lot of our projects.
Risk Analysis and search for contexts
Are you analyzing a particular person’s data to see if, for example, a bank should give them a loan? Yes, but it’s not just analyzing, it’s also finding that data. We’re trying to find out something else about that person that companies don’t normally use.
So you’re on Facebook all day. Well, basically yes (laughs). When you have your own business and you apply for a loan through our P2P project Nafirmy, we look at all the classic registers — if you have any debts or anything like that. But here at Creative Dock we also deliver other information. We call it alternative scoring. We just look at other things about your company. If it has a Facebook page, if you started it last week, how many followers there are, how active you are, what your website looks like. That’s kind of the first layer, let’s say. But then you can go on and explore how active you are, what you’re saying there in general. There might be reviews, etc.
Basically, anything you do more than once, we automate. Hynek Jina
Well, yes, I know that, I do that as a journalist. But we just do it by machine. Basically, anything you do more than once, we automate. We try to generalize these things.
Automate anything that repeats more than once
But the world is so complicated! A lot of things seem like you do them differently every time, but if you break them down into smaller tasks, you can find the same patterns. So we’re looking for more general things and smaller tasks that are somehow graspable. And then they’re kind of packages that you can put together.
Does automation always pay off? Definitely not always. If you only want to look at one company, it’s easier to look “manually”. See if they have a FB page and what’s on it. But also then, you don’t have a comparison to the whole. And you want to find differences or points of convergence to somehow relate it to other companies. And part of our product is that it doesn’t just download information for you, it interprets it.
Is that how robots now know more about me than my family? We’ll say it’s cool that you have a FB page, and it’s cool that you have this many views. But we can also machine recognize when people berate you in the comments.
But now we’re talking about you evaluating millions of people. We’re gonna put it into one example. You have a business and you want to take a loan. And we want to know if you’re a good company. If there are reviews about you somewhere, that’s great, it’s easy to interpret. One star is bad, five is good. But then when someone writes a comment, you as a person can see if they’re berating you or praising you. But when there are hundreds of comments, even if we’re only talking about one company, to rate it manually is already quite tedious. Plus, it’s constantly changing over time.
And then it comes in — as an even higher stage — the machine learning. Yes, you can then get to the super-sophisticated machine learning. You train algorithms on data. But then it can become a black box where you don’t even know exactly why it gave that particular result. Which you don’t want to do at risk, because with a bank you need to know why it’s that way. On the other hand, with Google Translate, for example, you don’t need to know why it gave you that particular sentence. You don’t analyze it. This is the result and nobody knows why. And if we don’t like it, you give it different data to learn it differently. But here at our company, we want it to be understood. Because then when somebody says, “Hey, why didn’t you give me credit when I’m actually great?” So we can tell them, “but you’re bad at this.”
So when Facebook bans me, their people might not even know why they did it? That’s possible. But if you complain, they can look at it and analyze it retroactively.
So user-wise, ML is already affecting everyone today? Absolutely. I was amused when I recently gave a cursory try to the German translator DeepL, which has now started rolling out Google Translate. Not in usage, but in translation quality. A colleague told me that he put some of his Czech text in there, translated it into English, liked it, so he translated it from English back into Czech and thought it had better Czech than it originally had (laughs).
I understand that you, as Creative Dock, use these things. But how accessible are these technologies like machine learning for small businesses? It depends. In Denmark, we had a project called Eat Grim. That’s a company that grew up pretty much from nothing, they did it by hand for a couple of months. The principle is that they had crooked fruits and vegetables that didn’t match the big market’s idea of what a cucumber or an apple should look like. When you want a machine to spot a “wrong” crooked banana or vegetable, neural networks are used to do it. For example, you’ve got a conveyor belt that it runs on, a camera, and software that compares: Is it the right color? Shape? Is it ripe?
Data Science is the sexiest thing
So after risk and automation, your next business is what? It’s Data Science, which we’ve already mentioned a while ago. That’s the sexiest stuff to write in presentations afterward (laughs). For example, we did the Crash project. For an insurance company, we were identifying parts of a damaged car from a photo and estimating the amount of damage. Or a project for car insurance based on the quality of driving, based on the position sensors in the mobile phone we find out if the driver is driving safely. But it also includes more common things like the recommendation engine. You already know it from Netflix or YouTube, which will offer you the next thing based on your behavior. That’s a real concern for us right now on a project for Albert, for example. Based on what you buy and your preferences, we can recommend the most suitable products and recipes. But again, there are many layers. Even in such a big and very successful project, the recommendation engine is basically made with the simplest logic. And at Albert, they’re happy that they understand it, that they know what results it gives and why it happens. They don’t want any magic. We’ve already come up with about three levels of how it could be made more robust and more accurate to specific metrics, but for now, the way it works is just fine.
How else have you used Machine Learning? For example, for one client, we were determining if the roof of your house was suitable for solar panels. Or what the payback would be, how much you would have to put in, etc. The idea was that you put in your address, we’d pull up pictures from Google maps, or the land registry and a few other databases, and say, yes, we’re estimating such and such potential. So you need an investment of maybe half a million, you’ll get a return in seven years and blah blah blah. And all without anyone having to go there.
But we only had it at the prototype stage and that was the end of it. That’s the toughness of the business. Often you make a move but it doesn’t bring in as much as you expected, so you pause. Sometimes, maybe after a while, it gets pulled out again, or maybe it doesn’t.
So give us a cheerier example. We’ve done something similar on the Refinanso project. When you want to refinance your mortgage, you tell us what kind of property you have, and we say we’re 80% sure you can do it. And that the price of your property is in the range of, say, 6 to 7 million. We do machine analysis of various maps, land registries, and price maps, but also on what level the listings of similar properties in the area are. There’s a lot of things to assess. And as a result, the bank doesn’t have to send an agent out there to price it.
So the advantage of a larger company like Creative Dock is a library of some ready-made solutions? It’s two opposing forces. Every company is always looking for its optimal size. The more experience you have, if you can document it well, your know-how increases. The whole evolution can be faster. That’s where we already have a pretty good advantage at Creative Docks. We’ve already tried a lot of things. But the market today is so fast that you’re normally going to do things that nobody in the world has done or you’re not able to find anyway. Because the technology today is completely different than it was, say, two years ago. Take, for example, the software you use today didn’t even exist two years ago. So if you’re a company that’s been around for 20 years, that’s not necessarily that big of an advantage. That 20-year-old experience is not very relevant today and it can be a burden on you.
What’s a sexy data project you currently have in the Creative Dock? It’s always the sensational stuff that people are talking about right now, and I get it. But a lot of times one person works on something for six months and then it gets put in a drawer because it turns out the business isn’t that strong. We’re glad we were able to get our hands on it and do some nice stuff, but often more trivial things are more effective. For example, if you automate something, you simplify things, then maybe the person just checks it out. For example, on our Fairo project. This is a bank in Ukraine where we use different data matching. We link bank data and customer data. So, for example, when some users of the parent bank and the product intersect, it simplifies the login process. When you sign up, you don’t have to do so many steps and answer the same questions again because your bank already knows those things.
That’s something that’s sorely lacking in our government. Oh, yeah. Because here at Creative Dock, things follow logic (laughs). So you try not to bother people with bullshit. In the free market, if you bother someone, you hurt yourself.