Why is BigQuery essential today?
28 February, 2022
4 mins
If you read our blog post on data lakes, some questions may have popped up in your mind. To refresh this topic: a data lake is not...
Read more
Stay tuned thanks to our Newsletter
This morning you almost dropped your cup of coffee, learning that your data scientist was not talking about her holidays but massive databases.
Yes, sad news. We won’t discuss travels either today, but big data storage to help you manage all those spreadsheets, users’ information and papers that fly through your office.
A data lake is nothing like a giant puddle of water. It is way more helpful than that for your company. If you reach a saturation point with data approximation, it can truly interest you. Your data scientists may be good, but without the appropriate tools, they can lack precision.
A data lake works as a cloud database to discharge your raw, semi-structured, and structured data. In a data lake, you can collect data from all sources. It is single storage for your CRM data, customers’ purchases, or even accounting spreadsheets.
A data lake in itself is not a work tool in which you will process your data. Indeed it is more of a cloud. Big data needs to be simplified and be reviewed, but a data lake is not the appropriate place to do so. In fact, data warehouses are the best to execute this task.
Whether you need to stock your accounting spreadsheets to set up your budget, you want to generate new leads thanks to your current customer database, and you need to have that data within easy reach. A data lake combines perfectly with security and accessibility.
A data lake is one of the best solutions to store your database. It is the only thing that separates you from your processed data. But, why do you need it?
The answer depends on the size of your enterprise. A hard disk and a spreadsheet can be enough for a small company, but they might not handle it if you plan to grow your business.
Whether your data is raw or structured, the schema of processing is the same. Your databases are stored to be classified and organized to make them accessible to business professionals. But how?
After you have discharged your data on a data lake, it needs to be cleaned so that you can use it and exploit it for business matters. In cloud storage, there are often bugs, useless information and unappropriated data. It needs to be removed so that your analysis can be trustworthy.
To do so, you can use a processing tool such as Google Big Query. Google's data warehouse, Big Query, is serverless. It allows the cleaning of your data and its structuring.
With its machine learning, Big Query allows you to automate the classification of your data. If you are used to working with Google tools, it can be the best choice for you. Keep in mind: data lakes and data warehouses are complementary. They are not devised for the same purpose. A data lake will never replace a data warehouse.
It depends. What a silly way to answer your question, you may think. Well, as a general rule, the more queries you make, the more you pay. Let’s dig in 3 data lakes pricing.
You can see in this grid that depending on the storage option that you choose, prices may vary. The calculator estimates the cost per GB of your storage that values are adjusted according to your location and device directly on Azure’s website. Depending on your subscription, you can access your data more or less often.
Premium | Hot | Cool | Archive | |
For the first 50 TB/month | $0.18 per GB | $0.0184 per GB | $0.01 per GB | $0.002 per GB |
For the next 450 TB/month | $0.18 per GB | $0.0177 per GB | $0.01 per GB | $0.002 per GB |
For an additional 500 TB/month | $0.18 per GB | $0.0169 per GB | $0.01 per GB | $0.002 per GB |
Source: https://azure.microsoft.com/en-us/pricing/details/data-lake-storage-gen1/
Data processing is not an easy process. Your IT team handles data processing so that you and your business professionals team can have it processed, ready to tap.
A data lake aims at storing raw data. It does not help your board because it stores all sorts of data from all sources you can think of. It has no genuine interest for you if you do not process it in a data warehouse. To sum things up:
Remember: your databases are worth analyzing but first, they need to be stored.You may have other questions on that specific topic? Don't hesitate to contact our dedicated team, we'll be pleased to help you.
By Emma Jeanpierre
03 Jan, 2022