Factors Blog
Insights Across All Things B2B Account Intelligence & Analytics
Revenue Marketing: New and Improved
I recently came across an article that placed a great deal of emphasis on getting your definitions right. Of course, ‘defining’ things — roles, processes, objectives — holds plenty of value. From providing clarity and purpose to qualifying breakthrough ideas, a good definition can help teams go a long way in reaching their goals. And yet, even the most precise definitions are bound to change.
With that in mind, this post discusses the elements that define the new and improved Revenue Marketer. In particular, we explore six pillars of Revenue Marketing and highlight the value of data, technology, and organisational alignment in effectively driving revenue growth.
But first, let’s quickly run over the fundamentals of Revenue Marketing.
Like many others, I learned about the term 'Revenue Marketing’ through Dr. Debbie Qaquish. About 10 years ago, during a transition from a long career in sales to a role in marketing, her CEO sat across her desk and posed a single question: “What are you going to do about revenue?” Long story short, this set off the development of a significant approach that transforms marketing teams from flowery cost centers to high-performing revenue machines. This approach, we've come to know as ‘revenue marketing’.
“Revenue marketing is the combined pillars of strategies, processes, people, technologies, content, and results across marketing and sales that drop leads to the top of the funnel, accelerates sales opportunities through the pipeline, and measures marketing based on repeatable, predictable, and scalable contribution to pipeline, revenue, and ROI”
Phew.
That was a mouth full.
Now don’t get me wrong; this continues to remain the foundation upon which Revenue Marketing is built. But back then, the market looked very different from what it is today. We’ve had major changes that mandate an updated definition of revenue marketing. Accordingly, here are three additional challenges that redefine what it means to be a revenue marketer today.
Challenge #1 - Digital transformation
In 2011, the average number of technologies available to the marketing industry was about 150. Today, that same measure stands at an astonishing 7000. It’s becoming increasingly normal for marketing teams to employ upwards of 30, or even 40 different pieces of MarTech products. But digital transformation isn’t just about getting your hands on the hottest new tech toy. Now, Marketers have to choose between all-encompassing platforms like SalesForce and specialised best-in-class solutions for each use-case. The key challenge here is to centralise customer data and orchestrate these platforms to deliver a personalised customer experience.
Challenge #2 - Customer centricity
It's no secret that as an industry, marketing has been progressing towards customer-centricity. Now more than ever, a firm’s customer experience signals its competitiveness in the market. Again, at the root of this change is digitalisation and technology. Digital customers are in control because your competition is now a single click away from you. Accordingly, identifying and employing the appropriate marketing channels — and distributing relevant content within those channels becomes a key challenge.
Challenge #3 - Revenue accountability
A 2019 report by Duke University found that 80% of CMOs are under pressure to deliver ROI, revenue, and growth. However, only about a third provide any financial reports as a result of technological inaccessibility and an overall lack of training. Though we have countless programs and platforms to crunch marketing data and derive revenue metrics, they can be a little too inaccessible for marketers without analytical backgrounds to make effective use of.
And so, we arrive at three challenges — each one based to varying extents in data, technology, and alignment — that are driving the new definition of revenue marketing.
The new and improved Revenue Marketer
Teams in leading B2B companies continue to transform themselves from cost centers to predictable and scalable revenue machines. Except now, they have an additional focus on digital transformation, customer-centricity, and revenue accountability. As an outcome, marketing is driving non-linear growth in a world where buyers are averse to direct sales.
Okay - so far, we’ve established our basis for the contemporary definition of revenue marketing. But let’s go even further. Not only is data, technology, and alignment fundamental in defining revenue marketing; it is essential to every capability within every pillar associated with the approach as well.
Strategy
In revenue marketing, strategy involves understanding your team’s readiness for change, aligning your company’s key business initiatives, and most importantly — forming revenue synergy with sales. While a large part of this ‘getting everyone on the same page’ process involves planning, communication, and leadership; technology is playing an increasingly important role as well. Though instinct and qualitative responses can complement strategy, data, metrics, and indicators are crucial ingredients in developing accurate customer profiles and journeys. And as all three merge across sales and marketing, teams require ecosystems that are conducive to a symbiotic, well-aligned workflow. An easily accessible analytics platform (*ahem* Factors.AI) enables sales and marketing folk to speak the same language — revenue.
//Factors.AI is an AI-powered marketing analytics platform that provides critical insights into your marketing activities, decodes customer behaviour, and empowers your marketing team to focus on real strategic decisions. In short - we do all the analytical heavy lifting for you.//
Process
The process pillar isn’t dissimilar to traditional marketing. In general, Process primarily involves campaigns and data. Accordingly, there are two aspects worth highlighting — campaign management and data management.
Campaign management involves executing, tracking, analysing, and measuring digital conversions in terms of business impact. There has been tremendous progress in the MarTech space within each of these functions. Not simply to automate the process, but to derive detailed insights as well. It’s a similar story with data management. Easy access and insight into your marketing data can make all the difference in the world. Implementing this process could be as simple as consolidating all your data under a single roof or automating any recurring analysis.
//Factors.AI enables your marketing team to consolidate and crunch marketing data from across all your sources - Google, Linkedin, Facebook, and more. Our integration process is completely code-free as well. In fact, we could have your marketing team onboarded in a single week.//
People
The people pillar consists of broad capacities involving the management of people in and outside of marketing. Stakeholder alignment, resource planning, and talent acquisition are important, but talent management in particular, is an aspect worth highlighting. A firm can employ all the data and technology in the world, but if the marketing team doesn’t have sound control over these tools, they won't be of much use at all. One solution to avoid this issue is to keep things simple.
//Factors.AI is simple by design. Our platform has been tailored to make the user experience very, very intuitive. In fact, our AI-powered analytics platform does all the work behind the scenes, so detailed insights into your data becomes as straightforward as a google search.//
A training program with a specific focus on revenue marketing tools can also go a long way in improving technical fluency and ensuring your team has a good grasp of revenue-oriented data.
Customer
As a revenue marketer, it is important to understand your customer across their entire life cycle. It’s no longer sufficient for marketers to get a customer through the door and call it a day. Revenue marketing encourages you to keep tabs on all the touchpoints a customer goes through. Additionally, a revenue marketer aims to optimize their customer data - not only to improve campaign performance but to access valuable business insights as well. A second aspect that’s closely tied to the customer is content management. The batch and blast approach simply doesn’t make the cut anymore. It’s just as important for content to be relevant to the intended audience as it is for that content to travel through the right channels.
//Multi-touch attribution, End-to-end customer insights, and Automated analysis are but a few of the several features Factors.AI has to offer. When coupled with highly customisable campaign analytics - our platform makes for a very simple, very powerful marketing tool.//
Results
Finally, we arrive at Results. Results to a revenue marketer involves a variety of measures associated with financial outcomes (Shocker!). But it doesn't end there. Along with delivering an impressive ROI, revenue marketers also aim to accurately forecast their revenue. In essence, they construct a marketing machine that drives repeatable, predictable, and scalable revenue. I probably sound like a broken record at this point but analysing data, utilising the right tools, and ensuring organisational alignment are crucial elements at this stage. Needless to say, sufficient training and practice won’t do any harm either.
//Factors.AI’s explain feature differentiates us from the rest of the game. Along with consolidating your data and performing automated analytics, our AI-powered platform provides actionable insights in a matter of minutes.//
Over the course of this post we’ve discussed what it means to be a Revenue Marketer today, we’ve briefly explored the six pillars associated with revenue marketing, and we’ve highlighted the value of utilising data, ensuring alignment, and employing the right tools and technologies. At the end of the day, revenue marketing is a pretty straightforward idea — A well-organised, well-equipped approach that empowers marketing teams to bring in money in a predictable, scalable manner. So as a marketer, the only question left to ask yourself is this:
“What are you going to do about revenue?
Intuition can only take us so far: Fun with Factors (Part 2)
Continuing with our series on “Fun with Factors” (please find the first part here), we had another session on “Intuition can only take us so far”, wherein we discussed how non-intuitive concepts such as irrational numbers are very much real. Furthermore, we established the importance of grounding ideas to their bare-bones structure, lest we confuse ourselves and fall into paradoxes.
The Irrational Route
For a number to be rational is to possess the ability of being expressed in the form of a fraction -- or the well-known p-by-q (p/q). Now, just for completeness, recall that ‘p’ and ‘q’ should be integers. And ‘q’ should be non-zero.
That said, is it not easy to see that every number is rational? What’s the big deal? Wait, prepare to be challenged! You need to prove (or disprove) that the square root of 2 (i.e., √2) is a rational number. Oh, I heard you! You say √2 an "imaginary" concept with no practical existence. Smart; you took the challenge to another level! So let’s first see how √2 looks like, and how it’s very real!
Take a square piece of cloth ABCD, each side of which measures 1 m. Now cut it into two pieces along one of its diagonals (say, AC). What you get are two right-angled triangles ABC and A’DC’. Let’s take one of them -- ABC. How much do its sides measure? We know AB = 1 m and BC = 1 m; but AC = ?.
Following Pythagoras’ advice, we could compute AC = √(AB² + BC²) = √(1+1) = √2. Bingo! We have a triangular cloth with one side measuring √2 metres. But you might object! “Why √2? I used a ruler and measured it to be 1.414 m.” Are we in a fix? Not yet. Analytically, we have AC = √2, but on measuring it using a ruler, we get 1.414. One can deduce that the value of √2 is 1.414. That is a smart move because if you could prove that, you would have √2 = 1.414 = 1414/1000, a rational number indeed! Let us see.
So what sorcery is this entity called √2? Simply speaking, it’s the number whose square should be 2. So, we should expect the square of 1.414 to be 2. Alas! It turns out that 1.414² = 1.999396, a little short of 2, isn't it?
Never mind, you procure a better ruler with more precise scale markings and measure the diagonal side of the cloth (AC) to be 1.41421356237 m. But on squaring it, we get 1.41421356237² = 1.9999999999912458800169, again, short of 2.
The fact of the matter is that no matter how precisely you measure the value of √2, it’s inexpressible as a fraction. But how do I convince you of that? You should demand a proof. A proof that √2 is not a rational number.
Let’s see what we could do:
Assume √2 to be a rational number; and let’s give this assumption a name: "The Rational Root Assumption" (TRRA). Now, if TRRA were to be true, we should be able to find two integers p and q such that √2 = p / q. In addition, let us demand p and q to meet a condition: that they have no common factors except 1. Let us call this the “no common factors” condition (NCFC). Now, “√2 = p/q” simply means that p = q√2, or p² = 2q². As soon as you multiply something by 2, the product becomes an even number. So we have 2q² to be an even number, and hence p² is an even number as well. This leads to our first conclusion: that p is an even number (because if it were not, then it would be odd, and if it were odd, then p would be equal to 2k+1 for some integer k, and this would mean (2k+1)² = 4k²+4k+1 = 2(2k²+2k) + 1 would be odd, and so would p² be, which is not possible since we showed p² is even). Let’s call it the “p is an even number” conclusion (PENC). But what does PENC mean? That p could be written as 2m for some suitable integer m. Let’s replace this in the equation p² = 2q². We get (2m)² = 2q², or 4m² = 2q² or q² = 2m². Oh, we have seen this before. This means q² is even, and hence q is even (for reasons made clear above). Let us call this the “q is an even number” conclusion (QENC).
The summary of the foregoing discussion is this: [TRRA and NCFC] implies [PENC and QENC]. In other words, if √2 is a rational number with numerator p and denominator q, and p & q have no common factors, then both p and q are even numbers. Wow, isn't that hard to believe, because how could p and q be even and not have any common factors? If they are even, they would have 2 as a common factor. Now, this is what we call a contradiction! And since the logical flow was flawless, there is only one explanation to the contradiction: the TRRA assumption -- that √2 is rational. Hence, we have proved that √2 is irrational. Period!
Was this discussion easy to follow? Yes.
Was it easy to write? No, because we had used wholesome English words to express the proof.
In fact, proofs are best expressed using shorthand symbols. To illustrate, the following would be a shorter version of the same argument:
To prove √2 ∉ ℚ.
Proof: Assume √2 ∈ ℚ.
⇒ ∃ p, q ∈ ℤ with p⊥q and q ≠ 0 s.t. √2 = p/q.
⇒ p² = 2q² ⇒ p²|2 ⇒ p|2 ------------------> (1)
⇒ m ∈ Z s.t. p = 2m ⇒ (2m)² = 2q² ⇒ q² = 2m² ⇒ q²|2 ⇒ q|2 ------> (2)
Now from (1) and (2) above, we have p|2 and q|2.
⇒ p⊥q is not true. Hence, we have a contradiction.
So, √2 ∉ ℚ. Hence, proved.
So √2, after all, is an irrational number and hence could not be written as a fraction of two integers.
Impossible Probabilities
To find the probability of an event is to measure something. And the prerequisite to make measurement possible is to define what to measure. Imagine what happens if what you want to measure is not well-defined. When asked to compute the conversion ratio of a campaign, your first question is to seek what the definition of a conversion event is. Let us understand the importance of defining concepts explicitly and clearly with the following example from the book on Probability and Statistics by Vijay K. Rohatgi et al, referred to as one of Bertrand’s paradoxes.
Question: A chord is drawn at random in the unit circle. What is the probability that the chord is longer than the side of the equilateral triangle inscribed in the circle?
To understand the question more clearly, consider the circle as follows.
We have a circle (in red) centered at C with radius r = 1. Inscribe into it an equilateral triangle PQR (blue). If we now randomly draw a chord on this circle (call it chord AB), what is the probability that it is longer than the side (say s = PQ = QR = RS) of the triangle PQR?
Do you see any problem in the question formulation? If no, then you might be surprised to know that there are at least three solutions depending on how one defines the concept “a chord at random”.
Solution 1: Every chord on the circle could be uniquely defined by its end-points. Let us fix one of the end-points -- A -- on the circumference of the circle. This also defines a unique inscribed equilateral triangle APQ. The choice of the other end-point (B) dictates the length of the chord AB.
If B lies on the arc between A and P (Case 1 below), we get a chord shorter than the side of the triangle. Similar is the case when B is chosen on the circumference of the circle between A and Q (Case 2 below). But when we choose B to be somewhere on arc PQ (Case 3), we get a longer chord.
Hence, we have the favourable points that could act as B (i.e., in a way that AB is longer) to be points on the circumference between points P and Q (Case 3). Now, since points A, P, and Q divide the circumference of the circle into three equal arcs AP, PQ, and AQ. We have length(arc AP) = length(arc PQ) = length(arc AQ) = 2𝜋/3. Hence, we get the desired probability as length(arc PQ) / circumference = (2𝜋/3) / 2𝜋 = 1/3.
Solution 2: Another way in which the length of a random chord is uniquely determined is by the distance of the chord’s midpoint from the circle’s centre. If we fix a radius OC, we would have an equilateral triangle PQR cutting OC at S. Moreover, length(OS) = length(SC) = length(OC) / 2 = 0.5. Our problem could be solved by picking a point X on OC and drawing a perpendicular line AXB as a chord.
Now, where that X is picked decides how long the chord would be. If X is picked on line SC, we have a shorter chord; and the same done on line OS gives a longer one. So our favourable region to pick X is line OS. In other words, the desired probability would be length(OS) / length(OC) = 0.5 / 1 = 1/2.
In conclusion, we have that the same question has two solutions -- 1/3 and 1/2 -- based on our interpretation of the concept of a “random chord”. If you refer to the book, there is another solution that gives a probability of 1/4. This shows how important the exercise of “defining” a concept could be.
At Factors, we support the philosophy of crunching numbers (rather than intuition) to provide intelligent marketing insights, which are only a click away for you to experience: click here to schedule a demo with us. To read more such articles, visit our blog, follow us on LinkedIn, or read more about us.
Intuition can only take us so far: Fun with Factors (Part 1)
“Trust your intuition; it never lies.”, a saying most of us have heard and might strongly agree with. But at Factors this week, things were quite different when we had a session on “Intuition can only take us so far”. The idea was to relook at known concepts -- concepts we use more often than not -- and reimagine their implications from different perspectives. This article is an account of the one-hour discussion. We associate the word “factors” with different concepts at different times. Here, we associate it with maths!
Mathematics: Sturdy yet fragile
We started with the following story from “How Mathematicians Think” by Willian Byers:
A mathematician is flying non-stop from Edmonton to Frankfurt with Air Transat. The scheduled flying time is nine hours. Sometime after taking off, the pilot announces that one engine had to be turned off due to mechanical failure: "Don't worry -- we're safe. The only noticeable effect this will have for us is that our total flying time will be ten hours instead of nine." A few hours into the flight, the pilot informs the passengers that another engine had to be turned off due to mechanical failure: "But don't worry -- we're still safe. Only our flying time will go up to twelve hours." Sometime later, a third engine fails and has to be turned off. But the pilot reassures the passengers: "Don't worry -- even with one engine, we're still perfectly safe. It just means that it will take sixteen hours total for this plane to arrive in Frankfurt." The mathematician remarks to his fellow passengers: "If the last engine breaks down, too, then we'll be in the air for twenty-four hours altogether!"
Well, from basic math knowledge, you might find the next number in the sequence 9, 10, 12, 16 to be 24. Here’s how you find it. The first four numbers could be broken down as follows:
9 = 9
10 = 9+2⁰
12 = 9+2⁰+2¹
16 = 9+2⁰+2¹+2²
Pretty clearly, the next number in the sequence has to be 9+2⁰+2¹+2²+2³ = 24.
But does that mean the plane will stay in the air for 24 hours? No. It has only four engines. And if the last one breaks down too, the pilots would either perform an emergency landing or, in the unfortunate case, it would lead to a fatal crash. This shows both the strength and the fragility of maths. While in the first four cases, we could accurately measure how long the journey would take, as soon as the conditions are changed (i.e., gliding into the air instead of being thrusted by engines), the dynamics of motion change too.
Intuition could misdirect
Following is an example the “professor of professors”, Prof. Vittal Rao had given in one of his talks: Imagine you have some identical coins you are supposed to distribute among some identical people. How would you do that? Or more mathematically: In how many different ways P(n) can you distribute n identical coins to any number of identical people? Let us understand the problem by taking cases:
n = 1
- The only way to do that is to give it to a single person: o. Hence, P(1) = 1.
n = 2
Distribute 2 coins. Here are two different ways:
- You either give both coins to one person: oo
- Or you take two people and hand them a coin each: o|o
Hence, P(2) = 2.
n = 3
Distribute 3 coins. What do you think P(3) should be? If P(1) = 1, P(2) = 2, we could expect P(3) to be 3, right? Let’s see.
- ooo
- oo|o
- o|o|o
And 3 it is! Hence, P(3) = 3.
n = 4
Now this drives our intuition even further. The sequence we have seen until now has been 1, 2, 3. So it’s natural to assume P(4) to be 4. Let us enumerate all cases again.
- oooo
- ooo|o
- oo|oo
- oo|o|o
- o|o|o|o
We have 5 ways to distribute 4 coins -- this beats our intuition. We get P(4) = 5.
n = 5
With new information in hand (i.e., the sequence being 1, 2, 3, 5), we could update our intuition and say this matches the Fibonacci sequence, and expects it to follow 1, 2, 3, 5, 8, 13, ... Let’s see what happens with 5 coins in hand:
- ooooo
- oooo|o
- ooo|oo
- ooo|o|o
- oo|oo|o
- oo|o|o|o
- o|o|o|o|o
We get P(5) = 7 (not 8 as we had expected).
n = 6
Now what? We could now turn to a different logic: They are either odd numbers (barring the extra ‘2’) following 1, 2, 3, 5, 7, 9, 11, …, or prime numbers (barring the extra ‘1’) following 1, 2, 3, 5, 7, 11, 13, ..., giving P(6) to be either 9 or 11 respectively. Taking n = 6, we have:
- oooooo
- ooooo|o
- oooo|oo
- ooo|ooo
- oooo|o|o
- ooo|oo|o
- oo|oo|oo
- ooo|o|o|o
- oo|o|o|oo
- oo|o|o|o|o
- o|o|o|o|o|o
That’s 11 ways! The prime-number logic worked.
n = 7
Going by the same logic, we would expect P(7) to be 13 (the next prime number). Now, if you would go on and calculate it, we would have P(7) to be, in fact, equal to 15 (please go ahead and enumerate them).
In fact, it turns out that the sequence P(n) expands as follows: 1, 2, 3, 5, 7, 11, 15, 22, 30, 42, 56, 77, 101, 135, 176, 231, 297, 385, 490, etc. You could take a moment and think about it intuitively, but chances are rare that you would come up with the following formula:
approximating P(n), where we have:
The foregoing formula was derived by the well-renowned mathematician Srinivasa Ramanujan (along with G. H. Hardy). This illustrates the fact that intuition could take us only so close to the solution, and formal maths might have to be invoked in some cases.
At Factors, we support the philosophy of crunching numbers (rather than intuition) to provide intelligent marketing insights, which are only a demo away for you to experience. To read more such articles, visit our blog, follow us on LinkedIn, or read more about us.
Find the next article in this series here.
What's next in Big Data and Analytics? (Part 2)
In the previous blog, we very briefly went over the history of Big Data Technologies. We saw how databases evolved from relational databases to NoSQL databases like Bigtable, Cassandra, DynamoDB etc with the rise of internet along with development of technologies like GFS, MapReduce etc for distributed file storage and computation. These technologies were first developed by companies like Google, Amazon etc and later picked up in a big way by the open source community.
Big Data and Enterprises
Soon enough commercial versions of these open source technologies were being distributed by companies like Cloudera, Hortonworks etc. Traditional enterprises started adopting these technologies for their analytics and reporting needs.
Prior to this enterprises built data warehouses which were actually large relational databases. It involved combining data from multiple databases of ERP, CRM etc and build an unified and relatively denormalized database. Designing the data warehouse was complex and required careful thought. Data was updated periodically. Updation involved a three stage process of extracting data from various sources, combining and transforming these to the denormalized format and loading it into the data warehouse. This came to known as ETL (Extract, Transform and Load).
With adoption of Hadoop, enterprises could now just periodically dump all their data into a cluster of machines and run ad-hoc run map reduces to pull out any report of interest. Visualization tools like Tableau, PowerBI, Qlik etc could connect directly to this ecosystem, making it seamless to plot graphs from a simple interface, but actually done by crunching large volumes of data in the background.
Customer Centric View of Data
Databases are a final system of record and analytics on databases only gives information on the current state of customers and not how they reached here. With the rise of internet a lot of businesses are now online, or have multiple digital touchpoints with customers. Now it's easier to instrument and collect customer data as a series of actions, be it clickstream or online transactions. This customer centric model of data enables richer analytics and insights. Additionally the data is incremental, and can be made available immediately in reports, instead of being updated only periodically. More enterprises are moving to this model and datastores and technologies that cater specifically to these kind of use cases are actively being developed like TimescaleDB, Druid, Snowplow etc.
So what’s next?
To summarize, the bulk of the big data revolution, that has happened in the last 15 years, is to build systems capable of storing and querying large amounts of data. The queries are raw i.e if X and Y are variables in the data and x1 and y2 are two corresponding values of interest, then the system can return all data points where in the variable X matches x1 and Y matches y2. Or some post processed result on all the matching data points. Along the way, we also have systems that can compute on large amounts of data in a distributed fashion.
So what’s next in analytics from here? Is it building machine learning models? Certainly, the availability of all these data, enables organizations to build predictive models for specific use cases. In fact, the recent surge of interest in machine learning has actually been because of the better results we get by running the old ML algorithms at larger scale in a distributed way. While most ML techniques can be used to build offline models to power predictive features, it is not useful in the context of online or interactive analytics. Most techniques are particularly designed for high dimensional unstructured data like language or images, where the challenge is not only to build models that fit well on seen data points, but also generalizes well to hitherto unseen data points.
Datastores that make sense of data
The next logical step would be datastores and systems that can make sense of data. Making sense of data would mean that instead of blindly pulling out data points such that variable X is x1 and Y to y2, it should also be able to interactively answer different class of queries like
- Give the best value for variable Y, that maximizes the chance that X is x1.
- Find all the variables or combination of variables, that influence X most when X is x1.
Such a system would continuously build a complete statistical or probabilistic model as and when data gets added or updated. Models would be descriptive and queryable. The time taken to infer or answer the different class of queries should also be tractable. But just like there are a host of databases each tuned differently for
- Data Model
- Scale
- Read and Write Latencies
- Transaction guarantees
- Consistency, etc
We could possibly have different systems here tuned for
- Assumptions on Data Model
- Accuracy
- Ability to Generalize
- Scale of the data
- Size of the models
- Time taken to evaluate different types of queries.
Autometa - is one such, first of it’s kind, system that we are building at factors.ai. It continuously makes sense of customer data to reduce the work involved in inferring from data. Drop in a mail to hello@factors.ai to know more or to give it a try.
Big Data and Analytics - What's next? (Part 1)
Apache Hadoop, Hive, Map reduce, TensorFlow etc. These and a lot of similar tems come to mind when some one says Big Data and Analytics. It can mean a lot of things, but in this blog we will restrict it to the context of - analytics done on relatively structured data, collected by enterprises to improve the product or business.
When I started my career as an engineer in Google around a decade back, I was introduced for the first time to MapReduce, Bigtable etc in my first week itself. These were completely unheard of outside and seemed like technologies accessible and useful to only a select few in big companies. Yet, within a few years, there were small shops and training institutes springing up to teach Big Data and Hadoop, even in the most inaccessible lanes of Bangalore.
It’s important to understand how these technologies evolved or rather exploded, before we dwell upon the next logical step.
Dawn of time
Since the dawn of time (or rather the unix timestamp), the world was ruled by Relational Databases. Relational Databases are something that most engineers are familiar with. Data is divided into (or normalized) into logical structures called tables. But these tables are not completely independent and related to each other using foreign keys. Foreign keys are data entries that are common across tables.
Take the example of data from a retail store. The database could have 3 tables, one for the Products it sells, one for Customers of the store and one for Orders of the products bought in the store. Each entity can have multiple attributes and is stored in different columns of the corresponding table. Each data point is stored as rows in the table. The Orders table contains entries of products bought by different customers and hence related to both Products and Customers table, using the columns product_id and customer_id.
Few implications of this structure are
- Since each data unit is split across tables, most updates would involve updating multiple tables at once. Hence transaction guarantees are important here, wherein you either update all the tables or none at all.
- Data can be fetched almost any way you want. For example, we can fetch all orders bought by a specific customer or all customers who bought a specific product. Additional indices can be defined on columns to speed up retrieval. But since data is split across tables, it sometimes could involve costly joins when matching the related items across tables.
SQL (Structured Query Language) became the de facto standard to query these databases and thus SQL databases also became the namesake for relational databases. These served the needs of all enterprises. As the data grew, people moved to bigger and better database servers.
Rise of Internet
Then in the 90’s there was the internet. One of the limitations of the SQL database is that it needs to reside in one machine, to provide the transactional guarantees and to maintain relationships. Companies like Google and Amazon that were operating at internet scale realized that SQL could no longer scale to their needs. Further, the data model did not need to maintain complex relationships.
If you were to store and retrieve the data unit as a whole, rather in parts across tables then each data unit is self contained and independent of other data. The data can now be distributed to different machines, since there are no relationships to maintain across machines.
Google for instance wanted to store and retrieve the information about a webpage only by it’s url and Amazon product information by product_id. Google published a paper on Bigtable in 2006 and Amazon on DynamoDB in 2007, of their inhouse built distributed databases. While DynamoDB stored data as key value pairs, Bigtable stored data by dividing data into row and columns. Lookups can be done by row key in both databases, but in Bigtable only the data in the same column family were co-located and could be accessed together. Given a list of rows and columns of interest, only those machines which held the data were queried and scanned.
Now you no longer needed bigger and better machines to scale. So the mantra changed from bigger and super machines, to cheap or commodity hardware with excellent software. And since hardware was assumed to be unreliable, the same data had to be replicated and served from multiple machines to avoid loss of data.
Open source projects soon followed suit. Based on different tradeoffs of read and write latencies, assumptions in the data model and flexibility when retrieving data we now have plethora of distributed databases to choose from. HBase, MongoDB, Cassandra to name a few. Since these databases were not relational or SQL they came to be known as NoSQL databases.
Related Big Data Technologies
This fundamental change in databases also came with auxiliary changes on how data was stored and used for computation. Most data is stored on files. But now, these files should be accessible from any of the machine. These files could also grow to be very large. And files should not be lost when a machine goes down.
Google solved it by breaking files into chunks of almost equal sizes and distributing and replicating these chunks across machines. Files were accessible within a single namespace. A paper on this distributed file system called GFS was published way back in 2003. Bigtable was infact built on top of GFS.
Distributed databases allowed you to access data only in one way (or a couple of ways) using keys. It was not possible to access data based on the values present inside the data units. In SQL you can create index on any column and access data based on the values in it. Take the example of Google storing web pages, you could access information about a webpage using url cnn.com (row key). Or you could get the links in a given webpage using rowkey (cnn.com) and a column key (links). But how do you get urls of web pages that contain the word say “Captain Marvel”.
So if the data needed to be accessed in a different way, it had to be transformed, such that data units that are related to each other by the values it holds come together. The technology used to do that was Map-Reduce. It had two phases - First it loads the data in chunks into different machines. All the urls of pages that contain the word “Captain Marvel” are sent to other process called Reducer, which collects and outputs all the matched urls. It usually requires pipelines of map reduces for more complex data transformation and joining data across different sources. This MapReduce framework was generic enough to perform various distributed computation tasks and became the de facto standard for distributed computing. The paper on MapReduce was published by Google in 2004.
Yahoo, soon took cue and developed and open sourced these technologies, which we all know as Hadoop, later adopted by Apache. Now if Map-Reduces can be used to transform data, it could also be used to retrieve data that match a query. Technologies like Apache Hive, Dremel, BigQuery etc were developed, which allowed user to fire SQL queries on large amounts of structured data, but the results were actually delivered by running Map Reduces in the background. An alternative to loading data into a different machine and then compute on top of it, is to take computation closer to where the data reside. Frameworks like Apache Spark, were developed broadly on this philosophy.
In the next blog, we will see some of the current trends of these technologies and discuss on how we think the these will evolve.
FactorsAI + Segment: Easy and instant analytics to drive growth
We are excited to announce our integration with Segment, further enabling companies to easily instrument user interactions across platforms and push different types of customer data, from any 3rd party source in realtime to FactorsAI.
FactorsAI provides advanced and intuitive analytics for marketers and product managers, to help drive growth. With FactorsAI you get immediate insights to optimize marketing campaigns, improve conversions and understand user behaviours that drive feature adoption and retention.
A good analytics setup requires detailed tracking of user actions like page views, Signups, AddedToCart with different attributes. The quality of insights on user behaviour shown by FactorsAI is dependent on the level of detail in tracking. With Segment integration this is a one time setup and you could send the same events to other tools for marketing automation, CRM etc.
Further with Segment integration, you can send data from different data sources like email, livechat which will send events like Email Delivered, Email Clicked, Live Chat Started etc. These additional events are useful when analyzing user conversions and by using Segment it can be done without the need to write custom code to hit our API’s.
Segment can perform all data collection tasks for FactorsAI. It can capture all the data that FactorsAI needs and sends it directly to FactorsAI in the right format, all in real-time. So, if you are on segment, you can now start getting insights on how to grow your customer base in no time.
To integrate with Segment, follow the steps here. Happy Analyzing!
See Factors in action
Schedule a personalized demo or get started for free