Breaking technical requirements down into small enough pieces that data warehousing and business intelligence (DW/BI) teams can deliver something of value to the end users every few weeks is a really difficult challenge. DW teams in particular regularly get stuck on delivering something "valuable" in a short time frame. I don't think we are significantly different from other development teams that want to learn about and start their Agile journeys, but the available articles, books, and training materials are usually not crafted in DW/BI-focused domain languages, and therefore it can be hard for us to make the leap of imagination and understand how to apply these established ideas to our work.
Many DW/BI teams investigate "Agile" enough to decide it's not right for them, since they can't imagine how to get something valuable from the source to the DW and finally to the BI layer in such a short amount of time. We are typically used to delivering to our users large, complex solutions where "Value" is perceived to be at the highest level when executives get keen business insights to drive revenue and reduce costs. We, and our users, are not used to thinking of the sort of value with a small "v" — something useful and necessary to get to Value with a capital "V" — that can be demonstrated to our users in a matter of weeks.
Ideally, we could slice the end request into such teeny pieces of work that the delivery team could pull data through from the source, all the way through all the DW layers, and present it in the BI front end in just a few weeks. That is the goal. However, it doesn't mean the team should give up and never start their Agile journey if they can't figure out how to do that after reading a few books or taking some training.
It is possible to pull fields through from source to BI, one at a time. It's most likely not going to be technically efficient to do this until the DW/BI team has the right enabling infrastructure (configuration management, version control, continuous integration, and test automation) to safely and quickly deliver "thin steel wires" of value on a regular, frequent cadence. However, the question DW/BI teams and their stakeholders need to ask themselves is whether it's more important to be technically efficient at the cost of business efficiency, or to be business-efficient at the cost of technical efficiency. This is a huge decision! In the following case study, drawn from my experience, we see how one organization answered this question.
Scenario
Company A is a 1,000-person Web-based marketing company that had recently acquired a competitor, Company B. Since it's likely that customers would buy from both Company A and its competitors, the executive team wanted to understand how many new customers had been acquired with this purchase, versus how many were already customers of Company A.
Request
The initial request to the DW/BI team was to provide a report identifying the number and percentage of total customers who were new to Company A so the marketing, sales, billing, and support teams could understand the scale of effort it would take to integrate these new customers into the company.
Feasibility
In terms of DW/BI requirements, this is already a fairly "small" request, yet the team was understandably hesitant to commit to this within the company's standard two-week Scrum sprint for the following reasons:
-
The team was fairly new to working together. It consisted of recently hired contractors brought on board at various times over the prior few months, so team members did not know what they could or could not accomplish together in a few weeks.
-
The team did not know anything about the source systems of the newly acquired Company B — not even how many sources there were for customer data, let alone how to connect with them, what the data quality would be like, and which source system SMEs could explain how to find the data they needed to answer this question.
-
The DW/BI team suspected that the request would soon lead to many more detailed requests beyond just "What number and percentage of the total Company B customer set is new to Company A?" Consequently, they felt they should really do a full customer data integration in order to quickly answer future unknown needs (a very common perspective among DW/BI teams that have been successful in traditional project management approaches).
Conversation
I encouraged the team and the product owner to embark on an exploratory conversation to figure out if there was anything the team could do in two weeks that would provide value to the organization. After an interesting — and challenging — conversation, they came to agreement on the approach outlined below.
Commitment
The scope of this first story would be limited to:
- A single new source of Company B customer data, identified either by an SME at the newly acquired company or by a quick profile of candidate sources
- A single email field within that source, identified by data profiling as being most populated with customer email addresses (There would be no data quality work to improve the values in the single email field.)
- De-duplicating against Company A's existing customer database and "Primary_Email" field by comparing this field to the email field pulled from the new source
The demonstration of this first story would show a simple tabular report (see Table 1). The executives agreed this first step would provide value in that it would give them a "directionally correct" idea of how many new customers they had acquired.
Total Company B customers (records evaluated) | 1,000 |
Number of email duplications with Company A customers | 350 |
Percentage of duplication with Company A customers | 35% |
Number of potentially new Company B customers | 650 |
Percentage of potentially new Company B customers | 65% |
Table 1 — Report showing the number and percentage of new customers from Company B.
Subsequent Stories
Subsequent sprints would provide more and more accuracy for this request by:
- Adding a phone number from this same source to the de-duplication rules
- Adding other email and phone number fields from this source
- Adding other sources of Company B customer data, if any
Eventually, the DW/BI team would answer the question of how many new customers had been acquired from Company B by eliminating duplicates identified by a match on an email and a phone number on a customer record.
The Team's Reflection
Pros
This approach worked really well for this new team in terms of enabling them to quickly understand what they could accomplish in a two-week sprint and building a sense of confidence in themselves as a team. They also appreciated the clear understanding of what the organization was going to do with the information and how much was "sufficient" to meet the organization's needs at this time. Furthermore, no one on the team had deep testing experience, so the very small scope of the effort was a relief as together they designed the tests for each field. They also found they could do rudimentary automation of such simple tests and reuse them for subsequent sprints.
Cons
On further reflection, however, they still felt that it was quite technically inefficient to pull in only one field at a time. If they'd had four to six weeks of focused time, they could have completed a much more elegant and efficient customer data integration that would have included more than just email and phone data.
As data professionals (and as contractors!), it made them uncomfortable to be "moving so slowly" given all of the work the company wanted from them. Although the business was seeing results much more quickly than it would have in a traditional approach, the relative technical inefficiency of this approach made the team feel that it would take much longer to finish the whole project.
The Business's Reflection
Pros
The executives started to get a feel for the scale of their customer integration efforts once they understood roughly how many new customers they had acquired. As an Agile company, the business was thrilled to see the DW/BI team making strides toward embracing Scrum and learning to deliver small bits of business value regularly. This team had been the last of all the company's IT teams to convert from waterfall to Scrum and had experienced a lot of turnover in the process.
Cons
The executives also agreed that there was a lot of work ahead for the DW/BI team due to the disruptions caused by the team's instability while adopting Agile. Since the enabling infrastructure used by the rest of IT (code management, version control, continuous integration, and test automation) was not yet easily accessible by the DW/BI team, this technically inefficient approach concerned the business as well.
They were also worried about the cost of paying contractors for such detailed, thorough work when there was such a huge backlog of DW/BI needs. They wondered if there was a balance that could allow the team to be more technically/budgetarily efficient, while at the same time delivering business value in small increments. Thus far they are still trying to find that balance.
[For more from the author on this topic, see "Agile Analytics: Slicing Data Warehousing User Stories for Business Value."]