10 Sep 2018
this is the text from a blog originally written in 2014
Pydata Berlin 2014 has come and gone. And I gave my talk about what Pandas can do for tax. If you want to be spared the full 35-min video, then here is the potted version.
The key points in relation to this are that:
- there is increasing imperative for tax practitioners to commoditise their products;
- to do so will inevitably lead them to rub up against particular difficulties and limitations with spreadsheets, both as tools in and of themselves, and more particularly how they can interface well with other software (in particular databases); but
- Pandas can liberate them from these problems; and
- their pre-existing skillsets make them extremely able to pick up Pandas and do cool stuff with it.
IMPERATIVE
The courts - and the courts of popular opinion - have stopped what the tax industry might have called its salad days. Aggressive planning is frowned upon socially and legally. The pressure is to be ‘compliant’ - to implement acceptable tax structuring well, and to have total transparency and accountability in how this is then reported to the tax authorities.
COMMODITISATION
My experience from talking to people at Pydata is that they come to Pandas via statistical software and languages. My own experience was (and that of others in my field will be) informed by a very different frame of reference - spreadsheets.
The spreadsheet is the tool of choice for tax professionals. Some excellent research by PwC suggests that they are not going to give up using them, but equally that they are desperate for something better. Can they have their cake and eat it?
There are three particular issues that spreadsheets have that might compel them to use something conceptually similar, but practically better.
-
Firstly, they’re not great when it comes to handling time and dates - which is an extremely important ability to have when you work in tax. Getting them to work in that way that you need is difficult and requires either a lot of manual data entry, plugins, or a working knowledge of Visual Basic.
-
Secondly, spreadsheets don’t lend themselves to easy-in, easy-out manipulation of data. Of course, Pandas can act very well as an intermediary - but since it can replicate the key functionality of the spreadsheet - why bother with the unnecessary intervening step?
-
Thirdly, spreadsheets are used for immutable data storage, when they represent nothing of the sort.
PANDAS & PYTHON
Pandas excels (if you’ll pardon the pun) at handling time and dates, and series of data that ascribe values to time and dates. This is in part because they operate ‘off the grid’. That’s to say, they aren’t bound in the same way by principles of cell-by-cell analysis. You can rescale things very effectively. The ability to refer only to key dates for each set of numbers (e.g. when the numbers being taxed change, when the applicable tax rates change, and when the applicable relief rates change), and leave the software to fill in the gaps without breaking a sweat, then dealing with tax numbers becomes a lot easier.
And because of Pandas’ excellent interoperability with file formats, and Python’s universality, it’s far easier to enjoy the benefits of a whole host of other software in the ecosystem.
Just as I feel tax practitioners would be foolish to continue to ignore Pandas, it would be equally foolish for current Pandas users to be dismissive of just what an impact this functionality has for people in tax, simply because it appears so effortless and is taken for granted these days (which I feel would do a grave disservice to the phenomenal work of Wes McKinney and the Pandas team). As I said in my talk, something might be capable of large-scale application, but this doesn’t mean that it cannot work wonders on small-scale application in the right context.
TRANSFERABLE SKILLS AND GATEWAYS
I am very firmly of the view that learning Python - and learning to use Pandas - is well within the grasp of any competent tax adviser - particularly those with legal training. The reasons are (based on my own experience) as follows.
Transferable skills
Drafting and coding are surprisingly similar skills - each attempts to condense a series of rules and behaviours into a mutually-exclusive, collectively-exhaustive set of conditions and actions. Code declares variables; contracts define terms. Code imports libraries that cover a point; contracts co-opt statute. Code’s mantra of DRY (“don’t repeat yourself”) - meet law’s precedents. Git diffs, blacklines. Pair programming, four-eye review. Test-driven development is basically worst-case drafting in hipster clothing.
I would go so far as to say that coding and drafting are actually mutually-reinforcing skills. Learning to code has improved my drafting, and I’m sure that drafting helped me to get my head around coding more quickly. I’d probably go further and say that there’s a lot that programming and legal practice could learn from each other. No lawyer would use the term ‘imposter syndrome’ - but any of them would recognise it instantly. (Each profession, sadly, has a lot to learn about diversity).
As code is similar to law, so one important aspect of the Pandas repertoire (DataFrames) is similar to spreadsheets. That gives a handy frame of reference for explaining people what they can do. Of course, Pandas is about a great deal more than these - but it’s not a bad way to start getting your head around the power of that Pandas can deliver. Having shown friends what Pandas can do, they tend to refer to DataFrames as “elastic spreadsheets”, “smart spreadsheets” or (my particular favourite), “spreadsheets off the grid”.
The linkage to Excel is supported by the fantastic xlwings by Felix Zumstein, which provides a bridge between Excel and Pandas.
Safe experimentation
In my experience, the excellent Anaconda scientific Python distribution allows the installation of pretty much all you need on conventional computer desktop environments. For an extra layer of comfort for those who are still worried that their computer might explode if they mess around with things outside the world of Microsoft, a Raspberry Pi can be picked up for next to nothing. And it will run enough Python and Pandas to learn from.
Another option is to use pythonanywhere, well, anywhere - certainly this gives a nice environment to explore Pandas.
Excellent resources
Subject to the very BBC-like proviso that ‘other books are available’, I struggle to see how anyone could top Zed Shaw’s Learn Python the Hard Way for a primer for a non-programmer on the world of programming. Wes McKinney’s Python for Data Analysis is then beautifully self-explanatory (which I fear it might not quite be for someone with zero-programming experience, compared to - say - an R refugee). I cannot recommend either book highly enough for someone looking to explore this field - although in contrast to the code bootcamps I would strongly advise against cherry-picking chapters you like or think are relevant in either book. There may one day be a market for a companion piece that pitches this stuff directly to tax people. But maybe not yet.
My talk finished with a demonstration of ORVILLE - that much at least is for another post - but you can catch a brief glimpse towards the end of the video of my talk.
16 Feb 2016
(This is a repost from an old blog dated 26 July 2014)
I built an SDLT NPV calculator - a little “tax bot” that takes rents due under a lease, calculates the net present value of those rents, and then calculates the UK stamp duty land tax (SDLT) that applies to the aggregate net present value.
Many of my subsequent postings on this blog - and my talk at Pydata Berlin - are likely to set out how.
This post is a guide as to why I built the Overlap, Rent-free and VAT Integrated Lease Liability Engine (or “ORVILLE”, for short).
As a project, this might appear to be an odd thing to do with perfectly good spare time (and two kids to keep entertained of a weekend). After all, HM Revenue & Customs (or “HMRC”, the UK taxation authority) already provide a perfectly serviceable webapp calculator, and it will eventually get you to the correct figure (eventually). So why bother to reinvent the wheel?
It’s a perfectly legitimate question which I asked myself, and which my colleagues asked of me when I mentioned to them that I was doing this (or when they caught me hacking away on the train). But there were, I felt, a few reasons, and I have set these out below.
1) Papercuts
SDLT is a tax on land transactions, designed by people who had no great grasp of the practice of property law, even though their grasp of the theory of property law is better than that of most conveyancers. Put simply, SDLT looks at leases in the way that a legal textbook looks at them - not like a legal practitioner does. But SDLT is a tax that must be handled by legal practitioners.
One of the funny things about SDLT - and specifically about the net present value calculator, is that it does work well on its own terms: you enter the term commencement date of the lease, its end date, and information on the highest rent in each of the first five years of the term. So far, so good - but one of the important points is that there are a surprising number of steps that you have to go through before you can ever get to the stage of undertaking this manual data-entry exercise.
The process leaves a lot of the work to the person doing the calculation. This is for several reasons, chief amongst which is that SDLT counts the term of a lease (for calculatory purposes) as running from the later of (a) the date of grant and (b) the term commencement date, in accordance with the venerable case of Bradshaw -v- Pawley (which is perfectly reasonable), whereas the lease documentation often sets an artificial back-date to ensure that the lease expires at the right time (perhaps in sync with the landlord’s other leases at the property, for various legal and administrative conveniences). This artificial date is typically the one from which liability is then counted. The first five years from the term commencement date of the lease (which property lawyers count from) is almost never over the same window of time as the first five years from the effective date of the grant of the lease (which the tax man counts from).
The consequence is that people often have to undertake laborious manual calculations on paper to reconcile the differences between those two dates, and to determine the rent in each of the first five years of the lease - and only then can the relevant data be entered into HMRC’s webapp and the NPV in question be determined.
2) Rent-frees: not hassle free
Another particular ‘pleasure’ involved in getting to the basic data-entry is that there are, often, rent-free periods in leases. Again, in practical terms the HMRC webapp doesn’t have any meaningful way to reduce rent payable by the amount of the rent-free period: again, you have to work it out yourself first. Typically this is done by working out the amount of rent that is ordinarily due, the number of days of the first year of the lease for which no rent is payable (manually, using a calendar, or perhaps alternatively using a website such as the excellent timeanddate.com).
Again, this results in another round of back-of-the-envelope calculations. One slip, and if this element of the calculation is incorrect, then the whole calculation again needs revisiting.
3) The VAT of the land
SDLT is payable on VAT-inclusive rents, and VAT is charged on commercial rents with-ever increasing frequency. But the problem with that is that the VAT rate has been up and down over the last few years (moving from 17.5%, to 15%, back to 17.5% and then up to 20%, all within the space of a few years like the proverbials), and, again, manual calculation is required when looking back at compliance in relation to old leases (more of that later). Furthermore the way in which VAT, and VAT rate changes interact with rent payment dates gives another headache: on 4 January 2011, the UK VAT rate increased to 20%, but it was only on the next ‘usual quarter day’ for the payment of rent (25 March 2011) that many tenants parted with VAT at the increased rate. The only thing worse than a conveyancing solicitor’s face when you tell them that they have forgotten to apply VAT is the crestfallen look when they have tried, but forgotten to apply the correct rates at the right time.
4) Overlap hassle
SDLT has a funny sort of relief that is commonly called overlap relief, which never works in the way that anyone expects that it will. For starters, is isn’t a relief, in the sense of being claimed in the return. What overlap relief does is to take historic rents on which SDLT has already been paid, and then offset these against the rents payable under the new lease (so that only the additional rent in the ‘overlap’ period is taxed). Overlap relief can be a particular hassle from a calculatory perspective, because you have to reconcile two different rent streams on a daily basis - cue more tears for your typical conveyancer.
5) The importance of compliance
The rate of SDLT on the net present value of rents is 1%: in other words, it is practically nothing compared to bigger taxes like VAT or corporation tax. But leases are an increasingly prevalent form of property ownership for business (I like to think of them as the original outsourcing) - but just because there is a relatively small amount of tax at stake does not mean that the you haven’t (a) got to get it right; (b) tell HMRC about it properly and promptly; and (c) keep hold of a sufficiently detailed set of records to allow you to deal with any future compliance obligations satisfactorily.
6) Control over cost
This is another area. SDLT is a very difficult area in which to be both correct and profitable. There are lots of historical reasons for this, but a key one is that the pricing model for SDLT compliance is still based on SDLT’s predecessor, stamp duty, which involved very simple arithmetic, and a runner going down to the local stamp office. Lawyers were traditionally saddled with this duty: trips to the stamp office were almost a right of passage for trainee solicitors. But lawyers go into the law as fugitives from arithmetic, so the complexity of SDLT lease compliance left them with a dreadfully difficult task that could take hours to deliver on.
A well-crafted tax compliance engine can turn questions of the billable hour into questions of the billable minutes. Which is about all you can afford to have. I wanted to experiment with keeping time costs down in compliance - and freeing myself and others up to do more interesting stuff.
7) The control over the calculation and compliance stack
This is the big one. By creating a calculation engine that replicates (and then improves) the calculatory functionality of the HMRC webapp, it is possible to then scale it or increase its scope - which goes to the very heart of the exercise here: to enable rapid, comprehensive audit of large portfolios of leases for occupational tenants. If the data prep, and the data entry, do not have to be done by hand, then this can present signficant savings in relation to the time it takes to deliver an audit to a client who needs comfort that their portfolio of leases has been dealt with properly.
It is not difficult to imagine that this code has been dropped into a slightly wider code-base that can take significant volumes of input, and spit out a full SDLT analysis in a lot shorter time. Which was the second project that then derived from the original exercise.
8) The intellectual challenge
Yep, I’ll be honest, I thought that it would be an interesting exercise - someone always said that it’s a lot easier to to learn to program if you have an ‘itch to scratch’. And this one bugged the hell out of me much more than it really should have.
After another bundle of hassle on a calculation, I decided to ask the brightest person I know (Safe Hammad). He pointed me to Python as a useful tool. The Internet pointed me to the phenomenal Zed Shaw. And Zed’s excellent book, Learn Python the Hard Way, pointed me to Wes McKinney’s equally fantastic work Python for Data Analysis. And these books gave an introduction to the world of Python and the excellent Pandas data analytics library.
Now, using these tools for non-‘big data’ issues does, in essence, represent the use of a sledgehammer to crack a walnut. But just because something is built for big data doesn’t mean that you can’t use it on little data, too.
And when the sledgehammer is so awesome, and you really hate that walnut, then it’s hard to resist having a swing.