In-house lawyers, equity-based remuneration, and improved governance

In a fascinating paper, Executive lawyers: gatekeepers or strategic officers? Morse, Wang and Wu seek to measure the influence of individual lawyers in explaining variances in compliance, monitoring, and business development on public corporations in the US. Their claims are eye-catching and important.

Firstly, they find that, “hiring lawyers into executive positions associates with 50% reduction in compliance breaches and 32% reduction in monitoring breaches.” Hiring a GC or placing lawyers into very senior positions in the corporate hierarchy (the top 5 by remuneration in any company) has a significant and measurable impact on compliance and monitoring, especially (it seems) on more poorly performing companies.

Their methods are explained below.*

So far, so good; lawyers make a difference and they make a positive difference. It’s not a story you see empirically as often as one might like. They explore whether the impacts they attribute to the hiring of senior lawyers reflects is attributable to the lawyers or to, “an overall strategy implemented by boards or CEOs to improve governance on many dimensions.” They are naturally limited in what they can do here but as far as they can tell, “We show that individual lawyers matter.”

Secondly, they examine whether, the “contracting of lawyers into strategic activities” reduces that gatekeeping effort. Those paid proportionately more by way of equity-based remuneration do less monitoring, preventing 25% fewer breaches than are typically mitigated by having an executive gatekeeper.”  Compliance levels were not significantly affected though (see below for why). So on monitoring, executive lawyers appear to do less monitoring, “when they have incentive contracts designed to reward business development effort”.

This faces up to the classic guardian-partner tension that Heineman writes so thoughtfully about. Put another way, it suggest that, in the desire to be commercially focused and ‘deliver for the business’, lawyers can take their eye off the gatekeeping ball. Culturally, the pressure on lawyers to be commercially focused is ubiquitous. This impacts on the management of ,and behaviour by, lawyers. Morse, Wang and Wu point to a Reynolds Associates survey report which supports the successful mainstreaming of the commercial awareness orientation: “contrary to conventional wisdom, the legal executives go well beyond spotting legal issues to helping the business actually take risks and find creative solutions.”

The pressure to be commercially aware, or business focused, goes well beyond remuneration incentives. It can lead to a higher risk appetite. We are told – from the same report – that, “executive lawyers that receive the best performance ratings are 11% more willing to take risks than the average executive lawyers, and they are as likely to take risks as any other executive.” Such analyses are concerningly one-sided: there is sometimes a machismo in the ‘risk is good’ analysis which is painfully dangerous. What Morse, Wang and Wu have done is show is how equity-based remuneration can discount the gatekeeping function.

That risk appetite can sometimes be harmfully high, and lead lawyers, as part of their ‘management’ of legal risk to recharacterise unacceptable conduct as higher risk strategy can be seen reasonably clearly in a whole host of corporate scandals involving lawyers (for the latest one in a long line on this blog see here or scan the pages of the Ethics Room for a broader sense of when things go wrong when gatekeeping fails).

Our Mapping the Moral Compass work points to two different orientations at work when lawyers think ‘commercially’: one is a broadly sensible commercial orientation, thinking about how legal work is fit for purpose in commercial contexts. The other is an exploitation orientation which seeks law’s uncertainty as an opportunity to be managed for commercial advantage. The latter, in particular, was associated with weaker ethical inclination. The exploitation of uncertainty orientation stood in contrast to orientations that emphasised independence and ethicality.

In that light, it is worth noting why Morse, Wang and Wu think the data is different for the compliance function. Equity-based incentives did not impact on compliance levels. They think is because, compliance breaches were in areas where lawyers had a “signing -off function”. They had to take personal responsibility for (say) SEC disclosures and, as a result, the lawyers faces personal exposure to liability.” Interestingly, the FCA/PRA are in the process (probably) making clear that the Senior Managers’ Regime should extent to those heading up a Bank’s legal function. Depressingly, when lawyers discuss this, they talk only of legal professional privilege concerns and conflict of interest concerns (see here for an example). They do not mention their obligations to the rule of law or how a regime like the SMR can provide important sustenance to their independence. Morse, Wang and Wu’s study is evidence, though, that personal accountability can work.

The bottom line is really this:

We find that higher equity incentives imply materially lower monitoring performance.

…We interpret the magnitude relative to the benefit of having an executive lawyer gatekeeper: when firms strongly contract  executive lawyers to be strategic officers, lawyers do less monitoring, preventing 25% fewer breaches than are typically mitigated by having an executive gatekeeper.

Or put another way:

Our results imply that whereas on average the hiring of a [senior] lawyer implies a 31.4% reduction in securities fraud (in column 2 of Table 7), when the lawyer is hired with high equity incentives, she only reduces fraud by 6.6%.

Incentives will only be part of the picture. The lawyers’ own inclinations and the approach of their host organisation may be as or more important influences. But this study supports the claim that lawyers can and do improve compliance, monitoring and even perhaps business performance, but must be managed appropriately. Incentives tied to the bottom line harm their gatekeeping functions whilst being only weakly related to business development.  Both gatekeeping and business development are legitimate aims, but the balance has to be well struck. The challenge to equity linked remuneration for lawyers practising in-house is made stronger by this study and the case for external, individual accountability for leaders of in-house functions is strengthened.


* Let me spend a moment explaining their terms. Compliance breaches occur in areas where there are “gatekeeping tasks that require a lawyer to sign off to be SEC compliant” related to accounting and insider training regulation. Monitoring as they define it encompasses a broader range of legal risk management such as guarding against potential breach of law or contract, antitrust, disclosure and so on. In the study, they measure compliance failures from Accounting and Auditing Enforcement Releases (AAERs) and SEC allegations of insider trading. Monitoring failures are measured from, “securities fraud, securities lawsuits purged of AAERs, and general lawsuits.” They also seek to measure the commercial impact of these lawyers by measuring “business development”. They measure, “capital expenditure intensity, R&D, business segments, and filing of patents”. They find a modest (they say weak) positive relationship between hiring executive lawyers and increases in business development.

The sample period for their data is extensive (1995-2012) and includes 32,372 firm-year observations for more than 3,000 unique firms. What they essentially do is look at significant hiring decisions (GCs and GCs/lawyers being hired in the very top bracket of corporate officers by pay) and see if there are any differences in compliance, monitoring or business development performance they can attribute through statistical analysis to such hires.

Interestingly also, Morse, Wang and Wu’s analysis suggests that senior in-housers coming from private practice may, initially, be less prone to dial down their gatekeeping responses than those who have moved from other in-house positions.


Posted in Uncategorized | Leave a comment

Gross earnings at the bar

Legalfutures report today on the Bar Council getting permission to plug a pensions gap through raised Practising Certificate fees. The story contains interesting data on Barrister’ earnings.  It seems these must be gross earnings (I assume therefore the numbers do not include deductions for chambers and expenses and of course tax). The Bar Council’s original paper predicts what they think Barristers will be earning this year, and so I have taken the data and put it in a graph to illustrate the distribution.


A second interesting point emerging from the LF story is that the £30-60k band contains a ‘significant minority’ who are from the employed Bar so in broad (and crude) terms, if we were imagining this as a distribution of the self-employed bar we might want to depress the second column a bit more than the others.

A barristers’ clerk tells me that, “working on Chambers expenses of 15 to 25% plus travel, books, income replacement insurance etc,” one would take off 30 to 40% from gross earnings to calculate a crude figure, “to be safe”. Though that figure has been disputed.

So using 30% as a cautious guide, [it looks like overheads will bite harder at the lower end]* the columns above would be 0-2ok, £20-40K, £40- 60k and so on. I have not taken much interest in Bar earnings, so it may be readers can point me towards better data. This suggests about half earning £6ok or more before tax (but query how taking out employed barristers would effect this). Given the risks posed in seeking to qualify and establish as a barrister, and assuming the varied earnings associated with certain types of work, these are interesting numbers; but law students tend to focus on the earnings of solicitors at the very visible and well-paid end of the profession when thinking of career trajectories, and that would be a mistake, partly because there is a harmful tendency to equate success with money and money with prestige.

*The Legal Services Board published research and had this to say about estimating overheads:

With regards to costs a 2007 survey of barristers, that achieved a 35% response rate, found, “that overheads and expenses accounted for between 11% and 30% of barristers turnover. Median billings were between £100,000 and £125,000 with 20% billing less than £80,000 and 20% more than £200,000 in the year. About 13% bill more than £300,000, and 11% less than £40,0004.” In 2009 it was reported that overheads at chambers are around 35% of turnover5. In 2010 it was reported that: “The ratio of overheads to total fee income in specialist commercial and civil sets was circa 8-14%. In more middle of the road mixed common law sets the ratio was circa 15-22%. In family sets the ratio was 18-25%. In criminal sets the ratio was circa 18-30%. The figures exclude individual barristers‘ expenses such as travel and professional insurance6.”

Posted in Uncategorized | Leave a comment

Innovation and the Big Mo

Political campaigns, and the journalists that surround them, love to talk of the big Mo. Momentum. Legal aid’s very own Jeremy Corbyn lookalike, Roger Smith,* has prayed in aid the idea that innovation momentum is on the up (follow his blog here, it is terrific). And so it is. I would agree.  He develops the theme in his latest report for the Legal Education Foundation. I have been having a read, and thought I would give you my quick thoughts – partly to distract me from mocking some of the quotes and statistics in this week’s other offering of similar ilk from the Law Society (which mocking aside, has some things to commend it: read it here).

The trouble seems to be that for all that the innovation world is awash with ideas, and stories, and hopes; when it comes to access to justice, the innovation world is not delivering. There are stories, many of which quickly become quite familiar,  but there are no real success stories yet and very little data. There are nearly men, and promising new kids on the blocks but none have so far delivered at anything that looks like scale. Rechtwijzer – for all that I totally agree it is inspiringly designed by marvellous folks at HiiL – has been in the paradigm-busting new hope for a while now category and perhaps needs to show us that it is delivering the transformation it has promised.

This lesson is written, I think, carefully and quietly throughout Roger’s report. It is one reason why it will repay careful reading. He camouflages some of his concerns in a necessary and genuine optimism but from his report I take these as some of the key difficulties for innovation in access to justice:

  • A lack of leadership and resources (especially in the UK where it contrasts poorly with parts of the Netherlands, Canada and Australia) – governments and universities, amongst others, have failed to step up to this challenge;
  • The growing (need for?) likelihood of, and threats posed by compulsion into online dispute resolution models.
  • The lack of performance standards and objective evaluations of innovation. This is one reason why stories is often all we have to go on, and why we should be sceptical that innovation is delivering (yet) in A2J terms, and often elsewhere.
  • And the apparently trenchant (I sometimes think somewhat exaggerated but still very serious) problems of the digital divide. This is mission critical. Great systems have to interface with less great humans. Roger’s section on chatbots vs websites is important: he likes the websites because he can read and judge them. Chatbots – well who knows whether they work – but he can see people will use them.

Anyway, Roger’s report. Read it. And the Law Society’s too (check out the case study on Hodge Jones and Allen for instance). And notice the absence of data, and the quality of the data that does exist, and the hopes, and wonder at what might be realised but also how.


*To be fair, Roger’s beard aside, he doesn’t really look that much like Jezza, but it is Friday, and I thought it would be funny.

Posted in Uncategorized | 1 Comment

Rolls Royce Service – risk, compliance, and ethics: where were the lawyers?

Readers of this blog will be familiar with my posts on the tensions inherent in a lawyers’ duties to their clients and their duties to uphold the rule of law and the proper administration of justice. Readers interested in the story of Rolls Royce’s DPA will already have picked up:

  • the reported levels of “full and extraordinary cooperation” with the SFO that Rolls Royce under its current leadership have provided, and,
  • the potential for individual prosecutions to flow from the events covered by the DPA.

That cooperation included the waiving of legal professional privilege over certain information (a reminder that privilege does not provide the fundamental guarantee it purports to for those working within organisations), a point of significant interest to lawyers but not my focus today.

It is perhaps no surprise that cooperation needed to be so fulsome given the litany of events described in the agreed statement of facts (see here for the DPA, the agreed statement and Leveson’s judgment). A series of corrupt deals, between 1989 and 2013, involved intermediaries assumed to have paid bribes to ensure Rolls Royce got orders it bid for. Or they involved the disguising of payments to executives (for example to service that executive’s private jet) within some of the organisation . One such deal seems to have ensnared a leading US University (whether blamelessly so or not, we do not know – but it is a salutory warning to University administrators and entrepreneurial academics).

These processes involved (amongst other things):

  • Drafting consultancy agreements, side-letters and contracts to give dubious credits to contracted parties which had the effect of (known or unknown to the individuals) facilitating bribery;
  • Advising on tendering processes, and compliance with Indian restrictions on the use of Intermediaries;
  • Inadequate due diligence especially where suspicions of bribery were raised by RR employees;
  • Deliberate advice to destroy documents when they feared an investigation by Indian authorities;
  • Modification of the language of risk reports to minimise understandings of payment regimes;
  • Risk procedures not having the benefit of the full information available in the company, and evidence of deliberate vagueness in some of the information (e.g. about the nature of intermediaries business arrangements); and,
  • Seeking legal advice from an external law firm without disclosing all the material facts to their instructed solicitors.

Whether such work involved lawyers in Rolls Royce is not generally at all clear from the published information. In the Statement of Facts we are told that Corporate Headquarters, where control was supposed to be exercised, “included, inter alia, the office of the Chief Executive Officer (“CEO”), and General Counsel.” Other than that individuals are not generally mentioned and very little of the work described in my bullet points is specifically attributed to Rolls Royce Legal or firms instructed by them. Much of the work might be presumed to involve lawyers internally and occasionally externally, but equally much of it might not have so involved them. Even for work that clearly did or can be assumed to have involved lawyers, we do not know what they were told, and how they responded to any red flags they may have been raised, when engaged in such work. We also do not whether lawyers should have been more involved to meet their professional and contractual obligations. In other words, we cannot see clearly what the sins of omission or commission, if there were any, were.

For the most part, when looking to answer the obvious question that would concern you and I, we would – for the most part – have to guess where the lawyers were. Perhaps this is by design, to protect the conduct of future prosecutions. RR Legal plainly was involved in the contractual arrangements designed to mitigate the risk of, or recharacterise, the provision of funds for an MBA course and associated hospitality which was the subject of one count of failure to prevent bribery; but whether this involvement was problematic or not, we would have to guess at. We do get a few tantalising glimpses of in-house and outside lawyers describing transactions now labelled as corrupt in terms of risk appetites. The advice is sometimes, it seems, framed in terms of risk rather than clear advice that certain things cannot be done. The implication might be, but it would be speculation, that dodgy deals were deemed ‘high risk’ rather than plainly illegal. If I am right, and I emphasise, I have to read between the lines of the DPA here, and am not prejudging, some of the lawyers may have found themselves unable or unwilling to say no.

So, from my point of view, which is hardly the most important view even to me, the DPA is a frustrating document. There is however enough in the details to raise a strong presumption that the conduct of at least some of the professionals involved should be scrutinsed by the regulator. This leads me to my final points. A series of questions.

  • Has Leveson reported any of the personnel involved to the SRA or BSB to have potentially serious misconduct investigated? He has not said he did, but perhaps this is for the same reason that individuals are not focused upon in the DPA. I do not criticise him for not making a public referral, even of unnamed individuals. And of course, being more seized of the facts than I, he may have formed the view that there is no serious misconduct here that should be investigated.
  • Have any of the lawyers involved, especially within Rolls Royce Legal and/or Compliance, reported any of their colleagues or former colleagues for serious misconduct?
  • Indeed, does this kind of case act as a salutary reminder that in-house legal teams may need the equivalent of a COLP?
  • Have the firms who gave advice on anti-bribery and corruption also considered their own exposure to criticism and investigation and, if appropriate, considered making a report?

The answers to these questions may all be anodyne. I am not seeking to raise further speculative criticisms of Rolls Royce. I am, however, suggesting all in-house teams and law firms caught up in scandals like these need to think very carefully about their own exposures and conduct before the regulator comes knocking at their door. Rolls-Royce have partly minimised the damage as a result of doing something similar. That’s just one of the lessons of this sorry tale. The second is, at some point, the detail needs to be more widely known and thought about. In-house lawyers need to learn from the mistakes, if any were made.

Postcript: This from Transparency International evaluates the DPA in broader terms, poiniting out how generously Rolls Royce may have been treated.

Posted in Uncategorized | 1 Comment

Lawyers learning about prediction

Just before Christmas, the Lord Chief Justice gave his senior brothers in the legal profession a bit of a Christmas present. In a speech to legal practitioners in his homeland of Wales, he warned them of their impending redundancy, saying:

It is probably correct to say that as soon as we have better statistical information, artificial intelligence using that statistical information will be better at predicting the outcome of cases than the most learned Queen’s Counsel.

Not just any old Queen’s Counsel, but the most learned. Even, shock horror, some who perhaps practice in London might be the implication. Even, EVEN, and here my friends in the De Keyser Massive may start humming Say it ain’t so, David Pannick QC.

So, as I happened to be teaching my Future of Legal Practice Students about Quantitative Legal Prediction (they start by reading this) now seemed like a good time to look quite hard at what the research actually says about quantitative legal prediction. This is a long post as a result. You may need beverages and biscuits. I should also say, I am not a machine learning expert. I am feeling my way, learning as I go, reading and listening to the likes of Jan Van Hoecke at RAVN and Noah Waisberg at Kira. I may well have got things wrong or misdirected the reader. I would, even more than usual, warmly welcome dialogue on these subjects from those who understand them more fully and from those who, like me, want to learn. So…

Dan Katz, a leading legal tech academic and founder of a predictions business, opines that QLP will “define much of the coming innovation in the legal services industry” and, with more circumspection, that, “a nontrivial subset of tasks undertaken by lawyers is subject to automation.” As we will see, I think, those last words provide an important coda to those drinking unreflectively from the innovation KoolAid. Yet the developments are exciting too: a recent study suggests text scraping and machine learning enabled impressive ‘predictions’ of ECtHR decisions (79% accuracy) and Katz and his colleagues have this week updated their paper claiming a machine learning approach can predict US Supreme Court Cases in a robust and reliable way (getting decisions right about 70% of the time) over decades and decades. And Ruger et al, as we will see, claim machine accuracy higher than experts (perhaps we can lobby them to call their algorithm GoveBot). So there may be something in the Lord Chief’s predictions. Let’s take a look at what that something may be.

We will start with Theodore Ruger, Andrew Martin, Kevin Quinn, and Pauline Kim’s study of the Supreme Court. This is a study which is fascinating because a computer model predicted the outcome of US Supreme Court cases for a year using only six variables with good accuracy. These variables are simple things like which Circuit the case came from, and a basic coding of the subject matter of the case. The variables are the kind of variables which one would not need to be a highly qualified lawyer to code. Feed those variables into an algorithm and the algorithm predicts whether the case is likely to be unanimously decided or not, and if it I not, how each of the Justices is likely to vote. The algorithm uses simple decision trees like the one below for each judge (each judge has his or her own algorithm – paying attention to different variables and in different ways). The decision tree provides a pretty good rule of thumb as to how the particular judge will decide a case. If you answer each of these questions about a case before it was decided, the likelihood is you could predict accurately the decision of that particular judge. Simples. And no need to worry about what the law said. Or detailed facts. Or what time the judge would have their lunch.

Ruger and his colleagues built these decision trees having analysed just over 600 cases between 1994 and 2002; a very modest data set that produces impressive results. Having built the decision trees, they then tested them. They used their algorithms to predict the outcome of each new case appearing before the Supreme Court (using them there six variables). And they got panel of experts (a mixture of top academics and appellate practitioners; some with experience of clerking for Supreme Court justices) to predict the outcome of cases in areas where they had expertise (they usually managed to get three experts to predict a decision). This real test enabled them to compare the judgment of the machine and the judgment of experts on 68 cases. Who would win? Machine or Expert?

In broad terms, the machine did. It predicted 75% of Supreme Court decisions accurately. The experts got it right only 59% of the time. Even where they had three experts predict the outcome of a particular case and could take the majority view of the three experts, accuracy climbed to the still not as good 66%. Only at the level of individual judicial votes did the experts do better than the machine (experts got votes right 68% of the time compared with the machines 67% – let’s call that a tie). The rub was that the machine was much better a predicting the moderate, swing voters; precisely the task you might think the experts would be better at.

Yet all is not quite what it seems. There is, at the very least, a glimmer of hope for our professional experts. If we look separately at the predictions of appellate attorneys in Ruger’s sample of experts, their accuracy of prediction was a mighty 90% plus. Because there were only 12 of such appellate in the panel used for the study, Ruger et al were not able to separately test (in a statistically robust way) whether the difference between appellate attorneys and other experts was significant. Nonetheless, we should be very wary of accepting the study as proving machines are better than experts at predicting the outcome of legal cases, or more particularly the kind of experts the Lord Chief has in mind, because the kind of experts we might expect to do very well at predictions really did seem to do very well, even if Ruger et al can’t prove so with high levels of confidence.

As Ruger et al also indicate there are other limitations on their, nonetheless very impressive, experiment. In particular, their algorithms are built to reflect a stable court: the same justices sat on the court through the entire period. They know that this is important because some of their Justices’ decision trees showed inter-judicial dependencies: Judge A would be influenced in their decision by the decision of Judge B. Take away Judge B, and the decision trees would not work so well. In the time period of their experiment Judge A, B, C and so on all sat so this was not a problem.

It is at this point that Dan Katz, Michael Bommarito and Josh Blackman pick up the story. Their model uses a far greater number of variables, across more than two centuries of data, and seeks to show that it is possible to predict judicial decision making using information on cases that could be known before the case is heard with high levels of reliability and consistency. In this way, they seek to go further than Ruger et al. They show that it may very well be possible to develop an approach which can adapt to changes in the personnel on courts, the kinds of cases before them, the evolving state of the law, and so on. In this way they hope to develop a model which is, “general – that is, a model that can learn online [as more decisions are taken]” that has a, “consistent performance across time, case issues, and Justices” and has “future applicability”, that is it could be used in the real world, that is, “it should consistently outperform a baseline comparison.”

Their approach is complicated to those, like me, not expert in data science and machine learning – reading it is a great way to start to peer beneath the machine learning hood. They rely on a pre-existing political science database of Supreme Court decisions which has coded the characteristics and outcomes of Supreme Court cases on dozens and dozens of variables across more than two centuries. To simplify a bit, from that database they set about, through machine learning, developing algorithms which can predict the outcomes of a ‘training set’ of cases in the early years of the Supreme Court; testing whether the algorithms can predict the outcome of cases held back from this training set; then – through machine learning again – adapting those algorithms to take account of a training set for the next year, testing them again and moving forward. Ingeniously, then, they can build models, test those models, and move forward through a very large slice of the US Supreme Court’s history. Because the approach does not rely on simple decision trees, but random forests (or multiple variations) of decision trees, which develop and change over the life of the court, then it is not possible for the authors to be generalise very clearly about how the machine takes its decisions: but can they get decisions right?

They test their predictions against the actual outcomes on the cases and find about 70% accuracy overall across decades. They build a way of looking at the cases, in (say) years 1-50, and then predicting cases in year 51, and then again Year 52 and so on. The computer programme does not have the data for Year 52 until after it has made the predictions. It is, thus far, an extraordinary achievement – a way of analysing cases, from quite basic (if detailed) data, which enables them to predict the outcome of cases at many, many points in the Supreme Court’s history, with high accuracy – even though the law, judges, and nature of the cases has – I would surmise- changed beyond all recognition over that period.

The next question for them, is what should they compare that accuracy level against. They note for instance that if one was to predict the outcome of a Supreme Court decision now – without any algorithm of legal insight – one would predict a reversal (because about 70% of cases are now reversals, but in 1990 a reversals would be the counter-intuitive prediction because only about 25% of cases were reversals). They need a null hypothesis against which to compare their 70% accuracy. If the computer can’t beat the null hypothesis it is not that smart. They opt for the best ‘rule of thumb’ prediction rule as being a rolling average of the reversal rate for the last ten years (if I have understood it right). This is actually quite a good prediction rule (it enables an accurate prediction of cases or votes in 66-68% or so of instances – one can be pretty smart at predicting, it turns out, with data from one variable – reversal rates over time).

On this basis the difference between the machine learning approach and the 10 year rule of thumb approach is 5% or thereabouts. The machine learning approach gets one more case right out of twenty than guessing on the basis of the 10 year reversal rate. This is still really interesting and suggestive of the potential for predictive analytics, but it seems to be some way off the sort of system that one would base case decisions on. Perhaps more worryingly for Katz et al, especially the need to be consistent over time and for different justices, is the model seems to work much better in the middle years of the data range, but for reasons which – as yet – remain a mystery the prediction of decisions under the Roberts Court (2005-) is poorer than the rule of thumb. You get a nice sense of this from Katz et al’s graphs in their article, especially this one which looks at the decisions of judges.

The clusters of pink indicate where the model is less successful at predicting than the ‘null model’ the rule of thumb. Perhaps understandably, the model struggles in the early years when it has learned less data. Also, as one of my students pointed out, Oliver Wendell Holmes was being Oliver Wendell Holmes. But the really interesting thing is that the model is not working so well in recent years either. One thought is that this is something to do with ‘newness’ of the Roberts Court. It is anyone’s guess here as to what is going on, but one thought I had is that the variables in the Supreme Court database might no longer be as fit for purpose as they once were. Remember the data is based on a political science dataset, where cases are coded retrospectively and where, I am assuming, the major intellectual legwork of thinking what kinds of data should be coded was probably generally done some time ago. The analytical framework it provides may be degrading in utility. The judges may be deciding new kinds of cases or in ways which somehow escape the variables used to date.

This leads me onto the third, and last, study I want to concentrate on. This study does not rely on the same kind of coding of cases. Instead of using someone else’s variables it seeks to predict cases based on the text associated with those cases. It is by Nikolaos Aletras, Dimitrios Tsarapatsanis, Daniel Preoţiuc-Pietro and Vasileios Lampos. This study sought to employ Natural Language Processing and Machine Learning to analyse decisions of the European Court of Human Rights and see if there were patterns of text in those cases which helped predict whether the Court would find, or not find, a violation of the Convention. What did they do? I am going to try and speed up a bit here for readers who have patiently made it this far, and hope I do not do the study a disservice in the meantime.

In broad terms, they took a sample of cases from the European Court of Human Rights dealing with Articles Cases related to Articles 3, 6, and 8 of the Convention. This is because they get most data on these cases dealing with potential violations in these areas. They only use cases in English and they randomly selected equal numbers of cases of violations and non-violation for each of the three articles. This creates a fairly clear null hypothesis within their dataset: there is a 50:50% chance of being right if you pick a case at random from the database.

After some cleaning up of the data, they use machine learning and natural language processing techniques to count the frequency of words, word sequences, and clusters of words/word sequences that appear to be semantically similar (the machine learning approach seems to work on the on the assumption that similar words appear in similar contexts, and defines semantically similar in this way). They do this analysis across different sections of the cases: Procedure, Circumstances, Facts, Relevant and calculate some amalgams of these categories (such as one which looks at the frequency of words and clusters across the full case, law and facts together). That analytical task on one level is as basic as identifying the 2,000 most common words, and short word sequences, but the clustering of topics is more complicated. Then – in the broadest, somewhat simplified terms – they look for relationships between the words, word sequences and, clusters of words that predict the outcome of a sample of the cases, a learning set, build a model of those cases, and test it on the remaining cases to give an indication of the predictive power of the model. It is very similar, in some ways I think, to the way spam filters work.

The predictions from this modelling are 79% accurate. A good deal better than Ruger’s ordinary experts, and not very far from his best experts. In numerical terms that is a little bit better than Katz’s model (but looking at more similar kinds of cases and over a narrower period of time).

There are some wrinkles in the study design. In particular, the textual analysis of ‘facts’ and ‘outcomes’ cannot be kept very separate. It might be unsurprising if the judges did not describe facts a certain way when they were going to find a violation. The algorithms might be picking up a shift in judicial tone, rather than being able to pick up whether certain fact patterns are likely to lead to certain outcomes. As the researchers make clear, it would be better to test the analysis of facts from documents other than the judgements themselves (such as the appeal pleadings and submissions – something which has been done on US Supreme Court Briefs but in a different context). Nonetheless it is a study which suggests the power of machine learning to glean information from legal documents, rather than read data from the kind of database Katz et al were using. Perhaps this suggests a way for Katz et al’s general model to develop? It would be interesting to know whther certain word clusters predict which cases Dan Katz’s model gets wrong, for instance. And if they can find a way of the machine learning new variables over time, from judgments say, well…

As things stand, then we seem some way off, but also quite close to artificial intelligence being be better at predicting the outcome of cases than the most learned Queen’s Counsel. No one gets close to Ruger et al’s 90%+ performance of the appellate attorneys. We don’t know if that is the fairest benchmark, but we certainly need to hold the possibility that it is in mind. The likelihood is, as Aletras et al note, that machine learning will be used in the lower reaches of the legal system. If I was a personal injury firm, or an insurer, for instance, I would be thinking about engaging with these kinds of projects in analysing my risk analysis by scapring data from files on which cases are accepted under no win no fee agreements.

And even then predictions contain only modest information. If you tried to look at Katz et al’s code (you can btw: it is on Github) you’d be hard pushed to find an explanation as to why the case was likely to win that was useful to you as a lawyer or a client (although this might be less so if you were looking at baskets of cases, like say an insurance company). Similarly, if you go and read Aletras et al’s paper, you will see clusters of words which seem to predict judicial outcomes, but they are not to my mind very informative of what is going on. The problem of opacity of algorithms is a big one, sometimes inhibiting their legitimacy, or raising regulatory issues. That said, utility will be what makes or breaks them.

And whilst we should worry about opacity, and power, and systematic malfunctions, and the like, we should perhaps not think they are new problems. After all, we think we know what influences judges (the law and facts), and we try and work out what persuasive arguments are, and we know what judges say influences them, but if we compare what we think with what Ruger et al found (six variables!), or Roger Hood’s work on sentencing (judges say they do not discriminate but…), things start to look simpler, or the explanations of judges look more unstable or tenuous. If our complex, fact and law-based thinking turns out to be poorer than forests of difficult to understand decision trees at predicting winners, is human or quantitative thinking more opaque in these circumstances? That is not all that matters, but it is very important.

Posted in Uncategorized | Leave a comment

Women QCs: a quick look at the data

The MoJ’s press release on recent QC appointments says this, “More female and black and minority ethnic candidates have been appointed Queen’s Counsel than ever before.” And the QC appointments panel data says this, “We are pleased that the number of women applying and being successful continues to rise, and that the proportion of women amongst those appointed is at its highest level ever.” (see the press release on its site).

So it is worth pointing out the following.

The year the most women were appointed as QCs in absolute terms was in 2006 (there were 68 compared with this year’s 56). You can see the graph of the data here.


And in terms of the proportion of women applicants,  66% of women applicants succeeded in 2011/12 whereas this year it was 55%. Another picture…


And if we turn to the data that the press departments want us to focus on, we do indeed see that this year there was a higher proportion of women appointed.


That’s an increase from 23.4% last year to 27.4% this, but it was 26.9% the year before that. So this year is 0.5% higher than the previous best for that statistic.*

Of course, the most important thing about those two lines is how far apart they are; and the most important lessons from the data are probably learnt from the number of applicants and their success rates. Oh, and the length of time this is taking.

*n.b. a previous draft of this post used the wrong data. The graph and the data have been corrected here.

Posted in Uncategorized | 2 Comments

Law: It’s all a game

Happy New year! I am tempted out of my accidental blogging purdah by a genuinely fascinating story in Legal Week on Taylor Wessing’s use of Cosmic Cadet. Cosmic Cadet is not a replacement term for trainee solicitors indicating the uber-global commercial awareness of the modern day law student. No. It is a possibly cringe-worthy test designed to measure (per the maker‘s website):

  • Cognition. How an individual processes and uses information to perform mental operations.
  • Thinking Style. How an individual tends to approach and appraise problems and make decisions.
  • Interpersonal Style. An individual’s preferred approach to interacting with other people.
  • Delivering ResultsAn individual’s drive to cope with challenges and finish a task through to completion.
With such an awful title, there must be something in it, no?  Arctic Shores claim strong levels of scientific support for their approach, including that all of their ‘research’ (not all of their testing or application or interpretation, I note in passing [NB, I am reassured since posting this that, “our testing, interpretation and general validity has been independently reviewed” – see below in comments section] is reviewed (with what results we know not) by “independent subject matter experts“.
If I am sounding sceptical, in fact, I am more interested than sceptical. The attributes that Legal Week highlighted as measured by the test are particularly worthy of scrutiny:

Thinking style
Risk appetite
Managing uncertainty
Potential to innovate
Learning agility

Interpersonal style
Social confidence

Processing capacity
Executive function
Processing speed
Attention control

Delivering results
Performance under pressure

No mention of ethics was my first reaction – and remains my strongest one. Risk appetite is likely to be related to ethical inclination and some of the other measures may be too. It would be especially interesting to know what kinds of risk appetite users of the test want. The rather weakly evidenced assumption in the industry is that lawyers are risk averse, in the same way as lawyers are seen as both show-offs and introverts. An interesting part of the test will be the capacity of Arctic and the firms to learn more about the truth of such claims.

Fascinating too would be an explanation of how would-be trainees are supposed to manage uncertainty. There is an uncomfortable impression given by this list of the trainee as a machine, a resilient robot, a chip in the supercomputer that is big law. That’s an unfair impression, I am sure, but it is one which I hope the firms who are thinking along these lines think carefully about. Taylor Wessing, to be clear, seem to be thinking carefully about how the tests integrate with their wider processes of assessment.

Resilient, high performing people are one thing; systems that break them or lead them astray are another. I would not say law firms are broken, but there is plenty of evidence that they can and do lead some people astray. And it is absolutely vital that if firms are thinking along these lines they pay more than lip service to the moral capacities of their candidates and the ethical resilience of their systems and culture. I don’t see that in these tests. Perhaps it is to be found elsewhere.


Postscript: there’s another excellent story on this here

Posted in Uncategorized | 3 Comments