A data scientist in Kenya pt.2

kenyadata science

Oct 31

TL;DR;

I work at Tulaa now, and our specialty is we assist smallholder farmers end-to-end
Tulaa has the potential to create the most granular dataset on individual smallholder farmers bar none - which is rad!
Data collection in this environment is no mean feat, every channel has its challenges, but SMS-polling is probably the best option for us.
To make our teams work smarter, we're implementing Jupyter Notebooks, Amazon QuickSight and, courtesy of Rippleworks, are redesigning our data architecture.

Something that has stuck with me in the brief time I’ve worked with smallholder farmers is how easy it is to overlook trust a necessary ingredient - and perhaps even a precursor - of doing any business with this community. In the past few days alone, I’ve attended presentations by firms whose entire business model is dedicated to this issue - such as setting up a blockchain ledger to ensure equitable contracts for farms - or others whose product adoption rate was mainly driven by proving to farmers that this offer was not a total scam - such as insurance for smallholder crops.

It was a welcome surprise, then, when in my first week at work I read a report authored by Busara - a behavioral economics firm focusing on the global south - regarding Tulaa and one of the farmers interviewed said "Tulaa cares about me when no one else does". So what is Tulaa then, and why did I choose to join this particular team for my sabbatical?

Tulaa - whose name, fittingly, means ‘balance’ in Sanskrit - is an Agtech startup based in Kenya whose mission is to "bring together input suppliers, financial institutions and farmers in a virtual marketplace to provide a new way for rural communities to get what they want when they need it.” What this means in practice is that we support farmers through the entire lifecycle starting from accessing quality inputs and the credit to purchase it, through to helping them sell their goods in an equitable and transparent marketplace. This model differs from many other players in the AgTech space in that we are one of the very few - and possibly the only in Africa - that offers such an end-to-end service.

You may justifiably wonder, why would any small firm attempt to offer so many things at once rather than focusing on one of these services very well - surely this is a recipe for disaster? Well, the ‘simple' answer is that none of the myriad problems that farmers face live in isolation so, without addressing all of the major pain points - many of which I alluded to in my previous post - it is hard to guarantee a farmer’s success (and even then success is not a given). Put another way, you will maximize a farmer's chance success only if she is able to sell her goods and often the precursors to this are 1) quality inputs, 2) a means to purchase said input, 3) and knowledge of how to maintain their crop and use their inputs.

Alright then, but if this is a blog about data science, why is this interesting from a data perspective? Well, because Tulaa’s model is an end-to-end service, we have the opportunity to collect a data stream for each smallholder farmer that could be unrivaled in its granularity which, unsurprisingly, will open the doors to a whole pantheon of data-driven projects. Not only that, but because we work with many other stakeholders - agents, retailers, input suppliers, aggregators, and traders (yes, vertical integration is very much a pipedream at this point) - we also have the chance to collect many parallel data streams and see the interactions among them. For example, how much are the sales of a small retailer boosted by the availability of credits to our farmers? Not convinced this could be useful - imagine how much more effective a trader-to-farmer matching algorithm would be if we knew well in advance what our farmer’s estimated yield and harvest date would be - we could run multiple matching scenarios prior to any trades being made.

But therein lies the challenge - how exactly does a small outfit like Tulaa (17 Tulaninis and counting!) collect all of this data and, for f-sake, where do we start? Welcome to my world! Broadly speaking, my time at Tulaa is defined by two major imperatives:

How do we measure the value we’re delivering to our stakeholders?
How can we ensure Tulaa’s teams are working smarter?

Let’s break this down.

How do we measure the value we’re delivering to our stakeholders?

This may certainly betray my consulting roots, but as the oft-quoted Peter Drucker said "if you can’t measure it, you can’t improve it” (or for the statisticians out there “In God we trust, all others must bring data”). The first step to knowing if we are delivering any value is to have a clear, and measurable definition of what this value is. In the case of farmers, we propose to offer them access to better inputs, access to credit, farming advice and a better sales value to their goods. To determine, then, if are actually improving the farmers’ well-being, we would need to check if the farmers had access to these items prior to when we approached them, or, in the case of sales, to measure what proportion of the trader purchase price actually goes to the farmers’ earning.

This sounds relatively straightforward, and for certain measures, it is, however, when it comes to collecting data directly from farmers several obstacles quickly become apparent. First, how exactly do we get this information? You see, presently, most of our data is gathered by our field agents (i.e. in person) and though this provides quality data, it does not lend itself well to scaling. Most of our farmers, unsurprisingly, do not have smartphones so this immediately rules our any web-linked polling software such as Actimo.

Another frequently used channel in East Africa is USSD, but this too has several limitations. First, USSD sessions time out after 2 minutes which would severely limit the number of questions you could submit, second, some programs, such as the Kudu market matching program in Uganda, have very low engagement through this on this channel (on the order of 1-2%), and finally, research conducted by Viamo showed that smallholder farmers can have very low literacy rates. In the case of women farmers in Malawi only 40% could read a full sentence (Viamo, however, was promoting an IVR service, so perhaps this last statistic may be taken with a grain of salt).

Alright then, so what about SMS based polling? Well, this is probably the most promising option - barring the low response rates - if you have the means to design and effectively deploy the survey yourself. Certain firms do specialize in this service, such as GeoPoll, at a coffer-draining $5/person/survey .. ya, you read that right. If you’re waiting for me to propose a neat solution well, I may have to disappoint you - there’s no cheap/easy way to do this that I know of, but it is doable.

How can we ensure Tulaa’s teams are working smarter?

Data, in of it itself, is not terribly useful - it must be made easily accessible and digestible. Inevitably, then, as a business that aspires to be data-driven, our conversation turned to what are the appropriate business intelligence (BI) and statistic tools to be used at startup like Tulaa? The latter question, I believe, is relatively straightforward - Jupyter notebooks. Yes, it may not have the breadth of stats packages that R has (yet) but it’s a powerful tool, has a great interface to share and visualize analyses and, python is easier to operationalize (IMO). As for the appropriate BI tool well, I won’t bore you with the details but suffice it to say that many software companies are stuck in the binary world of for-profit or non-profit, so the idea of pricing for a social enterprise is quite alien to them. Personally, I'm a fan of Tableau, however, the $850 license cost per developer/year is quite prohibitive for a small firm. For the time being, we're choosing to work with Amazon QuickSight - primarily because of our needs quite basic and it comes at no additional cost with our AWS suite.

Of course, a flashy front end is worth stuff-all if we don't have the back-end architecture to support it. Fortunately for us, we were approached by Rippleworks, an NGO that connects promising social ventures with industry experts, and are now under the guidance of a seasoned data-architect from Apple. Our goal, broadly speaking, will be to shift from our current schizophrenic state of PostgreSQL, Google Sheets, and arbitrary CSV files to something that will be orderly enough to avoid orphaned datasets but flexible enough to run in many directions our journey may take us.

So that's it, folks - now you know what will be keeping me busy for the next few months!

Tristan Aubert

A data scientist in Kenya pt.2

How do we measure the value we’re delivering to our stakeholders?

How can we ensure Tulaa’s teams are working smarter?

My 2019 Annual Review

A data scientist in Kenya pt.1