How a US firm Apriori is creating a database of Indians using publicly available datasets

How a US firm Apriori is creating a database of Indians using publicly available datasets

December 19, 2017 9:04 AM

A data aggregation startup in the suburbs of Washington DC is aiming to be a specialist provider of demographic data on Indians for commercial use. Apriori, founded by Igor Kyrylenko, sifts through publicly available data to craft profiles used to evaluate credit worthiness and to sell targeted consumer products in one of the world’s fastest growing economies, which is also digitising rapidly.

The American company, which relies on information such as voter registration data and from the postal service, analyses and draws inferences on migration patterns and creates consumer profiles for vast swathes of the population.

“We are proud to host over 850 million unique records for adult citizens in India in 14 publishing languages from every state and union territory,” said Kyrylenko who speaks Russian and Ukrainian and holds a patent in identification verification systems.

“We refresh data on a regular basis and operate with over 60 data attributes for each record which we have acquired through information publicly available on internet. The data is used for legal and permissible purposes by financial organizations, credit bureaus and insurance companies,” he added.


How a US firm Apriori is creating a database of Indians using publicly available datasets

The World Bank estimates India’s population at around 1.3 billion people, who speak 22 major languages with over 700 dialects. Government identification platforms like Aadhaar, the permanent account number, passport and voter ID cards among others over the years have attempted to capture details on citizens in an accurate way.

It is such data that startups like Apriori are using to estimate for instance how many people have left smaller towns and settled in cities, whether the voter ID data like address is correct for every individual.

The Election Commission of India and India Post did not reply to email queries from ET for this story.

Kyrylenko said his company which has built specialised algorithms to sift and analyse data, can even track name changes due to marriage or legal action, determine customers’ geographic coordinates (latitude and longitude) based on their India Post pin code and ascertain gender.

The company’s main office is in McLean, VA (Washington, DC suburb) and has representatives in UK, India, Hong Kong and Singapore. They have 12 full time employees and constantly hires linguists, native language speakers, business and marketing advisors in India to strengthen services. “The list grows every month when new countries are added to our coverage. Overall, Apriori handles over 2 Billion records annually in hundreds of formats and dozens of languages.”


How a US firm Apriori is creating a database of Indians using publicly available datasets

Kyrylenko said the idea for his startup came from a client inquiry seeking India specific data while he was working in another IT firm.

“Our customers want to reach and target areas which have still not have been correctly recorded. Our data sets on Indians supplements other identity platforms” The company claims that financial organizations or credit bureaus can use such accurate data to predict the credit worthiness of individuals.

He is still talking to large organizations in India to break into the market and says he is working with credit bureaus, insurance companies who want to make better customer related decisions.”We get data from multiple sources that have data sets strewn across the internet. Three out of 4 credit Bureaus in India already use our data.. But data is never perfect and we are continuously trying to improve it.”

As more Indians move online and more data sets are strewn online; initiatives like Open Government Data which provides data assets on Agriculture, health performance, legislation, and government’s budgets among others provide a great platform for organizations to utilize government data for commercial or non commercial purposes.

Even India-based entrepreneurs who utilize publicly available data find there is little clarity on what can be used for commercial purposes.

John Samuel Raja whose startup How India Lives use data pointers which are made available in searchable format from various government departments to offer their customer data based solutions, said. “Data today is in silos and we bring it out to the open and examine the possibilities of its usage.” And although the government has an open data policy, “there is still ambiguity on what data can be used for commercial purposes and vice versa,” he said.

Thejesh GN, Founder of Data Meet which is community of Data Science and Open Data enthusiasts said that getting data sets on Indians is easier but there “needs to be clarity on what is available as Open Data under the Government of India can be used for commercial purposes or not.”

Cyber law expert Pavan Duggal is of the view that there is nothing specific in the IT act which can stop companies from working with citizen or government data for commercial purposes. “As we get more digital as a country, amount of data per citizen is generated, we need to have laws which protect the sovereignty of such data and check what can be used for monetization by private companies,” he said.

While Anupam Saraph, a renowned expert in governance of complex systems and advises governments and businesses across the world pointed out that Election, bank customer data, mobile customer records generated, updated, certified or authenticated by parties who have no role in generating such data give rise to questions of differentiating authentic records from fake ones.


Download PDF

       Back to newsfeed