One Woman’s work using Data Science to break down barriers – and breaking down barriers to Data Science

You will only ever meet one woman like Kim Eng Ky – and that’s Kim Eng Ky.  As an undergraduate, she double majored in economics/applied math and computer science, graduating with honors.  In her professional life, she’s gone on to complete exacting work implementing sophisticated machine learning algorithms at multiple different private and public sector organizations (including as principal data scientist at a prestigious Fortune 6 company).  In her spare time, she’s founded multiple initiatives with missions to engender greater unity across diverse ethnic, racial and gendered backgrounds within data science and machine learning.  And she’s perhaps one of the humblest people one could have the pleasure of meeting.  In fact, it is highly probable, in hours of conversation, mention of a single one of those aforementioned accomplishments won’t cross her lips.  But in due course of dialogue with someone else – anyone else – who knows her, the truth quickly comes out; she’s smart, amicable, and easy to work with, making past and present co-workers and superiors alike sing her praise.

DS for good in transit

Kim cut her teeth in the data science world at MetroTransit, one of the local transit companies serving Minneapolis/St. Paul and the surrounding region.  She and a team of data scientists were tasked with looking into factors during a bus trip that increase risk of a “responsible accident” – a collision where a Metro Transit driver, not the weather or a fellow commuter, is at fault.  No bus operator is immune to crashes, and, in search of more ways to prevent future incidents, MetroTransit was interested if there were commonalities among past crashes that would shed light on perhaps-overlooked risk factors.

However, there were two major obstacles the team had to hurdle to unlock those insights:  the size of the data, and the complexity of the relationships therein.  MetroTransit captures hundreds of millions of data-points on their trips every month – about the route, the bus, the time of day, the day of week, and on the driver as well – and often the relationship between the variables isn’t a simple one.  One factor’s influence on the risk of an accident may be influenced by several other factors or intertwine with the effects of another factor, and the nature of these relationships can take on a myriad of forms (a straight line, a U-shape, a sigmoid S-like shape, an exponential curve, and many more).   With so many variables interacting, the relationships become highly dimensional and are often difficult to draw out with ordinary statistical methods.  Furthermore, the sheer size of the data at the trip level – hundreds of millions of data-points at such a granular level – had challenged predecessors, precluding other studies from examining data at the trip-level.

Instead, Kim and the team used a popular machine learning technique called a random forest, where thousands of ‘decision trees’ are created and then ‘averaged’ (in essence) to highlight the strongest indicators of responsible bus accident risk.  The team found their study aligned with previous findings – but they also found two important, preventable factors that had not previously been uncovered:  “bus drivers’ risk greatly increases if they (a) did not work the previous day and (b) worked longer hours the previous week.”  The work was cornerstone to preventative safety measures at MetroTransit established to schedule and train drivers in ways that minimized crash-risk factors within the agency’s control. 

Breaking down barriers in DS

From there, Kim was hired as a senior data scientist in healthcare, and quickly promoted to principal data scientist.  While there, she founded her second initiative intended to build bridges across backgrounds of diverse races, ethnicities and genders (her first was a local chapter of Women in Machine Learning and Data Science (WiMLDS) that had taken off in short order).  But upon coming to her new position, she found no such similar group internal to a company of almost 300,000 workers with hundreds of data scientists.  In characteristic-Kim fashion, she partnered with a friend and, at the encouragement of a friend and male lead of another Twin Cities analytics networking group, started Women in Analytics and Data Science, or WiADS for short.  WiADS is an internal employee resource group (ERG) that connects women in machine learning and data across the organization, and is an initiative that has grown into much more even in the years since it’s launch.  The Minneaoplis WiaDs Conference, an offshoot of the WiADS ERG, is a region-wide forum at the company that unites both men and women, internal and external to the org, and elevates the voices of women and people of color to showcase what diversity in machine learning does for the domain.  Of her ardor to eliminate disparities in the data science world, Kim said, “People still say they can’t find qualified women and people of color in machine learning.  We’re here to prove them wrong.”

© Image courtesy of

DS breaking down barriers

Today, Kim is working at the Federal Reserve Bank of Minneapolis as a data scientist in their community development division, conducting studies on issues impacting low- and moderate-income communities and individuals, as well as racial and ethnic disparities therein, in order to advance economic outcomes for all.  “Even as people let go of racism from their hearts, it does not go from ou system.  Our systems of education, our economic system, our health systems all have racism baked into them,”  Angela Glover Blackwell, the founder of PolicyLink, reminded viewers during the kick-off of a series of talks on the Fed’s work on the issue, called Racism and the Economy.  The series – created through the partnership of three Federal Reserve Banks:  Atlanta, Boston and Minneapolis – launched in October of this year, and the work of Kim and the economists at the Fed is crucial to these conversations.  

Unlike Kim’s first position at MetroTransit, where she and the team had to hurdle handling a plethora of data, the Fed often deals with data scarcity.  To illuminate root causes of lingering inequities amid this data scarcity, they use techniques like causal inference and economic models of reasoning, which can handle sparse data and, if done properly, can go beyond correlation to causation.  That work then informs policymakers throughout the region as they craft policies intended to target vestiges of racism and drive sustainable economic growth for BIPOC communities (BIPOC stands for Black, Indigenous, people of color).  Kim says that it’s especially important to conduct micro-studies like this – at state or even more granular levels – because each locality is plagued with unique forms of discrimination and inequity; there’s no panacea for racism.  “In the US we talk a lot about this now, after the murder of George Floyd,” Kim said.  “But a lot of people don’t believe [racism is still an issue].  But the data don’t lie.  That’s why we’re working to help the public understand the evidence.”

And extending that understanding to the public is key – because the Fed understands that the best policy interventions aren’t thought up in a vacuum.  Non-profits, local governments, community groups and more all work in symbiosis with the Fed to make decisions, putting their myriad of backgrounds, experiences, and thoughts together to come up with the best possible solution – a solution that represents everyone, not just a privileged few.  And while creating an inclusive economy is the right thing to do morally, it’s also economically sound too – because when everyone thrives, it generates a healthful, competitive national economy that benefits everyone right back.  So, despite the economy’s current shortcomings, the future is full of hope – because with Kim and others from across different backgrounds, expertises, sexes, ethnicities, and nationalities on it, we can and will put an end to racism – for good.

The first of Kim and the team’s Racism and the Economy papers is set to come out in mid-November, and will be available at

Read more on The Good AI

You may also like

Comments are closed.