Tamanna Haque studied maths with a focus on statistics at The University of Manchester and is now a lead data scientist for JLR, working in the field of connected cars, i.e. cars that are connected to the internet (that’s about 97% of electric vehicles)
After your maths degree, what made you choose industry over further study?
My maths degree was the perfect amount of education for me to build my analytics and later AI career upon. For me personally, the pareto principle sort of applies here: further study wouldn’t have helped me to achieve more than I have to date. My maths degree has given me excellent and necessary foundational knowledge for my data science work, helping me to think about AI in a statistically sound and responsible manner, but what helps me to turn an idea into an actual product is my commercial lens and cumulative work experience. More education would have shifted the balance of these skills I currently have. Plus, I couldn’t wait to start working towards achieving other goals in life whilst also feeling like I’m part of a bigger purpose.
You have many data experts’ dream job, working for a luxury car company. How did you get it?
In a sense it was 15 years in the making! I’ve been a fan of Jaguar since I was 9 and received my offer to join Jaguar Land Rover (JLR) at 24. My interest in the brand and corporate awareness made the interview easier because I didn’t have to ‘revise’ on these areas. In my interview I remember having a very energetic conversation about the then latest electrification announcements for Jaguar, as well as showcasing my AI portfolio and knowledge about the connected car. My awareness of the connected car began over a decade ago thanks to some of Michael Schumacher’s work! So many aspects of the role naturally aligned. In the hiring seat (which I’m now in) it’s easy to know when someone genuinely has an interest in your business. Being able to demonstrate passion helps – you’re already aligning with some of the company values.
What kind of datasets do you work with?
I work with data from connected vehicles (of participating and consenting clients). Getting to grips with this style of data was one of my biggest learning curves – previously I worked with transactional style data when working at a digital fashion retailer. Connected vehicle data is structured differently and is typically much, much bigger! It captures various interactions and signals from a vast number of sensors and touchpoints on the car, to give a comprehensive view of a client’s relationship with (and state of) their car. This can provide insights about vehicle health, journey habits, driving style, feature usage, diagnostics, charging habits and more. Vehicles themselves, and the information offboarded from them, are ever progressing, which is exciting because there will always be a place for new ideas and possibilities.
What statistical techniques do you use most in your everyday work?
Typically, a machine learning model is one which is trained to learn intricate relationships and patterns between several independent (explanatory) variables and a dependent (target) variable, to be able to predict outcomes when it sees new data consisting of independent variables only. I use a variety of statistical techniques day-to-day when leading the design and delivery of machine learning models.
Statistical methods help me determine what a good sample size is for my model’s training data, or if the data sampled sufficiently represents the population. This is important because a representative dataset helps to ensure that good predictions can be made for a majority of expected scenarios.
Descriptive statistics is a part of exploratory data analysis, the stage where I’ll try to understand the data at play before moving onto its treatment and preprocessing. This also helps downstream when I am monitoring my model live for things like data drift, which can indicate a change in the characteristics of the prediction data and perhaps lead to the model needing a rework. I also use techniques such as correlation coefficients to look for relationships between independent variables, and I learned the hard way to do this with the dependent variable as well (to spot signs of ‘target leakage’). There are other statistical techniques used to support our test designs and experimentation.
When faced with a new domain I tend to onboard through using cluster analysis. Its ability to form groups, without being led by prior belief, can be very revealing whilst helping you to ramp up quickly. Challenging previous conceptions (with the backing of data and statistics) can be quite fun too!
You can see that a strong and baked-in ability for statistics and mathematics is valuable in data science, especially if you’re working in a fast-paced, delivery-focussed environment where your predictions drive decision-making and reach real clients.
What is the biggest misunderstanding people have about connected cars?
I’m not sure this is a misunderstanding, but when people think about ‘connected cars’ a lot of them will first think of ADAS (advanced driver-assistance systems) capabilities, which can support various levels of autonomous driving. This is true. However, data from the connected car also enables many more aspects of the vehicle experience to be improved and made more intelligent through the application of AI, spanning aftercare, maintenance, infotainment, vehicle apps and more. Using the breadth of vehicle data available we are able to understand and optimise for vehicle health and ensure we are delivering a modern luxury client experience. It’s very cool what we can do with AI and the connected car.
You might also like: 5 minutes with: Immaculate Kathomi Murithi