Article

How to Source AI and Machine Learning Startups Early

How to Source AI and Machine Learning Startups Early

Artificial intelligence has become the dominant theme in venture capital. The pace of model development, the breadth of application areas, and the scale of capital flowing into the sector have created an environment where early detection of interesting founders is more valuable and more difficult than in almost any other sector. Demand from investors is high, competition for the best deals is intense, and the best founders are often not the ones who announce themselves first.

Where AI Founders Come From

Research lab alumni are the most prominent source. Founders with direct experience building at the frontier of model development, at organisations like Google Brain, Google DeepMind, OpenAI, Meta AI, Microsoft Research, and various university labs, have the technical credibility and network to attract talent and capital. Tracking departures from these organisations, particularly senior researchers and engineering leads, is a high-value sourcing activity.

PhD graduates in machine learning and AI from top programmes are a second important population. MIT, Stanford, CMU, Berkeley, Oxford, Cambridge, ETH Zurich, and a small number of other institutions produce the majority of globally relevant ML PhD talent. Tracking graduation timelines and post-graduation activity from these programmes surfaces potential founders at the moment they enter the commercial world.

Domain experts who have applied ML inside large organisations represent a third population. A machine learning engineer who has spent five years building production recommendation systems, fraud detection models, or drug discovery pipelines inside a large company has the operational expertise and domain knowledge to build a company that a pure researcher typically does not.

The Signals That Indicate an AI Startup Is Forming

GitHub activity is one of the strongest early signals. New repositories with model training infrastructure, inference optimization code, data pipeline architecture, or domain-specific ML application scaffolding from individuals with AI expertise are meaningful signals. The transition from contributing to employer repositories to building independent infrastructure is particularly telling.

Preprint publications on arXiv are a high-value academic signal. A researcher who publishes a preprint on a novel architecture or application and then registers a domain and incorporates a company within the following months is exhibiting a clear commercial trajectory.

Research lab departures, when publicly visible, are strong sourcing signals. Senior researchers who announce they are leaving organisations like Google DeepMind or OpenAI without announcing a new employer are worth monitoring. The window between departure and first public announcement of a new company is typically two to twelve months.

Professional profile changes on socials, while less timely than other sources, provide confirmation signals. A researcher who removes their affiliation with a research lab without adding a new employer is exhibiting the departure signal associated with company formation.

Where to Monitor

arXiv is the primary venue for AI research and should be monitored daily for papers in relevant subfields. Semantic Scholar and Papers With Code provide structured access with additional enrichment including citation data and code availability. GitHub is essential for tracking technical founders building in AI: new organisations created by researchers with ML backgrounds, new repositories with AI application architecture, and increased independent activity from engineers at AI labs are all observable. Trade registries surface the formal company formation event. In the UK, US, and European markets, new company registrations from individuals with AI research backgrounds are identifiable through background enrichment applied to director data.

Approaching AI Founders

AI founders receive a high volume of investor outreach, and the quality varies enormously. Messages that reference specific technical work, ask a genuine question about a specific aspect of their research, and connect it to a specific commercial problem the investor has observed are far more likely to generate a response. The most credible outreach for AI founders comes from people who can demonstrate understanding of the technical choices involved. Timing matters: a researcher who has just departed a lab and is in the exploration phase is open to investor conversations in a way they will not be once they are in active build mode with growing investor attention.

How Evertrace Surfaces AI and ML Founders

Evertrace monitors GitHub activity from engineers and researchers with AI and ML backgrounds, academic publications and preprints in ML research areas, company formation events from individuals with relevant backgrounds, patent filings in AI-related technology areas, and social platform signals from the AI research and engineering community. Signals are combined and scored to surface founders at the earliest observable stage globally.

175+ VC firms globally use Evertrace to find AI and machine learning founders before their competitors do.

Book a demo to see Evertrace in action

Frequently Asked Questions

Where do most high-quality AI startup founders come from?
The majority come from research labs at major technology companies (Google DeepMind, OpenAI, Meta AI, Microsoft Research), top ML PhD programmes (MIT, Stanford, CMU, Berkeley, Oxford, Cambridge, ETH Zurich), and senior ML engineering roles at companies with sophisticated AI deployments.

How do you find AI founders before they announce?
The most reliable approach combines GitHub activity monitoring, arXiv preprint monitoring, trade registry monitoring for new incorporations from individuals with AI backgrounds, and tracking departures from research labs.

What makes AI sector sourcing harder than other sectors?
Competition for the best AI founders is extremely high, outreach volumes are enormous, and the best founders are often selectively choosing their investors. Technical credibility in outreach is more important in this sector than almost any other.

How do you differentiate between an AI research project and a company-in-formation?
The combination of formal founding signals (company registration, domain registration, GitHub organisation creation) alongside the research activity confirms company formation. Research activity alone indicates potential; the formal signals confirm commitment.

Simon Bøttkjær
Co-founder