Anyone have a good (open) corpus or generator of human names that covers a good amount of the different types of names people can have?

Preferably tagged with ethnicity or nationality. The names don’t have to be real, just representative.

@eumiro maybe?

@pganssle @eumiro hmm, it seems like this might be a good job for a wikidata query? assuming that the sparql query could be divined...

@pybonacci @pganssle @eumiro
I assume that by 'type' he means names in general. So basically looking for a big list/DB of every possible name in existence. With ethnic/cultural/etc data attached to each if possible.

@mazieres @eumiro Very nice data set, and pretty cool analysis, though this does seem to be only surnames, and it doesn’t preserve capitalization.


Why don’t you look at “baby names” websites? They have all that. E.g., Although it’s probably not free to scrape… But they list also some sources at

@mazieres @eumiro

@FailForward @mazieres @eumiro That is for given names. It’s not terribly difficult to find lists of given names or lists of surnames, but I’d like more variety. Many people have multiple given names, multiple last names, no last name, no given name, patronymics, etc.

