@pganssle @eumiro Hi there. I crafted a corpus of 650k names with countries and ethnicities out of PubMed. HIH ! https://gist.github.com/mazieres/0b905a30b1fc9bdbb36237575fe276c8#file-namograph-ipynb
Why don’t you look at “baby names” websites? They have all that. E.g., https://www.behindthename.com/. Although it’s probably not free to scrape… But they list also some sources at https://www.behindthename.com/info/copyright
@mazieres @eumiro@fosstodon.org
@FailForward @mazieres @eumiro@fosstodon.org That is for given names. It’s not terribly difficult to find lists of given names or lists of surnames, but I’d like more variety. Many people have multiple given names, multiple last names, no last name, no given name, patronymics, etc.
“