@pganssle @eumiro Hi there. I crafted a corpus of 650k names with countries and ethnicities out of PubMed. HIH ! https://gist.github.com/mazieres/0b905a30b1fc9bdbb36237575fe276c8#file-namograph-ipynb
@mazieres @eumiro@fosstodon.org Very nice data set, and pretty cool analysis, though this does seem to be only surnames, and it doesn’t preserve capitalization.
@FailForward @mazieres @eumiro@fosstodon.org That is for given names. It’s not terribly difficult to find lists of given names or lists of surnames, but I’d like more variety. Many people have multiple given names, multiple last names, no last name, no given name, patronymics, etc.
“