Wonder if we should get rid of automatic language detection given how often it is inaccurate...

#mastodev

Follow

@Gargron It seems pretty critical to me. How about fix it?

Β· Β· 2 Β· 1 Β· 2

@freemo I don't think that's possible. Aren't you a Machine Learning innovator? You should know what this problem entails.

@Gargron I am yes. I didnt mean to suggest it would be trivial to solve with 100% accuracy. I am just suggting you work towards improving the error rate its ok if there is some) rather than eliminating a vital feature altogether.

@freemo Well, for a start I am not a C developer and not a ML expert. CLD3 is developed by Google and I seriously doubt that I could do something that they can't.

@Gargron Perhaps try a different third-party library? Or perhaps improve the way the library is applied. I havent looked at your code but im not suggesting you do the ML yourself at all. But there can be a huge difference in how you apply it.

Just an off the cuff example (not saying this is viable as i dont know enough). But for example I'd imagine a LOT of the error comes from shorter posts analyzed in isolation. However if your library is uncertain what language a particular post is in then it can do one of two things

1) display it anyway, no harm done if you display an unwanted language, only harm is done when you dont display a wanted language. So make the error of an acceptable nature if you cant improve it.

2) use more context, for example if 100% of a users identified posts are chinese and a short post is undetermined what language it is, then assume it is chinese as it should be weighted on context.

1 is the easy path, and probably the one I'd suggest since I dont see a need for perfection here.. but 2 might be a decent incremental step if you really feel perfection is needed.

@freemo CLD3 doesn't offer a reliable confidence rating. You can give it a short string and it will be 95% confident about its wrong result. So while 1 is the better option it is not possible.

@Gargron Perhaps use a library that uses a confidence rating instead, or consider options beyond 1 and 2.

If you'd like me to provide some more serious help and suggestions I can review the code and library options more closely if youd like.

@freemo even from the sidelines it's enfuriating to see how you're posing the problem. I mean, even if you'd be right, you still sound like an asshole. Maybe you should tone down the entitlement a bit. @Gargron is a saint for even replying to you.

@mariusor

I'm sorry if i worded it in a way that gave you that impression, it wasnt my intention. It was intended to be a straight forward reaction not a critical/emotional one.

What about my wording do you feel made it sound like i was being an "asshole" so i can try harder in the future to avoid that language. I'd hate for someone to misinterprit my intention again or be hurt by it.

Also note i explicitly offered my help and time to do the fixing, since as he noted I am an expert in that field. Which I would hope make my good intentions clear.

@Gargron

@freemo
"How about fix it" - sounds entitled
"Why not trying to fix it?" - sounds reasonable.

I am however not a native English speaker, maybe something got lost in translation. Please accept my apologies.

@mariusor No thats fair, rereading what I said I could see how someone might see it that way. As I said wasnt my intent but I do agree your wording would have been more tactful. My apologies for the misunderstanding.

@Gargron

Sign in to participate in the conversation
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.