uspol, doubting one's sanity, empiricism, wasting resources?
So yesterday I ended up in a situation whereI was in disagreement about what I thought I could clearly hear in a video. Since it sounded perfectly clear to me, and the topic of the related discussion was politically charged, _and_ I have no reason to doubt the other participants honesty about what they say they are hearing, this is pretty concerning. I see three options:
1. I am so influenced by propaganda my basic senses are broken.
2. The above, but for the other participant.
3. This specific video is an auditory case of blue/black vs white/gold dress.
I think the odds are about 5/80/15. I kind of hope it's 3 though, it would mean the propaganda is not strong enough to wrap the minds of intelligent people that badly. If it is 1, I obviously need to at least make a drastic change in the media I am consuming, and probably re-evaluate a lot of stuff.
This toot is mostly a pre-commitment, so that I follow up on my attempt to settle this. My plan is as follows, mostly in order of effort needed:
0. Look at the auto-generated captions on the YT video. If this confirms what I hear this would be _extremely weak_ evidence against 1. There might not even be auto-captions enabled for the video and I am not sure if manual captions can be distinguished from automatic ones.
1. Extract the crucial part of the sound from the video and re-upload it to YT with no real visuals attached and no suggestive title. Check the auto-captions there. This could be weak to moderate evidence for any of the above.
2. Same but with a different system than YT. I'll probably pick a couple options from this page: https://fosspost.org/open-source-speech-recognition/ . They all would be weak to moderate evidence for any of the above, in aggregate they are strong if in agreement.
3. Use Mechanical Turk to ask people about what they hear. **If anyone knows a reasonable non-amazon alternative, let me know.** This would be strong evidence towards something, with the possibility of bias due to people being familiar with the content.
4. Same as above, but cut the audio into separate words to limit bias.
If too many of the steps fail (producing no reasonable output) I can fall back on using the single words to ask friends who are hopefully unfamiliar with the context, but this would be kind of weak. I might skip some later steps if previous steps produce sufficient agreement or if they turn out to be too expensive (I don't really know the rates on mturk...).
Crucially, what my specific claims about what I clearly hear are (which are incompatible with what the other person hears), in order of how confident I am of them:
1. The second word starts with an 'm', not a 'w'.
2. The first word ends with a consonant, most likely an 'ng' sound.
3. The first word starts with 'ha'.
4. The second word starts with a 'my' sound.
This might take a couple of days...
uspol, doubting one's sanity, empiricism, wasting resources?
# Test 0
No captions on the original video. Not a huge disappointment, it wouldn't have been strong evidence anyway.
Before I get to Test 1, I wanted to point out that if it correctly reconstructs the given name present in the chant this would be _weaker_ evidence of whatever gets recognized, because it might suggest the captioning system recognized the chant and assigned known captions to it (I don't know whether anything like that actually happens). Something like "Hang my pants!" (which is actually what I heard before I corrected for context) would be stronger evidence. Thankfully this won't be an issue in Test 2.
uspol, doubting one's sanity, empiricism, wasting resources?
# Test 1
Let's document this one properly.
## Preparation
Downloaded the video using `youtube-dl`.
Extracted the relevant part of the sound, from the moment it becomes clear (IMO) to when the video cuts to another part of the crowd.
```
ffmpeg -i Rioters\ chant\ \'hang\ Mike\ Pence\'\ as\ they\ breach\ Capitol-ba0UR7gITrU.mp4 -vn -acodec copy chant.aac
```
Created a video out of the sound file with a irrelevant name and the least political picture I could find on short notice (a drawing of a mathematical pun in Polish).
```
ffmpeg -loop 1 -y -i ../kurakLematowskiegoZorna.jpg -i chant.aac -shortest -acodec copy -vcodec libx264 sillyTestVideo.avi
```
Uploaded the result to YT, as of now there are no auto-generated captions present, but the instructions suggest this might take a while.
uspol, doubting one's sanity, empiricism, wasting resources?
On second look, if I'm understanding the UI correctly it generated captions already, but they are _empty_. There is a warning it might not generate proper captions if there are multiple people speaking, so maybe that's a problem. That would make the results inconclusive again. Oh well, I can wait just to make sure before declaring that.
uspol, doubting one's sanity, empiricism, wasting resources?
Well, that ended up silly. YT managed to autogenerate captions, but not for the chant, but for some barely audible person talking close to the person recording. And all the words it identified were "el bote no". Waiting for Q theories how this proves these were Mexican antifa who entered the capitol by ship and had problems escaping.
At least this is a very clear inconclusive result. I'll continue tomorrow with the other tests, but the odds of me needing to use actual money on this are rising.
uspol, doubting one's sanity, empiricism, wasting resources?
# Test 2
Apparently speech-to-text is something only professionals usually do, because the tools I managed to find are not especially easy to use. For now I managed to get julius, followed instructions on its GitHub substituting the file I wanted for the test file. It needed to be converted as follows:
```
ffmpeg -i chant.aac -ar 16000 -map_channel 0.0.0 chantL.wav -ar 16000 -map_channel 0.0.1 chantR.wav
```
The two channels were actually indistingiushable as far as I (and julius btw) can tell. Unfortunately all it recognized was "details had", which means it probably also picked up some random person talking, treating the chant as background noise.
I'll try cutting the file into smaller bits (bit per word, where I think they are most clear), since I will need to do this for further steps anyway, and check whether this helps.
uspol, doubting one's sanity, empiricism, wasting resources?
tl;dr Did not help.
The first word is recognized as "oh", the second as "five", the last as "but added". These are so nonsensical (especially the last one) that I believe they provide no evidence one way or another ("five" kinda sounds like "Mike"? pfffft), except for julius being terrible at transcribing chants. _Maaaybe_ this is tiny evidence towards 3., since a chant that's incomprehensible to programs might also be incomprehensible to humans.
I'll try at least one more software of this kind, but at this point I believe mturk will be necessary.
uspol, doubting one's sanity, empiricism, wasting resources!
Next I used this: https://github.com/facebookresearch/flashlight/tree/master/flashlight/app/asr/tutorial
It did not detect any words in the first clip and the word "one" in both the other clips. This suggests it again was picking up on noise different from the chant. It also didn't detect anything on the full chant.
Finally I tried Vosk. Did not detect anything on any file.
Welp, MTurk it is. But not today.
uspol, doubting one's sanity, empiricism, wasting resources!
There we go, sent both the full chant (without repetitions, I just picked the IMO clearest sounding instance) and single words (cut from the full chant). I requested 20 answers for every data piece, which should be enough for reasonable evidence (unless the responses are atrocious quality). I expect some people will be familiar with the full chant, so answers which correctly identify the given name present there are weaker evidence. With the single words this problem should be somewhat mitigated.
Still hoping I'm not theonly one hearing "Hang my pants!" when ignoring the context.
uspol, doubting one's sanity, empiricism, wasting resources!
So lets start with the predictably most disappointing, the full chant. Four people were clearly familiar with the chant, divided equally between the "standard" interpretations. One more person had an interpretation that was not exactly one of the standard ones, but close enough to make me suspect they were also familiar. Two further people had interpretations that were clearly made through careful listening, somewhat phonetically close to one of the standard interpretations (one each, lol). Ten people had done a terrible job and returned nonsense, wild guesses or just claims it's unintelligible. Three people had tried, but their interpretations are not close to either of the "standard" ones, and phonetic similarities are unclear.
This is relatively strong evidence for 3, and against both 1 and 2.
uspol, doubting one's sanity, empiricism, wasting resources!
@timorl I'd be curious to see the specific responses given, but thats not surprising. When I heard it while it did seem to sound closest to "we want pence" it also wasnt the clearest audio. I noticed this more when i tried to listen for your version of the audio while i couldnt hear what you were hearing i could fool (or not fool as the case may be) my brain into hearing it as unintelligible in certain parts.
Well at least we settled the original issue that it was not a clear cut case of chanting for death.
uspol, doubting one's sanity, empiricism, wasting resources!
@freemo Eh, since you brough up the previous discussion I was curious about one more thing, still related to the chant, I hope you don't mind me asking, but if you do maybe stop reading or feel free to ignore.
How does "We want Pence" make sense as a chant? The closest I see in the context is a thinly veiled threat, instead of an explicit one. Do you see something better or is this how you understood it?
Interestingly, one of the interpretations in the file imo makes more sense as a completely non-confrotational chant (and in general as a chant in the context) -- "We want best". It's a bit stretched, but "best" as a kind-of-nickname for Trump who often insisted on being best at many things does make some sense. On the other hand, his supporters rarely call him by anything other than his name, so ehhh... a bit stretched.
uspol, doubting one's sanity, empiricism, wasting resources!
> Eh, since you brough up the previous discussion I was curious about one more thing, still related to the chant, I hope you don't mind me asking, but if you do maybe stop reading or feel free to ignore.
I dont mind you asking, and in general I dont mind you casually bringing up points from it. I just didnt want to continue a point by point that was unproductive.
> How does "We want Pence" make sense as a chant? The closest I see in the context is a thinly veiled threat, instead of an explicit one. Do you see something better or is this how you understood it?
Hard to say, since as a group they didnt exhibit any violence towards people or politicians (as we covered), and that even when violence was done against them. In fact of the 5 people killed 4 of the were unarmed pro-trump protestors, the 5th was a cop who was actively beating the crowd with his baton when he was struck. So even the data we have from the day shows that they were not violent and were far more the victim than the aggressor in terms of engaging in human violence.
So with that said it paints a picture for me what the crowd likely would have done should Pence have shown up, probably surrounded him and yelled profanities at him until he slinked away out of sight, presuming the cops didn't escalate things, as they tend to do, and start shooting or beating even more people (at which point its hard to say if he would have remained safe).
they clearly didnt like pence though and they clearly wanted to let him know that face to face. I was not implying that it was a chant to show their **support** of pence, I was only claiming that it was not a chant of murder.
> Interestingly, one of the interpretations in the file imo makes more sense as a completely non-confrotational chant (and in general as a chant in the context) -- "We want best".
I personally would dismiss this as an explanation, too stretched.
uspol, doubting one's sanity, empiricism, wasting resources!
This is the first time you refered to it as threatening/aggressive behavior rather than violence. That is a big difference in my eyes. There is a huge chasm between being physically violent towards a human being vs breaking a window, one I will call violence, the other I would agree is aggressive and even threatening, but isnt violence.
There is also the matter of considering who strikes a human first in an encounter of mutual violence to determine who is in the right in my mind.
So take the example of the police officer being beaten by the flagpole.. had he at any point earlier int he day beaten people in the crowd with his baton? Every single incident i saw where a police officer was even threatened (such as the one running through the halls as people chassed him while he periodically stopped to beat them a little with his baton)... in every case it can be seen the police officer initiating the violence by beating them with their baton or similar, and only after that (and usually repeated examples of that) did the crowd respond with any level of violence.
Even then those examples are very rare, in one case we have the officer who died who was reported and admitted to be actively beating the crowd with his baton before finally being struck, and in the other case with the flagpole, which we dont have video evidence to know, but seems possible he was doing the exact same thing.
Only other physical violence of any kind I know of from the events were explicitly non-lethal but also in like kind. For example they were being actively tear/pepper gassed by police and one protestor responded by pepper spraying the police in return.
REgardless overall it appears far more violence was done on them then they did in return, as is evidence by the death toll being almost exclusively trump supporters.
uspol, doubting one's sanity, empiricism, wasting resources!
@timorl depends on how they do it.. breaking a window isnt violence against a person, someone standing in your way and you simply walking past them anyway (pushing through them) also isnt violence...
Now if someone stands in your way and you beat them with a stick till they move, then that IS violence.
Ironically the only one (for the most part) being anyone with sticks were the cops.
I would say that they "physically overpowered" the cops, I would also say that a small number of people (the ones breaking glass) were "destructive of property"... but violence towards an individual is not something that is guaranteed as a matter of course simply by walking into a building you arent allowed or walking through a line of people standing in your way, particularly when no one actually struck a cop to do it
uspol, doubting one's sanity, empiricism, wasting resources!
@freemo Differentiating "physically overpowered" and deliberately walking into someone from violence seems wrong to me. You think these are things to which one cannot respond with violence? In particular this principle applied to the former seems like it would put people initiating "phisical overpowering" at a very advantegeous position, while the victim of that overpowering in a dangerous one. And applied to the latter, especially in the case of crowds, it would make any kind of crowd control impossible.
And I think we might have gotten to the crux of our disagreement, strangely. I think the above paragraph pretty well explains why I think the whole debacle was so dangerous. The insurrectionists were trying to put themselves in positions in which they were physically advantaged over congresspeople through the fact that there were many of them (a whole mob in a room). I think that preventing angry entities (whether people or mobs) from ending up in positions in which their possible victims can no longer protect themselves even through violence, can be done using violence. I mentioned this once as a basic principle of policing in most modern countries, but this is also a general rule -- if a person is forcing themselves on another person I wouldnt wait for them to throw the first punch before attempting to stop them, and I would consider the latter person already a victim of violence.
uspol, doubting one's sanity, empiricism, wasting resources!
@timorl In my view if i were walking down the street and you happened to be standing in my way and I said "move" and you didnt, and i walked through you, causing you to stumble to the side unharmed, then no that isnt violence, violence implies you have been harmed in some way. If you were to respond to taht by taking a baton and bashing me over the head then I would say you were the one initiating violence.
> . And applied to the latter, especially in the case of crowds, it would make any kind of crowd control impossible.
non violent crowd control is not just possible, its far more effective., you have pepper sprays, physical barriers, sound weapons, and a host of non-violent tools that are far far more effective than physical beatings.
Keep in mind I did say someone being pushed aside, such as a cop in this case is inappropriate, and wrong.. but it isnt violence.
Ultimately its about escalation, it is absolutely unacceptable to respond to one act with another significantly more violent act.. you do not repond to someone pushing through you by beating them over the head with a stick as they did, you do not shoot them to death when unarmed and havent struck anyone, as they did, etc.
We even uphold this principle in our own laws (though sadly as of late cops are above the law), it is the law of proportionality, you are allowed to defend yourself but only with similar force as that which you are threatened. so for example shooting an unarmed non violent protest dead for bashing in a window, however wrong that may be, is not proportional and makes YOU the bad guy, not them.
uspol, doubting one's sanity, empiricism, wasting resources!
@freemo Then we clearly have a very different model of what constitutes violence, or rather when violence might be an appropriate response. (Whether you think a specific action is violence is just semantics, but the response criterion is crucial.) I believe in situations in which someone with more resources is doing things which lowers my chances of defending myself from unfairness I can respond disproportionately to their actions. Agh, I think I am a bit too tired to explain this properly with examples, because it's a delicate thing and definitely needs to be carefully defined to work. And my laptop batterry is running out, aaaa, see you tomorrow (written after the below obviously lol)
And I agree that escalations are not good in general, and better ways of crowd control exist. Unfortunately the policemen seemed to be extremely underprepared for what happened, and couldn't have employed less violent and more appropriate measures because of that. That's also why I was stressing multiple times that an investigation into why they were so underprepared is crucial -- were they better prepared, almost surely fewer people would have died.
uspol, doubting one's sanity, empiricism, wasting resources!
@freemo And to express my thoughts a bit more clearly here too:
I wanted to write a very long post about general philosophy and sociology of law here, but I think it's a bit much for a toot (even with my love for creating walls of text). So the very short version is – we, as humanity, have worked very hard to figure out when escalations of violence (treated as a spectrum not a binary thing it this case; maybe aggression would be a better word? I hope you know what I mean) are a good thing to allow within a society. At this point it's relatively clear that in most cases it's much better to deescalate rather than escalate, but there are several relatively obvious exceptions. The clearest one is probably theft – it seems generally agreed upon that someone attempting to steal something can be stopped through violence, at least of the restraining kind. With pure proportional response, the only option would be to be allowed to steal something back, but a society in which such payback-stealing would be the only recourse for theft victims would not be a nice one to live in. And that's without even going into problems around the value of stolen items, and how it differs between the people involved.
Oh, and in most modern societies one of the roles of police is to handle all the cases where escalation was deemed appropriate, to prevent visious circles of escalation. Just a sidenote to avoid some confusion that might have appeared here.
To get closer to the problem at hand and clarify what my general positions on escalations in this case is, let me give an example. Say someone is trying to push through me while I'm standing in the door to my house. If that person is not stronger than me, I can just prevent them from entering by standing my ground, and this is the ideal situation – no escalation. If they are much weaker than me, I can even let them in to possibly deescalate and then figure out how to get them out later, possibly with use of some violence if necessary, but hopefully without much conflict at all. But if they are much stronger than me, then preventing them from entering becomes much more important – after they entered I (and additionally people present in my house) would be in an even worse position to do anything about that person. In such a case I would say escalating violence might be the correct course of action, depending on the difference in strength and the expected intentions of the pushing person – if I don't escalate now, the next time I could escalate I would not only be in a worse position, but also the person might already have done some damage. If that person already expressed anger at the people in my house, I would most likely feel justified in using violence to prevent them from entering. If they issued direct threats towards the people in my house the case is so clear cut it can probably be used to teach the art of jewelery making.
I would say the Capitol situation is pretty analogous at the very least to the anger case, up to the threat case, depending on what exactly they were chanting. The mob was big enough that it was stronger than the police, and so violent response was justified. It would _not_ have been justified if police presence was adequate, since this would have been analogous to the situation where I can just stand my ground, which is again why I am stressing why the investigation into _why_ the police presence was so low is so important.
uspol, doubting one's sanity, empiricism, wasting resources!
@freemo I still think they were violent, I would say trying to enter a building by brute force with people standing in your way is necessarily violent. They weren't violent only when they were not being opposed. The fact that they were aggressive/threatening follows from both the violence and the (explicit or implicit) threats.
Also, I saw you linked an article claiming that Twitter blocked the "Hang Mike Pence" hashtag -- do you happen to know if "We want Pence" was trending at the same time? This seems like relevant evidence as to what was being chanted, if we have the data.