Need emojis for a project?
OpenMoji is a collection of free open source emojis. 🥳
Further while trying to extract and format data from PDFs using #python #PyMuPDF.
I was trying to create a perfect chain of functions that would format all the edge cases into the final desired #HTML format. This is where I quickly realized running every tweaked version of the functions on the 100 page PDF is quite time consuming.
Instead I can run it once and save the results in a #sqlite database. Then create #sql queries to do post processing on the edge cases while having a good enough way to observe the contents of each page over the pervious method of posting the output into the #terminal and scrolling to the desired page. And in the end, I am one step closer of having the data in a #csv file, which is easily exported with #Dbeaver.
Currently trying to extract and format data from PDFs using #python #PyMuPDF.
Initially used the `get_text(value)` method with the `"text"` value, only to learn that I could have potentially saved time directly using the `"html"` value, since I have been creating pattern matchers to format the text into #HTML.
After investigation, although the html option exists, the post processing is more strenuous than the initial approach.
My fascination with the `get_text(value)` method is that each value packages the data differently. Where as `"html"` puts the text in `<p><span>text</span></p>`, `"xhtml"` puts it instead in `<h1>text</h1>`.
When starting a new #programming project my preferred methods are 'cowboy coding' and 'jumping in deep end'. This way I can get a feel for the ecosystem and learn all the ways not to do it.
The initial goal is to get it to work and make it maintainable. Later one can always improve it and automate lots of processes.
The downside of such approach, especially if one already knows another #coding language, is that one is more likely than not, not going to follow the best practices and thereby create a Frankenstein project.
This is where documentation should be added, so that if one comes back to the project, one can more easily pick up where one left off.
I'm afraid of a world where we effectively lost democracy and individual agency.
There is enough to go around to allow everyone to live a good life. And AI has the opportunity to add even more value to the world. But this will go with huge disruptions. How we distribute the wealth, value and power in the world is going to be one of the major questions of the 21st century. Again.
7/7
Further it seems that lots of #LLM are not familiar of the unmaintained nature of the packaging tool, thus when asking questions regarding how to setup within #LazyNvim it will try to resolve the question using #PackerNvim.
Just realized that #PackerNvim is unmaintained since August 2023 and it suggests to use either #LazyNvim or #PckrNvim. Thus went with the prior and could have potentially resolved a lot of headaches of the past couple of months dealing with breaking plugins.
After backing up and tagging the final packer #NeoVim config version, took the opportunity to set up the starter bundle in the existing git repo.
What astonished me is how similar the starter config aligned with the previous config, especially the keymapping.
Sabot in the Age of AI
Here is a curated list of strategies, offensive methods, and tactics for (algorithmic) sabotage, disruption, and deliberate poisoning.
🔻 iocaine
The deadliest AI poison—iocaine generates garbage rather than slowing crawlers.
🔗 https://git.madhouse-project.org/algernon/iocaine
🔻 Nepenthes
A tarpit designed to catch web crawlers, especially those scraping for LLMs. It devours anything that gets too close. @aaron
🔗 https://zadzmo.org/code/nepenthes/
🔻 Quixotic
Feeds fake content to bots and robots.txt-ignoring #LLM scrapers. @marcusb
🔗 https://marcusb.org/hacks/quixotic.html
🔻 Poison the WeLLMs
A reverse-proxy that serves diassociated-press style reimaginings of your upstream pages, poisoning any LLMs that scrape your content. @mike
🔗 https://codeberg.org/MikeCoats/poison-the-wellms
🔻 Django-llm-poison
A django app that poisons content when served to #AI bots. @Fingel
🔗 https://github.com/Fingel/django-llm-poison
🔻 KonterfAI
A model poisoner that generates nonsense content to degenerate LLMs.
🔗 https://codeberg.org/konterfai/konterfai
While looking into how to drop a #git commit, I have realized that rewriting the history might be a better option. This option is typically used if one wants to change the email or name of the author.
https://git-scm.com/book/en/v2/Git-Tools-Rewriting-History#_changing_email_addresses_globally
The example code from the site is
```
$ git filter-branch --commit-filter '
if [ "$GIT_AUTHOR_EMAIL" = "schacon@localhost" ];
then
GIT_AUTHOR_NAME="Scott Chacon";
GIT_AUTHOR_EMAIL="schacon@example.com";
git commit-tree "$@";
else
git commit-tree "$@";
fi' HEAD
```
One might need to force the function if one decides to run it multiple times for various `$GIT_AUTHOR_EMAIL`. Alternately, one could append the other emails with the OR operator.
TIL big specialized forums have started backdating millions of LLM-generated posts. Now you cannot be sure a reply from 2009 on some forum for physics or maps or flower or drill enthusiasts haven't been machine-generated and totally wrong.
Creating a star rating system using a single HTML element, CSS, and a single JS command.
https://alvaromontoro.com/blog/68069/single-element-star-rating-system
The idea of using just #CSS to fingerprint email clients and browsers is wild. The approach suggested in this repo
https://github.com/cispa/cascading-spy-sheets
and further #research paper
explain the technique that works even if #javascript is disabled.
They further explain that they reached out both to #Tor and #BraveBrowser where such exploits should be mitigated.
One example where such an exploit can cause even more precision is when it is incorporated into #phishing attacks. Since the exploit was also able to depict the operating system, meaning one could combine existing exploits for a more targeted attack.
Digital-free #cardio is something we should all enjoy more often. Especially in today's current society constantly being bombarded with information. Taking a couple hours of being in the present will question after the situation of how much time we put into distractions just because one doesn't want to be with one's own thoughts and/or doesn't like being bored.
And for those that need to track the fitness activity, one just needs to remember the start and end times plus the route.
After programming a good 2 months in #python finally found the tool #poetry which is quite similar to how #nodejs bundles libraries into a directory either locally or globally in the cache directory.
I have tried the other tools from #pyenv to #venv and/or #virtualenv. Where I thought they were used to deal with library dependency management only to realize that they are more like #nvm.
I did use #conda for some time, though preferred a python only solution. I do realize that poetry won't resolve all issues and might need to look into to containerization later on, though for the time period it looks like a good solution.
After programming a good 2 months in #python finally found the tool #poetry which is quite similar to how #nodejs bundles libraries into a directory either locally or globally in the cache directory.
I have tried the other tools from #pyenv to #venv and/or #virtualenv. Where I thought they were used to deal with library dependency management only to realize that they are more like #nvm.
I did use #conda for some time, though preferred a python only solution. I do realize that poetry won't resolve all issues and might need to look into to containerization later on, though for the time period it looks like a good solution.
I've been a full-time developer for over 10 years, and blindly copy-pasting shell commands from the internet is still my go-to mechanism for solving new problems 💪
To this day I still don't know a single character of awk or sed syntax, and I'd like to thank the community for letting me keep it that way.
Raising $400K in under 48 hours has been absolutely unreal—thank you so much for the support! We’ve added new stretch goals: new colors and transparent cases for both Pilet 5 and Pilet 7, and a detachable keyboard/gamepad module for Pilet 7. Let’s make it happen. https://www.kickstarter.com/projects/soulscircuit/pilet-opensource-modular-portable-mini-computer
I am a strong proponent of leaving this planet better behind than when I arrived on it. Thus to get the most bang for a lifetime my key focus is #longevity which I attempt to achieve with #nutrition specifically #plantbased.
Longevity is good and all as long as you are not frail and weak. Ideally would be to die young at an old age. Thus I incorporate tactics from #biohacking and #primalfitness. Additionally I am an advocate of #wildcrafting, which is a super set of #herbalism.
Studied many fields of science like maths or statistics, though the constant was always computer science.
Currently working as a fullstack web developer, though prefer to call myself a #SoftwareCrafter.
The goal of my side projects is to practice #GreenDevelopement meaning to create mainly static websites. The way the internet was intended to be.
On the artistic side, to dub all content under the Creative Commons license. Thereby, ideally, only using tools and resources that are #FLOSS #OpenSource. #nobot