From Money Laundering to Data Laundering
From Money Laundering to Data Laundering: How Big Tech’s genAI has become the World’s Biggest Data Laundering Machine.
How Silicon Valley’s tech titans use transformer models to systematically launder user and business data.
How they use tech model replications by fractal fine-tuning to endlessly clean data and bleach information similar to financial money laundering schemes.
We argue that the illicit data traffic and transformation transcends by all means the harm done to the current data economy. Beyond data scraping, copyright infringements and IP violations, we witness an attack and techno-putsch on our human heritage.
From Meaning to Means. From Connectors to Disruptors.
Beyond data, our words and thoughts, and the attached social and cultural connotations, give us the needed meaning for our all togetherness. They shape us as humans. Linguistically, semiologically, behaviorwise, cognitively and emotionally.
If words and meanings are hollowed out, they float senselessly and directionlessly from sender to receiver to sender to receiver, and become easy targets for manipulation and disorientation.
Like many authors and businesses, we at The House of Ethics™ have become extremely careful in publishing articles since proprietary work, words, thoughts and concepts are systematically scraped by big data hoovers.
Like many businesses and independent entrepreneurs, it happened to us too. Our novel concept of collective, emerging and agile ethics, Swarm Ethics™ has been lavishly reformulated and re-attributed to … nobody … by genAI-powered search engines.
GenAI-powered search engines or llm-based models like ChatGPT “legitimately” incorporate and appropriate proprietary work, en masse.
The strategically planned system behind It models systematically uses an inverted data processing scheme similar to basic money laundering constructs.
The modus operandi of data laundering
There are several programs to clean data and bleach information. How has traditional Machine Learning (ML) has been turned into “de-genius” Data Machine Laundering (DML) by LLMs?
Several methods at work:
- Plain old 1:1 plagiarism which is most common;
- Data Laundering – Step 1: producing phantom data. By processing scraped data in the genAI “transformer washing drum”, the model-inherent collaging of tokenized words results in “stateless or ownerless” words and text. By removing any traces to the original authors or by deleting sourcing links, the initial data is simply turned into washed and cleaned data.
- Data Laundering – Step 2: generating orphan information. Once referred data has been turned into phantom data, the next step is morphing it into orphan information. Some data just remains at the second level of phantom data. LLMs use no traditional referentials but turn on tokenization; the outcome may or may not be understood, contextualized and be meaningful.
Orphan information is easy prey on the all-you-can-eat free genAI data buffet.
- Data Laundering -Step 3: creating an infosphere nomansland where mis- and disinformation, infringement and IP violations are endlessly spinning in virtuous drum rounds. Many refer to the new genAI playground as the “Digital Wild West”. A nomansland without strict nor universal rules (yet).
So by first bleaching original, referenced data, then meshing it into bleached information and ultimately recoloring it (disentangling and reentangling) to generative AI black boxes, initial data ownerships, references, sources, authorships are simply wiped out.
Everything re-appear as newly generated content by ChatGPT, Bing, Google, Anthropic, Ernie, Mixtral, and the like, not only requesting new data ownership but also pretending referential knowledge.
How could this worldwide data laundering coup succeed?
Because it is systematically operated on a systemic level. And it happened at high-speed and large-scale, decentralized, and simultaneously.
Like money laundering, data laundering is a decentralized and tentacular scheme and system far beyond a localized organization.
Data laundering follows the easy-to-observe but hard-to-trace and prove cycle of traditional money laundering.
1) concealing of illicit origin and unlawful activities;
2) conversion into cleaned money (data) assets which appear to have a legitimate origin;
3) re-integration into the legitimate financial (data) economy.
The last step of re-integrating the freshly cleaned data (acquired illegitimately) into the legitimate Data Economy (and now press with major deals like The Atlantic-OpenAI deal, and many more) – the third step seals the final round of successful clean-washed data laundering.
If further proceeding in analyzing the sophisticated data laundering scheme, one might mention the negatively brilliant way how genAI data laundering is paradoxically using privacy preserving concepts like deidentification or tokenization to successfully clean data and bleach information.
Finally, beyond calling on fairness, responsibility or integrity, it becomes obvious that the legal cursors urgently need to be re-calibrated to prevent further and deeper hacks of our unique human heritage.
Hopefully the smoke screens of the human extinction narratives will no longer contribute to blindfold people, and distract them to recognize imminent risks at hand.
There are so many beneficial and purposeful goals we can invest our energy, money and passion in, and build a thriving responsible, innovative and sustainable future for All.