Microsoft Copilot+ Recall AI Looks Awesome!

| Comments

First of all, I don’t trust Microsoft to do a good job keeping our data safe. My recollection of history makes me feel that Microsoft’s execution regarding security often starts off poor, and there would potential for horribly incriminating or financially costly information to accidentally leak will be immense.

Stable Diffusion Kid at a terminal

There are a lot of posts on social media and news stories that are implying that the sky is falling. These posts all remind me of when people starting syncing everything to the cloud, and similar to the arguments that were made when people started posting everything to social media.

I don’t think this is fear mongering. I don’t think these posts are wrong. There is absolutely potential here for both new types of cyber crime and user error, and I can understand being fearful that your employee’s entire time spent on their computer could be subpoenaed. It is even more terrifying that this could also enable employers to scrub years of work of each employee looking for excuses to fire them.

Not only that, but I think a lot of the fears about storing your data in the cloud are extremely valid! I only store my important, private data in my own private cloud. Seafile works very much like Dropbox, except I own my Seafile server, and all my data is encrypted by the open-source Seafile client before being transmitted to my server.

I am not worried about any of this today. I don’t run Windows. I don’t use any Microsoft products.

I won’t be able to use Recall, but I am extremely excited about the concept and the future potential. This is the first time I have seen anything come out of the machine learning hype that could realistically save me significant time and help me avoid monotony.

I lost my trust in Microsoft in the early nineties when they put Stac Electronics out of business, and they left us stuck with Microsoft’s worse and much less reliable DoubleSpace in Stacker’s place.

Microsoft did a lot of shady things during the next couple of decades, and I have no idea if they could ever do enough to regain my trust.

Even though the title seems to imply otherwise, this post isn’t really about Microsoft or Microsoft’s specific new product. This is about the idea of compiling a database with all the text and images of essentially every moment of you computing life, and what that might be able to do for you.

I could save so much time researching blog posts!

You have absolutely no idea how much time I spend flipping between Emacs and a dozen browser tabs while writing a blog post. There are so many numbers involved in many of the things I write about, and I have to get those numbers correct.

I have to remember which tab has the documentation or notes. I have to skim the page to find the numbers that I need. Sometimes I have to do math.

Can you imagine if I had a local large language model (LLM) that had access the contents of all the recent web pages I visited, and all of my notes? How awesome would it be if that assistant was voice enabled?

  • “Hey, You! What is the max clock speed of an Intel N100?”
  • “How many kilowatt hours did I measure on my N100 server at idle?”
  • “Paste a Markdown table with the Geekbench scores of the N5095, N100, and Ryzen 5700U.”
  • “How long have I been running the Seafile Pi at Brian Moses’s house?”
  • “How long ago did I write about upgrading to a Radeon 6700 XT?”

Any of these functions would save me a minute or two, and that adds up fast!

This is a fantasy today, and I don’t think there are any open-source projects working on this today, but it sure feels like this is where things are headed.

This isn’t an entirely new idea

There was a Gnome project 20 years ago that used inotify to watch your emails, chat logs, and the text files you were editing to pop up related information. It might even be so old that it used dnotify!

If you got a ICQ message from your friend Brian asking you to meet up at the new pizza shop he’s been telling you about, it might have shown you a link to Brian’s recent email that talked about that pizza shop.

This was a really cool idea two decades ago. It was the very first thing that came to mind when I read the headline about Windows Recall, but it took me SO MUCH digging to find any sort of information about this project. I couldn’t even remember the name! It was Nat Friedman’s Dashboard.

Finding ways to connect instant messages to emails, and emails to text documents, and text documents to instant messages was extremely rudimentary when all the logic had to be built by hand with regular expressions.

I hope we see a new Gnome Dashboard in 2024!

I am hopeful that there will be a modern implementation of something like Gnome Dashboard. I want to collect regular screenshots. I want that information converted to text. I want it indexed, and I want to be able to query it!

I don’t need something as powerful a GPT-4o processing queries of my personal database. Give me a little LLM that fits in 4 GB or 8 GB of VRAM. It is OK if it sometimes thinks that March has thirty days. Even the best assistants goof up like this from time to time.

I don’t need my assistant to be Albert Einstein, and the quality of the small LLM will improve over time. The software will be optimized, and I will upgrade to a newer GPU with more VRAM every few years. Even a simple brain of an assistant would save me time today.

There are already some open-source projects popping up!

The first one I found is the rem project for macOS. Their first release was in 2023. Rem has a pretty good list of features that includes capturing the screen and recognizing text every 2 seconds. There is no built-in LLM capability, but there is a way to grab recent context from your screenshots to use as context with an LLM.

Stable Diffusion guy with a robot

There’s also Windrecorder for Microsoft Windows. The feature list seems less complete than rem’s, and I didn’t see a roadmap. Rem definitely has some of Recall’s features on their roadmap.

I don’t have macOS or Windows here, so I can’t try any of these. I haven’t yet found a project like this that runs on Linux, though I realize from the names and descriptions that the terminology used to describe these things is very different from Recall even though they do very similar things!

I understand that it isn’t that simple

I know that you can’t just feed a tiny local LLM with all the text I have ever written, all the text from months of screenshots of my activities, and all the text of the Wikipedia pages and other sites I have visited in that time. That is just way too much context to load into my little 12 GB Radeon 6700 XT.

All sorts of work has to be done to make the software summarize this stuff, and then summarize the summaries, and organize those summaries of summaries. Someone probably needs to figure out how to tie together old-school text search to find relevant document, then summarize those documents, then feed them into the LLM every time you ask a question.

What if someone steals your Microsoft Windows Recall database?!

No one will steal MY Recall database, because I won’t be able to run Windows Recall at all. I am running with the assumption that if there is any sort of open-source equivalent that I will have some amount of control over when it is active and what gets stored in the database.

It seems that Windows Recall makes no effort to keep usernames, passwords, and login screens out of the recording of your keystrokes and screenshots. If you log into your bank, it will most likely be in that database, and this makes the Recall database a juicy target.

I wouldn’t complain too much if I had to manually turn my AI tracker on when I start working, then turn it off when I start doing personal things. Maybe it would be interesting to have different databases for different projects and for personal stuff.

I would also bet that it would be pretty easy to tie in certain activities to the tracker. Maybe when you click on Bitwarden, your tracker would stop recording, and hopefully it would show you that in an extremely obvious way. If it detects that the processed text looks like it might be a private encryption key, your tracker could scrub that out of your database immediately.

Even just having an “oops button” that you can hit to have the last five minutes of activity expunged from the record would be nice.

I am ready to start recording every aspect of my computing life!

I have an incongruity here in my brain. I don’t like surveillance cameras. I don’t understand why so many people are pointing cameras at their backyards and living rooms, then plumbing those into Home Assistant. It feels icky. I don’t want to record myself lumbering around the house!

Yet I am excited about recording every second of my digital life. I believe this is because I can see an immediate value in being able to ask a virtual assistant to mine this data, and it will provide me with valuable time savings and possibly even improvements to the quality of my work from day one.

I will almost definitely be jumping on board with the first open-source screen-recording virtual assistant that shows up. I am excited to try something like this out to see to see how much it might improve my workflow, and I am excited to see where this technology will go as it improves!

What do you think? Are you skeeved out by Microsoft recording everything you do? Would you be willing to let an open-source AI assistant mine your screen for nuggets of useful data? Are you as excited about the prospects as I am? Let me know in the comments, or stop by the Butter, What?! Discord server to chat with me about it!

Comments