louis030195/screen-pipe: Turn your screen into actions (using LLMs). Inspired by adept.ai, rewind.ai, Apple Shortcut. Rust + WASM.

Turn your screen into actions (using LLMs). Inspired by adept.ai , rewind.ai , Apple Shortcut . Rust + WASM.

Screen to action using LLMs

Here's an example of server-side code written in TypeScript that takes the streamed data from ScreenPipe and uses a Large Language Model like OpenAI's to process text and images for analyzing sales conversations:

import { ScreenPipe } from "screenpipe" ; import { generateObject } from 'ai' ; import { z } from 'zod' ; const screenPipe = new ScreenPipe ( ) ; export async function onTick ( ) { const data = await screenPipe . tick ( [ 1 ] , { frames : 60 } ) ; // or screen [1, 2, 3, ...] // [{frame: [...], text: [...], metadata: [...]}, ...] const { object } = await generateObject ( { model : openai ( "gpt4-o" ) , schema : z . object ( { leads : z . array ( z . object ( { name : z . string ( ) , company : z . string ( ) , role : z . string ( ) , status : z . string ( ) , messages : z . array ( z . string ( ) ) , } ) , } ) ) , prompt : "Fill salesforce CRM based on Bob's sales activity (this is what appeared on his screen): " + data . map ( ( frame ) => frame . text ) . join ( "

" ) , } ) ; // Add to Salesforce API ... }

Status

Alpha: runs on my computer. Capture things and do things.

Usage

Keep in mind that it's still experimental but got a working prototype, see the Related projects section.

To try the current version, which capture your screen and extract the text, run:

git clone https://github.com/louis030195/screen-pipe cd screen-pipe

Then, in one terminal run the OCR API (just a temporary hack until something cleaner):

virtualenv env source env/bin/activate pip install fastapi uvicorn pytesseract pillow uvicorn main:app --reload

And (you need Rust + Cargo installed) the Rust CLI:

cargo install --path screenpipe screenpipe

Check the target/screenshots directory now :)

Sample file Basically ends up with bunch of image + JSON pairs with the OCR: {"text":"{\"text\":\"e P& screen-pipe Ow 33\

Oo Pe Av @ Cargo.toml M ® main.rs MX {} ® main.py 1, U cay\

\\\\Y OPEN EDITORS screenpipe > src > ® main.rs > @ process_image\

X ® main.rs screenpipe/src M let text = response.text().await?;\

{} Ok( text)\

@ main.py 1,U }\

\\\\ SCREEN-PIPE\

> __pycache_ async fn process_image(filename: String) {\

> .github // Example async post-processing function\

> cy // Perform tasks like extracting text or making API calls here\

Mrecicervine println!(\\\"Processing image asynchronously: {}\\\", filename);\

v ste // Simulate async work Lr\

. j // tokio:: time :: sleep(Duration:: from_secs(1)).await;\

® main.rs M y , . .\

. y let text = call_ocr_api( &filename).await.unwrap();\

y println!(\\\"OCR result: {}\\\", text);\

80,\

Cargo.toml M y // Create a JSON object\

¥ y let json = serde_json::json!({ \\\"text\\\": text });\

> y let new_filename = filename.replace( \\\"\\\\png\\\", \\\",json\\\");\

> y let mut file = File::create( new_filename).unwrap();\

> y file.write_all( json.to_string().as_bytes()).unwrap();\

> ¥\

> fn screenpipe(path: &Sstr, interval: f32, running: Arc<AtomicBool>) {\

{} // delete and recreate the directory\

PROBLEMS 9 OUTPUT DEBUGCONSOLE TERMINAL PORTS tu we A xX\

® .gitignore Found 1 monitors | (9 screenpipe A\

Screenshots will be saved to target/screenshots =\

Interval: @ seconds (g bash A\

Cargo.toml Press Ctrl+C to stop I @ Python A\

® Cross.toml\

$ install.sh TN TN TN TN JN /_/\\\\ TIN TN TW\

f{ LICENSE.md J f:/_ / /:/ VEEN J f:/_ J f:/_ \\\\OA\\\\:\\\\ J /::\\\\ 1 IN I PS I Pl.\

nr . ar wn ee a eee ee ee ee VAN J L:1\\\\:\\\\1 131 J 1:1\\\\:\\\\7_ 1:7 /N\

> CET PS? 11U 1 i ee ee ee ee ee A 9 hod? ee ee 9 ey ee 2 —--\\\\--\\\\:\\\\ J f:/~1:1__12\\\\ ee shed et ee\

@ README.md 7 Aa AY ee ee A 9 a ® ® ® a ® \\\\ JA 1 121N\\\\_NIN?\\\\_S S31 121131 1:1 /N\

NONE /e7N ONIN ZN NM ttt NEE NO NAA IN \\\\i\\\\ee\\\\e\\\\/ \\\\A\\\\HW:/ NONE AINA 27\

NONE 722 NNN LSD NX Nit/eere NONE DD NONE Lt NONI ee \\\\ \\\\is/ \\\\Ne/\\\\ N\\\\esZ NO NitZ /:/\

VW Jf NNN NNN VON: NONE NNN \\\\O\\\\N\\\\ J_S:/ \\\\ \\\\:\\\\ XN NINA:/\

/_/:/ \\\\ \\\\is/ VAN \\\\ \\\\is/ \\\\ \\\\sa/ \\\\O\\\\N\\\\ \\\\O\\\\N\\\\ \\\\_V \\\\ Vt \\\\ \\\\s2/\

\\\\__V \\\\__V \\\\__V \\\\__V \\\\__V \\\\__V \\\\__V \\\\__V \\\\__V\

* Update Cursor? I\

> BRILLIANT Ai. .--.-. --- eee eee\

\"}"} And the idea is to feed this to an LLM that do rest of the work

screen-pipe.mp4

Why open source?

Recent breakthroughs in AI have shown that context is the final frontier. AI will soon be able to incorporate the context of an entire human life into its 'prompt', and the technologies that enable this kind of personalisation should be available to all developers to accelerate access to the next stage of our evolution.

Principles

This is a library intended to stick to simple use case:

record the screen & associated metadata (generated locally or in the cloud) and pipe it somewhere (local, cloud)

Think of this as an API that let's you do this:

screenpipe | ocr | llm " turn what i see into my CRM " | api " send data to salesforce api "

Any interfaces are out of scope and should be built outside this repo, for example:

UI to search on these files (like rewind)

UI to spy on your employees

etc.

Contributing

Contributions are welcome! If you'd like to contribute, please fork the repository and use a feature branch. Pull requests are warmly welcome.

Say ???? in our public Discord channel . We discuss how to bring this lib to production, help each other with contributions, personal projects or just hang out ☕.

Licensing

The code in this project is licensed under MIT license. See the LICENSE file for more information.

Related projects

This is a very quick & dirty example of the end goal that works in a few lines of python: https://github.com/louis030195/screen-to-crm