Friday, December 20, 2024

Create searchable Bluesky bookmarks with R


new_my_likes 

Mix the brand new and previous knowledge:


deduped_my_likes 

And, lastly, save the up to date knowledge by overwriting the previous file:


rio::export(deduped_my_likes, 'my_likes.parquet')

Step 4. View and search your knowledge the standard manner

I prefer to create a model of this knowledge particularly to make use of in a searchable desk. It features a hyperlink on the finish of every put up’s textual content to the unique put up on Bluesky, letting me simply view any photos, replies, dad and mom, or threads that aren’t in a put up’s plain textual content. I additionally take away some columns I don’t want within the desk.


my_likes_for_table 
   mutate(
     Publish = str_glue("{Publish} >>"),
     ExternalURL = ifelse(!is.na(ExternalURL), str_glue("{substr(ExternalURL, 1, 25)}..."), "")
    
     ) |>
  choose(Publish, Title, CreatedAt, ExternalURL)

Right here’s one technique to create a searchable HTML desk of that knowledge, utilizing the DT package deal:


DT::datatable(my_likes_for_table, rownames = FALSE, filter="high", escape = FALSE, choices = listing(pageLength = 25, autoWidth = TRUE, filter = "high", lengthMenu = c(25, 50, 75, 100), searchHighlight = TRUE,
                  search = listing(regex = TRUE)
                                                                           
      )
)

This desk has a table-wide search field on the high proper and search filters for every column, so I can seek for two phrases in my desk, such because the #rstats hashtag in the primary search bar after which any put up the place the textual content accommodates LLM (the desk’s search isn’t case delicate) within the Publish column filter bar. Or, as a result of I enabled common expression looking with the search = listing(regex = TRUE) choice, I might use a single regexp lookahead sample (?=.rstats)(?=.(LLM)) within the search field.

bluesky_table_03

IDG

Generative AI chatbots like ChatGPT and Claude might be fairly good at writing advanced common expressions. And with matching textual content highlights turned on within the desk, will probably be straightforward so that you can see whether or not the regexp is doing what you need.

Question your Bluesky likes with an LLM

The best free manner to make use of generative AI to question these posts is by importing the information file to a service of your alternative. I’ve had good outcomes with Google’s NotebookLM, which is free and exhibits you the supply textual content for its solutions. NotebookLM has a beneficiant file restrict of 500,000 phrases or 200MB per supply, and Google says it received’t prepare its massive language fashions (LLMs) in your knowledge.

The question “Somebody talked about an R package deal with science-related coloration palettes” pulled up the actual put up I used to be considering of — one which I had preferred after which re-posted with my very own feedback. And I didn’t have to present NotebookLLM my very own prompts or directions to inform it that I wished to 1) use solely that doc for solutions, and a couple of) see the supply textual content it used to generate its response. All I needed to do was ask my query.

bluesky_noteboklm_04a

IDG

I formatted the information to be a bit extra helpful and fewer wasteful by limiting CreatedAt to dates with out instances, conserving the put up URL as a separate column (as an alternative of a clickable hyperlink with added HTML), and deleting the exterior URLs column. I saved that slimmer model as a .txt and never .csv file, since NotebookLM doesn’t deal with .csv extentions.


my_likes_for_ai 
   mutate(CreatedAt = substr(CreatedAt, 1, 10)) |>
  choose(Publish, Title, CreatedAt, URL)

rio::export(my_likes_for_ai, "my_likes_for_ai.txt")

After importing your likes file to NotebookLM, you’ll be able to ask questions immediately as soon as the file is processed.

bluesky_noteboklm_04

IDG

In case you actually wished to question the doc inside R as an alternative of utilizing an exterior service, one choice is the Elmer Assistant, a venture on GitHub. It must be pretty simple to change its immediate and supply data to your wants. Nevertheless, I haven’t had nice luck operating this regionally, although I’ve a reasonably strong Home windows PC.

Replace your likes by scheduling the script to run robotically

With the intention to be helpful, you’ll must hold the underlying “posts I’ve preferred” knowledge updated. I run my script manually on my native machine periodically once I’m energetic on Bluesky, however you may also schedule the script to run robotically day-after-day or as soon as every week. Listed here are three choices:

  • Run a script regionally. In case you’re not too nervous about your script all the time operating on a precise schedule, instruments reminiscent of taskscheduleR for Home windows or cronR for Mac or Linux will help you run your R scripts robotically.
  • Use GitHub Actions. Johannes Gruber, the creator of the atrrr package deal, describes how he makes use of free GitHub Actions to run his R Bloggers Bluesky bot. His directions might be modified for different R scripts.
  • Run a script on a cloud server. Or you would use an occasion on a public cloud reminiscent of Digital Ocean plus a cron job.

You might have considered trying a model of your Bluesky likes knowledge that doesn’t embody each put up you’ve preferred. Typically it’s possible you’ll click on like simply to acknowledge you noticed a put up, or to encourage the creator that individuals are studying, or since you discovered the put up amusing however in any other case don’t anticipate you’ll need to discover it once more.

Nevertheless, a warning: It might get onerous to manually mark bookmarks in a spreadsheet in case you like a whole lot of posts, and it’s essential to be dedicated to maintain it updated. There’s nothing unsuitable with looking by means of your total database of likes as an alternative of curating a subset with “bookmarks.”

That stated, right here’s a model of the method I’ve been utilizing. For the preliminary setup, I recommend utilizing an Excel or .csv file.

Step 1. Import your likes right into a spreadsheet and add columns

I’ll begin by importing the my_likes.parquet file and including empty Bookmark and Notes columns, after which saving that to a brand new file.


my_likes 
  mutate(Notes = as.character(""), .earlier than = 1) |>
  mutate(Bookmark = as.character(""), .after = Bookmark)

rio::export(likes_w_bookmarks, "likes_w_bookmarks.xlsx")

After some experimenting, I opted to have a Bookmark column as characters, the place I can add simply “T” or “F” in a spreadsheet, and never a logical TRUE or FALSE column. With characters, I don’t have to fret whether or not R’s Boolean fields will translate correctly if I determine to make use of this knowledge outdoors of R. The Notes column lets me add textual content to clarify why I would need to discover one thing once more.

Subsequent is the guide a part of the method: marking which likes you need to hold as bookmarks. Opening this in a spreadsheet is handy as a result of you’ll be able to click on and drag F or T down a number of cells at a time. When you have a whole lot of likes already, this can be tedious! You can determine to mark all of them “F” for now and begin bookmarking manually going ahead, which can be much less onerous.

Save the file manually again to likes_w_bookmarks.xlsx.

Step 2. Hold your spreadsheet in sync together with your likes

After that preliminary setup, you’ll need to hold the spreadsheet in sync with the information because it will get up to date. Right here’s one technique to implement that.

After updating the brand new deduped_my_likes likes file, create a bookmark test lookup, after which be a part of that together with your deduped likes file.


bookmark_check 
  choose(URL, Bookmark, Notes)

my_likes_w_bookmarks 
  relocate(Bookmark, Notes)

Now you’ve gotten a file with the brand new likes knowledge joined together with your present bookmarks knowledge, with entries on the high having no Bookmark or Notes entries but. Save that to your spreadsheet file.


rio::export(my_likes_w_bookmarks, "likes_w_bookmarks.xlsx")

An alternative choice to this considerably guide and intensive course of may very well be utilizing dplyr::filter() in your deduped likes knowledge body to take away gadgets you received’t need once more, reminiscent of posts mentioning a favourite sports activities crew or posts on sure dates when you targeted on a subject you don’t must revisit.

Subsequent steps

Need to search your personal posts as effectively? You’ll be able to pull them through the Bluesky API in an analogous workflow utilizing atrrr’s get_skeets_authored_by() perform. When you begin down this street, you’ll see there’s much more you are able to do. And also you’ll possible have firm amongst R customers.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles