Obsidian-Copilot: An Assistant for Writing & Reflecting
Writing drafts via retrieval-augmented generation. Also reflecting on the week's journal entries.
What would a copilot for writing and thinking look like? To try answering this question, I built a prototype: Obsidian-Copilot. Given a section header, it helps draft a few paragraphs via retrieval-augmented generation. Also, if you write a daily journal, it can help you reflect on the past week and plan for the week ahead.
Obsidian Copilot: Helping to write drafts and reflect on the week
Here’s a short 2-minute demo. The code is available at obsidian-copilot.
How does it work?
We start by parsing documents into chunks. A sensible default is to chunk documents by token length, typically 1,500 to 3,000 tokens per chunk. However, I found that this didn’t work very well. A better approach might be to chunk by paragraphs (e.g., split on \n\n).
Given that my notes are mostly in bullet form, I chunk by top-level bullets: Each chunk is made up of a single top-level bullet and its sub-bullets. There are usually 5 to 10 sub-bullets per top-level bullet making each chunk similar in length to a paragraph.
chunks = defaultdict() current_chunk = [] chunk_idx = 0 current_header = Nonechunks = defaultdict() current_chunk = [] chunk_idx = 0 current_header = Nonefor line in lines:
if '##' in line: # Chunk header = Section header current_header = line
if line.startswith('- '): # Top-level bullet if current_chunk: # If chunks accumulated, add it to chunks if len(current_chunk) >= min_chunk_lines: chunks[chunk_idx] = current_chunk chunk_idx += 1 current_chunk = [] # Reset current chunk if current_header: current_chunk.append(current_header)
current_chunk.append(line)`
Next, we build an OpenSearch index and a semantic index on these chunks. In a previous experiment, I found that embedding-based retrieval alone might be insufficient and thus added classical search (i.e., BM25 via OpenSearch) in this prototype.
For OpenSearch, we start by configuring filters and fields. We include filters such as stripping HTML, removing possessives (i.e., the trailing ‘s in words), removing stopwords, and basic stemming. These filters are applied on both documents (during indexing) and queries. We also specify the fields we want to index and their respective types. Types matter because filters are applied on text fields (e.g., title, chunk) but not on keyword fields (e.g., path, document type). We don’t apply preprocessing on file paths to keep them as they are.
'mappings': { 'properties': { 'title': {'type': 'text', 'analyzer': 'english_custom'}, 'type': {'type': 'keyword'}, 'path': {'type': 'keyword'}, 'chunk_header': {'type': 'text', 'analyzer': 'english_custom'}, 'chunk': {'type': 'text', 'analyzer': 'english_custom'}, } }'mappings': { 'properties': { 'title': {'type': 'text', 'analyzer': 'english_custom'}, 'type': {'type': 'keyword'}, 'path': {'type': 'keyword'}, 'chunk_header': {'type': 'text', 'analyzer': 'english_custom'}, 'chunk': {'type': 'text', 'analyzer': 'english_custom'}, } }When querying, we apply boosts to make some fields count more towards the relevance score. In this prototype, I arbitrarily boosted titles by 5x and chunk headers (i.e., top-level bullets) by 2x. Retrieval can be improved by tweaking these boosts as well as other features.
For semantic search, we start by picking an embedding model. I referred to the Massive Text Embedding Benchmark Leaderboard, sorted it on descending order of retrieval score, and picked a model that had a good balance of embedding dimension and score.
This led me to e5-small-v2. Currently, it’s ranked a respectable 7th, right below text-embedding-ada-002. What’s impressive is its embedding size of 384 which is far smaller than what most models have (768 - 1,536). And while it supports a maximum sequence length of only 512, this is sufficient given my shorter chunks. (More details in the paper Text Embeddings by Weakly-Supervised Contrastive Pre-training.) After embedding these documents, we store them in a numpy array.
During query time, we tokenize and embed the query, do a dot product with the document embedding array, and take the top n results (in this case, 10).
def query_semantic(query, tokenizer, model, doc_embeddings_array, n_results=10): query_tokenized = tokenizer(f'query: {query}', max_length=512, padding=False, truncation=True, return_tensors='pt') outputs = model(**query_tokenized) query_embedding = average_pool(outputs.last_hidden_state, query_tokenized['attention_mask']) query_embedding = F.normalize(query_embedding, p=2, dim=1).detach().numpy()**def query_semantic(query, tokenizer, model, doc_embeddings_array, n_results=10): query_tokenized = tokenizer(f'query: {query}', max_length=512, padding=False, truncation=True, return_tensors='pt') outputs = model(**query_tokenized) query_embedding = average_pool(outputs.last_hidden_state, query_tokenized['attention_mask']) query_embedding = F.normalize(query_embedding, p=2, dim=1).detach().numpy()**cos_sims = np.dot(doc_embeddings_array, query_embedding.T) cos_sims = cos_sims.flatten()
top_indices = np.argsort(cos_sims)[-n_results:][::-1]
return top_indices`
If you’re thinking of using the e5 models, remember to add the necessary prefixes during preprocessing. For documents, you’ll have to prefix them with “passage: ” and for queries, you’ll have to prefix them with “query: ”
The retrieval service is a FastAPI app. Given an input query, it performs both BM25 and semantic search, deduplicates the results, and returns the documents’ text and associated title. The latter is used to link source documents when generating the draft.
To start the OpenSearch node and semantic search + FastAPI server, we use a simple-docker compose file. They each run in their own containers, bridged by a common network. For convenience, we also define common commands in a Makefile.
Finally, we integrate with Obsidian via a TypeScript plugin. The obsidian-plugin-sample made it easy to get started and I added functions to display retrieved documents in a new tab, query APIs, and stream the output. (I’m new to TypeScript so feedback appreciated!)
What else can we apply this to?
While this prototype uses local notes and journal entries, it’s not a stretch to imagine the copilot retrieving from other documents (online). For example, team documents such as product requirements and technical design docs, internal wikis, and even code. I’d guess that’s what Microsoft, Atlassian, and Notion are working on right now.
It also extends beyond personal productivity. Within my field of recommendations and search, researchers and practitioners are excited about layering LLM-based generation on top of existing systems and products to improve the customer experience. (I expect we’ll see some of this in production by the end of the year.)
Ideas for improvement
One idea is to try LLMs with larger context sizes that allow us to feed in entire documents instead of chunks. (This may help with retrieval recall but puts more onus on the LLM to identify the relevant context for generation.) Currently, I’m using gpt-3.5-turbo which is a good balance of speed and cost. Nonetheless, I’m excited to try claude-1.3-100k and provide entire documents as context.
Another idea is to augment retrieval with web or internal search when necessary. For example, when documents and notes go stale (e.g., based on last updated timestamp), we can look up the web or internal documents for more recent information.
• • •
Here’s the GitHub repo if you’re keen to try. Start by cloning the repo and updating the path to your obsidian-vault and huggingface hub cache. The latter saves us from downloading the tokenizer and model each time you start the containers.
git clone https://github.com/eugeneyan/obsidian-copilot.git
Open Makefile and update the following paths
export OBSIDIAN_PATH = /Users/eugene/obsidian-vault/ export TRANSFORMER_CACHE = /Users/eugene/.cache/huggingface/hub`
Then, build the image and indices before starting the retrieval app.
# Build the docker image make build# Build the docker image make buildStart the opensearch container and wait for it to start.
You should see something like this: [c6587bf83572] Node 'c6587bf83572' initialized
make opensearch
In ANOTHER terminal, build your artifacts (this can take a while)
make build-artifacts
Start the app. You should see this: Uvicorn running on http://0.0.0.0:8000
make run`
Finally, install the copilot plugin, enable it in community plugin settings, and update the API key. You’ll have to restart your Obsidian app if you had it open before installation.
make install-plugin
If you tried it, I would love to hear how it went, especially where it didn’t work well and how it can be improved. Or if you’ve been working with retrieval-augmented generation, I’d love to hear about your experience so far!
If you found this useful, please cite this write-up as:
Yan, Ziyou. (Jun 2023). Obsidian-Copilot: An Assistant for Writing & Reflecting. eugeneyan.com. https://eugeneyan.com/writing/obsidian-copilot/.
or
@article{yan2023copilot, title = {Obsidian-Copilot: An Assistant for Writing & Reflecting}, author = {Yan, Ziyou}, journal = {eugeneyan.com}, year = {2023}, month = {Jun}, url = {https://eugeneyan.com/writing/obsidian-copilot/} }@article{yan2023copilot, title = {Obsidian-Copilot: An Assistant for Writing & Reflecting}, author = {Yan, Ziyou}, journal = {eugeneyan.com}, year = {2023}, month = {Jun}, url = {https://eugeneyan.com/writing/obsidian-copilot/} }Share on:
Join 11,800+ readers getting updates on machine learning, RecSys, LLMs, and engineering.
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
copilotassistant8 Best AI Coding Assistants [Updated March 2026] - Augment Code
<a href="https://news.google.com/rss/articles/CBMijgFBVV95cUxQSnA4TElnNUJ2QjI2d0RMSW9Vb3ROUTRLNnpKYUNqbEk3MHRiaTk5ZUxVQ09SNE9aOGJDZWxiMnlWNjFpcFFSSEtYT2UxOGZlVnBzc2RRbmhCSEkxZlJQUThCdDFrRkZvSGxHZElFeGt5WGNUWkJScGJhbFBBSnludTk3MS1JWEV3REx3cG5n?oc=5" target="_blank">8 Best AI Coding Assistants [Updated March 2026]</a> <font color="#6f6f6f">Augment Code</font>
Amazon expands access to health-focused AI assistant - Healthcare Dive
<a href="https://news.google.com/rss/articles/CBMimAFBVV95cUxPRk1LRG01V0Q2ekxzU0IzQ2o3LWZyaV9jMXAxd2JlOXZMRFZnUWNnNkUwbjFfaGVxdmVGRktxbGJwaFcwT2FLaDFUakhjUzBmQnBrTVpEd3FSNTdZd0Y2dGNWc2pTZ1J4R3pPVm5fbjdOT0xJN2owS3J5djRWS182QWZ0aUdsTGhtQ1ZfNnEycXo0Sk5xM0Y0VQ?oc=5" target="_blank">Amazon expands access to health-focused AI assistant</a> <font color="#6f6f6f">Healthcare Dive</font>
Baidu’s AI Assistant Reaches Milestone of 200 Million Monthly Active Users - WSJ
<a href="https://news.google.com/rss/articles/CBMitANBVV95cUxNdHh1UzRRMlZrTUVWVjlxekNXQjJTYzR3RXR5cE9YeTZxOGVZMktzMDR4ckVpWDVTNzlfbGUtRERFNDlvLUFKV1o4dW91NHdidEJlUHZiYW55c0xta3JhaV9ES1hxZHhwRHVfNFU2ZDNyZzZRWmxCVnRXTm5QWHZndm9CMkF5NVhXZm9XVTlPRlZRSEtvUVhXVTBydWhYbkM5RHFGWjQ0Y1Jya3VKRHFGTVNwS3pRUlhRSHZXXzV4Uk9Dc0NFdzVxX00xcGlqQk1PYnpnOEZCQndCWWpycXBnSTVFeW1TRnctM3pyZGZSZHZrYkUwdmJMYmNlUnhhX1pjQjJCd1kxNmZZNEdheVJJRGtLU0F0aWRmZ3pVOUM5ZF9FQ0gzLU5BZ1U0aFRhM04yRW8wbjhvZ0wwQUkyR1VuWmRxRDFjb2o0aDZWaDFCMDdBTHlWTFRpZGszMHhlc284MTh0ZWRmYmlPd0xQRXpTYnF6WHRTOUV2Z3I2THZpNjVpRlZWa25HdUd2LVN6YWpxZ0RwaE5oU3ZfWG5IMVBZNVpGXzAycUdibHpjSDh5WjZWVTVh?oc=5" target="_blank">Baidu’s AI Assistant Reaches Milestone of 200 Million Monthly Active Users</a> <font color="#6f6f6f">WSJ</font>
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Products
Slack's upgraded AI can analyze how you work
Salesforce has unveiled the newest version of Slack, which comes with a whole host of new AI features to add to its ever-growing catalog. Naturally, many of these tools are embedded into Slackbot, which the company had already pledged to turn into a “personalized AI companion.” The new features include the stuff you’d expect, such as transcription, note taking and deep research, as well as integrations with the rest of the Salesforce family. But it’ll also get “reusable skills,” which sound a lot like automations, where a team can define a task from start to finish, and then the bot will run it whenever it’s called for. In fact, there’s a big focus on sharing and co-developing tools within organizations, so if you find a prompt that actually gets useful data out of Slackbot, you can share
Hershey applies AI across its supply chain operations - AI News
<a href="https://news.google.com/rss/articles/CBMipgFBVV95cUxQVFVqanBsaEJOaTVyRHFPaUt2QVV4Q045bW1ZdUlMUUZJdEJJWXFqUHdPZmhtRl9xLXNJWkd4TG1yc2NhalFFY2d0c3A3WU5keVZBcHVQOTkybVQyTlBCd3Jxek0zczFSOEdUM0lOaGZOZjB4WTBkSjByY0g1eURUN3c0RHk0U3FyRjZGZEhHNVotUnN6MWUwMHNuTzUtSHlLYjQ0RWJR?oc=5" target="_blank">Hershey applies AI across its supply chain operations</a> <font color="#6f6f6f">AI News</font>
The Top 100 Gen AI Consumer Apps — 6th Edition - Andreessen Horowitz
<a href="https://news.google.com/rss/articles/CBMiS0FVX3lxTE1odTRIQk11RzgyYmQzdmplaDdWNVJxWU8zNnVrd3VaT19JRDlfSDRWMEp4TWx3NkVPU0I0QTVtT2E1SmlVWEpRYmtJMA?oc=5" target="_blank">The Top 100 Gen AI Consumer Apps — 6th Edition</a> <font color="#6f6f6f">Andreessen Horowitz</font>
The AI Video Apps Gaining Ground After OpenAI Declared Sora Dead - Bloomberg.com
<a href="https://news.google.com/rss/articles/CBMi4AFBVV95cUxNRmVodXNBRDJ2ZVNUd3lWUUpXSFRCcExNV3lXVl90OEdWcU03VjNmZG1nMFZ0WVIzbUpBcWRTV2owN180c1AzeUE3c1hqMTV2ZEk5dkdaM1pVYlZtb243aVlFZjRydFVGVV9ITm5SM29SNm5TN3dpRzByeV9ieXJrbERkMHkwYXlmM3EtY0g3UTk4WC1MOUNLcVp0TXJhZm5rTEZYdEVKWGJReTRWdVlwVmhXcENFVld0N2NEV1IwUDlZOXFxdDVjSjFyOFc1SlBhVkt0Y2o0elduMGFPcjREQg?oc=5" target="_blank">The AI Video Apps Gaining Ground After OpenAI Declared Sora Dead</a> <font color="#6f6f6f">Bloomberg.com</font>
Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!