Simon(sfarshid) and I spend a lot of time on GitHub. As data nerds we put together a quick tool to explore your repository’s data.
How it works:
- Data Loading: We use dlt to pull data (issues, PRs, commits, stars) from GitHub
- Semantic Layer: Relta wraps the underlying dataset into a semantic layer so the LLM doesn’t hallucinate.
- Text-to-SQL: A text-to-SQL agent transforms your plain-English question into a query using the semantic layer
- Generative Charts: assistant-ui dynamically generates a chart based on the SQL query
- Refinements: If the semantic layer can’t handle your question, our agent submits semantic layer improvements via pull requests
Hosted version: https://github-assistant.com
Demo Video: https://youtu.be/ATaf98nID5c
Check out the repo + hosted version and let us know what you think.
Is there any information you can get out of this that isn't already available in the GitHub.com UI? I tried asking things like "What could the most interesting information you can tell me about this repository?" but it seems like most of the data is already available in the UI in pretty much the same format, except you click a link to see it instead of writing a question and waiting for a reply.
Same thoughts
We pull data from the GitHub API which includes data that that is not available from GitHub.com pages. Currently only PR, Issues, Commit and Star data is being loaded. You can also read more here https://medium.com/relta/github-assistant-49ae388ad758
There will new data from the graphql API added over time. Would love your feedback on which data you like to see added https://docs.github.com/en/graphql
Maybe a better question: What questions could be answered with your service, that could not be answered with just cURL + Git + the GitHub API?
Great question! The purpose of github-assistant is to showcase the technologies that make it easy to build a tool/feature like this, not necessarily for it to be a stand-alone service. With dlt/Relta/LangGraph/assistant-ui we spin this up in about 10 days. For example:
- The GitHub graphql API limits to 100 items to be queried at a time and has pretty opaque secondary rate limits. Building this with cURL would take effort. dlt handles all this complexity to set up a robust pipeline by providing a connector to the GitHub API. - Creating semantic layers manually from a relational dataset and leveraging it in a text-to-sql pipeline to prevent hallucinations (similar to those we highlighted in our Medium post) would take lots of manual effort, which Relta streamlines. - Creating a chat front-end with charts was made easy by assistant-ui
Hope this makes sense.
I really like the UI, great work, I have to dig through the code :) thanks for sharing!
Pretty nifty, is Relta going to be OSS as well?
Yes in the future. We share the source code in both commercial and non-commercial engagements already. Drop me a line at amir [at] relta.dev if interested.
I am building an AI Slack Moderator bot [0] as a side project. I was thinking that this could be a cool intermediate layer to allow a user to ask questions about moderation logs. However I am not ready to build this for now. Feel free to add me to an email list for people who want to know when you OSS it down the line.
[0] - https://popsia.com
Tried adding a repo I work on, import worked after failing first, but then the query result was that there was no data on top contributors.
Put the video in the Readme!
Hi -- strange that didn't work. Overall, the semantic layer is designed to provide very tight guardrails and not hallucinate. You can see the agent suggest changes to the semantic layer if you give the produced answer a thumbs down.
The idea is for the system to provide answers that have close to 100% accuracy, but make it a single click for developers to to improve the semantic layer.
Was able to reproduce and pushed an update. Thanks for calling this out.
Just updated the README, thanks for the suggestion!
Can this help in explaining how the code works, its schematics or HLD of a given github repo ?
No this currently only answers questions from the GitHub graphql API.