News | drihu.com

By alx-net, 5 days ago

Hey HN, I am Alex. I am open sourcing Data Studio, a lightweight data exploration IDE in your browser that runs locally.

Try it: https://local.dataspren.com (no account needed, runs locally)

More information: https://github.com/dataspren-analytics/data-studio

I love working with data (Postgres, SQL, DuckDB, DBT, Iceberg, ...). I always wanted a data exploration tool that runs in my browser and just works. Without any infra or privacy concerns (DuckDB UI came quite close).

Features:

  - Data Notebooks
    - SQL cells work like DBT models (they materialize to views)
    - Use Python functions inside of SQL queries
    - Use DB views directly in Python as dataframes
  - Transform Excel files with SQL
  - You can open .parquet, .csv, .xlsx, .json files nicely formatted

If you like what you see, you can support me with a star on Github.

Happy to hear about your feedback <3

URL: github.com

1 comments

By cxr, an hour ago

Neat. Some things I noticed:

1. Dragging and dropping a CSV (from the system file manager) onto the "Upload Data" button doesn't do anything. This is something that a vanilla <input type="file"> does for free. In reality, someone should be able to drag and drop a file anywhere into window to "upload" it. PS: Don't use the word "upload" since one of your selling points is that this is _not_ using cloud storage.

2. Let people use cloud storage if they want. Please, please use remoteStorage if/when you do. (See <http://remotestorage.org/>)

3. If I try to open a second tab, I get a message "DataStudio uses the Origin Private File System for local data processing, which only supports a single active session. Please close the other tab". You should do whatever you can to mitigate this, incl. not using the those APIs unless you absolutely have to. (In this case, where all I've done is visited the landing page and opened a CSV to see how it looks, you don't have to.)

4. Compiling to WASM is cool and all, but in the aforementioned case where I open a CSV and then click it, what ends up happening is I get a "Loading runtime..." message and a spinner for a really long time (tens of seconds) before the data appears. Again: you should do whatever you can to mitigate this (incl. not "loading the runtime" unless you absolutely have to—and, again, this is not a case where you absolutely have to).

5. There's a "Reset runtime" button in the top right. This suggests a fundamental problem somewhere fairly deep down; this button shouldn't exist.

6. When I open a three-column CSV in a half-width window and then resize my browser to take up the full screen, the data is still displayed in auto-sized column widths that are the same as they were before, but the data table itself expands to fill the extra space, and the column headers do, too, to distribute the extra space. So now I have column data appearing beneath an unrelated column header.

7. There's evidently no splitter to grab to resize the columns manually. This is strange and unexpected.

8. Eventually you will probably decide that it's a good idea to be able to export data in a portable format that isn't just JSON or Parquet—the two options currently available from the context menu in the file explorer sidebar. I strongly urge you to consider choosing semi-self-contained HTML as that format. The value proposition is pretty clear: someone should be able to edit/munge the data in their browser at local.dataspren.com, then export it to a local file that they can email to someone else, and that someone-else should be able to double-click the saved attachment in order to open up the file on their own machine in their own browser to at least look (and maybe even poke) at it. It would be a very good idea to have an "Open in Data Studio" button somewhere in this file. (You can also use the "@render" trick for exported JSON: make sure you escape all the angle brackets in the data (\u003c is good enough), and make the root container an object (not an array) with a "@render" key as the very first property which has a value of "<script src='https://cdn.example.com/datastudio/v0.whatever.js></script>" stub or whatever, and make sure to export the file with a .json.html extension. This means that the raw contents of the file remain valid JSON, but when treated by a Web browser as HTML (because of the .html extension), then script in the @render stub will have a chance to load, giving it the ability to "hydrate" the data with rich controls—incl. a very similar "Open in Data Studio" CTA button.)

9. Switching back and forth to/from a three-column CSV with less than 3,000 rows takes a noticeable amount of time for the data to show up. (This is after the whole "Loading runtime" step. A spinner shows up and stays there for about a second.) And then scrolling all the way to the bottom reveals that (a) this is a virtualized list where there are no more than 26 rows on my screen at a time, and (b) even then the table is limited to "Showing 500 of 2,xxx rows". Web browsers are fast. It shouldn't take anywhere near this long even to display the whole table, let alone in a table with virtualized rows. (Hint: All the tree-/listviews in the Firefox (and Thunderbird) UI are implemented as Web Components; steal that code.)

Show HN: Data Studio – Open-Source Data Notebooks