Setup Guide: Create a data catalog to display and share your GitHub datasets with PortalJS
Luccas Mateus
The github-backed example added to PortalJS is intended to provide users with an easy way to set up a data catalog that can be used to display and share data stored in GitHub repositories. With this example, users can quickly set up a web-based portal that allows them to showcase their data and make it accessible to others, all this being done thru the configuration of a simple datasets.json
file.
Demo
To get a feel of the project, users can check the live deployment.
Below are some screenshots:
Front page
Individual dataset page
How to use this example as a template
TipYou can also create a new project by clicking on the "Deploy" button below. Vercel will clone the example on a new repo under your user or organization on GitHub and set up a deployment for it.
Then, you can clone the new repo on your machine and start editing it.
Create a new app with create-next-app
Run the following commands:
npx create-next-app <app-name> --example https://github.com/datopian/datahub/tree/main/examples/github-backed-catalog
cd <app-name>
Setup a GitHub token
This project uses the GitHub API, which for anonymous users will cap at 50 requests per hour, so you might want to get a Personal Access Token and add it to a .env
file inside the folder, like so:
GITHUB_PAT=<github token>
Setup the datasets list
The datasets.json
file serves as a list of datasets that should be in your data portal. Edit the this file to your liking. Some examples can be found inside this repo:
Run the app
Run the app by executing the following command:
npm run dev
Congratulations, your new app is now running at http://localhost:3000.
Deployment
By clicking on this button, you will be redirected to a page which will allow you to clone the example into your own GitHub/GitLab/BitBucket account and automatically deploy it.
GitHub token
You have to set up GITHUB_PATH
as an environment variable on Vercel. To do that, go to the project's page on Vercel, then click settings, look for "Environment variables" and create a new environment variable. Refer back to the previous section if you are not sure about how to create a GitHub token.
Editing the new deployment
You can now clone the new repo on your machine and start changing it. Simply follow the "How to use this example as a template" section skipping the first step.
TipNote that whenever you push changes to the new repo these are going to be automatically deployed by Vercel.
Structure of datasets.json
The datasets.json
file is simply a list of datasets, below you can see a minimal example of a dataset
{
"owner": "fivethirtyeight",
"repo": "data",
"branch": "master",
"files": ["nba-raptor/historical_RAPTOR_by_player.csv", "nba-raptor/historical_RAPTOR_by_team.csv"],
"readme": "nba-raptor/README.md"
}
It has
- A
owner
which is going to be the github repo owner - A
repo
which is going to be the github repo name - A
branch
which is going to be the branch to which we need to get the files and the readme - A list of
files
which is going to be a list of paths with files that you want to show to the world - A
readme
which is going to be the path to your data description, it can also be a subpath eg:example/README.md
You can also add
- A
description
which is useful if you have more than one dataset for each repo, if not provided we are just going to use the repo description - A
Name
which is useful if you want to give your dataset a nice name, if not provided we are going to use the junction of theowner
therepo
+ the path of the README, in the example above it will befivethirtyeight/data/nba-raptor
Extra commands
You can also build the project for production with
npm run build
And run using the production build like so:
npm run start