This post was originally published on the Datopian blog.
Storing metadata might seem like a backstage operation, but it is pivotal. We chose Frictionless Data Packages, housed in the
os-data GitHub organization as repositories, to serve this purpose. Frictionless Data Packages offer a simple but powerful format for cataloging and packaging a collection of data - in our scenario, that's primarily tabular data. These aren't merely storage bins - they align with FAIR principles, ensuring that the data is easily Findable, Accessible, Interoperable, and Reusable. This alignment positions them as an ideal solution for publishing datasets designed to be both openly accessible and highly usable. Learn more from their official documentation.
Can you imagine having to manually gather metadata for each dataset from multiple GitHub repositories? Sounds tedious, right? That’s why we used Octokit, a GitHub API client for Node.js. This tool takes care of the heavy lifting, automating the metadata retrieval process for us. If you're intrigued by Octokit's capabilities, you can discover more in its GitHub repository. To explore the datasets we've been working on, take a look at OpenSpending Datasets.
When it comes to data storage, Cloudflare R2 emerges as our choice, defined by its blend of speed and reliability. This service empowers developers to securely store large amounts of blob data without the costly egress bandwidth fees associated with typical cloud storage services. For a comprehensive understanding of Cloudflare R2, their blog post serves as an excellent resource.
In closing, we invite you to explore the architecture and code that power this project. It's all openly accessible in our GitHub repository. Should you want to experience the end result firsthand, feel free to visit openspending.org. If you encounter any issues or have suggestions to improve the project, we welcome your contributions via our GitHub issues page. For real-time assistance and to engage with our community, don't hesitate to join our Discord Channel. Thank you for taking the time to read about our work! We look forward to fostering a collaborative environment where knowledge is freely shared and continually enriched. ♥️