The problem:
At my company Conio , we use a famous headless CMS. We use that for support/faqs page and jobs page. We needed a solution to backup all contents since data comes from external provider.
Since a S3 daily backup requires the pro plan, I tried to change from "light" to "pro" version. The price would have changed from few euros to 400€/Month. Since my company use 2% of the available features on pro plan, I thought that build a custom system could be cheaper and cool 😀.
To build the same CMS backup system I needed:
my S3 backup (or some similar tool where store data).
a script that handle fetch-and-save flow.
a tool where I can trigger pipelines to to automate everything.
To archive these points I asked my CTO to create a new S3 bucket with name "cms". The script for the point 2 is relative simple. It fetch all stories/folder from the CMS, save it on local, and sync these folders with remote bucket. I used bitbucket pipeline that runs every morning the script. If one day our cms go down for some reason we can decide to upload all saved entities on new database cms or maybe new database on different provider
The solution (my solution):
Probably in the real word if a company use a CMS it can have 100+ record for same entity type. This is my case where my company has more than 100 record used for faqs page. In my scenario I run the script inside a pipeline where only node.js is required. So I need to fetch sequentially all entities for 2 reasons:
Maybe cms has a requests/second limit and we need to handle this scenario.
Promise.all/Promise.allSettled
constructors can introduce problem and I don't want that.Normally CMS has a limit for max record to fetch, and we need to use a pagination.
I want to have log for each request.
Here the function fo fetch entitites. In the real word probably the famous-cms provide some js-client to simplify operations.
export async function fetchAllStories(
page: number = 1
) {
return await fetch(`https://famous-cms.com/?page=${page}&limit=100`);
}
With this code we built an iterator that cycles all stories in block of 100. In my case my cms return max 100 records. In case of return 99 we know that there are other stories to fetch (next call will return 1). Other cms maybe add also "total" field in the response so the logic should be adapted a little. Our cms handle request/s byself, so when I hit limit, the next api call is automatically delayed of 3 seconds. This behavior maybe is not handled in same way with different providers. In this case we can add an await delay()
intentionally
export async function* pageThroughStories<T>(): AsyncGenerator<Array<T>> {
async function* makeRequest(page: number): AsyncGenerator<Array<T>> {
const res = await fetchAllStories(page);
const stories: Array<T> = [];
stories.push(...res.data.stories);
yield res.data.stories;
if (stories.length === 100) {
// await delay(3_000)
yield* makeRequest(page + 1);
}
}
yield* makeRequest(1);
}
With this iterator we can do what we want. For my scenario I can upload in my S3 bucket a set of 100 entities each time for example.
interface MyStory {
uuid: string;
title: string;
}
for await (const stories of pageThroughStories<MyStory>()) {
stories.forEach(async (s) => {
console.log("upload on S3",s.title);
await customUploadMethid(s);
})
}
Details about what to do with this 100 entities is out fo scope. But if like me you want to use a S3 bucket I can suggest to use s3-sync-client package library. Is simplify a lot of operations