SA
Skip to main content

β€’ Counting...

Bing Chat for All Browsers. Verified at cho.sh. Featured on Chrome Web Store. Productivity. 200,000+ Users.

The new Bing Chat, Microsoft's search AI powered by ChatGPT, only works on the Edge browser, so I built an extension to make Bing Chat work on Chrome and Firefox. After a long and winding road, I reached 750K visitors, 500K installs, and 230K weekly active users, but I had to take down the project because of a trademark complaint from Microsoft. Now let's talk about what happened.

Coming on the Market​

Since ChatGPT was first released, many apps have come on the market. What these apps all have in common is that they started small, found their footing, gained hundreds of thousands of users, and grew almost forcibly. We've seen that timing is everything, and those apps used by many people evolve naturally. I've been observing these market patterns and looking for opportunities, and then Bing Chat came along. Since most consumers use Chrome anyway and Edge is a Chromium browser, it will take a lot of work to distinguish between Edge and Chrome. If there's a program that naturally bypasses it, people will use it. With that in mind, I developed and deployed it in a day. I also kept up-to-date with the recent AI boom, learned about Bing Chat very early on, and signed up for the waitlist. We'll talk about that later.

Making Front Page on Yahoo! JAPAN​

I saw no significant user growth in the first few days likely because Microsoft initially waitlisted Bing Chat due to lacking GPUs. Then at the end of February 2023, there was a dramatic increase in users. Interestingly, it was all Japanese users.

"Bing Chat for All Browsers" has appeared, which makes it possible to use the chat Al function of "Bing", which is currently available only in "Microsoft Edge", in browsers other than Edge. Developer Sunghyun Cho has released extensions for Chrome and Firefox.

Made Front Page on Yahoo! JAPAN IT Section. Archive

As it turned out, Yahoo! JAPAN, the number one portal in Japan, featured me on their front page and brought in many people. Since then, the number of users has grown organically. Research Note on Bing Chat and Japan

βœ…As a reminder

I am Korean, not Japanese, so this event was very unexpected and random for me. We have different search engines (Naverβ€”Yahoo!), different messenger (Kakaoβ€”Line), and different online culture and lives, so to speak. In the end, I was very happy that a lot of users from Japan loved my extension.

230K Users and an Acquisition Offer​

I met many people along the way as my user base grew. I even hit 1.5k 🌟 on GitHub. People created 120 issues on GitHub, and I responded to about 1K cases via email, Chrome Web Store, and Firefox Add-ons. I averaged about ten cases daily, with a much higher volume in the second half of the extension's life cycle.

GitHub 1.5K Stars Screenshot

Several people have approached me through various media outlets interested in acquiring the extension. They were either AI companies looking to snowball or companies that brokered advertising deals for tech companies. Apps that have established themselves with a simple feature like this are great acquisition targets. They have a clear interest (generative AI), few components, low technical debt (easy to move in your features), a large user base, and tons of media and blogs already linking to the extension install page.

Acquisition Quotation Request

Advertisement Quotation Request

Acquisition Quotation Request

Different Acquisition/Advertisement Offers with varying prices.

For various reasons, I did not sell or get financially involved. However, the negotiation experience will come in handy later. Interestingly, they were looking to buy 230K+ active users rather than the product, but they had very different magnitute of values.

Shield and Spear​

Meanwhile, Microsoft was updating its browser detection logic weekly. Dozens of GitHub issues were being registered, and I got tons of bug report emails daily. Most of the fixes were relatively easy, requiring a few bureaucratic steps like refreshing or tweaking the browser User-Agent value.

Then, an event changed my perception in an instant. At 11:00 p.m., dozens of users contacted me simultaneously. I'm attaching a link to the GitHub Issue. The biggest problem was that I could use the service without any issues, and I couldn't reproduce the phenomenon on my device and the machines of people around me. Nevertheless, over two hours, GitHub and my inbox would be saturated.

After a sleepless night and 12 hours of debugging, I realized that Microsoft had created a proprietary header specification. A sophisticated front-end website like Bing or Google has hundreds of headers, cookies, and local storage, so it takes work to determine which headers, if any, affect access to the service.

To use a dramatic analogy, a reverse engineering or CTF hack like this is not unlike a cold case investigation. There is no way to know what information is helpful and what is unnecessary, coincidental, person-to-person, false positive, or undetected.

Only Bing Uses UserAgentReductionOptOut

Less than 0.1% of the web uses UserAgentReductionOptOut.

UserAgentReductionOptOut, only used on bing.com globally.

I was impressed with Microsoft's determination. If you think about it, giving away GPT-4 services in Bing and restricting them to Edge is about using AI to compete with Google. It must have been a pain in the ass for Microsoft to open all of its features to its biggest competitor, Chrome, and keep 230K users from moving to Edge. At this point, I realized my project could not continue indefinitely. Microsoft will continue to develop different tactics for its products, and it's hard to work for free indefinitely, especially overnight like this. I speculated that my project will end when Microsoft allows access to third-party browsers because the product will lose market validity at this point.

Trademark Complaints​

However, my project's end was closer than expected since Microsoft complained that using Bing's logo and name was problematic, and they threatened to ask Google's legal team to shut my extension down unless I did it myself within the next five days. I was torn between confusion and feeling like I had been awarded a badge of honor for Microsoft making a move after me. According to the news, Microsoft has been aware of my extension since March, so the company has been ignoring it for a while now.

Letter from Microsoft Legal

tracer.ai is the Brand Protection Software Company that Microsoft hired.

It's not every day that an indie app uses the name of a famous brand in its name, especially when it comes to extensions such as 'Enhancer for Blah Blah.' Since extensions are essentially programs that assist the service, the fact that users install them affirms that they will continue to use the service. Businesses will encourage more developers to make extensions for their apps and, at the very least, will not stop users from creating their extensions.If anything, I brought benefits to Microsoft by making Bing Chat more accessible. Did Microsoft think people were not switching browsers because of my small extension? I think there's more of a path dependency of browsers, so I'm guessing that I provided more customers for them (people who would have tried Bing Chat a few times in Chrome out of curiosity) than the opposite (people who would have switched to Edge but didn't because of my extension).

πŸ’­Maybe they're just scaring you off?

Could be. Maybe Microsoft is just automatically sending documents. But it's unlikely that they would have sent it to all the Bing AI-related apps that are popping up in the first place. In any case, internally, Microsoft must have officially decided to take action against apps that have a lot of users.

After all, there's precedent in the past...

But This Isn't the First Time: MikeRoweSoft v. Microsoft​

Microsoft has a reputation for being a stickler for trademarks and rights. A case in point is the 2004 case where it filed a trademark infringement lawsuit against student Mike Rowe's domain, MikeRoweSoft.com, where Mike got a trademark complaint just like me. He declined first and settled for a small amount after a lengthy court battle. My situation had a lot of similarities.

Conclusion​

When I received the documents, I had a lot of thoughts. It's not easy to give up an app with 230K users, which is a first for me, but as I mentioned before, I was already mentally saturated. People started making feature and enhancement requests on GitHub, bug reports poured in endlessly via email, and the biggest problem was the inability to debug. Microsoft would have continued to do something, whether a technical tactic or a legal tactic, even if this incident had passed.

Moreover, Microsoft announced they will open it to all browsers soon. They may keep developing the app by adding more features, but that's not in the interest of the product because they want to retain users. In many ways, the product didn't have much of a life left. There was no point in being unresponsive and causing problems.

I was happy to have learned my lesson and called it a day. I have since unpublished the app from the store and archived the repository.

Lessons Learned​

Unpaid open source is an emotional labor​

That was my biggest takeaway.

In a sense, these GitHub notifications are a constant stream of negativity about your projects. Nobody opens an issue or a pull request when they’re satisfied with your work. They only do so when they’ve found something lacking. Even if you only spend a little bit of time reading through these notifications, it can be mentally and emotionally exhausting. What it feels like to be an open-source maintainer

I was no different. My email was flooded with anonymous complainers. Many times it was just pure name-calling and complaining. It's impossible to please everyone. If I tried to make a fix, three people protested; if I didn't, the people who wanted the feature protested again. I couldn't turn off my daily GitHub notifications. If this is how it is for a small app, how is a medium to large-scale project maintained? Regarding unpaid open source, honor and self-enthusiasm are short-lived.

Linus, Genovese, and the Matthew Effect​

There's a phrase that is a staple of open-source advocacy.

Linus's law is the assertion that "given enough eyeballs, all bugs are shallow". Linus's law

But we all know of a social phenomenon that is precisely the opposite.

The bystander effect, or bystander apathy, is a social psychological theory that states that individuals are less likely to offer help in presence of other people. Bystander effect

Which is right? The conclusion I came to is closer to the latter. The scientific community already has a name for it: The Matthew Effect.

In the sociology of science, "Matthew effect" was a term coined by Robert K. Merton to describe how, among other things, eminent scientists will often get more credit than a comparatively unknown researcher, even if their work is similar; it also means that credit will usually be given to researchers who are already famous. Matthew effect

A tiny percentage of contributors contribute, and they do so repeatedly; even then, they are heavily skewed toward a few popular repositories. Thus, the contributors and the repositories they contribute to are heavily skewed.

The rarest resource is human will​

I'm not good at running and developing long-term projects. For a hobbyist, the deciding factor is willpower. Speed is essentialβ€”not because of time to market, but because you need a meaningful metric before running out of willpower, which is individuals' most common deficiency. If that doesn't work, you must purchase willpower (i.e., pay salary) to sustain projects in the long run.

If you try to do something in one fell swoop, you'll only get as much done as you can in that one swoop. While you can increase your fitness for short bursts of exercise, a more fundamental solution is to balance your breathing via aerobic rather than anaerobic exercise. I wanted to build lasting services that continue to evolve and create new value even without of my goodwill and passionβ€”products with a life of their own.

Statistics​

I finish this post with some memorable statistics.

πŸͺ¦ Monuments

Weekly users over time

Weekly users over time. I had about 210K Chrome users and 20K Firefox user.

Weekly users by region

Weekly users by region

Weekly users by language

Weekly users by language

Weekly users by OS

Weekly users by OS

Daily users by item version

Daily users by item version

Rating distribution over time

Rating distribution over time

Page views

Page views

Impressions across the Chrome Web Store

Impressions across the Chrome Web Store

Top 3 sources by page views

Top 3 sources by page views

Page views by sourde

Page views by sourde

Enabled vs. disabled

Enabled vs. disabled
βœ…Like My Work?

Let's chat! Email me to schedule a call.

β€’ Counting...

Image: Hero image of the iPhone's Korean keyboard: "Sky, Earth, and Human."

πŸ’ŽGive me the App Store Link first!

Of course! Here is the App Store Link. Also available on GitHub.

More and more Korean citizens are considering iPhones. Interestingly, the elderly Korean generation is purchasing iPhones more than ever. While many state the primary reasons for choosing a Galaxy to be call recording and Samsung Pay, my observation after my parents switched to iPhones differed.

Unexpectedly, the most significant difficulty for older generations was the keyboard. Korean customers have had no problem typing Korean with a 10-key dial pad since the very early days because there has been a powerful input method known as "μ²œμ§€μΈ" (Cheon-Ji-In, which translates to "Sky, Earth, and Human") to input Hangul, the Korean characters. This was unlike Roman alphabet keyboards, which required several characters to be crammed onto a single button. Koreans had far less of a need to switch to QWERTY keyboards because of this. Many people still use Cheon-Ji-In unless they are members of Gen Z who grew up with smartphones.

The patent for Cheon-Ji-In entered the public domain in 2010. iPhone added support for Cheon-Ji-In in 2013, but its shape differed from that of the standard Cheon-Ji-In. The starkest difference was that the space button and the next character buttons were separate.

The space button and the next character button

The space button and the next character button
πŸ’ŽFor example, to type "였 μ•ˆλ…•"...
  • Galaxy: γ…‡ α†ž γ…‘ β†’ Space β†’ γ…‡ γ…£ α†ž γ„΄ β†’ Space β†’ γ„΄ α†ž α†ž γ…£ γ…‡
  • iPhone: γ…‡ α†ž γ…‘ β†’ Space β†’ γ…‡ γ…£ α†ž γ„΄ β†’ Next Character β†’ γ„΄ α†ž α†ž γ…£ γ…‡

Moreover, the size of each button was smaller, making people produce more typos than ever. For these reasons, I decided to replicate the Galaxy Cheon-Ji-In experience on iPhones.

πŸ’ŽGoal

Let's recreate the original Cheon-Ji-In for iPhones!

🍯Extra Tip

I also open-sourced the research notes for this project.

First of all, I checked the legal rights. I found that the patent holder, μ‘°κ΄€ν˜„ (Cho Kwan-Hyeon), had donated the patent to the Korean government, and Cheon-Ji-In had become the national standard for input methods, publicizing the legal rights to the keyboard. So I confirmed these details and then moved on to the development process.

πŸ›  Readying the Tech​

I first read through Apple's Creating a Custom Keyboard document. It was similar to creating a regular iOS app β€” create ViewControllers and embed the logic inside. However, I wanted to try SwiftUI since it was my first time using it. Moreover, SwiftUI Grid would be a clean approach to organizing buttons. Still, I figured that this class is more suitable for things like the Photos app, which has numerous elements to lay out, and a simple HStack and VStack (similar to display: flex on the Web ecosystem) would suffice my needs.

iPhone third-party keyboards use a unique structure known as extensions. Anything not running on the main iOS app is an extension β€” custom keyboards are extensions, iOS widgets are extensions, and parts of Apple Watch apps are extensions. I read through Ray Wenderlich and understood how keyboard extensions worked.

keyboard image of having gray background around γ…‡

keyboard image of having gray background around γ…‡

keyboard image of having gray background around γ…‡

A few early prototypes

The gray background of "γ…‡" was iOS's NSRange and setMarkedText. It helped enter the text by marking the currently edited characters, but such methods seemed more suitable for Pinyin in Chinese, not Cheon-Ji-In for Hangul.

Another interesting observation was that the colors of the default iPhone keyboards differed from any default system colors provided with iOS. I had to extract the color with Color Meters one by one.

πŸ˜Άβ€πŸŒ«οΈ But how do we make Cheon-Ji-In?​

Supplementary YouTube video on how Hangul system works.

I first thought of individual cases to figure out the input logic of Cheon-Ji-In, then figured that this is tremendously difficult. For example, take:

  • To input μ•Š, we start with μ•ˆγ…… and press γ……γ…Ž to acquire μ•Š. That is, we must check if the characters are "re-mergeable" with the character before.
  • From 앉, when we input γ…‘, it must be μ•ˆμ¦ˆ. Therefore, we must check if the last consonant is extractable from the previous character.
  • From 깚, when we input ㅂㅍ, it should result in κΉ”γ…ƒ. We must check if the consonants are extractable and switch between fortis and lenis (strong and weak sounds, like '/p/ and /b/', '/t/ and /d/', or '/k/ and /Ι‘/' in English).
  • From 갌, when we input γ……γ…Ž, it should result in 갏. More than switching between γ……, γ…Ž, and γ…†, we must consider double consonant endings like γ„½.

These are just a few examples. Even if we used KS X 1001 Combinational Korean Encodings, it took a lot of work to consider all cases. I concluded that using a Finite State Machine required more than 20 data stacks and dozens of states. (I am unsure of this calculation because I guessed some parts of it; there may be a more straightforward implementation.) If you want to try building such an algorithm, refer to this patent's diagrams. I found some implementations online, but they were long and spaghettified. Translating them to the Swift language and understanding the codes would take significant time.

But then I came to an epiphany:

πŸ’ŽIf there are too many cases...

why don't I hardcode every combination?

After all, aren't keyboards supposed to input the same character, given the input sequence is the same? What if I generate all possible combinations and put them into a giant JSON file? Korean character combinations are around 11,000. Even considering previous characters, the combinations seemed to be at most 100K levels. The size of the JSON file will not exceed 2MB.

We are not living in an era where we must golf with KBs of RAM on embedded hardware. As long as Hangul coexists with the human species, someone will recreate Cheon-Ji-In in the future, making constructing the complete Hangul map worth it.

πŸ–¨οΈ Hwalja: The most straightforward Cheon-Ji-In implementation​

Therefore, I created Hwalja: the complete map πŸ—ΊοΈ of Hangul, containing all such states and combinations of Cheon-Ji-In. There are around 50,000 states, and the minified JSON is about 500 KB. (Note: Hwalja means movable type in Korean.)

To implement additional higher-level features (such as removing consonants, not characters, on backspace or using timers to auto-insert "next character" buttons), we need more functional workarounds; however, the critical input logic is as simple as the following:

const type = (prev: string, Hwalja: hwalja, key: string, editing: boolean) => {
const last_two_char = prev.slice(-1)
const last_one_char = prev.slice(-2)
if (editing && last_one_char in Hwalja[key]) return prev.slice(0, -2) + Hwalja[key][last_one_char]
if (editing && last_two_char in Hwalja[key]) return prev.slice(0, -1) + Hwalja[key][last_two_char]
return prev + Hwalja[key]['']
}

I boldly claim this is the simplest implementation of Cheon-Ji-In, given its five-liner.

Some may ask how I preprocessed such large combinations; I set the 11,000 final Hangul characters as the destination and traced back what would've been the previous state and what button the user must have entered last. For example, to input μ—­, the previous state must have been μ—¬, and the keypress must have been γ„±. Of course, there were many more edge cases. My work from four years ago helped a lot. The following is an interactive example of Cheon-Ji-In, made with Hwalja.

πŸ§ͺTry it out!

This is an interactive demo of Cheon-Ji-In, made with Hwalja.

I open-sourced Hwalja for platform-agnostic usage.
Please try out the above demo!

πŸ’ŽDon't be mistaken...

Hwalja is the most simplest implementation, not the lightest.

Can't we use combinatory Hangul sets and normalize the combinations to reduce the case count?

On the Hwalja project, Engineer 이성광 (Lee Sung-kwang) pointed out that using Normalization Form D and decomposing consonants will reduce the case count. I only considered Normalization Form D, but Engineer 이성광 is correct. For example, we decompose μ•ˆλ…• as μ•ˆ α„‚α†žα†žγ…£γ…‡ and use Hwalja to gather α†žα†žγ…£ into γ…• and then normalize γ„΄γ…•γ…‡ into λ…•.

I decided to maintain Hwalja's current approach because it aims for the easiest and simplest Cheon-Ji-In implementation. The current system enables developers to stick with "substring" and "replace." If I add dependencies on Normalization Form D and Unicode Normalization, the Hwalja project may be lighter, but the developers using Hwalja must add additional handlers for normalizations. I created Hwalja because using Automata and Finite State Machines had steep learning curves. Thus, requiring any learning curves to use Hwalja violates the original purpose. Also, the final minified version is already 500KB, which is manageable for a full-fledged input engine.

πŸ€– Implementing Keyboard Autocompletes​

Cheon-Ji-In users can type at blazing speeds because of their active use of autocompleted texts (Apple QuickType). In addition, these autocompleted texts continuously learn from the user to assist with typing.

Fortunately, Apple's UIKit supports UITextChecker, which frees us from going down to Core ML and Neural Engine levels. Korean is also supported, and we can use learnWord() and unlearnWord() to record data on user activities.

import UIKit

let uiTextChecker = UITextChecker()
let input = "ν–‰λ³΅ν•˜"
let guesses = uiTextChecker.completions(
forPartialWordRange: NSRange(location: 0, length: input.count),
in: input,
language: "ko-KR"
)

/*
[
"ν–‰λ³΅ν•œ", "ν–‰λ³΅ν•©λ‹ˆλ‹€", "ν–‰λ³΅ν•˜κ²Œ", "행볡할", "ν–‰λ³΅ν•˜λ‹€", "ν–‰λ³΅ν•˜κ³ ", "ν–‰λ³΅ν•˜μ§€",
"ν–‰λ³΅ν•˜λ‹€κ³ ", "ν–‰λ³΅ν•˜λ‹€λŠ”", "ν–‰λ³΅ν•˜κΈ°", "ν–‰λ³΅ν•˜λ©΄", "ν–‰λ³΅ν• κΉŒ", "ν–‰λ³΅ν•˜κΈΈ",
"행볡함을", "ν–‰λ³΅ν•˜κΈ°λ₯Ό", "행볡함", "ν–‰λ³΅ν•˜λ‹ˆ", "ν–‰λ³΅ν•œν…Œ", "ν–‰λ³΅ν•˜μž", "ν–‰λ³΅ν•˜λ„€"
]
*/

I used such features to implement the autocomplete feature. Sometimes the flow feels unnatural, or the keyboard does not suggest anything, but this is a perfect implementation for an MVP.

Happy 2023 πŸ’™

Happy 2023 πŸ’™

⌨️ Advancing Keyboard Functionalities​

Cheon-Ji-In, rooting from the 10-key keypad, has many higher-level functionalities, such as long-pressing backspace to delete multiple characters until you release the key or holding any key to input the corresponding number key. I used Swift's closure to extend the keyboard component.

struct KeyboardButton: View {
var onPress: () -> Void
var onLongPress: () -> Void
var onLongPressFinished: () -> Void
var body: some View {
Button(action: {})
.simultaneousGesture(
DragGesture(minimumDistance: 0) // <-- A
.onChanged { _ in
// Code to be executed when long pressed or dragged
onLongPress()
}
.onEnded { _ in
// When long press or drag gesture finishes
onLongPressFinished()
}
)
.highPriorityGesture(
TapGesture()
.onEnded { _ in
// Code to be executed on tap
onPress()
}
)
}
}

Code simplified for explanation. KeyboardButton.swift

I found an ingenious implementation on the part marked A. With this, I can successfully implement two features with one code.

  • Flicking (swiping) on a button to input numbers.
  • Long-pressing on a button to input numbers.

It utilizes iOS's behavior that when the minimum distance of DragGesture is set to 0, iOS cancels the highPriorityGesture when it recognizes long-press and falls back to DragGesture.

Furthermore, I used Combine, introduced with iOS13. Combine Framework is a Declarative Swift API to implement asynchronous operations. With this, we can create timers to implement the "long press backspace" action.

struct DeleteButton: View {
@State var timer: AnyCancellable?
var body: some View {
KeyboardButton(systemName: "delete.left.fill", primary: false, action: {
// on tap, execute the default delete action.
options.deleteAction()
},
onLongPress: {
// when long pressed, create a timer that will trigger every 0.1 seconds.
timer = Timer.publish(every: 0.1, on: .main, in: .common)
.autoconnect()
.sink { _ in
// while pressing the button, execute the delete action every 0.1 seconds.
options.deleteAction()
}
},
onLongPressFinished: {
// when the long press finishes, cancel the timer.
timer?.cancel()
})
}
}

Code simplified for explanation. HangulView.swift

With these codes, I implemented particular functionalities using long-press or drag gestures.

🦾 Accessibility and usability​

I added a few helpful accessibility features. For example, if the user enables "bold text," the keyboard button will reflect the change. The following code implements such behavior.

let fontWeight: UIAccessibility.isBoldTextEnabled ? .bold : .regular

Bold Text Enabled

Bold Text Enabled

Bold Text Disabled

Bold Text Disabled

Also, I found one feature particularly inspirational. This keyboard is primarily for those Galaxy android devices with a "back" button in the bottom right corner. Galaxy users are used to dismissing the keyboard with the "back" button. So I placed the keyboard's dismiss button in the bottom right corner to resemble this.

Pressing the bottom right corner button dismisses the keyboard.

Pressing the bottom right corner button dismisses the keyboard.

πŸ§‘πŸ»β€πŸŽ¨ Using Midjourney to create the app icon​

Midjourney Images

Images created with Midjourney

I used Midjourney, a text-to-image AI program, to create the app icon. This is called prompt engineering. Creating paintings with various keywords was amusing.

☁️ CI/CD with Xcode Cloud​

Finally, I built CI/CD using Xcode Cloud (released in 2022). When using this, if you push your React code to GitHub, Vercel will build and deploy it independently. iOS apps are compiled and stored on the Apple Xcode Cloud servers. For Apple iPhone apps, there is an App Store review process, so they are not automatically distributed. (You must select a build in the App Store console and hit the "request review" button.) Still, it's much easier than creating an archive file in Xcode and manually uploading it.

You can check the build linked with GitHub on the App Store console

You can check the build linked with GitHub on the App Store console

Push notifications are supported.

Push notifications are supported.

🏁 Finishing up​

It has been a while since I did iOS development; it was a thrilling experience. The iOS platform has greatly matured. In particular, while working on Hwalja, I felt that Hangul was meticulously engineered. Most of all, I felt good because I made this app for my parents as a present. I will finish this article by attaching the links.

πŸ’™A five-star review on the App Store and a star on GitHub would really help me!

β€’ Counting...
πŸ—£Talk is cheap; show me right now!

Of course. Click on the black oval below. It will display a song I am currently listening to or any of my 30 most recently played songs.

Good artists copy; great artists steal β€” and I am now replicating Vercel's DX VP Lee Robinson's idea, LeeRob.io. Known for being a excellent testing bed for new Next.js features, Lee Robinson has one outstanding functionality: It will display the owner's song currently playing.

leerob.io

Now Playing β€” Spotify @ leerob.io

I have an unmistakable taste in music genres and longed to implement this one day on my website. However, I wanted it to be a technical challenge rather than simply recreating it. I also tried out various music services, making me postpone the development. I kept delaying the action until Apple released an exciting feature in 2022 called the Dynamic Island.

The Dynamic Island

The punch-hole camera at the top will reshape itself into different widgets.

The Dynamic Island perfectly satisfied my desire for a technical hurdle, so I planned to implement it with Web technologies. I also checked out different copies of the Dynamic Island for Android products, which all had awkward animation curves, further getting me interested in learning such details.

πŸ’‘Goal

Let's recreate the Dynamic Island on the Web!

πŸ’°Extra Tip

I also open-sourced my research notes for this project.

πŸ›  Readying the Tech​

I went with the most familiar choice for the framework: Next.js and Tailwind. However, the animation troubled me. I have never dealt with anything more complicated than ease-in-ease-out CSS animations. I learned about a production-ready motion library named Framer Motion and opted for it.

Framer Motion

Framer Motion

πŸ§‘πŸ»β€πŸ« The Physics of Animations​

We first want to understand why Apple's animations look different from the others. We can classify animations into two big categories. (At least Apple classifies theirs into these two categories in their platform.)

Parametric Curve. Given a start and an endpoint, place a few control points and interpolate the curve in between using mathematical formulas. Depending on the type of interpolation formula, it can be a linear curve, polynomial curve, spline curve, etc. The BΓ©zier curve that many developers often use falls under this category.

Spring Curve. Based on Newtonian dynamics (Hooke's law, the law that governs a spring's physical motion), we calculate the physical trajectory using stiffness and dampening. Learn More: Maxime Heckel

Any further discussions on animation curves will be out of the scope of this post. Most replications of the Dynamic Island choose parametric curves (it's the easiest, standardized in CSS). Apple uses spring motion, supposedly to mimic real-world physics. Framer Motion, the library I chose for this project, also provides a React hook named useSpring() to give control of such physical animations.

import { useSpring } from 'framer-motion'
useSpring(x, { stiffness: 1000, damping: 10 })

πŸ›₯ To the Dynamic Island​

Source: Apple

Source: Apple

I had to study the different behaviors of the Dynamic Island with Apple's official documents. The Dynamic Island can be any of the following forms:

Minimal. The widget takes one side of the Dynamic Island when two or more background activities are ongoing.

Minimal. The widget takes one side of the Dynamic Island when two or more background activities are ongoing.

Compact: The standard form, where the widget takes both sides of the Dynamic Island when there is one ongoing background activity.

Compact: The standard form, where the widget takes both sides of the Dynamic Island when there is one ongoing background activity.

Expanded: The biggest size of the the Dynamic Island, shown when the user long-presses on the Dynamic Island. It cannot display content in the red area.

Expanded: The biggest size of the the Dynamic Island, shown when the user long-presses on the Dynamic Island. It cannot display content in the red area.

Furthermore, I found the following image on the Web. Apple puts Expanded for all big sizes, but this image describes the Dynamic Island's expanded states.

Different sizes of the Dynamic Island. Considering that there is a typo on the image, it doesn&#39;t seem like an official document.

Different sizes of the Dynamic Island. Considering that there is a typo on the image, it doesn't seem like an official document.

I declared the type as the following, reflecting the earlier information.

export type DynamicIslandSize =
| 'compact'
| 'minimalLeading'
| 'minimalTrailing'
| 'default'
| 'long'
| 'large'
| 'ultra'

Then I spent a whole night (2022-10-16) and figured out how to naturally shift sizes with Framer Motion. It uses the following codes. I especially experimented with a lot of stiffness and dampening values; the golden ratio was const stiffness = 400 and const damping = 30.

<motion.div
id={props.id}
initial={{
opacity: props.size === props.before ? 1 : 0,
scale: props.size === props.before ? 1 : 0.9,
}}
animate={{
opacity: props.size === props.before ? 0 : 1,
scale: props.size === props.before ? 0.9 : 1,
transition: { type: 'spring', stiffness: stiffness, damping: damping },
}}
exit={{ opacity: 0, filter: 'blur(10px)', scale: 0 }}
style={{ willChange }}
className={props.className}
>

As of Oct 16th, 2022

As of Oct 16th, 2022

πŸ“ž Hello?​

Before connecting with external APIs, I mimicked Apple's incoming phone call widget. There's no big reason for this; it was just to get used to the animations. I love how it turned out; it looks exactly like the official Apple animation! Finished on Oct 20th, 2022.

🍎 Apple Music API​

Then I needed to integrate with Apple Music's API. I previously made a technical demo with Spotify's API at the beginning of 2021. Spotify officially has a Now Playing API, so naturally, I expected a similar Now Playing API at Apple Music.

I was a huuuuge fan of IZ*ONE back then... πŸ˜…

I was a huuuuge fan of IZ*ONE back then... πŸ˜…

Spotify Now Playing API

Spotify for Developers

When Apple Music API 1.1 was released, Apple released an API named Get Recently Played Tracks β€” the closest we ever got to the Now Playing API. FYI, such an API did not even exist two years ago.

Apple Music Get Recently Played Tracks

FYI, such an API did not even exist two years ago.

Now we need to issue and save the different tokens needed for OAuth 2.0. Spotify almost precisely followed the OAuth 2.0 standards, while Apple required a little more processing. The Now Playing API, especially, accessed the data on the Apple Music server and the user's private data, so I needed a separate privilege control with user access grants. Moreover, all of these needed to be better documented, making it significantly more complicated. I needed the following:

Aviation IndustrySame Concept at AppleExplanation
Establishing your Aviation CompanyApple Developer Paid Account$99
Pilot LicenseApple Music Key from Apple Developer Website.Ensures that you have permissions to get data from Apple Servers
Air Carrier Operating PermitApple Music Main Token from requesting to Apple ServerA permit to attach when requesting to Apple
Airline tickets for PassengersApple Music User Token from user's grantEnsures if the user wants to use my service

All four pieces of information should work harmoniously to retrieve users' data (of what they were listening to). All the others were pretty straightforward (More Info: Research Note.) The trickiest one was User Token. User Token was specialized for iOS, macOS, and MusicKit on the Web. MusicKit on the Web was intended for Apple Music Web clients, like music.apple.com, Cider, and LitoMusic and was not designed for such API request bots. Still, Apple put MusicKit on the Web will automatically take care of it without documenting it. So what are we going to do? Reverse engineer the API.

Apple: Documentation? Nah.

Apple: Documentation? Nah.

MusicKit on the Web

MusicKit on the Web. Is Apple using Storybook? Based on Apple's track record, this MUST be an Alpha of Alpha.

🦾 Cracking the MusicKit​

First, I mimicked the specs of MusicKit on the Web, creating a website.

This website will do nothing else than calling the authorization process.

This website will do nothing else than calling the authorization process.

It will show the Apple Music access grant page like this.

It will show the Apple Music access grant page like this.

Then digging into the request headers of the website will reveal the media-user-token.

There we go.

There we go.

Finally, I can successfully get a JSON response from the Apple server by filling in other information with Postman software. Finished on Oct 28th, 2022.

It sounds straightforward, but it took me days to figure it out. 😭

It sounds straightforward, but it took me days to figure it out. 😭

Requiring the information whenever someone accesses the Web will deplete my API quota in minutes. I wanted to make a cache server of some sort. But remember, the best database is no database.

Don't use the database when avoidable. Which is always more often than I think. I don't need to store the 195 countries of the world in a database and join when showing a country-dropdown. Just hardcode it or put in config read on boot. Hell, maybe your entire product catalogue of the e-commerce site can be a single YAML read on boot? This goes for many more objects than I often think. It's not Ruby that's slow, it's your database

So I made a GitHub Secrets that holds my private keys and made GitHub Actions to retrieve the data every few minutes and publish it on GitHub.

I don&#39;t know how long I struggled to find this typo.

I don't know how long I struggled to find this typo.

🎼 Equalizers​

Similar to finishing the phone call component, I completed the music player component.

But something felt empty.

But something felt empty.

There were no equalizers! I searched for good equalizers for React but later decided to implement this with Framer Motion. So these are the few iterations of the product.

FANCY by TWICE

After Like by IVE

Lavender Haze by Taylor Swift

Hype Boy by NewJeans

Each stick of the equalizers will have a random length. But as seen in the last song, something was also awkward. Usually, vocal music has smaller amplitudes on low and high frequencies, but completely randomizing the amplitude will also make those frequencies have similar ups and downs. So I set a base length as an anchor and made the randomized values slightly shake the values. Finally, I set the equalizer color to match the album cover's key color. I did not need additional work; it came with the Apple Music API.

Much smoother.

Much smoother.

πŸ”Ž The Physics of Squircles​

We're not done yet! Such completed widgets still felt slightly off; the curves felt too sharp. We needed squircles.

Squircle

Source: Apple's Icons Have That Shape for a Very Good Reason @ HackerNoon

A standard curve made by setting a border-radius has constant curvature, leading the end of the curve to have a sudden surge in curvature, making it feel sharp. On the contrary, gradually increasing and decreasing the curvature will make a much more natural curve.

For those AP Physics nerds, it's like uniformly increasing the jerk instead of uniformly accelerating.

For those AP Calc nerds, a squircle is a superellipse β€” the set of points satisfying the following equation. nn is the curvature, aa is the length in xx axis, and bb is the length of yy axis. Here, For any deeper dives, check out Figma's article on squircles.

∣xa∣n+∣yb∣n=1{\lvert{x \over a}\rvert}^n + {\lvert{y \over b}\rvert}^n = 1

I used this tienphaw/figma-squircle to create an SVG squircle and cut the Dynamic Island with the clipPath property.

I saw a similar bug at the iOS 16 Notification Center. Maybe Apple is also clipping?

I saw a similar bug at the iOS 16 Notification Center. Maybe Apple is also clipping?

However, to clip all frames of the animations, we would have to create squircles for every frame, risking speed. Therefore, I opted to use borderRadius for the animation and clipped it right after the animation finished. It was barely noticeable, even if you looked very closely, so it was a good trade-off between performance and detail. Finished on Nov 11th, 2022.

Look closely. The border cuts into the squircle when the animation exits.

Look closely. The border cuts into the squircle when the animation exits.

πŸ’¨ Optimizing Performance​

CSS has a will-change property. It tells the browser which elements on the screen will change, preparing the browser for it beforehand. The browser rasterizes every frame if there is no will-change property; however, the browser will reuse a static image while the animation processes, rasterizing only when the animation finishes. Therefore, the animation may seem blurry depending on the type, but it will give more fluidity for transform, scale, or rotate animations.

The Dynamic Island usually modifies scale and opacity, so it was perfect for will-change. We can apply the property in Framer Motion, as in the following example:

import { motion, useWillChange } from 'framer-motion'

// ...

const willChange = useWillChange()

// ...

<motion.div style={{ willChange }}/>

πŸ”— Integration​

Last but not least, I made pages for integration purposes (/embed-player, /embed-phone-call.) I did not want to add Tailwind or Framer Motions as a dependency on other websites, so I tried to use the iframe method. I used davidjbradshaw/iframe-resizer to make a responsive iframe. I also used CSS's position: sticky property to make it stick on specific pages β€” it's on this website, too!

πŸ’­ Postmortem​

This completes the project. Here are some thoughts:

First of all, I succeeded in managing a mid-to-long-term side project. I have always respected people with persistence, and I was very happy to finally complete the project after working on it for more than a month. I was also delighted that I successfully juggled 🀹 CS Major classes, job searching, and side projects simultaneously (although they still need to be completed).

Second, I would like to express my gratitude to Tim (cometkim), whom I met during my previous internship. I had a memorable experience during this internship when Tim showed me that it is possible to reverse engineer a compiled webpack codebase. It was indeed a spiced-up 🌢 and intense learning environment. However, that gave me confidence when I was blocked by Apple's undocumented API services.

I am also developing the habit of note-taking. There's a saying that people's will is weaker than we think, so it's better to reshape the environment. I did a decent job remodeling my website as a digital garden (or Extracranial Memex) that is optimized for note-taking. I want to continue taking notes and learning new stuff. Tim also had a significant effect my note-taking by showing his workspace on Roam Research.

Anyhow, this concludes the project. Thank you, everyone!

β€’ Counting...

I worked as a full-time Mini App researcher intern at Karrot (Korean Unicorn Company πŸ‡°πŸ‡·πŸ¦„). This is what I found and learned from it.

πŸ“± Mini Apps​

Mini Apps are a collection of third-party services that run on top of a native Super App.

info

Imagine the Shopify app hosting thousands of small shopping mall web apps. You sign in once, and you can access all the apps. No need to log in, no need to download, no need to update; it goes beyond Shop Pay, which simply provides a payment gateway. There could be a Game Super App that hosts thousands of mini-games, a Shopping Super App that hosts thousands of mini-shopping malls, a Social Super App that hosts thousands of mini-social networks, and so on.

How is this different from the status quo? You can get the best of both worlds; deploy it as an app (gets the best retention and metrics) with making a web (simple JavaScript development)

At the same time, you can use Super App's complete account and wallet information (no need to sign up or bother to enter data)

Therefore,

  • it is faster than making an app
  • it reaches more demographic than making a web
  • it can target more user base than making an app
  • it guarantees unparalleled reachability, retention, and payment conversions.

The so-called BAT (Baidu, Alibaba, and Tencent) is already dominating the Chinese market. WeChat, the first player in the market, already has a Mini App ecosystem of 400 million active daily users and 900 million active monthly users. Apple and Google are struggling to maintain their platform power in the Chinese market because of these Mini Apps. For Chinese users, the App Store and the Play Store are like Internet Explorer. Just as IE only exists to download Chrome, so the App Store and the Play Store are simply gateways for downloading WeChat.

Of course, international businesspeople have reacted by replicating this outside of China. Snap tried to create Snap Mini, and Line tried to implement Line Mini Apps. Karrot, a Korean Unicorn company that has 60% of Korean citizens as their user base, also wants to become a Super App and create a Mini App environment. Offering more information on the Mini App system is out of the scope of this post; please refer to Google's in-depth review on Mini Apps.

πŸ’‘So far
  • A Mini App is easy to make (web-like developer experience) while having powerful business effects (app-like user experience).
  • Karrot wants its internal and external partners to provide service through the Mini App within the Karrot App.
  • Karrot thinks that all Super Apps will want to make Mini App Systems and that there will be repeated work and fragmented developer experience if all the Super Apps make their own Mini App systems.
  • Goal. Figure out a Mini App Model that will succeed in Korea, Japan, United States, United Kingdom, and so on. (Karrot's business regions)

πŸ”₯ For Thriving Ecosystems​

The previously mentioned BAT have created their proprietary languages and browsers, seemingly inspired by the web. These three companies possess immense platform power; they can ask whatever they want from the developers. However, most Super App services cannot justify developers following their demands, like asking devs to use non-standard SDKs or asking for logical branching for detecting a Mini App environment. In that case, developers will give up creating a Mini App to spend that effort on creating an iOS and Android app (which has a much higher chance of success). If you have other thoughts, why is PWA still stagnating? Therefore, a standard Mini App should follow the web standard. Developers should deploy their web app as a Mini App with little to no change.

😻 For Beautiful Interfaces​

Having a pretty design is much more important than you think. This statement is especially true for permission request screens. If, for example, a service requires location without context, the user will likely decline, affecting the service's stability. I mean that permission requests should make sense, for which we require persuasive interfaces and designs. Therefore, it needs to be pretty.

Let us take Starbucks as an example. The following image shows permission requests from Starbucks Web, App, and Mini App. Which one do you think you will grant? Which one will you decline?

Web

Web

Mini App

Mini App

App

App

Most users will likely grant our request as we go to the right, given more details. A standard Mini App should at least provide the context level of the middle screenshot.

πŸ“¨ For Prettier Permission Requests​

The geolocation permission requests mentioned above display whenever JavaScript calls the Geolocation API. It's not magic β€” executing the following code will prompt the permission request.

navigator.geolocation.getCurrentPosition()

Based on backgrounds 1 and 2, we would need to provide a more persuasive and prettier permission request when we execute the above code, based on the Web Standards.

🌐 But Isn't That the Browser's Job?​

Yes, displaying such a request screen falls under the browser's responsibility. Therefore, we will meet the above permission request if we call the Geolocation API inside a Web View (specifically, WKWebView for iOS). This behavior also happens inside Karrot Mini, an intermediary version of the Mini App system built by Karrot. So, how can we solve this? Do we plan on making a new browser?

Even worse, an unknown URL can urge people to deny such a request.

Even worse, an unknown URL can urge people to deny such a request.

🎭 We don't care who's who​

For web apps, 99.99% don't care who's who. They call the function wherever they need it. So, what if we make a fake navigator like the following?

const navigator = {
geolocation: {
getCurrentPosition(success, error) {
// do some random stuff...
},
},
}

JavaScript does not check for the authenticity of the navigator. Therefore, we can inject whatever behavior we want. This methodology is called Shim.

In computer programming, a shim is a library that transparently intercepts API calls and changes the arguments passed, handles the operation itself, or redirects the operation elsewhere. β€” Shim (computing)

I have created a demo website where a cat gif asks for location permission.

Default behavior

Default behavior

Injected behavior

Injected behavior

If we advance this methodology and implement the Document Object Model in JavaScript, we can inject all behaviors that are deemed suitable for Mini Apps.

πŸ—Ώ For Consistent Experiences​

A Mini App is all about a consistent experience. It's akin to universal components like Refresh, Favorite, or Close buttons not changing in browsers when you navigate different websites. For more information on consistent experiences, please refer to Google's Mini App User Experiences document. Of course, this consistency will require us to inject standard components.

⚑️ For Snappy Experiences​

Opening and closing different Mini Apps should at least be faster than websites, if not faster than their app versions. For this, we would need prefetching policies for Mini Apps. We also want data persistency when opening and closing apps so we can contain the Mini App inside an iframe and delegate the managing to the Super App's web view. This procedure will also require implementing crossOriginIsolated, Cross-Origin-Opener-Policy, and Cross-Origin-Embedder-Policy headers so that the codes inside the iframes will not have access to data outside.

πŸ₯Ά How'd You Solve the Icing Problem?​

Super App force-quitting frozen Mini App

Super App force-quitting frozen Mini App

There's another problem here: The iframe works on a single thread, so when the Mini App freezes, the entire Super App will also freeze, including the quit button.

πŸ•Έ Multi-threaded Web​

πŸ€”Isn't JavaScript Single-Threaded?

Correct and wrong.

  • JavaScript inside a browser is single-threaded.
  • We can, however, create multiple threads with web workers.

Then, if we run our iframe inside the web worker, the Super App will effectively solve the icing problem.

πŸ§‘β€πŸ”§ No DOM APIs in Workers​

Web workers do not have access to DOM APIs. However, just like our shimming the Geolocation API, the DOM API is also an Object Model written in JavaScript. Therefore, we would effectively solve this problem if we could provide the fake DOM API inside the web worker and mirror the manipulations to the real DOM. Also, we can police the manipulations between the two DOM APIs by verifying if this operation is permitted or not.

πŸ‘» Mission Impossible​

In the film Mission Impossible 4, the protagonist, Ethan, acts like each other in between two terrorist groups, negotiating them in Ethan&#39;s favor.

In the film Mission Impossible 4, the protagonist, Ethan, acts like each other in between two terrorist groups, negotiating them in Ethan's favor.

Luckily, there is previous research conducted. Google created WorkerDOM for their Accelerated Mobile Pages, and BuilderIO created Partytown to separate 3rd-party codes from web workers. However, none of them is fully appropriate for Mini Apps. Google started WorkerDOM when Spectre security vulnerability was a thing and did not utilize SharedArrayBuffer and Atomics. Therefore, WorkerDOM cannot make synchronous data transfers (elaborated later). Partytown cannot Event Prevent Default. But fundamentally, we can use this Mission Impossible model to isolate and quarantine third-party codes.

πŸ’½ No Synchronous Data Transfer​

Web Workers do not have synchronous data transfer by default. Synchronous data transfer is essential for many places; for example, drawing animations or displaying a map on the screen requires it because we need to calculate the pixels on the screen to render the next frame. However, since we do not have synchronous DOM APIs inside of Workers, all of the animation codes will not respond.

🀝 Then Make It Synchronous!​

JavaScript was meant to be asynchronous from the beginning due to user interactions. That is why we have the notorious triumvirate: callbacks, promise, async/await. Synchronously performing such asynchronous JavaScript means that if I call a specific function, the entire operation will sit there and wait until it gets the response.

We can make this synchronous using the following two methods.

  1. Synchronous XMLHttpRequest
  2. SharedArrayBuffer and Atomics
    • SharedArrayBuffer is a shared data channel between Web Worker and the main thread. The Atomics operation ensures thread safety in such mutual operations. At the same time, it means we can pause the Worker thread, harnessing the power of Atomics. Mini Apps already use Web Workers, so using SharedArrayBuffer and Atomics seems more suitable.

βœ‚οΈ Oops, You Got Disconnected​

We cannot access the regular web environment offline. For example, if we have a calculator Mini App, we expect it to work without network access. This condition also tightly relates to initial loading speeds. Although we can use progressive web apps to cache the website offline, it also requires plenty of initial network requests to cache it, deeming it inefficient.

πŸ“¦ Pack it up!​

Source: web.dev/web-bundles

Source: web.dev/web-bundles

There is also a solution. Google is already experimenting with WebBundle, based on the CBOR file format. WebBundle contains all the necessary files for the web, including HTML, CSS, JS, and images, into one file. WebBundle is already enabled in Chrome, and Google is experimenting with this technology in various ways. But sadly, Google's hidden goal is to disarm and bypass URL-based adblocking technologies. Related Thread.

🦠 What if it gets malicious code?​

A perfectly fine code on GitHub can suddenly become an attacking code in NPM. For example, UAParser.js, a popular library marking 40M+ monthly downloads, once got hacked and distributed malicious codes. Accident Records.

Such a trustful library with big names can suddenly hit you back.

Such a trustful library with big names can suddenly hit you back.

Essential in any way, the Super App provider should get the package from Mini App providers, audit them, and host by themselves so that others cannot swap out codes. However, there is very little to say because this part of the system is developed almost wholly.

😊 Conclusion.​

If we solve all the abovementioned problems, we can finally construct a proper Mini App environment. However, as you can tell, each issue exhibits a vast range of technical and administrative challenges. I focused on problems #2 and #3 during my internship, but the resource was extremely scarce since it delved into such a niche area of interest. I imagine seeing a Mini App environment that is β‘  internationally accessible β‘‘ scalable β‘’ interoperable with Web Standards β‘£ and maximizing values for creators and users without being confined to a specific geographic region like China.

But the challenges will only delay our joyful union.

Heads Up!
  • I wrote this post more than 2 years ago.
  • That's enough time for things to change.
  • Possibly, I may not endorse the content anymore.
Google Latest Articles Instead

β€’ Counting...

tossface.cho.sh

tossface.cho.sh

info

I would like to thank @sudosubin and the Tossface team for reviving Korean emojis with Unicode PUA!

Background​

Tossface is an emoji font face a Korean (almost) Decacorn company, Viva Republica, created. Tossface initially included a series of intentionally divergent emoji designs, replacing culturally specific Japanese emojis with designs representing related Korean concepts and outdated technologies with contemporary technologies.

Tossface&#39;s first release. Toss: &quot;Right Now, The Right Us (Hinting Modern &amp; Korean Values)&quot;

Tossface's first release. Toss: "Right Now, The Right Us (Hinting Modern & Korean Values)"

Unfortunately, these replacements caused backlash from multiple stakeholders, and Viva Republica had to remove the emojis.

Unicode Private Use Area​

However, there is a hidden secret in Unicode; There is a unused, hidden area from U+E000-F8FF, U+F0000-FFFFD, U+100000-10FFFD, which is known as Unicode Private Use Area. This area will remain unassigned for standard emojis, and companies can use it at their own will.

Regrettably, those letters with Korean and contemporary style in a clean and neat tone and manners disappeared into history. Therefore, I have proposed returning the emojis using a standard technology known as Unicode Private Area.

@toss/tossface/issues/4

@toss/tossface/issues/4

After about three months, Viva Republica accepted the request. They redistributed those emojis in Tossface v1.3, from PUA U+E10A to U+E117.

But how shall I type?​

However, these emojis remained uncharted in the Unicode standard. PUA U+E10A to U+E117 cannot be inputted with the standard keyboard, nor does it appear on the emoji chart. Ironic that we finally got the glyphs back but can't type.

So I have created a small website where you can check the glyphs and copy them. I call these Microprojects. They're perfect for trying out new technologies; I wanted to try Astro, but it kept giving me unrecognizable errors primarily because the platform was still in an early stage, so I used Next.js, Vercel, and Tailwind.

Now, it somehow became a Museum of Korean Culture​

After creating the website, it now looked like a Museum of Korean Culture, so I added some text in English and shared it publicly.

Page View Nationality Break Down

Page View Nationality Break Down

Postmortem​

It was a fast and fun project before the beginning of school!

Heads Up!
  • I wrote this post more than 2 years ago.
  • That's enough time for things to change.
  • Possibly, I may not endorse the content anymore.
Google Latest Articles Instead

β€’ Counting...

Banner image showing onboarding goods such as MacBook, charger, sticker, guide, etc.

It's already been a week since I've been living as an intern at Karrot (2022-05-22), a Korean Unicorn Company. It's an internship that lasts for three months, but it would be good to organize the interview and onboarding before it's too late.

Application and Interview​

A Great Start​

It starts with Karrot Market Team Recruiting Site. I got a lot of feelings that Karrot was putting a lot of energy into discovering good talent. While running the recruitment website neatly, Karrot wrote down all the information applicants might be curious about. Above all, Karrot wrote the JD (Job Description) specifically and clearly. Some companies I interviewed with did not disclose the JD, so Karrot was much more considerate.

Karrot Mini R&D Engineer Intern JD

Who we are looking for.

Karrot Market is still actively using web technology to create mobile apps. The web is a great tool, but it still has a lot of limitations when it comes to native platform support. The OS's WebView environment is unsuitable for running multiple apps simultaneously. Due to the difference between the web security model and the basic OS security model, it is challenging to replicate the native experience. For example, if you request user location information through the web API, you will experience a different UI/UX from the user consent seen in native. The Karrot Mini team is looking for a breakthrough from the modern web, not the OS WebView. We are looking for someone who will break through what was initially thought to be challenging to achieve on the web and create an OS-level experience that can run entirely in the browser.

Specifically, they will

  • Study the next-generation web-based execution environment to be used in the Karrot market
  • Provide a sandbox environment to isolate multiple apps
  • Must provide Karrot market integration function through web standard interface
  • Implement a scheduler that can observe and control the running state of multiple apps

We are looking for someone.

  • Familiar with HTML, CSS, and JavaScript-based web development
  • Skilled in program development using JavaScript and TypeScript
  • Those who are interested in reading the DOM standard and implementing it themselves
  • Those who are interested in various web standard APIs
  • Have a basic understanding of the security model of web browsers
  • Those who want to operate an open-source project from the beginning

Even better if you

  • Have experience contributing to or operating an open-source project in which many people participate.
  • Have good knowledge of OS, scheduling, and concurrent programming
  • Know how to handle various programming languages
  • Have experience with system programming languages such as C/C++, Go, Rust, or Zig is preferred

Please Note...

  • This position is held for three months, and in some cases, a 6-month extension is possible

Procedure...

  1. Document submission
  2. Job interview
  3. Final acceptance

Document Screening

Karrot Market is accepting freestyle applications. Please freely express various information that shows your strengths. You can freely select the document format, such as word, pdf, or web link, excluding hwp files. Please forward your portfolio, GitHub link, etc., as needed.

Job Interview

This is the stage where you have an in-depth talk about your job-related experiences and competencies based on your resume and assignments. The job interview lasts from 1 hour to 1 hour 30 minutes with the Karrot Market team members who are highly related to the job.

Doesn't it look very detailed and subtle? The information transparency was excellent, allowing me to predict what position I would hold and what responsibilities I would be given even before the interview. The application process was also straightforward. I didn't have to write a cover letter, etc.; I only had to attach my existing resume. It took less than 15 minutes to apply.

The Interview​

As mentioned in JD, the interview was scheduled for 1 hour and 30 minutes. I had been interviewing for several companies before. Up to this point, the discussions I had experienced could be divided into two types.

Example of Behavioral Interview
  • If this happened within your team, how would you deal with it?
  • What do you think is the most important thing as a PM or developer?
  • Please describe this project written on your resume. What did you learn? What did you miss the most?
Example of Technical Interview
  • ~ Please solve this problem.
  • (In case of Web3 company interview) Please explain the concept of blockchain Proof of Stake. How is it different from Proof of Work? What problem are you trying to solve?
  • Please explain the difference between HTTP POST/GET/PUT, etc.

Among them, if you were looking for a Computer Science intern, you had to prepare well for the second technical interview. In the meantime, most of the companies that have been interviewed are preparable as above. Karrot's interview was different. They did not ask questions about the interviewer's knowledge, and we discussed practical work within 5 minutes of starting the discussion. I felt like I was in a team meeting rather than an interview. First, he explained the team's current problem. Then, he asked me to analyze the expected solutions presented and their strengths and weaknesses. During the actual interview, he explained the following information.

On Mini-Apps​

  • In China's WeChat, there is a Mini Program called Xiaochengxu.
  • A feature that allows sideloading of small programs within WeChat.
  • You don't need to install the app, take a QR code, and the mini-app loads super-fast, giving you a similar experience to the app.
  • At the same time, membership registration and payment connection are not required. Since WeChat ID and WeChat Pay are automatically linked, no roadblocks hinder the user's flow.
  • In China, the mini-app app ecosystem already dominated the market, and Line and Snap are also preparing for this trend.
  • Apple has also launched its mini-app, App Clips.

If you are curious about the mini-app, please refer to this article!

Karrot

Karrot

So, how did it go?​

The answers to the previous question lead to the next question.

Interview Questions
  • In the case of WeChat, they create their native client, and the native client runs the mini-app. However, in this case, mini-apps do not comply with web standards and use their security model, making it difficult to introduce them globally. Karrot Market is also envisioning a similar mini-app environment. What is the appropriate strategy for this?
  • β†’ It would be sufficient to implement a general-purpose mini-app that complies with standard web specifications and perfectly follows the web security model. In other words, you want to run a WebView inside the web. The first method that comes to mind is an iframe. What's the problem with implementing this in an iframe?
  • β†’ Since the external and internal codes of an iframe run on the same thread, the client app also freezes if the mini-app freezes. What should I do to solve this?
  • β†’ With Web Worker, it is possible to separate the mini app and the client app into separate threads. However, the Web Worker cannot access the DOM API if you do this. For example, you cannot use the DOM API called getClientBoundingRect. What should I do to solve this?
  • β†’ Provide a virtual DOM API that Web Workers can access. To solve this problem, Google developed a model called WorkerDOM. And an open-source project called PartyTown, an implementation that separates third-party JS code into a separate Web Worker, was recently released. So how can we implement a mini-app system using this?
  • β†’ Let's assume that the mini-app system is implemented using the underlying technologies of Web Worker and WorkerDOM. Then, can we implement forced shutdown and multitasking on the web within the web? What should I do?

It was not a typical interview, but it aimed to find out how to come up with an idea and find a solution at a practical level. It felt like I was having a coffee chat, and I thought I was receiving significant consideration even though I was the interviewee. The people team's efforts were evident in many aspects, such as promising to inform both successful and unsuccessful applicants of the results within three days and asking for understanding via e-mail when the announcement of the results was delayed.

If you are curious about the questions above, you can learn the answers by looking at the articles below.

Interesting Things about Karrot​

Onboarding

Onboarding

Interns with Power​

Our team consists of 8 people, and I felt like I became a core member of a tiny startup, not an intern at a Unicorn company. Interns were also given a fair amount of voice and power, and information and opportunities were unrestricted. Even as an intern, I could develop ideas, contribute to production-level products, and suggest new directions for product design. The team leader supported me in expressing more opinions on the first day, which helped me tremendously. I was learning on the job in the actual field.

Great Power and Responsibility​

Karrot showed me trust first by giving me as much freedom as possible. For example, I go to work freely between 9:00 and 11:00 and do not say hello when I leave work. I would not record work hours and only had to prove it with performance. I was an intern, so I asked a bunch of things, and I was impressed with team members saying: We trust you, proceed as you wish.

Working Anywhere​

It's related to the above, but our team has never gathered offline. I went to work alone with my team for the last few days. Currently, one of our team's developers is living in Jeju Island for a month. Nevertheless, all team members maintained the best performance. Also, the concept of asynchronous communication impressed me. As we move increasingly to remote work, it takes too much energy to hold meetings where everyone gathers in real-time. We instead document and record everything and use corporate messengers like Slack to handle everything during my working hours. Of course, this is based on trust between members and the freedom and responsibility above. (i.e., it is a system that runs under the belief that there are no employees to free ride like group assignments)

Transparent Information​

Weekly team meeting where all information is shared

Weekly team meeting where all information is shared

There are no restrictions for interns. You can view the server code of the Karrot market, check the sales volume of local advertisements, and view the minutes of meetings with Karrot investors. After reading this, you will probably think of Reed Hastings's No Rules Rules book.

In the corporate meeting every Monday, we share updates from each team. It was also impressive that no one used presentation slides; they were all written in a shared document in Notion for easier future reference. Overall, the culture promoted creative and powerful expression of opinions. In addition, it induced a responsible attitude by first trusting the employees.

Endless Debates​

Our team wrote all 277 replies that morning πŸ˜”

Our team wrote all 277 replies that morning πŸ˜”

I put the one that impressed me the most last. The discussion culture based on mutual respect was awe-inspiring. Two days after I arrived, we had a 6-hour meeting, and our team exchanged hundreds of Slack messages until dawn to discuss the direction of the product. While I've done various group assignments and student startup projects, I've never seen such deep and delicate affection for a product and heated discussions. Debating how to allocate limited resources to succeed in the market inspired me. Everyone communicated logically to understand the other people's points of view and find a middle ground. Even so, it was cool to see that the discussion never linked to personal feelings and respected each other.

Moving Forward​

I will be working on sandboxing for the mini-app standard in the future. Simply put, it's about creating a web within the web and the basis for the mini-app environment. I have a variety of technical & product goals. I plan to write another article after finishing my internship. Please look forward to the Karrot Mini Team!

Heads Up!
  • I wrote this post more than 2 years ago.
  • That's enough time for things to change.
  • Possibly, I may not endorse the content anymore.
Google Latest Articles Instead

β€’ Counting...

After a few years of technical writing, I felt limitations on writing platforms that hindered me from writing the best-class articles. Technological knowledge is dynamic and intertwined in that none of the current formats - academic papers, lecture videos, code examples, or straightforward posts - can best represent the knowledge. I have examined and observed some attempts that addressed this issue, namely, stuff called the second brain or digital gardens, but none of them seemed to correctly solve the problem. Therefore, I have distilled my inconveniences into this huge mega-post and imagined what I would've done if I had created the new incarnations of digital brains.

Update 2022-06-12

Since this post, I have extensively studied non-linear PKM software, such as Roam, Obsidian, Logseq, and Foam. I acknowledge that I misunderstood the concept of manual linking; that PKM software performs a fuzzy search to intelligently identify linked and unlinked references. I found some PKM software with automatic linkings, such as Saga or Weavit. But none of them worked how I expected. Manual linking helps refine the database. So, even if I make a Next-gen digital brain, I will not remove the linking process.

Update 2022-07-01

Well, you're now watching my next-gen digital brain! For the past two weeks, I have worked on the WWW project that built this website. It checks off almost all of the marks detailed in this post!

TL;DR
  • Create an aesthetic-interactive-automatic pile of code-image-repo-text that organizes-presents-pitches itself.
  • There is no manual tagging, linking, or image processing, etc., etc.
  • You just throw a random knowledge; creating a knowledge mesh network.
  • The algorithm operates everything. It will be contained, processed, organized, and distributed all around the world in different languages.
  • You don't tend knowledge. The algorithm penalizes outdated content (you can mark the post as evergreen to avoid this.)

So what's the issue?​

Apart from popular belief, I noticed the best method for managing a digital garden is not tending it. Instead, try to make a digital jungle - you don't take care of it; nature will automatically raise it. In other words, the digital brain should make as less friction as possible. The less you tend, the more you write.

Especially,​

I despise the [[keyword]] pattern prevalent in so-called second brains (obsidian, dendron, ...). Not to mention it performs poorly for non-alphabetical documents, it is manual - creates a lot of friction. The fact that you must explicitly wrap them with brackets doesn't make sense... What if you realize you want to make a linkage to a term you've been writing for 200 posts? Do you go back and link them all one by one? No! The solution must lie in algorithmic keyword extraction.

Organizing Contents​

Interconnected entities​

Practical knowledge does not exist in simple posts (though they might be straightforward). Create a knowledge bundle that interconnects GitHub Repository, Codes, GitHub README, and other posts in the same brain network. Examine how Victor's post has rich metadata for the paper, dataset, demo, and post. This is what I see as interconnected entities.

Interactive Contents & Animations​

victordibia.com. Seems like using MDX.

victordibia.com. Seems like using MDX.

bluewings.github.io. Confirmed using MDX.

bluewings.github.io. Confirmed using MDX.

pomb.us. Reacts to user scroll.

pomb.us. Reacts to user scroll.

qubit.donghwi.dev. This isn&#39;t a blog; it&#39;s a web app that demonstrates key concepts of Quantum Computers. But still interesting.

qubit.donghwi.dev. This isn't a blog; it's a web app that demonstrates key concepts of Quantum Computers. But still interesting.

Unorganized Graphing.​

Trust me, manually fiddling with tag sucks. Necessarily tagging posts and organizing posts into subdirectories resembles organizing your computer. However, you wouldn't want to do this if you have thousands of posts; also the border gets loose. What if the post has two properties? What becomes the primary tag and what becomes the secondary tag? Notable trends. Gen Z's don't organize folders anymore! Recent trends, I would say, are dumping everything into a mega folder and searching up things whenever needed. I also used to organize folders a lot more, but recently as searches like Spotlight and Alfred improve, I don't see the need to manage them all by hand, considering I always pull up those search commands to open a file. You don't need to manually organize all of the files when algorithms can read all the texts and organize them for you! Use algorithmic inspections to analyze how the posts may interrelate with each other properly.

Velog.io, the Korean version of dev.to, links relevant posts for every post.

Velog.io, the Korean version of dev.to, links relevant posts for every post.

Therefore, creating a cluster of posts, not classified by me, but bots and algorithms. WordPress also has this plugin. This is similar to backlinking, which most so-called digital brains such as [[Obsidian]] and Dendron are doing.

Example of backlinking from Dendron

Example of backlinking from Dendron

I agree with the importance of interlinking knowledge crumbles, but I can't entirely agree with their method. Manually linking posts are inconsistent and troublesome; it can only be done on a massive communal scale, like Wikipedia. You cannot apply the same logic to individual digital brain systems.

SEO and Open Graphs​

Precis Bots for Meta description​

I can apply the above technique for crosslinking to TL;DR bots for meta tag descriptions.

Automatic Open Graph Image Insertion​

For example, GitHub creates automatic open graph images with their metadata.

Example open graph image from GitHub

Example open graph image from GitHub

There are quite some services using this technique. GitHub wrote an excellent post on implementing this feature. I also tried to implement this on top of Ghost CMS, which I gave up after figuring out the Ghost Core Engine should support this. However, I have created a fork that I can extend later. http://og-image.cho.sh/

GitHub - anaclumos/cho-sh-og-image: Open Graph Image as a Service - generate cards for Twitter, Facebook, Slack, etc

Multilanguage​

Proper multilanguage support​

Automatic Langauge Detection. The baseline is to reduce the workload, that I write random things, and the algorithm will automatically organize corresponding data. hreflang tags and HTTP content negotiations. I found none of the services which use this trick properly (outside of megacorporate i18n products)

Translations​

At this point, I might write one English post and let Google Translate do the heavy lifting. Also, I can get contributions from GitHub.

While supporting multilanguage and translations, I want to put some 3D WebGL globe graphics. Remember infrastructure.aws in 2019? It used to show an awesome 3D graphic of AWS's global network. AWS Edge Cloud Continuum

I kind of want this back too. Meanwhile, this looks nice:

Also made some contributions...

Fonts and Emoji​

I want to go with the standard SF Pro series with a powerful new font Pretendard.

font-family:
ui-sans-serif,
-apple-system,
BlinkMacSystemFont,
'Apple SD Gothic Neo',
Pretendard,
system-ui -system-ui,
sans-serif,
'Apple Color Emoji';

However, I am exploring other options. I liked TossFace's bold attempt to infuse Korean values into the Japan-based emoji system for emoji. (lol, but they canceled it.)

Tossface Original Emojis

Tossface Original Emojis

Honestly, I want this back. They can use Unicode Private Use Area. But Toss is too lazy to do that considering they still didn't make the WOFF version Webfont. So I might use Twemoji.

Domains and Routes​

URL Structures​

Does URL Structure matter for SEO? I don't think so if the exhaustive domain list is provided through sitemap.xml. For SEO purposes (although I still doubt the effectiveness), automatically inserting the URLified titles at the end might help (like Notion)

Nameless routes​

Autolinks with alphanumeric IDs | GitHub Changelog I don't like naming routes like cho.sh/blog/how-to-make-apple-music-clone. What if I need to update the title and want to update the URL Structure? Changing URL structure affects SEO, so I would need to stick to the original domain even after changing the entity title to maintain the SEO. But then the title and URL would be inconsistent. Therefore, I would give the entity a UID that would be a hash for each interconnected entity. Maybe the randomized hash UID could be a color hex that could be the theme color for the entity? Emoji routes seem cool, aye? I would also need Web Share API since Chrome doesn't support copying Unicode URLs. Some candidates I am thinking of:

  • cho.sh/β™₯/e5732f/ko
  • cho.sh/🧠/e5732f/en

Also found that Twitter doesn&#39;t support Unicode URLs.

Also found that Twitter doesn't support Unicode URLs.

Miscellany​

Headline for Outdated Posts​

There should be a method to penalize old posts; they should exist in the database but wouldn't appear as much on the data chain. i.e., put a lifespan or "valid until" for posts.

홍민희 λΈ”λ‘œκ·Έ

홍민희 λΈ”λ‘œκ·Έ

Kat Huang

Kat Huang

Footnotes​

An excellent addition. But not necessary. If I ever have to make a footnote system, I want to make it hoverable, which namu.wiki did a great job. I do not want to make it jump down to the bottom and put a cringy ↩️ icon to link back.

ToC​

A nice addition. But not necessary.

Comments​

Will go with Giscus.

Heads Up!
  • I wrote this post more than 2 years ago.
  • That's enough time for things to change.
  • Possibly, I may not endorse the content anymore.
Google Latest Articles Instead

β€’ Counting...

I recently saw this Gist and Interactive Page, so I thought it would be cool to update it for the 2020s. This can serve as a visualization of how fast a modern computer is.

How to read this calendar​

Imagine 1 CPU cycle took 1 second. Compared to that, A modern 4.0 GHz CPU has a CPU cycle of 0.25 ns approx. That's 4,000,000,000 times difference. Now, imagine how that CPU would feel one second in real life.

ActionPhysical TimeCPU Time
1 CPU Cycle0.25ns1 second
L1 cache reference1ns4 seconds
Branch mispredict3ns12 seconds
L2 cache reference4ns16 seconds
Mutex lock17ns68 seconds
Send 2KB44ns2.93 minutes
Main memory reference100ns6.67 minutes
Compress 1KB2ΞΌs2.22 hours
Read 1MB from memory3ΞΌs3.33 hours
SSD random read16ΞΌs17.78 hours
Read 1MB from SSD49ΞΌs2.27 days
Round trip in the same data center500ΞΌs23.15 days
Read 1MB from the disk825ΞΌs38.20 days
Disk seek2ms92.60 days
Packet roundtrip from California to Seoul200ms25.35 years
OS virtualization reboot5s633 years
SCSI command timeout30s3,802 years
Hardware virtualization reboot40s5,070 years
Physical system reboot5m38,026 years
Heads Up!
  • I wrote this post more than 3 years ago.
  • That's enough time for things to change.
  • Possibly, I may not endorse the content anymore.
Google Latest Articles Instead

β€’ Counting...

OK β€” I admit. The title is slightly misleading. You are reading a technical post about converting any video into an ASCII Art text stream that one can play on the terminal. The text stream here is a subtitle file. You can use any video player or terminal program to parse and display subtitles to play the music video. But the playing part is out of the scope of this post. Still don't get it? Here's a demo:

Enable subtitles and wait for a couple of seconds. If the video errors out, check out the following screen recording:

My text streams use braille to represent pixels. And to display consecutive streams of texts paired with music playback, what would be more suitable than the subtitle format? Therefore, I aim to convert any video into a YouTube subtitle. The tech stack is:

  • OpenCV (C++ cv2) β€” used to convert video into frames
  • Python Image Library (Python 3 Pillow) β€” used to convert frames into ASCII art (braille)
  • Python Standard Library (sys, os, pathlib) β€” used to read and write files
  • ffmpeg (optional) β€” used to pack everything into a video

Open-sourced on GitHub: anaclumos/video-in-dots.

note

Technically, braille characters are not ASCII characters. They are Unicode, but let's not be too pedantic.

Design​

We need to first prove the concept (PoC) that the following technologies achieve our goal:

  1. Converting any image into a monochrome image
  2. Converting any monochrome image into ASCII art
  3. Converting any video into a series of images
  4. Converting any frames into a series of ASCII art and then packaging them into a subtitle file.
  5. (Figured out later) Compressing the subtitle files under a specific size.
  6. (Figured out later) Dithering the images to improve the quality of the ASCII art.

1. Converting images into monochrome images​

A monochrome image is an image with 1-bit depth, comprised of #000000 and #FFFFFF colors. Note that grayscale images are not monochrome images. Grayscale images also have a wide range of gray colors between #000000 and #FFFFFF. We can use these pure black and white colors to represent the raised and lowered dots of the braille characters, to visually distinguish borders and shapes. Therefore, we convert an image into a BW image and again convert that into a 1-bit depth image. One detail we should note is that subtitles are usually white, so we want the white pixel in the monochrome image to represent 1, the raised dot in braille.

As you can see in the right three images, you can represent any image with border and shape with pure black and white. DemonDeLuxe (Dominique Toussaint), CC BY-SA 3.0, via Wikimedia Commons.

As you can see in the right three images, you can represent any image with border and shape with pure black and white. DemonDeLuxe (Dominique Toussaint), CC BY-SA 3.0, via Wikimedia Commons.

The leftmost image has 256 shades of gray, and the right three images have only two shades of gray, represented in different monochrome conversion algorithms. I used the Floyd-Steinberg dithering algorithm in this project.

Converting the image​

There are many ways to convert an image into a monochrome image. However, this project only uses sRGB color space, so I used the CIE 1931 sRGB Luminance conversion algorithm. Wikipedia. Sounds fancy, but it's just a formula:

def grayscale(red: int, green: int, blue: int) -> int:
return int(0.2126 * red + 0.7152 * green + 0.0722 * blue)

red, green, and blue are the RGB values of the pixel, represented in integers from 0 to 255. If their sum goes over the hex_threshold, the pixel is white (1); otherwise, it is black. We can now run this code for every pixel. This grayscale code is for understanding the fundamentals. We will use Python PIL's convert function to convert the image into a monochrome image. This library also applies the Floyd-Steinberg dithering algorithm to the image.

resized_image_bw = resized_image.convert("1")  # apply dithering

2. Converting any monochrome image into arbitrary-sized ASCII arts​

The above sentence has three parts. Let's break them down.

  1. Converting any monochrome image into
  2. Arbitrary-sized
  3. ASCII arts

We figured out the first, so now let's explore the second.

Resizing images with PIL​

We can use the following code to resize an image in PIL:

def resize(image: Image.Image, width: int, height: int) -> Image.Image:
if height == 0:
height = int(im.height / im.width * width)
if height % braille_config.height != 0:
height = int(braille_config.height * (height // braille_config.height))
if width % braille_config.width != 0:
width = int(braille_config.width * (width // braille_config.width))
return image.resize((width, height))

I will use two-by-three braille characters, so I should slightly modify the height and width of the image to make it divisible by 2 and 3.

Converting the image​

Seeing the image will help you better understand. For example, let's say we have the left image (6 by 6). We would cut the image into two-by-three pieces and converted each piece into a braille character.

Left β†’ Right

Left β†’ Right

The key here is to find the correct braille character to represent the two-by-three piece. A straightforward approach is to map all the two-by-three pieces into an array, especially since two-by-three braille characters only have 64 different combinations. But we can do better by understanding how Unicode assigns the character codes.

Note: Braille Patterns from Wikipedia and Unicode Tables

Note: Braille Patterns from Wikipedia and Unicode Tables

To convert a two-by-three piece into a braille character, I made a simple util function. This code uses the above logic to resize the image, convert it into braille characters, and color them on the terminal. You can color the terminal output with \033[38;2;{};{};{}m{}\033[38;2;255;255;255m".format(r, g, b chr(output)). For more information, see ANSI Color Escape Code. If you want to try it out, here is the code: anaclumos/tools-image-to-braille

tip

This code uses an ANSI True Color profile with 16M colors. macOS Terminal will not support 16M color; it only supports 256. You can use iTerm2 or VS Code's integrated terminal to see the full color.

3. Converting any video into a series of images​

I planned to experiment with different dimensions with the same image, so I wanted to cache the images physically. I decided to use Python OpenCV to do this.

  1. Set basic configurations and variables.
  2. Read the video file.
  3. Create a directory to store the images.
  4. Loop through the video frames.

An example screenshot. I didn&#39;t use GPU acceleration, so it took about 19 minutes. I could&#39;ve optimized this, but this function runs only once for any video, so I didn&#39;t bother.

An example screenshot. I didn't use GPU acceleration, so it took about 19 minutes. I could've optimized this, but this function runs only once for any video, so I didn't bother.

4. Convert text streams into formalized subtitle files​

I already had the braille conversion tool from section 2; now, I needed to run this function for every cached image. I first tried to use the .srt (SubRip) format. The .srt file looks like this:

1
00:01:00,000 --> 00:02:00,000
This is an example
SubRip caption file.

The first line is the sequence number, and the second is the time range in the Start --> End format ( HH:mm:ss,SSS ). Lastly, the third line is the subtitle itself. I chose SubRip because it supported colored subtitles.

It turned out that SubRip&#39;s text stylings are non-standard. Source: en.wikipedia.org

It turned out that SubRip's text stylings are non-standard. Source: en.wikipedia.org

I made several SubRip files with different colors, but YouTube won't recognize the color; it turned out SubRip's color styling is nonstandard.

Types of subtitles YouTube supports​

No style info (markup) is recognized in SubRip.

No style info (markup) is recognized in SubRip.

Simple markups are supported in SAMI.

Simple markups are supported in SAMI.

YouTube docs shows the above table. I figured that SAMI files supported simple markups, so I used SAMI. (Oddly enough, I am very familiar with SAMI because .smi is the standard file for Korean subtitles.) Creating subtitles is already simple because it is appending text to a file in a specific format, which didn't require a lot of code change. Microsoft docs shows the structure of SAMI files.

<SAMI>
<HEAD>
<STYLE TYPE = "text/css">
<!--
/* P defines the basic style selector for closed caption paragraph text */
P {font-family:sans-serif; color:white;}
/* Source, Small, and Big define additional ID selectors for closed caption text */
#Source {color: orange; font-family: arial; font-size: 12pt;}
#Small {Name: SmallTxt; font-size: 8pt; color: yellow;}
#Big {Name: BigTxt; font-size: 12pt; color: magenta;}
/* ENUSCC and FRFRCC define language class selectors for closed caption text */
.ENUSCC {Name: 'English Captions'; lang: en-US; SAMIType: CC;}
.FRFRCC {Name: 'French Captions'; lang: fr-FR; SAMIType: CC;}
-->
</STYLE>
</HEAD>
<BODY>
<!<entity type="mdash"/>- The closed caption text displays at 1000 milliseconds. -->
<SYNC Start = 1000>
<!-- English closed captions -->
<P Class = ENUSCC ID = Source>Narrator
<P Class = ENUSCC>Great reason to visit Seattle, brought to you by two out-of-staters.
<!-- French closed captions -->
<P Class = FRFRCC ID = Source>Narrateur
<P Class = FRFRCC>Deux personnes ne venant la r&eacute;gion vous donnent de bonnes raisons de visiter Seattle.
</BODY>
</SAMI>

You can see it's just a simple XML file. Looking closely, you can also see how multi-language subtitles are handled in one SAMI file.

5. Compressing the text files​

You would never imagine compressing a text file...

You would never imagine compressing a text file...

I finally got my hands on the SAMI file to discover that the file was over 70MB. I couldn't find any official size limit for YouTube subtitles, but empirically, I discovered the file size limit was around 10MB. So I needed to compress the files.

I thought of three ways to compress the files:

  1. Reduce the width and height.
  2. Skip some frames.
  3. Use color stacks.

I already separated the configurations from the main code, so I could easily change the width, height, and frame rate. However, after many experiments, I figured that YouTube only supports 8β€”10 frames per second for subtitles, so I decided to skip some frames to reduce the file size.

class braille_config:
# 2 * 3 braille
base = 0x2800
width = 2
height = 3


class video_config:
width = 56
height = 24
frame_jump = 3 # jumps 3 frames

What I mean by "color stacks" is that I could push the same color to the stack and pop it when the color changes. Let's take a look at the original SAMI file:

<FONT color="#FFFFFF">β Ώ</FONT>
<FONT color="#FFFFFF">β Ώ</FONT>
<FONT color="#FFFFFF">β Ώ</FONT>
<FONT color="#FFFFFF">β Ώ</FONT>
<FONT color="#FFFFFF">β Ώ</FONT>
<FONT color="#FFFFFF">β Ώ</FONT>
<FONT color="#FFFFFF">β Ώ</FONT>
<FONT color="#FFFFFF">β Ώ</FONT>
<FONT color="#FFFFFF">β Ώ</FONT>
<FONT color="#FFFFFF">β Ώ</FONT>
<FONT color="#FFFFFF">β Ώ</FONT>
<FONT color="#FFFFFF">β Ώ</FONT>
<!-- Text Length: 371 -->

Although they are all the same color, the code appended the color tag for every character. Therefore, I can reduce the repetition by using color stacks:

<FONT color="#FFFFFF">β Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώ</FONT>
<!-- Text Length: 41. Reduced by 89% -->

It's not the complete-search-maximal-compression you usually see when Leetcoding, but it's still an excellent compression to make it under 10MB. This simple algorithm is especially good when you have black-and-white videos.

<SYNC Start=125><P Class=KOKRCC><FONT color="#FFFFFF">β Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώ</FONT><BR><FONT color="#FFFFFF">β Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώ</FONT><BR><FONT color="#FFFFFF">β Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώ</FONT><BR><FONT color="#FFFFFF">β Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώ</FONT><BR><FONT color="#FFFFFF">β Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώ</FONT><BR><FONT color="#FFFFFF">β Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώ</FONT><BR><FONT color="#FFFFFF">β Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώ</FONT><BR><FONT color="#FFFFFF">β Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώ</FONT><BR></SYNC>
<SYNC Start=250><P Class=KOKRCC><FONT color="#FFFFFF">β Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώ</FONT><BR><FONT color="#FFFFFF">β Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώ</FONT><BR><FONT color="#FFFFFF">β Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώ</FONT><BR><FONT color="#FFFFFF">β Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώ</FONT><BR><FONT color="#FFFFFF">β Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώ</FONT><BR><FONT color="#FFFFFF">β Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώ</FONT><BR><FONT color="#FFFFFF">β Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώ</FONT><BR><FONT color="#FFFFFF">β Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώ</FONT><BR></SYNC>
<SYNC Start=375><P Class=KOKRCC><FONT color="#FFFFFF">β Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώ</FONT><BR><FONT color="#FFFFFF">β Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώ</FONT><BR><FONT color="#FFFFFF">β Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώ</FONT><BR><FONT color="#FFFFFF">β Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώ</FONT><BR><FONT color="#FFFFFF">β Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώ</FONT><BR><FONT color="#FFFFFF">β Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώ</FONT><BR><FONT color="#FFFFFF">β Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώ</FONT><BR><FONT color="#FFFFFF">β Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώ</FONT><BR></SYNC>
<SYNC Start=500><P Class=KOKRCC><FONT color="#FFFFFF">β Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώ</FONT><BR><FONT color="#FFFFFF">β Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώ</FONT><BR><FONT color="#FFFFFF">β Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώ</FONT><BR><FONT color="#FFFFFF">β Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώ</FONT><BR><FONT color="#FFFFFF">β Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώ</FONT><BR><FONT color="#FFFFFF">β Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώ</FONT><BR><FONT color="#FFFFFF">β Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώ</FONT><BR><FONT color="#FFFFFF">β Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώ</FONT><BR></SYNC>
<SYNC Start=625><P Class=KOKRCC><FONT color="#FFFFFF">β Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώ</FONT><BR><FONT color="#FFFFFF">β Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώ</FONT><BR><FONT color="#FFFFFF">β Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώ</FONT><BR><FONT color="#FFFFFF">β Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώ</FONT><BR><FONT color="#FFFFFF">β Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώ</FONT><BR><FONT color="#FFFFFF">β Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώ</FONT><BR><FONT color="#FFFFFF">β Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώ</FONT><BR><FONT color="#FFFFFF">β Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώβ Ώ</FONT><BR></SYNC>

The file completed so far (No Dithering)

6. Ditherings​

I uploaded the file I created so far, but something was off. It seemed like a problem with how mobile devices handle braille characters. For example, a flat braille character appeared as a circle on computers but as an empty space on mobile devices. (Maybe legibility issues?) I needed extra modifications to resolve this issue: dithering.

Mobile devices show space instead of an empty circle. On the left, you can see almost no details, but on the right, you can see more gradients and details. The right one is the dithered version. Dithering especially shines when you have a black background or color gradients.

Mobile devices show space instead of an empty circle. On the left, you can see almost no details, but on the right, you can see more gradients and details. The right one is the dithered version. Dithering especially shines when you have a black background or color gradients.

The original image from the video. BTS Jimin

The original image from the video. BTS Jimin

Dithering is a technique to compensate for image quality loss when converting an image to a lower color depth by adding noise to the picture. Let me explain it with an example from Wikipedia:

The first image uses 16M colors, and the second and third use 256 colors. Dithered images use compressed color space, but you can feel the details and gradients. Image from en.wikipedia.org

The first image uses 16M colors, and the second and third use 256 colors. Dithered images use compressed color space, but you can feel the details and gradients. Image from en.wikipedia.org

Can you see the difference between the second and third images? They use 256 colors, but the third image has more details and gradients. In this way, we can adequately locate pixels to represent the image properly.

Dithering is also used in GIF image conversion, so most GIF images show many dotted patterns. Digital artifacts are also related to ditherings. You lose some details when you convert an image to a lower color depth. If the dithering happens often, you will get a picture with many artifacts. (Of course, digital artifacts have many other causes. See dithering and color banding for more information.)

Monochrome conversion also requires dithering because we are compressing the 16M color space into two colors. We can do this with the PIL library mentioned above.

resized_image_bw = resized_image.convert("1")  # apply dithering

Let us check this in action.

Can you perceive the difference, especially from 1:33?

Results​

I completed the project and uploaded the video to YouTube. I aim to study computer graphics and image processing more further. If you are interested in this topic, please check out my previous post: How Video Compression Works

Butter​

Fiesta​

Added 2021-07-09: Irregular Subtitle Specs?​

I tested the subtitle file on the YouTube app on iOS/iPadOS, and macOS Chrome, Firefox, and Safari. However, I heard that the subtitle file does not work on some devices, like the Android YouTube app and Windows Chrome. I have attached a screen recording of the subtitle file on macOS 11 Chrome 91. You can expect the subtitle file to work when using an Apple device.

I also made the screen recording in 8K to show crisp dots in motion πŸ˜‰

Heads Up!
  • I wrote this post more than 3 years ago.
  • That's enough time for things to change.
  • Possibly, I may not endorse the content anymore.
Google Latest Articles Instead

β€’ Counting...

Woowa Tech Camp 3rd Review

It may seem too late to write a review in 2021 for something that ended in August 2020, but with the 4th recruitment currently underway, I felt that if I didn't post it now, I might never do it. Most of the information available online only briefly touches upon the topic, so I will focus on the things I was curious about when I applied. Woowa Tech Camp is a programmer training course where participants work as interns at Woowa Brothers, a tech unicorn company that operates Baedal Minjok (λ°°λ‹¬μ˜λ―Όμ‘±, a.k.a Baemin), during the summer while studying development and programming close to real-world practice. 30 people are selected, with a competition ratio of approximately 43 to 1.

No shoes allowed - the so-called Sushi Bar Lounge. It&#39;s first-come, first-served due to the great view.

No shoes allowed - the so-called Sushi Bar Lounge. It's first-come, first-served due to the great view.

πŸ”‹ Selection Process​

Application​

Each question had a 700-character limit.

  • What do you think are the virtues of a developer, and in light of this, what aspects of yourself do you think make you suitable to work as a developer?
  • Please freely describe why you want to participate in Woowa Tech Camp.
  • If you have your own programming learning method outside of the curriculum, please describe it.
  • Describe an experience where you faced difficulties in the collaboration process and what efforts you made to overcome those difficulties.

2020 3rd Recruitment Banner

2020 3rd Recruitment Banner

1st Coding Test​

The problems were typical coding test questions. I remember solving them in JavaScript, and since I had practiced a lot for coding tests due to Programmers Summer Coding and Woowa Tech Camp at the time, the difficulty level wasn't too burdensome. There were a total of 4 problems with a time limit of 150 minutes.

2nd Coding Test​

It was a project to develop an admin tool that performs specific functions on the provided VS Code web platform. The basic boilerplate, build configuration, and CI/CD were pre-implemented, so it could be run quickly as described in the README, and we had to implement 3 core features on top of that. The use of external libraries was prohibited, and it had to be solved using only vanilla JS. The time limit was quite long at 4 hours, but I felt it was insufficient. I will omit the detailed problem-solving strategy as the staff said they couldn't share it... 😭

Interview​

Due to COVID-19, it was conducted online for 30 minutes. Since it wasn't a developer recruitment interview, rather than asking in-depth technical questions, they mainly asked about whether we had a solid foundation as programmers, were ready to learn at Woowa Tech Camp, and could be good camp members. In my case, I mentioned this technical blog in my application, and they asked detailed questions about one specific post.

Competition Ratio​

I was also curious about the competition ratio, and according to what I learned later, it was as follows:

  • Total applicants: 1300+ (43+ times)
  • Passed application and 1st coding test: 500+ (17+ times)
  • Passed 2nd coding test: 90 (3 times)
  • Passed interview and final acceptance: 30

The interview room at Woowa Brothers. Originally, the interviews were supposed to be held here, but due to the spread of COVID-19, they were conducted via Google Meet.

The interview room at Woowa Brothers. Originally, the interviews were supposed to be held here, but due to the spread of COVID-19, they were conducted via Google Meet.

🏫 Curriculum​

  1. Orientation Period (3 days)
  • Mini Project: Implementing a web server without Express and HTTP.
  • Study Keywords: Node.js, JS OOP, Asynchronous Programming, Async Cafe, HTTP Specification, HTTP Basics.
  • Our Team's GitHub
  • My Blog Post

ALT: Luther Hall 3
ALT: Luther Hall 2

Luther Hall 1

Small House where the orientation took place. Located in Jamsil.

  1. Baemin Mart Project implementing login (1 week)
  • Conditions
    • Use only vanilla JavaScript.
    • Implement authentication directly without using authentication systems like Passport.
    • Implement the DB directly using the file system without using a commercial DB.
  • Study Keywords: HTML, CSS, CSS Layout, Express.
  • Our Team's GitHub

ALT: Office

Office

ALT: Office

Office

Big House which became the main stage for Woowa Tech Camp. Located a 12-second walk from Mongchontoseong Station.

  1. Trello Project directly implementing a Kanban board (2 weeks)
  • Conditions
    • Use only vanilla JavaScript.
    • Directly configure and utilize Webpack.
    • Drag & drop must be implemented, but it should be done directly using event bubbling, event capture, and event delegation without the HTML Drag and Drop API.
  • Study Keywords: Webpack, ES Module, DOM API, Templating, Fetch-Promise pattern, JS Event Delegation, DBMS, MySQL, SQL Syntax.
  • Our Team's GitHub

ALT: Cafe

Cafe View

The cafe on the 18th floor of the Big House and the view of Olympic Park from the cafe.

  1. Bank Salad Project directly implementing a household ledger app (2 weeks)
  • Conditions
    • Use only vanilla JavaScript.
    • Directly implement a single-page application using vanilla JavaScript and the History API.
    • Implement CI/CD directly without using commercial solutions.
    • Implement OAuth.
    • Draw graphs using SVG, canvas, etc.
  • Study Keywords: Observer Pattern, ERD, OAuth, Passport, State Management, Immutability, Transactions, Shell Scripts, CI/CD, CSS Animations & Optimizations (requestAnimationFrame & requestIdleCallback), SVG, Canvas.
  • Our Team's GitHub

Pair Programming

We occasionally did pair programming. The code you see now is...

  1. B Mart Project directly implementing Baedal Minjok's B Mart (3 weeks)
  • Conditions
    • Use Vanilla React.
    • Utilize AWS VPC.
    • Utilize S3 image storage.
    • Utilize the Elastic Search, Logstash, Kibana (ELK) combination.
  • Study Keywords: React Hooks, AWS VPC, React Router, React Context API, React useReducer, AWS IAM, AWS S3, React Test Codes (Jest, Enzyme, ...), Elastic Search, Logstash, Kibana, ELK.
  • Our Team's GitHub

During the 4th project, social distancing was raised to level 2, so we had to proceed remotely.

During the 4th project, social distancing was raised to level 2, so we had to proceed remotely.

✨ The Good Parts​

First of all, an activity stipend of about 1.5 million won per month and activity equipment (MacBook Pro πŸ’» and monitor πŸ–₯) were provided.

ALT: MacBook

Monitor

A 2019 MacBook Pro 16-inch i9 high-end model was loaned to everyone. 16GB RAM, 1TB SSD, Radeon 5500M 4GB GPU. As of the 2020 camp, it was the highest spec MacBook Pro that could be ordered without CTO. One monitor was provided for every two people, for a total of 15 monitors. The monitors were ThinkVision QHD monitors. I thought there wouldn't be enough monitors, but there were plenty.

πŸ‘¨β€πŸ’» What the heck is (for beginners) good code?​

// Load activity log to right sidebar  
async function addActivityLogToActivityLogList() {
let activityLogList = document.getElementById('activity-log-list')
activityLogList.classList.add('activityLog')
activityLogList.innerHTML = ''
let userList = await api.User().getAllUsers()
userList.reverse()
console.log('There are currently [', userList.length, '] users.')
userList.forEach((user) => {
let activityLog = document.createElement('li')
activityLog.classList.add('activityLog')
let date = new Date(moment(user.created_at).format('YYYY-MM-DD HH:mm:ss'))
activityLog.innerText = user.userId + ' joined on ' + date
activityLogList.appendChild(activityLog)
})
}

The original code can be found here.

Uh - this is not in a state to be reviewed. Who wrote this? Let's have a talk.

This was the feedback on my code that was randomly selected to be shown on the screen during the code review session on Friday afternoon, July 25th, at the end of the 2nd project. At the time, I thought I had overcome the extreme time pressure quite well and created a well-functioning page. However, receiving such direct criticism left me in a state of shock. It's not well conveyed in writing, but the atmosphere was truly frozen. On the train ride home that day, I had countless thoughts. After calming my mind for a moment and thinking about it, I realized that if it had been a camp where they just said, "Okay, okay~ We all did well and worked hard~", it wouldn't have been a good camp. Just as a good workbook should have problems that you get wrong. So I decided to make the most of what I was relatively good at and absorb as much as I could learn during the remaining month. Juniors interested in studying programming often hear things like "clean code, good patterns". However, the problem is that from a beginner's perspective, hearing these things mechanically too often leads to unconsciously repeating them as if memorizing them, without having a realistic sense of what level is actually good. Looking back at the code above,

  • The code is doing two tasks simultaneously. β‘  Fetching information and β‘‘ displaying information. In this case, the code becomes highly dependent. High dependency can lead to major surgery when part of the code needs to be replaced later.
  • Overall, the logic and view are mixed in the file, and readability is poor. Based on the advice, from the 3rd project onwards, I paid a lot of attention to these development patterns. I did a mini-project with a part of the 3rd project, and you can probably get a sense of what it's like.
  • Creating a Calendar with Vanilla JS

πŸ›· Dunning-Kruger Sledding​

It may be a bit clichΓ©, but I was able to directly experience the peak of ignorance. Of course, I never thought I knew everything completely, but since I had experience with various JavaScript projects, I dared to think, "Of course I'll have to work hard, but wouldn't I be able to keep up to some extent without too much difficulty?"

I don&#39;t know anything about JS. I&#39;m just a talking potato. You know what I mean?

I don't know anything about JS. I'm just a talking potato. You know what I mean?

Naturally, Woowa Tech Camp was extremely challenging. The original goal of the curriculum was to impose constraints on each project and then resolve the regrets caused by those constraints in the next project. For example, after implementing authentication without using Passport, that thirst would be quenched in the next project by using Passport. However, on the flip side, this process occurred every 1-2 weeks, meaning that as soon as you barely grasped the previous technology, you had to immediately move on to the next technology and experience a steep learning curve again. It felt like I experienced the Dunning-Kruger sledding at Woowa Tech Camp. Since I wasn't proficient enough to freely handle JavaScript, I really had to work hard to keep up.

🌎 What does it mean to know in the internet age?​

I also pondered a lot about what it means to know something in an age where search exists. If I limit this to programming, I think I found a bit of an answer. It's the concept of GSPH, which stands for Googling Session Per Hour. A Googling Session refers to a deep search task lasting more than 5 minutes. For example, if you couldn't remember the name of a JavaScript property function and completed the search in 2 minutes, it wouldn't count as a Googling Session, but if OAuth doesn't come to mind easily and you have to look at the documentation for 10 minutes, it would count as a Googling Session.

Search for it... but every time?

Search for it... but every time?

When doing a task, if the Googling Sessions per hour are (roughly) 3 or less, it seems you can say you know that concept. In other words, doing short searches in the middle of a task doesn't directly mean you don't know the concept. However, if you have to look up every detail of the task one by one, it means you still need more study.

πŸ‘Ύ Library β‰  Alien Technology​

Sometimes frameworks and libraries are treated like alien technology. Of course, well-known frameworks and libraries are collections of proven, efficient code, but thinking of them as unapproachable alien technology and relying on libraries for all considerations can be a bit dangerous.

A developer doing npm i

A developer doing npm i

In particular, the underlying technology of web libraries is Plain JavaScript that we can also use. Rather than blindly relying on external libraries, it was continuously emphasized throughout the camp that we need to know how that library works and what potential risk factors exist. In other words, in case of necessity, you should study carefully enough to be able to implement a library in a similar form.

Libraries are Terran πŸ§‘β€πŸ”§ technology, not Protoss πŸ‘½ technology.

One example is the left-pad incident that occurred in 2016. An 11-line library called left-pad was removed from npm, and as a result, the dependency chain collapsed like dominoes, rendering the babel transpiler unusable. If you think about it, wasn't this problem also caused by excessive reliance on simple code that could be written quickly?

From the perspective of a hobby developer, you might think, "Huh? Babel is a really reliable library used by hundreds of thousands of people. I should focus on the safety of my own code instead of worrying about that." However, for a company that suffers enormous financial losses even if the service is down for just 30 minutes, this consideration is essential. In other words, libraries are not unknowable alien technology, nor are they something we should pray to, and we should keep in mind that they are also services that can be damaged at any time.

πŸ₯³ Fun Experiences​

🧩 Crawling Baemin's Image Server​

When creating the final B Mart service, there was a time when we needed a huge amount of Baemin's B Mart data. We needed photos to put in the photo slots to give it an app-like feel. Using my previous experience, I (with permission from the staff) scraped the image resources on Baemin's server.

A photo sneakily taken from Baemin&#39;s B Mart image server

A photo sneakily taken from Baemin's B Mart image server

Strictly speaking, the images exist in the form of CDN open web, so it's not hacking Baemin's server. The problem is that these endpoints and the following image addresses are hidden in an unrecognizable way.

http://CDNdomain.baemin.com/some/thing/1abcde23-very-long-alphanumeric-address.jpg

The final image CDN URI looks roughly like this, and when accessed, the image appears. It wasn't a shallow level of crawling where you simply open the B Mart web view and use CSS selectors, nor did we receive internal resource server data by sharing it with Woowa Tech Camp, so it took quite a bit of effort. To briefly share, I intercepted the iOS Baemin app communication to find out the endpoints and image addresses, and through a bit of CTF, I was able to figure out the image address list. I scraped about 1,000 images, icons, sound effects, etc. from that image server and shared them in a private repository for other Woowa Tech Camp participants to use.

Crawled result

Crawled result

🏒 Whirlwind Corporate Night History​

We were able to hear behind-the-stories of Korean companies in between. Stories like how a certain game company doesn't use RDB much and uses binary dumps because hundreds of thousands of items are mass-produced, how an incident occurred where someone manipulated the game DB fields to duplicate items worth hundreds of millions of won, so DB access rights management became very strict afterward, how developers at a certain accommodation company could access the personal information of all members, so for a while, developers freely accessed celebrity member information... For me, who is very interested in the corporate ecosystem, these were truly fascinating stories.

The day the neighboring N company&#39;s C Foundation&#39;s B Camp exploded

The day the neighboring N company's C Foundation's B Camp exploded

⚑️ Synergy x Synergy = Synergy3​

Nevertheless, I think the greatest advantage was meeting other Woowa Tech Camp participants. I kind of understood what it means when people say the best welfare is great colleagues. Most notably, I want to talk about jhaemin. At Woowa Tech Camp, when starting each project, they provide design drafts and planning documents. However, those contents are just recommendations, and the actual implementation can be done freely. In other words, improving the design to enhance usability and aesthetics is entirely up to the camp participants. At first, I thought this design was something that must be followed, but that wasn't the case. In the end, as if hinting that good colleagues are those who find and do the work, everything was freely open. Whether it was improving the design, adding features, or conversely, deleting something, great power was given and great responsibility was taken. In terms of the design mindset of a front-end developer, I was greatly influenced by my fellow camp participant jhaemin. I learned a lot by watching him quickly create usable web apps with his own solid design system. If you directly look at the 2 sites that truly shocked me, you'll probably understand what I mean.

Greatly influenced, I also tried improving the design from the 3rd Bank Salad project. This is how I completely modified the design ↓

The provided household ledger design draft. This draft is just a starting point, and it&#39;s entirely up to the camp participants to unleash their creativity from here.

The provided household ledger design draft. This draft is just a starting point, and it's entirely up to the camp participants to unleash their creativity from here.

I realized design is a more fun task than I thought (quite satisfied)

I realized design is a more fun task than I thought (quite satisfied)

In my opinion, the advantages of my design are β‘  it makes full use of the wide screen by dividing and arranging the screen elements into 3 columns, and β‘‘ the activity log window on the right operates independently, so the contents of the right window are maintained even when freely moving to the calendar, statistics, and payment method management. Kind of like Slack?

In addition, there was an overflow of things to learn, such as naamoonoo who created a pseudo-React with vanilla JavaScript, pigrabbit who completed Elastic Search over the weekend, dnacu who handled React as comfortably as breathing, younho9 who quickly implemented the SPA structure and singleton pattern with just JS, 0407chan who systematically implemented the data access strategy, Jenny who finished the design structure overnight, and more.


🎬 Miscellaneous and Conclusion​

  • I tried in-depth collaboration with Git and GitHub for the first time. When doing 1-person development, there aren't many occasions to properly utilize Git's branch and checkout features. In terms of Git collaboration, I think I learned really well with GSPH < 3.
  • The lectures in between were really good. In the lectures held every Wednesday, interesting aspects of the development and operation of Baemin's services were shown. In particular, the lecture by developer Kim Min-tae was truly impressive.
  • It was great to indirectly experience company life. To be able to experience company life at the age of 21!
  • The kind hyungs and noonas who taught us had a deep understanding. I felt that they were considerate of me. There were also many recreational activities in between, but it was a shame that the recreational activities were reduced due to COVID-19.
  • It was a truly precious experience and I gained a lot of new teachings. If my fundamentals were a bit more outstanding, I could have studied much deeper contents, so I have a bit of regret for not being able to do that. I think it has become a really good asset for my future journey.
  • Of course, I won't be perfect right away. I'm also feeling a bit of a gap between ideals and reality these days. Still, if we align our compass with our ideals and walk forward, I have a bit of faith that we will reach them someday 🧭
πŸ“š More Resources

The End!

The End!
Heads Up!
  • I wrote this post more than 3 years ago.
  • That's enough time for things to change.
  • Possibly, I may not endorse the content anymore.
Google Latest Articles Instead

β€’ Counting...
note

This document is machine-translated. It is readable, but may contain awkward phrasing. Please let me know if you find any errors!

Recently, I applied for an on-campus technology start-up club, which has a very competitive rate (with a pass rate of around 5%), and I had the following question.

Here, a meme means a short video that makes you laugh.

Here, a meme means a short video that makes you laugh.

It was a club I had always wanted to join, so I had a lot of trouble. This is because everyone has different interests; what is funny to one group can be offensive to another.

Then, I thought, "What if I could create a service that recommends memes based on choices?" Because it is not a responsive recommendation system (a system that dynamically changes recommendations based on newly accumulated data), the technical complexity seemed not to be that high. But I just had a weekend, so I decided to make it quick.

🎬 Designing the System​

First of all, I have listed the videos that I enjoyed watching. (I thought it would be okay to use YouTube instead of TikTok, Twitter, and Instagram.)

I could easily find the videos I thought were funny by searching for the keyword youtu in a group chat room.

I could easily find the videos I thought were funny by searching for the keyword youtu in a group chat room.

I've since broken it down by category on the Notion page. For example, it could be divided into Music, Movies, Games, Coding, and General memes.

As I envisioned the selection-based recommendation test, there seemed to be two approaches. One is a system that gives weight to each answer to a question, calculates a final score, and recommends results. The other is to set up the entire scenario tree and recommend results according to the combination of options. The popular MBTI results analysis uses the first score-based recommendation system. However, I used the second scenario tree-based recommendation system. Here's why:

The score flag system was too complex to configure.

  • MBTI has a simple score flag. Since there are only four flags: E/I, N/S, T/F, and J/P, it is relatively convenient to manage the score status.
  • I immediately had five categories: Music, Movies, Games, Coding, and General, and each type had various subcategories, so it wasn't easy to pinpoint the flags.
  • For example, I cannot recommend LoL meme videos simply because the game flag score is high. Because you may not know the rules of the LoL, or you may not empathize with the laughter. In other words, to fix this, you need to either add a roll score flag or set a separate "favorite game" flag.
  • In this case, state management becomes very complicated, and I didn't want to increase the technical difficulty.
  • However, the design difficulty rises higher than the technical difficulty. Above all, I felt it was tough to elaborately plan which score range should be recommended for each flag. In other words, it is difficult to make a perfectly-fit meme recommendation based on the score.

I wanted to make checking all endings possible.

  • In a choice-based game, you may want to see a different ending by changing only one final decision (especially these meme recommendations that are not just MBTI).
  • But score-based systems usually require the test to be restarted from scratch and more engineering to add optional 'undo' actions.
  • If you use the scenario tree base, this part becomes more convenient. This is because I need to navigate to the Parent Node.
  • As will be described later, in my case, because I used Next Link, just going back in the browser becomes the undo action.

I wanted to include a curated choice vocabulary rather than a typical optional vocabulary.

  • In a score-based system, you only ask questions and answers in a general form. That is, you cannot ask follow-up questions.
  • I tried to use the Scenario Tree to make the question and the answer exactly fit each other, giving you an everyday experience.
  • Also, as a result, this system is intended to be "attached to the club application".
  • Even if you recommend a funny video, if you can't remember my name and only remember the Video, it serves no purpose!

In the process, I wanted to give the feeling that I want to join this club!!!

In the process, I wanted to give the feeling that I want to join this club!!!

πŸ₯ž Choosing the Stack​

I didn't worry too much about the front end. Since I recently fell in love with TypeScript Next, it was a natural choice for me, and knowing Vercel's compatibility with Next, I decided to host it on Vercel. For the style, I used the styled component.

Where to store the data was a problem. Since the data about Meme is not dynamic and there is no need to store user information, I decided to hard-code all data modularly instead of using DB or DBaaS separately. You can see the hardcoded data here.

The backend likewise didn't need to be configured. So I decided to make it serverless.

πŸ’» Dev​

It can be summarized as follows:

  1. Each Question has a unique link for each question, and each Video has a special link, and when you select an option, you access that link.
  2. Each option is in an Object with a 'next question' or 'result video' field, and an interface is constructed based on this.
  3. Use getStaticProps and getStaticPaths to make responsiveness super fast.

Each Question and Video has the following URI structure:

https://smile.cho.sh/question/[id]
https://smile.cho.sh/video/[id]

πŸ’¬ 2. Type Definitions​

To take advantage of TypeScript, I have predefined the type structure.

export type question = {
id: number
contents: string[]
answers: Answer[]
}

export type answer = {
id: number
content: string
result: number | null
nextQuestion: number | null
}

export type Video = {
id: number
title: string
uploader: string
desc: string
youtubeId: string
}

In type Answer, result, and nextQuestion can have only one value. Links are created based on this. With these two separate fields, I was able to avoid the mistake of confusing question and video. I also wanted to avoid unintentional 'null' errors by defaulting to '0' when writing data. So you can check the traces at /question/0.

πŸš€ 3. Making it Blazing Fast​

For example, pages corresponding to /question/[id] are statically created at build time through the following code:

export const getStaticPaths: GetStaticPaths = async () => {
const paths = questionData.map((question) => ({
params: { id: question.id.toString() },
}))
return { paths, fallback: false }
}

export const getStaticProps: GetStaticProps = async ({ params }) => {
try {
const id = params?.id
const item = questionData.find((data) => data.id === Number(id))
return { props: { item } }
} catch (err) {
return { props: { errors: err.message } }
}
}

Here, getStaticPaths sets a list of path of pages to be created statically, and getStaticProps retrieves the question data matching path and sends it to the React App in the form of props. This allows you to statically pre-generate all your questions and video pages. Furthermore, if you use a combination of <Link> of next/link, you can prefetch pages, making interactions very fast. (Literally, I don't see any loading or unloading in the browser favicon!)

πŸ’… 4. Styling and Tidying up​

In other words, creating the intro and ending pages and adding the missing details. Next, I worked on handling different types of Views for exceptional cases. For instance, if the user answers that they do not know all the questions, the following results are displayed. While other views 'embed' the Video right away, only in this case was it shown in the form of a button.

Fallback Video

Check it yourself what kind of Video it is!

✨ Results​

  • smile.cho.sh
  • Try it yourself, and let us know what you think!
  • I finally got into the club!

πŸ”₯ Postmortem​

  • There seems to be a good balance between the design and technical difficulties.
  • I am happy that I learned the map function of ES6+ properly!
  • I built a good understanding of how to use Static TypeScript Next.
  • It's a bit disappointing that I ignored Favicon, Metadata, SEO, etc., but I don't think I'll add them separately because it doesn't require search or SNS inflow.
  • Grinding over the weekend delivers the product... πŸ˜‰
Heads Up!
  • I wrote this post more than 3 years ago.
  • That's enough time for things to change.
  • Possibly, I may not endorse the content anymore.
Google Latest Articles Instead

β€’ Counting...

Let's create a calendar with JavaScript but without any external library. This project is based on my previous internship at Woowa Bros, a unicorn food-delivery startup in Seoul.

Show me the code first.​

GitHub - anaclumos/calendar.js: Vanilla JS Calendar

Show me the demo first.​

Goals​

  • Use functional programming* instead of Object-oriented programming.
  • No DOM manipulation after initializing. This philosophy is based on the React framework (or any other Single Page Application libraries.) DOM manipulation can be highly confusing if 30 different codes are trying to edit the same thing. So instead, we will rerender the components if we need to edit something.

πŸ’‘

**Don't fix it. Buy a new one. **β€” Rerendering in Front-end

Stack​

  • JavaScript Date Object
  • CSS display: grid will be useful.

Basic Idea​

  • There will be a global displayDate object that represents the displaying month.
  • navigator.js will change this displayDate object, and trigger renderCalendar() function with displayDate as an argument.
  • renderCalendar() will rerender with the calendar.

Before anything, prettier!​

Prettier helps write clean and neat codes with automatic formatting.

// `.prettierrc`
{
"semi": false,
"singleQuote": true,
"arrowParens": "always",
"tabWidth": 2,
"useTabs": false,
"printWidth": 60,
"trailingComma": "es5",
"endOfLine": "lf",
"bracketSpacing": true
}

Now throw in some HTML.​

<!-- `index.html` -->
<!doctype html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>JavaScript Calendar</title>
</head>
<body>
<div id="navigator"></div>
<div id="calendar"></div>
</body>
<script>
// code for rendering
</script>
</html>

I generated this boilerplate with VS Code.

Then trick VS Code to read JS String as HTML Tags.​

Since we use Vanilla JavaScript, we don't have access to fancy JSX-style highlighting. Instead, our generated HTML codes will live inside JavaScript String, which doesn't have syntax highlighting or Intellisense. Therefore, let's create a function that tricks VS Code to recognize JavaScript String as HTML Tags.

// `util.js`
const html = (s, ...args) => s.map((ss, i) => `${ss}${args[i] || ''}`).join('')

to be added - screenshot of highlighting

calendar.js​

Then we connect calendar.js and index.html.

<!-- `index.html` -->
<script src="calendar.js"></script>

Defining constants will help before writing renderCalendar().

// `calendar.js`
const NUMBER_OF_DAYS_IN_WEEK = 7
const NAME_OF_DAYS = ['sun', 'mon', 'tue', 'wed', 'thu', 'fri', 'sat']
const LONG_NAME_OF_DAYS = ['Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday']
const ACTUAL_TODAY = new Date()

Note that we use NUMBER_OF_DAYS_IN_WEEK to remove magic numbers inside our code. It can be tough to decipher if we meet a random 7 during a code. Instead, using such constant increases the maintainability of the code.

for (let d = 0; d < NUMBER_OF_DAYS_IN_WEEK; d++) {
// do something
}

If there was a random 7, who knows if we are iterating through the number of Harry Potter Books?

This code block will be the baseline for our calendar generation. We will pass in the HTML target and day object. today represents the month being displayed. Thetoday object will come from navigator,js. Navigator will return the actual date for the current month and return on the first day of the month for other months.

// `calendar.js`
const renderCalendar = ($target, today) => {
let html = getCalendarHTML(today)
// minify html
html = html.replace(/\n/g, '')
// replace multiple spaces with single space
html = html.replace(/\s{2,}/g, ' ')
$target.innerHTML = html
}

Now, we need four different Date objects for displaying the calendar. We could've used fewer objects, but it is up to the implementation. I think reducing date objects here would cause a minimal performance increase but spike the understandability of the code, so using four objects seems like a fair middle ground.

Four Date objects we need​

  • The last day of last month: needed to highlight last month's weekend and display the correct date for last month's row.
  • The first day of this month: needed to highlight this month's weekend and figure out how many days of last month we need to render.
  • The last day of this month: needed for rendering this month with iteration.
  • The first day of next month: needed to highlight the weekend of next month.

I made a function that would process these four dates when inputted a specific Date.

// `calendar.js`
const processDate = (day) => {
const month = day.getMonth()
const year = day.getFullYear()
return {
lastMonthLastDate: new Date(year, month, 0),
thisMonthFirstDate: new Date(year, month, 1),
thisMonthLastDate: new Date(year, month + 1, 0),
nextMonthFirstDate: new Date(year, month + 1, 1),
}
}

I created a function that binds these 4 dates into an object and returns them. It receives a Date object as argument, and in this calendar, a Date object corresponding to "today" will be inserted.

const processDate = (day) => {
const date = day.getDate()
const month = day.getMonth()
const year = day.getFullYear()
return {
lastMonthLastDate: new Date(year, month, 0),
thisMonthFirstDate: new Date(year, month, 1),
thisMonthLastDate: new Date(year, month + 1, 0),
nextMonthFirstDate: new Date(year, month + 1, 1),
}
}

2-2. Create getCalendarHTML​

Now let's draw a calendar in earnest. I created a getCalendarHTML function that returns the contents of the calendar as HTML. The getCalendarHTML function is a bit bulky, so I framed it first.

const getCalendarHTML = () => {
let today = new Date()
let { lastMonthLastDate, thisMonthFirstDate, thisMonthLastDate, nextMonthFirstDate } = processDate(today)
let calendarContents = []

// ...

return calendarContents.join('')
}

Add a line at the top to display the day of the week. Use the const we added at the beginning to remove the magic number.

for (let d = 0; d < NUMBER_OF_DAYS_IN_WEEK; d++) {
calendarContents.push(html`<div class="${NAME_OF_DAYS[d]} calendar-cell">${NAME_OF_DAYS[d]}</div>`)
}

Then let's draw the last month. For example, if the first day of this month is Wednesday, the role of drawing the last month corresponding to Sunday, Monday, and Tuesday. For days corresponding to Sunday, sun HTML Class is added.

for (let d = 0; d < thisMonthFirstDate.getDay(); d++) {
calendarContents.push(
html`<div
class="
${d % 7 === 0 ? 'sun' : ''}
calendar-cell
past-month
"
>
${lastMonthLastDate.getMonth() + 1}/${lastMonthLastDate.getDate() - thisMonthFirstDate.getDay() + d}
</div>`
)
}

Let's draw this month on a similar principle. For today's day, today HTML Class and "today" String are added. Similarly, sat and sun HTML Class are added for Saturday and Sunday respectively.

for (let d = 0; d < thisMonthLastDate.getDate(); d++) {
calendarContents.push(
html`<div
class="
${today.getDate() === d + 1 ? 'today' : ''}
${(thisMonthFirstDate.getDay() + d) % 7 === 0 ? 'sun' : ''}
${(thisMonthFirstDate.getDay() + d) % 7 === 6 ? 'sat' : ''}
calendar-cell
this-month
"
>
${d + 1} ${today.getDate() === d + 1 ? ' today' : ''}
</div>`
)
}

Finally, draw the days of the next month in the remaining cells.

let nextMonthDaysToRender = 7 - (calendarContents.length % 7)

for (let d = 0; d < nextMonthDaysToRender; d++) {
calendarContents.push(
html`<div
class="
${(nextMonthFirstDate.getDay() + d) % 7 === 6 ? 'sat' : ''}
calendar-cell
next-month
"
>
${nextMonthFirstDate.getMonth() + 1}/${d + 1}
</div>`
)
}

3. Writing CSS​

3-1. Using display: grid​

If you use display: grid on an element, you can neatly put its child elements into a grid (table).

  • grid-template-columns: Information on how to arrange columns. 1fr means 1 fraction, and since it is written 7 times in total, 7 columns with the same width are created.
  • grid-template-rows: You can define the size of rows. Here, there is only one 3rem, so the first row is defined as 3rem.
  • grid-auto-rows: You can define the size of the next row. Here, it says 6rem, so all subsequent rows have a row size of 6rem.

Below we define additional styles.

#App {
/* grid */
display: grid;
grid-template-columns: 1fr 1fr 1fr 1fr 1fr 1fr 1fr;
grid-template-rows: 3rem;
grid-auto-rows: 6rem;

/* style */
font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', sans-serif;
border: 1px solid black;
max-width: 720px;
margin-left: auto;
margin-right: auto;
}
  • When drawing a table, you want to wrap all cells with a uniform border, just like Excel, but there are cases where only the outermost cells have thin lines. In terms of HTML, borders are applied only to th and td.
  • I prefer to apply this "n px to all cell borders, n px to table borders" border. This will give you a uniform border of 2n px overall.
.calendar-cell {
border: 1px solid black;
padding: 0.5rem;
}

3-2. ν† μš”μΌκ³Ό μΌμš”μΌ, 였늘 ν•˜μ΄λΌμ΄νŒ…β€‹

.past-month,
.next-month {
color: gray;
}

.sun {
color: red;
}

.sat {
color: blue;
}

.past-month.sun {
color: pink;
}

.next-month.sat {
color: lightblue;
}

.today {
color: #e5732f;
}

I felt that​

  • At first, I got a little lost when connecting with JS to "initialize" the calendar. This is because you connected renderCalendar to the top of body. Since the DOM is executed sequentially, if you connect it to the top, if the #App div does not appear, renderCalendar will be executed and the DOM element will not be found.
  • Also, I couldn't remember how to render codes that can be expressed in JS associations on the screen. It was simply to querySelect the app in js, which plays the role of index.js, and then insert it into innerHTML.
  • In the Woowa Tech Camp project, magic numbers were used. This time, the magic number was removed to improve readability.
  • The Woowa Techcamp project was written in Object Oriented JavaScript (more precisely, Singleton pattern), but this time it was written in small functions.
  • Tried to use ES6+ syntax. For example, I used it by putting variables in backticks or destructuring the return data of processDate. Also let and const were mainly used.
  • I regret that getCalendarHTML could not have been written a little shorter.
Heads Up!
  • I wrote this post more than 3 years ago.
  • That's enough time for things to change.
  • Possibly, I may not endorse the content anymore.
Google Latest Articles Instead

β€’ Counting...

Recently I came across The Noun Project's API. With the combination of the download function I created in the past, you could download hundreds of icons within seconds.

Beware​

Do not use this tool to pirate others' intellectual property. Beware of what you are doing with this code and The Noun Project's API. Read the license and API documents thoroughly. Unauthorized use cases are listed here. This entire post & codes are MIT licensed.

Importing libraries​

import requests
import os
from tqdm import tqdm
from requests_oauthlib import OAuth1

You will need to pip3 download if you do not have these libraries.

The download function​

def download(url, pathname):
if not os.path.isdir(pathname):
os.makedirs(pathname)
response = requests.get(url, stream=True)
file_size = int(response.headers.get("Content-Length", 0))
filename = os.path.join(pathname, url.split("/")[-1])
if filename.find("?") > 0:
filename = filename.split("?")[0]
progress = tqdm(
response.iter_content(256),
f"Downloading {filename}",
total=file_size,
unit="B",
unit_scale=True,
unit_divisor=1024,
)
with open(filename, "wb") as f:
for data in progress:
f.write(data)
progress.update(len(data))

This code fetches the URL and saves it as a file at pathname.

The Noun Project API​

# ---

DOWNLOAD_ITERATION = 3
# Returns 50 icons per iteration.
# Three iteration equals 150 icons.

SEARCH_KEY = "tree" # Search Term
SAVE_LOCATION = "./icons"
auth = OAuth1("API_KEY", "API_SECRET")

# ---

for iteration in range(DOWNLOAD_ITERATION):
endpoint = (
"http://api.thenounproject.com/icons/"
+ SEARCH_KEY
+ "?offset="
+ str(iteration * 50)
)
response = requests.get(endpoint, auth=auth).json()
for icon in response["icons"]:
download(icon["preview_url"], SAVE_LOCATION)

For more advanced uses, please visit this docs page. In addition, you can get your API Key and API secret by registering your app here.

Result​

I have run some benchmarks and found that downloading ~5k icons shouldn&#39;t be a problem.

I have run some benchmarks and found that downloading ~5k icons shouldn't be a problem.

However, The Noun Project's API has a call limit so beware of that.

Heads Up!
  • I wrote this post more than 4 years ago.
  • That's enough time for things to change.
  • Possibly, I may not endorse the content anymore.
Google Latest Articles Instead

β€’ Counting...
  • This is a personal analysis and there is no way to verify this article until YouTube releases the source code. Please keep that in mind when reading this article.

YouTube, a strange video platform where a Korean singer sings a Korean song on a Korean broadcast, but there are only Korean comments

YouTube, a strange video platform where a Korean singer sings a Korean song on a Korean broadcast, but there are only Korean comments

YouTube does not have the ability to view comments based on language. This article tells the story of the development of "YouTube Comment Language Filter" from its alpha, beta, and general release versions.

0. The idea​

One day, out of frustration and curiosity, I decided to take a look at how YouTube's comment HTML is wrapped up. After entering various values in the console, I created a javascript file that removes all comments that contain Korean characters. My past experience with a project called HangulBreak.py about Hangul and Unicode conventions was helpful.

🧼 Initial filter script. The idea is to remove YouTube comments if they don't contain Hangul. It's also stored in the first commit of the GitHub repository.

var commentList = document.getElementsByTagName('ytd-comment-thread-renderer')
var comment

function containsUnicode(str, startUnicode, endUnicode) {
for (var i = 0; i < str.length; i++) {
if (startUnicode.charCodeAt(0) <= str.charCodeAt(i) && str.charCodeAt(i) <= endUnicode.charCodeAt(0)) {
return true
}
}
return false
}

for (var x = 0; x < commentList.length; x++) {
comment = commentList[x].childNodes[1].childNodes[1].childNodes[3].childNodes[3].innerText
if (containsUnicode(comment, 'κ°€', '힣')) {
// comment = "ν•œκΈ€μž„ \n" + comment;
} else {
// console.log(typeof commentList[x]);
commentList[x].parentNode.removeChild(commentList[x])
}
}

for (var x = 0; x < commentList.length; x++) {
console.log(commentList[x].childNodes[1].childNodes[1].childNodes[3].childNodes[1].innerText)
// The author's name and date of creation are concatenated together. "Name\nDate Created"

console.log(commentList[x].childNodes[1].childNodes[1].childNodes[3].childNodes[3].innerText)
// Comment!
}
// End of code

/* Notes: This is initial code, and while it works in some cases.
* there are a number of errors and performance issues.
* The code, which has been debugged and improved, is uploaded to GitHub.
*/

At this point, I thought I could grow this code into a project. However, there were a few problems with this script.

  • First, I had to paste the code from the console, which made it difficult and inconvenient for others to use.
  • It didn't run automatically, so I had to re-run it in the console when the comments reloaded.
  • Since it could only filter out Korean characters, it was very limited in use, and it was slow to run.

Solving these problems became a natural development goal. I thought it would be most convenient to create a Chrome extension first.

  • Edge and Whale are also Chromium-based, and Firefox add-ons can be created as Chrome extensions.
  • Safari doesn't play YouTube 4K videos, so many Safari users also use Chromium to view YouTube.
  • Internet Explorer has lost support for YouTube.

I also felt that to make it easier for non-computer savvy people to use, it should be located between the video and the comment, "just like YouTube does," so that users can move their mouse cursor from the video to the comment and use it naturally. So I didn't want a Chrome extension, I wanted to embed the menu directly into the YouTube screen itself.

We have a spy in our midst...

We have a spy in our midst...

Now, the goal became concrete.

  • This Chrome extension would insert a language control interface between the video and the comments.
  • The interface will allow all comments to be filtered automatically. It will continue to work even when more comments are loaded.
  • It should be scalable so that adding new languages is not a major challenge.
  • It should be reasonably fast.

See also: versioning​

Different people and organizations have different ways of assigning version numbers. For the purposes of this article, we'll use the following categorization

  • The first digit indicates a major update.
  • The first decimal point indicates the addition of a feature, the second decimal point indicates a bug fix.
  • The third decimal point is used for re-uploads to the store without any code modifications.
NameVersionDescription
Alpha0.1+Minimal functionality, released to a small number of people for feedback
Beta0.9+Most of the original concept implemented, distributed to a large number of people for feedback
Official1+Continuous improvement, distributed to anyone with feedback

1. Alpha versions should be released as soon as possible​

To create an alpha (or more precisely, a Minimum Viable Product), a minimum of basic functionality must be working. More specifically, the following goals had to be achieved

  1. automatically redo comments on every new load.
  2. be able to see the full comment again at any time.
  3. allow for a one-click installation.

The first question was, "How do we make it happen automatically?".

Method 1. Time-based autoplay​

I thought it would be a waste of performance for people who just watch YouTube videos as soon as they turn them on. At the same time, I didn't want to make it so long between replays that it would be annoying to read the comments. This was the first option I thought of, but the first one I discarded.

Method 2. Detect YouTube's loading icon​

YouTube's loading icon

When I checked, the <can-show-more> tag appeared and disappeared under the comment whenever the loading icon appeared, so I thought I could catch the moment when YouTube's obfuscated javascript code inserted the <can-show-more> tag and re-run the filter accordingly.

But I think I had that idea because I forgot what obfuscation means... After a few hours of struggle, I gave up

But I think I had that idea because I forgot what obfuscation means... After a few hours of struggle, I gave up

Method 3. MutationObserver​

Then I discovered MutationObserver in JavaScript. The idea is to set a target to observe and a config to observe, and then run a callback function when a change occurs that meets the config. I used the YouTube comment HTML as the target and made it react to changes in the childNodes and attributes in the HTML. Just as we wanted, it was re-run every time the comment loaded. Problem solved.

However, performance dropped significantly. First, I realized that console.log was running tens of thousands of times, so I cleared it, and it became usably fast. To speed things up a bit, instead of testing the language of the comment each time, I wrapped the comment with display:none; and recorded the result in an HTML tag. When I clicked the button to view the full comment, it would remove the display:none;, and when I ran it again, it would only use the tags I had recorded. Solution 2. (I didn't realize this until later, but it wasn't a complete solution. This way of utilizing annotations didn't improve speed much and created more problems. See YouTube's SPA-ness during beta #2.)

I spent the next three days learning about the structure of Chrome Extensions and porting my existing code within the framework of Chrome Extensions. I used the official documentation, stack overflow, and a Udemy course on Chrome extensions as my main sources. Solution 3.

Finish the alpha version​

User review of the alpha version. &quot;Slow performance, but it works&quot; is an accurate description of the alpha version

User review of the alpha version. "Slow performance, but it works" is an accurate description of the alpha version

We started development on March 6, 2020, and finished the first alpha version on March 11. We shared the alpha with a small group of testers and spent about a week collecting user reviews and researching improvements and bugs.

Screenshot from the alpha version. The alpha version had a button to switch between Korean and full comments

Screenshot from the alpha version. The alpha version had a button to switch between Korean and full comments

2. Beta version​

We found and improved several problems in the alpha version.

β‘  YouTube's Lazy Loading Issue​

YouTube lazily loads content. Lazy loading refers to the practice of waiting and fetching information when it's needed, rather than loading everything at once. Lazy loading is usually applied to large images, but YouTube has taken it a step further and lazily loads the HTML itself! (I don't know if that's accurate, but that's what I saw at the time of development).

When you first land on a video page, the HTML for the comment interface doesn't exist. When the user starts scrolling to view the comments, a gray loading icon appears and loads the comment HTML. (Why? Because it saves querying the comment DB. Seems like a clever solution.) Since the comment HTML doesn't exist at the document_end of the extension, the extension throws an error and exits shortly after.

Lazy Loading on YouTube

Lazy Loading on YouTube

Solution.​

Although YouTube's Lazy Loading prevented me from seeing the content of the comments, I was able to insert a filter menu into the comment window itself and it worked fine. Fortunately, the comment box was loading shortly after the document loaded (at document_end), so I set it to rescan the comment box for an Xpath every 0.5 seconds if the comment HTML didn't exist.

This may seem a bit puzzling, as I had previously ruled out retrying at regular intervals as a waste of computation. In the previous case, the timer would loop indefinitely because it was re-checking the comments after a certain amount of time, whereas in this case, it only retries until the comments window loads. In practice, this workaround terminates normally after one or two retries. If it takes longer than that, your internet is too slow to view YouTube itself.

Bypassing YouTube&#39;s Lazy Loading

Bypassing YouTube's Lazy Loading

Secondary bugs have since emerged. Because I wasn't very good with JavaScript, I didn't know how to efficiently retry certain functions after a certain amount of time. YouTube would sometimes load infinitely when accessed by video URL or directly from search results. It turned out that we needed to make the JS retry after a certain amount of time, but by making it wait for a certain amount of time, we were preventing other JS from running. This bug can also be found in GitHub Issue #4 and was finally fixed in v1.1.4. (The bug showed up about once in 20 times, and it was harder to reproduce the error situation than to find the bug in the error situation.)

β‘‘ YouTube's HTML component recycling issue​

In order for the Chrome extension to access your website and make changes, you need to write all the addresses that the extension will access in the manifest.json. Initially, I used https://*.youtube.com/watch* because I thought it would only need to run on the video page. However, I ran into a problem when I tried to access the video from the YouTube home screen.

YouTube reuses a lot of components from HTML. If you've ever watched a YouTube video and pressed the i button to launch the miniplayer, you'll recognize this. YouTube doesn't actually take you to the video page, it just (1) covers the existing screen, (2) puts a new page on top, and (3) replaces the address in the web address bar. So naturally, when you press the i button to open the miniplayer, you'll see the same window you saw before playing the video.

However, using https://*.youtube.com/* in the manifest.json caused the filter menu to be inserted in the wrong place, even on pages without a comment box.

Apparently the Chrome extension only checks the domain when a new page is loaded

Apparently the Chrome extension only checks the domain when a new page is loaded

Also, on video pages, even if the filter menu was inserted correctly, if you navigated to another video with the filter enabled, it would sometimes filter the "current video's comments" based on the "previous video's comment language". This also seemed to be caused by YouTube's recycling of the comment component. I mentioned earlier that I was jotting down the results in the comment HTML to speed things up, but this also scrambled the results and made the error bigger.

Solution​

The extension monitors website access. Be prepared to insert a filter if the domain being accessed is YouTube, or manually inspect the domain on a case-by-case basis and insert a filter if it's youtube.com/watch.

YouTube is a SPA

YouTube is a SPA

In this case, we need the "Can read user's history" permission when installing the extension. For this reason, I've included a note about the permissions used in the corresponding commit and installation completion page. As you can see from the GitHub open source, the browsing history is not sent externally.

I also set it up so that when you navigate to a new page with filters on, it will (1) reset all filters and filter results, (2) reload the comments on the new page, and (3) wait.

Sometimes the problem would reappear. However, this seems to be more of a YouTube bug. Even without this extension, YouTube's web commenting system is notoriously buggy, with incorrect comments appearing and comments on videos getting mixed up. In this case, a refresh should fix it.

β‘’ Speed issues​

In the alpha version, the performance of the filter was very bad, but by removing all console.log, there was a significant speed improvement. As long as the filter is faster than the time it takes for users to read comments, it's not unusable, so I prioritized fixing the above issues over speed improvement, but later I realized that this is the problem.

for (var comment of commentList) {
if (comment.id === '') {
var commentString =
comment.childNodes[1].childNodes[1].childNodes[1].childNodes[3].childNodes[3].childNodes[1].innerText
if (containsSelectedLang(commentString)) {
comment.id = 'contains-SelectedLang'
} else {
comment.id = 'no-SelectedLang'
}
}
if (comment.id === 'no-SelectedLang') {
comment.style = 'display: none'
}
}

The comment address in commentString doesn't currently work because it has changed several times with YouTube's updates.

Solution...?​

We haven't completely fixed it yet. For now, we've modified v1.2 to use the names of the characters rather than the names of the languages to make the characterization more explicit. There are plans to eventually include a natural language processing module. However, due to a policy issue with Chrome Extensions, this would require us to break a significant portion of our code, so we're working on it.

3. Full version​

While there are no performance or functionality improvements to the filter itself, we have made usability improvements to the extension. We've built a settings window to hide unused languages and created a landing page that opens immediately after installation. We've also created a short guide for reporting bugs. I also redid the promo image.

You might think this is a pointless task, but it works wonders for attracting more users than expected

You might think this is a pointless task, but it works wonders for attracting more users than expected

I did a small experiment a while back, and the difference between the number of people who came in with and without the promo image was about 4x.

Finally, I ported the Chrome extension to Firefox as a Firefox add-on. I covered this in my previous article Porting a Chrome Extension to Firefox Add-on (https://blog.chosunghyun.com/porting-a-chrome-extension-to-firefox-add-on/).

Overall thoughts on v1​

My goal in releasing v1 was to create a deliverable that I could use without thinking about it. If an extension breaks along the way and you start to care about it, it ruins the fun of watching YouTube.

But the current v1 doesn't break the fun. I can just use it without thinking about it and it works fine. (It just all... works!)

Review from a friend who has been using it since the beta version

Review from a friend who has been using it since the beta version

In addition, the original purpose of finding Korean comments works almost flawlessly. After all, Korean comments will have at least one Hangul character in them.

In the future​

I'm working on v2. The most popular feedback on v1 was "more styles" and "better language detection". A fundamental problem with Chrome extensions is that it is very complicated to add external files or modules from within the Chrome extension itself. Using the npm library is also cumbersome. The solution is to use something called Webpack, which requires a significant rewrite of the code. Nevertheless, I can think of a lot of fun ways to use it, so I'm working on v2.

There's also talk that YouTube is experimenting with the percentage of foreigners viewing Korean videos by deliberately pushing Korean comments down in the comment rankings, so I don't think this project will ever end.

Hence the title Part 1. Someday, if v2 is completed, or if something happens that deserves to be remembered, you'll see part 2 of this post.

Heads Up!
  • I wrote this post more than 4 years ago.
  • That's enough time for things to change.
  • Possibly, I may not endorse the content anymore.
Google Latest Articles Instead

β€’ Counting...

My Ghost blog is not serverless. Although it requires continuous management, there are many advantages to operating a blog through a server. However, managing a blog through a server has one major drawback. If the server crashes, it becomes very difficult to restore the posts stored inside. I thought it would be too cumbersome to manually copy and backup each post and photo as the amount of content increases in the future. I wanted to come up with a solution to improve this situation.

Problems with Ghost's Built-in Backup​

Ghost provides a feature to download a blog backup file in .json format. It's like a complete copy of the blog's soul. Everything that can be set within Ghost, such as the author's name, tags used, post content and format, upload time, and even the summary in the HTML meta tags, is backed up as is.

However, there are two problems.

  • Ghost's built-in backup files are difficult for humans to read. Not only are they minified JSON, but the file structure is also complex due to the vast amount of information it contains, and the posts are compressed.
  • Also, Ghost's built-in backup does not back up photos. Therefore, when restoring the blog, all photo files will display "not found" (commonly known as broken images). If the blog server is alive or you have copied photos, you're in luck, but there may be cases where photos cannot be restored.

Goals​

Main Goal​

  • Both posts and photos should be backed up.

Bonus Goals​

  • It should be in a human-readable format. (Human-Readable Medium)
  • It should be clear which photo goes into which location of which post, in preparation for restoring the blog.
  • Backup should be convenient.
  • It should be possible to create a replica outside the blog.

Idea​

The answer is RSS. RSS is a technology that emerged in the early 2000s during the blogging boom, serving as a "subscription" service. Sites or blogs that support RSS provide an RSS feed address. The RSS feed address contains updated content from that site in a machine-readable format. When users enter the RSS feed address into an RSS reader, the reader fetches new content from the RSS feed address each time.

In modern times, with the rise of social media, RSS technology has become obsolete, but it is sufficient to achieve my goals. The RSS feed acts as an API for retrieving posts. Since Ghost supports RSS by default, I decided to utilize it.

General Idea​

  1. Copy the entire RSS feed by entering the blog's RSS address.
  2. Parse the RSS and extract the HTML of each post.
  3. Create a folder for each post and save the post's HTML.
  4. Download the photos by accessing the src address of the img tags included in the post's HTML.
  5. For posts containing photos, create an images folder in each post folder, save the photos, and change the src of the img tags in the HTML to the relative path of the saved images.

Development​

Reference​

All the examples below are based on v1 of [anaclumos/backup-with-rss](https://github.com/anaclumos/backup-with-rss). By the time you read this post, there may have been new features or bug fixes added.

Also, the code included in this post is intended to show a general flow, not the entire code. If you try to copy and run it as is, it probably won't work! The complete code is available in the GitHub repository.

1. Copying the RSS Feed Using Feedparser​

The RSS feed is copied using the Feedparser module in Python.

# -*- coding: utf-8 -*-
import feedparser

class RSSReader:
origin = ""
feed = ""

def __init__(self, URL):
self.origin = URL
self.feed = feedparser.parse(URL)

def parse(self):
return self.feed.entries

RSSReader is used to load the RSS feed and pass the entries item.

What this code does is:

  1. When the RSSReader Object is created, it stores the RSS address in self.origin and parses the RSS address and stores it in self.feed.
  2. When the parse() function is executed, it returns the entries from the stored value in self.feed.

The entries contain the posts from the RSS feed in the form of a list. The following example is the RSS of this post.

Structure of self.feed.entries in parse()

// Some parts omitted
{
"bozo": 0,
"encoding": "utf-8",
"entries": [],
"feed": {
"generator": "Ghost 3.13",
"generator_detail": {
"name": "Ghost 3.13"
},
"image": {
"href": "https://blog.chosunghyun.com/favicon.png",
"link": "https://blog.chosunghyun.com/",
"links": [
{
"href": "https://blog.chosunghyun.com/",
"rel": "alternate",
"type": "text/html"
}
],
"title": "Sunghyun Cho",
"title_detail": {
"base": "https://blog.chosunghyun.com/rss/",
"language": "None",
"type": "text/plain",
"value": "Sunghyun Cho"
}
},
"link": "https://blog.chosunghyun.com/",
"links": [
{
"href": "https://blog.chosunghyun.com/",
"rel": "alternate",
"type": "text/html"
},
{
"href": "https://blog.chosunghyun.com/rss/",
"rel": "self",
"type": "application/rss+xml"
}
],
"subtitle": "Sunghyun Cho's Blog",
"subtitle_detail": {
"base": "https://blog.chosunghyun.com/rss/",
"language": "None",
"type": "text/html",
"value": "Sunghyun Cho's Blog"
},
"title": "Sunghyun Cho",
"title_detail": {
"base": "https://blog.chosunghyun.com/rss/",
"language": "None",
"type": "text/plain",
"value": "Sunghyun Cho"
},
"ttl": "60",
"href": "https://blog.chosunghyun.com/rss/",
"namespaces": {
"": "http://www.w3.org/2005/Atom",
"content": "http://purl.org/rss/1.0/modules/content/",
"dc": "http://purl.org/dc/elements/1.1/",
"media": "http://search.yahoo.com/mrss/"
},
"status": 200,
"version": "rss20"
}
}

2. Creating Markdown Files from RSS Data​

I thought I could extract only the necessary values from the self.feed.entries returned by RSSReader. I created an MDCreator class to process the information provided by RSSReader.

class MDCreator:
def __init__(self, rawData, blogDomain):
self.rawData = rawData
self.blogDomain = blogDomain

def createFile(self, directory):
try:
os.makedirs(directory + "/" + self.rawData.title)
print('Folder "' + self.rawData.title + '" Created ')
except FileExistsError:
print('Folder "' + self.rawData.title + '" already exists')

self.directory = directory + "/" + self.rawData.title

MDFile = codecs.open(self.directory + "/README.md", "w", "utf-8")
MDFile.write(self.render())
MDFile.close()

The blogDomain parameter is used later.

What this code does is:

  1. When the MDCreator Object is created, it stores the blog address in self.blogDomain and the raw RSS feed data in self.rawData. This raw RSS feed data is the self.feed.entries returned by RSSReader's parse().
  2. When the createFile() function is executed, it creates a folder for each post in the backup folder. The folder title is the title of the post. It creates a README.md in each folder and puts the post content inside.

The reason for creating files using the codecs library is to make it use Unicode instead of the CP949 codec on Windows. This way, emojis included in the RSS are displayed correctly πŸš€πŸ₯Š

3. Adding Post Information to the Generated Markdown File​

I wanted to use Jekyll-style Front Matter when displaying post information. I thought it would be the easiest way to check the post's title, tags, link, author, etc.

def render(self):
try:
postTitle = str(self.rawData.title)
except AttributeError:
postTitle = "Post Title Unknown"
print("Post Title does not exist")

try:
postTags = str(self.getValueListOfDictList(self.rawData.tags, "term"))
except AttributeError:
postTags = "Post Tags Unknown"
print("Post Tags does not exist")

try:
postLink = "Post Link Unknown"
postLink = str(self.rawData.link)
except AttributeError:
print("Post Link does not exist")

try:
postID = str(self.rawData.id)
except AttributeError:
postID = "Post ID unknown"
print("Post ID does not exist")

try:
postAuthors = str(self.rawData.authors)
except AttributeError:
postAuthors = "Authors Unknown"
print("Authors does not exist")

try:
postPublished = str(self.rawData.published)
except AttributeError:
postPublished = "Published Date unknown"
print("Published Date does not exist")

self.renderedData = (
"---\nlayout: post\ntitle: "
+ postTitle
+ "\ntags: "
+ postTags
+ "\nurl: "
+ postLink
+ "\nauthors: "
+ postAuthors
+ "\npublished: "
+ postPublished
+ "\nid: "
+ postID
+ "\n---\n"
)

What this code does is:

  1. It checks if the post's title, tags, link, ID, author names, and publication date exist in the RSS code, and if they do, it enters those values into the Front Matter.
  2. If a value doesn't exist, it enters ~ Unknown.

The reason for adding tags using code like self.getValueListOfDictList(self.rawData.tags, "term") is because tags are specified in the following format in Ghost. This is the same for Gatsby and WordPress as well.

'tags': [{'label': None, 'scheme': None, 'term': 'English'},
{'label': None, 'scheme': None, 'term': 'Code'},
{'label': None, 'scheme': None, 'term': 'Apple'}],
def getValueListOfDictList(self, dicList, targetkey):
arr = []
for dic in dicList:
for key, value in dic.items():
if key == targetkey:
arr.append(value)
return arr

In this way, only the term item is extracted from tags and added to the Front Matter. When executed, the following Jekyll-style Front Matter is completed.

---
layout: post
title: Apple's Easter Egg
tags: ['English', 'Code', 'Apple']
url: https://blog.chosunghyun.com/apples-easter-egg/
authors: [{ 'name': 'S Cho' }]
published: Sun, 19 Jan 2020 17:00:00 GMT
id: /_ Some Post ID _/
---

Jekyll Style Front Matter on GitHub

Jekyll Style Front Matter on GitHub

Front Matter is rendered like this on GitHub.

4. Adding Post Summary and Content to the Generated Markdown File​

The Summary and Content items from the RSS data are added to renderedData.

self.renderedData += "\n\n# " + postTitle + "\n\n## Summary\n\n"
try:
self.renderedData += self.rawData.summary
except AttributeError:
self.renderedData += "RSS summary does not exist."

self.renderedData += "\n\n## Content\n\n"
try:
for el in self.getValueListOfDictList(self.rawData.content, "value"):
self.renderedData += "\n" + str(el)
except AttributeError:
self.renderedData += "RSS content does not exist."

One interesting thing was that while Ghost and WordPress-based blogs support both RSS Summary and Content, Jekyll-based GitHub Pages and Tistory put all the post content in the RSS Summary. (...) Ghost basically provides a feature to set the Excerpt of a post, and this Excerpt value is used as the RSS Summary.

5. Adding Images to the Generated Markdown File​

For backup, images must be completely preserved. Unless the images are embedded in base64 in the HTML, they are all currently in the form of img tags with only src specified. If the server goes down, it won't be able to load images from the img src, so all images need to be downloaded at the time of backup.

I referred to How to Download All Images from a Web Page in Python by PythonCode.

soup = bs(self.renderedData, features="html.parser")
for img in soup.findAll("img"):
for imgsrc in ["src", "data-src"]:
try:
remoteFile = img[imgsrc]
break
except KeyError:
continue

if self.isDomain(remoteFile) != True:
print("remoteFile", remoteFile, "is not a domain.")
remoteFile = self.blogDomain + "/" + remoteFile
print("Fixing it to", remoteFile)

print('Trying to download "' + remoteFile + '" and save it at "' + self.directory + '/images"')
self.download(remoteFile, self.directory + "/images")

img["src"] = "images/" + remoteFile.split("/")[-1]
img["srcset"] = ""
print(img["src"])

self.renderedData = str(soup)
return self.renderedData

What this code does is:

  1. It reads the string renderedData as HTML and finds all img tags.
  2. It checks if there are src or data-src attributes. data-src is an attribute for WordPress compatibility.
  3. It creates an images folder inside each post folder and saves the images there. The image name is the lowest directory of the img src. For example, if the img src is https://blog.someone.com/images/example.png, it is saved as images/example.png.
  4. It changes the existing img src to the relative path of the images folder.
  5. If it has a srcset attribute, it removes it (for Gatsby compatibility).
def download(self, url, pathname):
if not os.path.isdir(pathname):
os.makedirs(pathname)

response = requests.get(url, stream=True)
file_size = int(response.headers.get("Content-Length", 0))
filename = os.path.join(pathname, url.split("/")[-1])

if filename.find("?") > 0:
filename = filename.split("?")[0]

progress = tqdm(
response.iter_content(256),
f"Downloading {filename}",
total=file_size,
unit="B",
unit_scale=True,
unit_divisor=1024,
)

with open(filename, "wb") as f:
for data in progress:
f.write(data)
progress.update(len(data))

One problem is that the image addresses are not consistent. Some sites write the full domain like <img src = "https://example.png/images/example.png">, while others write from the subdirectory like <img src = "/images/example.png">. There were also places with <img src = "example.png">. To handle as many cases as possible, I created a function isDomain() to detect the domain. Other libraries recognized file extensions like .png as top-level domains like .com, so I added some exception handling.

def isDomain(self, string):
if string.startswith("https://") or string.startswith("http://"):
return True
elif string.startswith("/"):
return False
else:
return validators.domain(string.split("/")[0])

If it's not a directly accessible domain like <img src = "/images/example.png">, I specified to add the domain name in front. This is where the previously set self.blogDomain is used.

Results​

I tried backing up this blog. This blog is a self-hosted Ghost blog. Just running main.py will proceed with the backup.

The backed-up posts. The folder names are set to the post titles.

The backed-up posts. The folder names are set to the post titles.

The appearance of a backed-up post on GitHub. The photos are also stored directly in the folder instead of the blog server.

The appearance of a backed-up post on GitHub. The photos are also stored directly in the folder instead of the blog server.

The photos used in the post are saved in the folder.

The photos used in the post are saved in the folder.

Based on testing, the following services are supported. The format or arrangement of posts may be slightly different, but the purpose of backup is sufficiently achieved.

  • Ghost
  • WordPress
  • Jekyll-based GitHub Pages
  • Gatsby-based GitHub Pages
  • Medium
  • Tistory

Goal Achievement Evaluation​

Main Goal​

  • Both posts and photos should be backed up. β˜…β˜…β˜… The goal was fully achieved. Videos are not backed up, but since videos are usually embedded through YouTube anyway, there is a much lower probability of information loss. That's why it was excluded from the goal from the beginning.

Bonus Goals​

  • It should be in a human-readable format. (Human-Readable Medium) β˜…β˜…β˜† Compared to Ghost's built-in backup, important information can be seen at a glance in the Front Matter, and posts are rendered in almost the same form as the blog. It is also convenient to find desired materials as posts and photos are organized by folder. However, even though Markdown is used, the post body is in HTML, so it is inconvenient to edit posts. It's a backup that achieves just the purpose of Lots of copies keep stuff safe.

  • It should be clear which photo goes into which location of which post. (In preparation for restoring the blog) β˜…β˜…β˜… It is clear which photo goes into which location of which post.

  • Backup should be convenient. β˜…β˜…β˜† main.py needs to be executed manually. I'm thinking of automating it with crontab someday.

Also, due to the nature of using RSS, only posts included in the RSS feed are backed up. RSS feeds often include only the latest posts to reduce bandwidth usage, and each blog has an option to adjust this. Ghost blogs include 15 of the latest posts in the RSS feed by default. The number of posts in the RSS feed of a Ghost blog cannot be manipulated within the Ghost CMS and requires modifying the code of Ghost Core.

  • It should be possible to create a replica outside the blog. β˜…β˜…β˜† When repeatedly downloading numerous photos from a WordPress blog, access may be temporarily blocked.

Future Plans​

After completing it and giving it some thought, I realized it could be a good tool for people who are planning to migrate their blog but are struggling with too much accumulated data. I plan to further improve it to become a tool that can help with blog migration.

References​

Heads Up!
  • I wrote this post more than 4 years ago.
  • That's enough time for things to change.
  • Possibly, I may not endorse the content anymore.
Google Latest Articles Instead

β€’ Counting...

Example of Video Ghosting

Example of Video Ghosting

In this article, we will learn about the principle of video compression and discuss why the above phenomenon occurs.

Videos are too big​

A video is a collection of photos. However, the capacity becomes surprisingly large if we produce a video as a series of actual images. For example, if the 1920 x 1080 60FPS video we often watch on YouTube is not compressed, its size approaches 7GB per minute. However, if you watch a video with the exact specifications on YouTube, up to 40MB per minute is used. This compression is a reduction of capacity by almost 200 times. Still, we don't notice much of a difference. What happened?

So we encode​

Due to the large video size, most videos use some compression level. We call this video encoding, and the world of encoding algorithms is amazingly sophisticated and beautiful.

Video encoding finds the key to saving capacity in redundancy. For example, imagine a singer standing still and singing. Only the singer's mouth moves, and the background and the singer's body do not move at all. If so, is it necessary to provide information about the black pixels in the background and the body movements of the singer every time? No. Because those parts overlap.

Video data overlap in space and time. A method of removing spatial duplication is called intra-frame coding (intra-frame compression), and a way of eliminating temporal duplication is called inter-frame coding (inter-frame compression). As detailed implementation methods, there is a Discrete Cosine Transform used to reduce adjacent pixel data, prediction using motion vectors, and in-loop filtering techniques.

Intra-frame coding​

Reduce the size of the photo itself!

A video is a collection of photos. A picture is a set of pixels. We can reduce spatial redundancy if we reduce the information on overlapping pixels in the same image. One of the most straightforward implementations is to use averages. Suppose the data of one pixel is left empty, and the information of the surrounding pixels is left. In that case, the computer takes the knowledge of adjacent pixels when playing a video and expresses the average of the data.

What's interesting here is that adjacent pixels are not up, down, left, or right. Pixel data in a video is stored in order from left to right and top to bottom. Suppose the information of the top, bottom, left, and right pixels is retrieved, and the average is obtained. In that case, it is necessary to wait until the right and bottom pixel data are read and then come back to represent the pixel data. Since it is not efficient when expressing a video quickly, Intra-frame coding temporarily stores the upper left, upper right, upper right, and left data. When encountering blank data, the average value is calculated using the temporarily stored values.

Inter-frame coding​

Don't resend information you've sent in the past; let's recycle it!

Remember giving out prizes at school holidays? Let's imagine that the same award is given to 30 people. How long would the vacation ceremony be if the principal read out all the prizes individually? How boring and painful will it be? But the principal doesn't. Just the contents are the same as above, and move on. We can have a lovely vacation afternoon just by expressing that we are the same as the previous person. The principal did inter-frame compression.

The same goes for videos. Since many videos have similar frames, they can also express information about the relationship between the structures before and after or omit it altogether. This can reduce temporal redundancy.

Who&#39;s better? Principal announcing for 2 hours or 2 minutes?

Who's better? Principal announcing for 2 hours or 2 minutes?

#1. I-Frame being the standard​

An I-Frame (Intra-coded picture) is a photograph. All information in the I-Frame is new information. I-Frame becomes the standard for expressing the front and back frames.

#2. P-Frame expressing only the amount of change​

A P-Frame (Predicted Pictures) is inserted between each I-Frame. The amount of change from the previous screen is expressed in the P-Frame. If the current frame has something in common with the last structure, information about the prior frame is retrieved and used. It is easier to understand by looking at the picture.

Copyright: Blender Foundation 2006. Netherlands Media Art Institute. www.elephantsdream.org

Copyright: Blender Foundation 2006. Netherlands Media Art Institute. www.elephantsdream.org

What is represented by an arrow is a motion vector representing the amount of change. In addition to this, P-Frame includes conversion values for prediction correction. In some cases, new image information is also included in the P-Frame. P-Frame uses only about half the size of the I-Frame. Of course, in actual video encoding, instead of comparing all pixel information, it is divided into several blocks and compared. This is called a macroblock, and in HEVC, the latest video codec, it is called a coding tree unit.

#3. B-Frame saving data​

Insert B-Frame (Bidirectionally Predicted Pictures) between I-Frame and P-Frame. B-Frame calculates the screen using the front and backs I-Frames or P-Frames. There is no difference from P-Frame, but B-Frame is used because of its capacity. Since B-Frame utilizes all the data of the preceding and preceding frames, we can omit information as much. So the B-Frame uses only 25% of the size of the P-Frame.

Like P-Frame, B-Frame also uses Motion Vector and conversion values for prediction correction. B-Frame refers to I-Frame and P-Frame, but in the latest video codecs such as HEVC and VVC, B-Frame can also refer to other B-Frames.

Copyright: Cmglee, CC BY-SA 4.0.

Copyright: Cmglee, CC BY-SA 4.0.

Reason for ghostings​

The problem occurs when the communication packet containing the I-Frame is lost. As a result, the reference values for calculating the surrounding P-Frame and B-Frame disappeared. Of course, a good video streaming program uses various algorithms to detect communication packet loss in advance and request packets again. Still, we cannot check for I-Frame loss if the server is unstable or the streaming program is poor.

If the I-Frame is lost and only the next P-Frame and B-Frame arrive, the change value is applied to the wrong I-Frame.

A picture is worth a thousand words​

If you still don't understand, check it out with your own eyes. Using a commercial video library such as 'FFmpeg', the frame information of a video file can be intentionally corrupted. This type of art is called Datamoshing.

Using Python and FFmpeg libraries, the I-Frame in the music video was damaged to cause the ghosting artificially.

  • All I-Frames in the video were overwritten with the values of the previous frame (probably P-Frame and B-Frame). Therefore, there is no new information due to I-Frame. So the screen does not change, but the characters' movements appear. This is because I applied the amount of change (P-Frame and B-Frame) to the wrong reference point (I-Frame).
  • There are times when a part of the middle screen looks clean for a moment. This is because the P-Frame may also have new image information. However, since I-Frames, all new information, has been deleted, even if the screen looks clean temporarily, the entire screen will not be clean.
  • You will notice that when the video is broken, it is not scattered like small sand but broken into large, easily visible square units. This is because image data compression is not calculated for each pixel but in units of macroblocks (coding tree units) that bundle several pixels. When this phenomenon occurs during a broadcast, it is commonly referred to as "Pixelated Videos".

Considering that there is no I-Frame, you will understand the relationship between I-Frame, P-Frame, and B-Frame more clearly after watching the video. When the opportunity arises, we will discuss how to damage a video using FFmpeg later.


  • If there is an error in the article, please report it to mail@chosunghyun.com.
  • "However, to prevent the error from getting bigger, B-Frames do not refer to other B-Frames. They only refer to I-Frames or P-Frames" is incorrect. In video codecs such as HEVC and VVC, B-Frames can reference other B-Frames. Thank you so much for reporting. Credit: (anonymous)
Heads Up!
  • I wrote this post more than 4 years ago.
  • That's enough time for things to change.
  • Possibly, I may not endorse the content anymore.
Google Latest Articles Instead

β€’ Counting...

Hero Image. Building a payment system for school festivals

MinsaPay is a payment system that was built for the Minjok Summer Festival. It works like a prepaid tap-to-pay card. Every source and piece of anonymized transaction data is available on GitHub.

Stats​

But why does a school festival need a payment system?​

My high school, Korean Minjok Leadership Academy (KMLA), had a summer festival like any other school. Students opened booths to sell food and items they created. We also screened movies produced by our students and hosted dance clubs. The water party in the afternoon is one of the festival's oldest traditions.

Because there were a lot of products being sold, it was hard to use regular paper money (a subsequent analysis by the MinsaPay team confirmed that the total volume of payments reached more than $4,000). So our student council created proprietary money called the Minjok Festival Notes. The student council had a dedicated student department act as a bank to publish the notes and monitor the currency's flow. Also, the Minjok Festival Notes acted as festival memorabilia since each year's design was unique.

The Minjok Festival Note design for 2018 had photos of the KMLA student council members at the center of the bill. The yellow one was worth approximately 5.00, the green one was worth 1.00, and the red one was worth 50 cents.

The Minjok Festival Note design for 2018 had photos of the KMLA student council members at the center of the bill. The yellow one was worth approximately 5.00,thegreenonewasworth5.00, the green one was worth 1.00, and the red one was worth 50 cents.

But there were problems. First, it was not eco-friendly. Thousands of notes were printed and disposed of annually for just a single day. It was a waste of resources. The water party mentioned above was problematic as well. The student council made Minjok Festival Notes out of nothing special, just ordinary paper. That made the notes extremely vulnerable to water, and students lost a lot of money after the water party. Eventually, the KMLA students sought a way to resolve all of these issues.

Idea​

The student council first offered me the chance to develop a payment system. Because I had thought about the case beforehand, I thought it made a lot of sense. I instantly detailed the feasibility and possibilities of the payment system. But even after designing the system in such great detail that I could immediately jump into the development, I turned down the offer.

I believe in the social responsibilities of the developer. Developers should not be copy-pasters who meet the technical requirements and deliver the product. On the contrary, they are the people with enormous potential to open an entirely new horizon of the world by conversing with computers and other technological media. Therefore, developers have started to possess the decisive power to impact the daily lives of the rest of us, and it is their bound responsibility to use that power to enhance the world. That means developers should understand how impactful a single line of code can be.

Of course, I was tempted. But I had never done a project where security was the primary interest. It was a considerable risk to start with a project like this without any experience or knowledge in security. Many what-ifs flooded my brain. What if a single line of code makes the balance disappear? What if the payment record gets mixed up? What if the server is hacked? More realistically, what if the server goes down?

People praise audacity, but I prefer prudence. Bravery and arrogance are just one step apart. A financial system should be flawless (or as flawless as possible). It should both be functional and be performing resiliently under any condition. It didn't seem impossible. But it was too naΓ―ve to believe nothing would happen, as I was (and am still) a total newbie in security. So I turned it down.

Wait, payment system using Google Forms?​

The student council still wanted to continue the project. I thought they would outsource the task to some outside organization. It sounded better since they would at least have some degree of security. But the council thought differently. They were making it themselves with Google Forms.

When I was designing the system, the primary issue was payment authorization. The passcode shouldn't be shared with the merchant, while the system could correctly authorize and process the order. The users can only use the deposited money in their accounts. This authorization should happen in real-time. But I couldn't think of a way to nail the real-time authorization with Google Forms. So I asked for more technical details from one student council member. The idea was as follows:

Abstract of a Google-Form-Powered Payment System
  • Create one Google Form per user. (We have about 400 users in total.)
  • Create QR codes with links to the Google Form. (So it's 400 QR codes in total.)
  • Create a wristband with the QR code, and distribute them to the users.
  • Show that wristband when purchasing something.
  • The merchant scans the QR code and opens the link in incognito mode.
  • Input the price and the name of the booth.
  • Confirm with the user (customer) and submit the response.
  • Close the incognito tab.

So the idea was to use the Google Form's unique address as a password. Since the merchants are supposed to use incognito mode, there should be a safety layer to protect the user's Google Form address (in theory). They will need to make a deferred payment after the festival. But as a developer, this approach had multiple problems:

Potential Problems I found
  • How are we going to manage all 400 Google Forms?
  • Intended or not, people will lose their wristbands. In that case, we will need to note the owner of the wristband in every Google form to calculate the spending. Can we deliver those QR codes to the correct owner if we do?
  • If the merchant doesn't use incognito mode, it will be hard for an ordinary person to tell the difference. If that happens, it is possible to attack the exposed Google form by submitting fake orders. We could also add a "password," but in that case, we cannot stop the customer from providing an incorrect password and claiming that they were hacked by someone else.
  • If the merchant has to select the booth and input the price manually, there will be occasions where they make a typo. Operators could fix a typo in the price value relatively quickly, but a typo or misselection in the booth value would be a pain since we would have to find out who made a mistake and the original order. Imagine there were 20 wrong booth values. How are we going to trace the real booth value? We could guess, but would that sort of record have its value as reliable data?
  • How are we going to make the deferred payment? How will we extract and merge all 400 of the Google Forms response sheets? Even worse, the day after the festival is a vacation. People care about losing money but not so much about paying their debts. There could be students who just won't come back. It would be excruciating to notify all those who didn't deliver. But if the money is prepaid, the solution is comparably easy. The council members could deposit the remaining balance to their phone number or bank account. We don't need to message dozens of students; we could do the work ourselves.
  • The student council will make the Google Form with the student council's Google account. That Google account will have restricted access, but a few students will be working together to create all 400 Google forms. Can we track who makes the rogue action if someone manipulates the Google form for their benefit?
  • Can this all be free from human error?

It could work in an ideal situation. But it will accompany a great deal of confusion and entail a noticeable discomfort on the festival day. That made me think that even though my idea had its risks, mine would still be better. So, I changed my mind.

Development​

Fortunately, I met a friend with the same intentβ€”our vision and idea about the project aligned. I explained my previous concept, and we talked to each other and co-developed the actual product. We also met at a cafe several times. I set up and managed the DNS and created the front-end side. Below are the things we thought about while making the product.

Details that my team considered
  • We won't be able to use any payment gateway or third-party payment service since we are not officially registered, and we will use it for a single day. Some students don't own smartphones, so we won't be able to use Toss or KakaoPay (Both are well-known P2P payment services in Korea, just like Venmo). Therefore, there cannot be any devices on the client-side. We would need to install computers on the merchant's side.
  • It is impossible to build a completely automated system. Especially in dealing with cash, we would need some help from the student council and the Department of Finances and Information. Trusted members from the committee will manually count and deposit the money.
  • There must be no errors in at least the merchant and customer fields since they would be the most difficult errors to fix later. But, of course, we cannot expect that people will make no mistakes. So, instead, we need to engineer an environment where no one can make a mistake even if they want to.
  • The booths may be congested. If each customer needs to input their username and password every time, that will pose a severe inconvenience. For user experience, some sort of one-touch payment would be ideal.
  • For this, we could use the Campus ID card. Each card has a student number (of course) and a unique value for identifying students at the school front door. We could use the number as the username and the unique value as the password. Since this password is guaranteed to be different for each student, we would only need the password for identification purposes.
  • The final payment system would be a prepaid tap-to-pay card.
  • Developers would connect each account with its owner's student ID.
  • Students could withdraw the remaining money after the festival.

We disagreed on two problems.

  1. One was the platform. While my partner insisted on using Windows executable programs, I wanted the system to be multi-platform and asked to use web apps. (As you might expect, I use a Mac.)
  2. The other was the method of reading data from the Campus ID card. The card has an RFID chip and a bar code storing the same value. If we chose RFID values, we would have to purchase ten RFID readers, spending an additional 100.Initially,IinsistedonusingtheembeddedlaptopwebcamtoscanthebarcodebecauseMinsaPaywasapilotexperimentatthattime.Ithoughtthatsuchanexpensewouldmaketheentiresystemquestionableintermsofcostβˆ’effectiveness.(Isaid"Wait,weneedtospendanadditional100. Initially, I insisted on using the embedded laptop webcam to scan the barcode because MinsaPay was a pilot experiment at that time. I thought that such an expense would make the entire system questionable in terms of cost-effectiveness. (I said _"Wait, we need to spend an additional 100 even though we have no idea if the system will work?"_)

We chose web and RFID, conceding one for each. I agreed to use RFID after learning that using a camera to read bar codes wasn't that fast or efficient.

Main Home, Admin Page, and Balance Check Page of the product.

Main Home, Admin Page, and Balance Check Page of the product.

And it happened​

Remember that one of the concerns was about the server going down?
On the festival day, senior students had to self-study at school. Then at one moment, I found my phone had several missed calls. The server went down. I rushed to the festival and sat in a corner, gasping and trying to find the reason. Finally, I realized the server was intact, but the database was not responding.
It was an absurd problem. (Well, no problem is absurd, per se, but we couldn't hide our disappointment after figuring out the reason.) We thought the free plan would be more than enough when we constructed our database. However, the payment requests surged and exceeded the database free tier. So we purchased a $9.99 plan, and the database went back to work. It was one of the most nerve-wracking events I ever had.

The moment of upgrading the database plan. $10 can cause such chaos!

The moment of upgrading the database plan. $10 can cause such chaos!

While the server was down, each booth made a spreadsheet and wrote down who needed to pay how much. Afterward, we settled the problem by opening a new booth for making deferred payments.

The payment log showed that the server went down right after 10:17:55 AM and returned at 10:31:10 AM. It was evident yet intriguing that the payments made per minute were around 10 to 30 before the crash but went down to almost zero right after restoring the server. If you are interested, please look here.

Due to exceeding the database free tier, the server went down for 13 minutes and 15 seconds after payment #1546.

Due to exceeding the database free tier, the server went down for 13 minutes and 15 seconds after payment #1546.

Results​

1. MinsaPay​

The entire codebase for MinsaPay is available on GitHub. First, though, I must mention that I still question the integrity of this system. One developer reported a security flaw that we managed to fix before launch. However, the system has unaddressed flaws; for example, though unlikely, merchants can still copy RFID values and forge ID cards.

2. Payment Data​

I wanted to give students studying data analysis more relatable and exciting data. Also, I wanted to provide financial insights for students planning to run a booth the following year. Therefore, we made all payment data accessible.

However, a data privacy problem arose. So I wrote a short script to anonymize personal data. If a CSV file is provided, it will anonymize a selected column. Identical values will have the same anonymized value. You can review the anonymized data here.

Note for Developers​

I strongly recommend thoroughly auditing the entire code or rewriting it if you use this system. MinsaPay is under the MIT license.

What I Learned​

There is ample room for improvement.

First, there are codes with numerous compromises. For example, we made a lot of trade-offs not to miss the product deadline (the festival day). We also wanted to include safety features, such as canceling payments, but we didn't have time. More time and development experience would have improved the product.

Since I wasn't comfortable with the system's security, I initially kept the repository quiet and undisclosed. Afterward, however, I realized this was a contradiction, as I knew that security without transparency is not the best practice.

Also, we were not free from human errors. For example, RFID values were long strings of digits, and there were a few mistakes that someone would input in the charge amount, making the charge amount something like Integer.MAX_VALUE. We could've added a simple confirmation prompt, but we didn't know the mistakes would happen at that time.

In hindsight, it was such a great experience for me, who had never done large-scale real-life projects. I found myself compromising even after acknowledging the anti-patterns. I also understood that knowing and doing are two completely different things since knowing has no barriers, but doing accompanies extreme stress both in time and environment.

Still, it was such an exciting project.

Lastly, I want to thank everyone who made MinsaPay possible.

  • Jueon An, a talented developer who created MinsaPay with me
  • The KMLA student council and Department of Finances and Information, who oversaw the entire MinsaPay Initiatives
  • The open-source developers who reported the security flaws
  • Users who experienced server failures during the festival day
  • And the 400 users of MinsaPay

Thank you!

πŸ‘‹

πŸ‘‹