My experience using Python to support my long journey in learning mandarin as a native French speaker
Article originally published on Medium.
Learning a language can be a long journey, so staying motivated and having clear goals to aim for along the way are essential.
Because Mandarin uses a pictorial system of writing words and sounds called hanzi 汉字, it makes the journey even more challenging for any learner without a background in a similar language.
In my quest for Chinese fluency, flashcards have been my best ally in improving my reading and pronunciation.
In this article, I will share my experience using Data Analytics tools with Python to automate the process of flash card creation to support my learning process.
I am a French guy who moved to China to study engineering for a two-year double degree program.
Finally, I ended up staying for more than 6 years and my main challenge was to learn Mandarin for daily life and work.
前车之鉴：lessons drawn from others’ mistakes
The main mistake I did when I started to learn Mandarin was not following the advice of intelligent people that were promoting the use of flash cards.
Do you remember as a kid when one of your parents or tutor was holding your book to help you prepare for tomorrow’s history test?
She was asking you questions related to the lesson:
- If you answer well: she can consider that you are ready for the test.
- If you make mistakes: she would ask you to read the lesson again and come back when you’re ready.
Now there is an open-source app for this, and it’s called Anki.
A personal teacher on your phone
In the picture above, you can find an example of the card to learn how to say ‘Hello!’ in mandarin.
Step 1: it first shows you the word in the Chinese character Hanzi
Step 2: it shows you the answer with:
- The pronunciation using the romanization system pinyin: nĭ hăo
- The translation in English: Hello!
- The oral pronunciation with an mp3 sound
Step 3: Perform your self-assessment
- If you guessed well press ‘Good’: the card will reappear in 10 min
- If you think that it’s ‘Easy’: Anki will wait 4 days to ask you again
- If you did not guess well press ‘Again’: the card will reappear shortly
In order to support your learning journey, you want to feed your Anki with thousands of cards and practise 2 hours per day during your commuting and dead times.
In this section, I will explain how to use Python to build these cards with…
- Common words or sentences for daily life or work
- Add the phonetic transcription using a python library
- Add an audio transcription using Google TTS API
This framework can be applied to any language, not only Mandarin Chinese.
I. Build Your Vocabulary List
As a foreigner working in China, my main priority was to have a basic vocabulary to communicate with my colleagues at work.
- Read emails with pywin32
Because my first objective was to read emails in Mandarin, I planned to extract the most frequently used words in the emails in my outlook mailbox.
Using the piece of code below, you can extract the body of all your emails and store them in a list.
2. Extract keywords from pdf reports
Some reports and documentation I received from suppliers can be a good source of technical words.
Therefore, I have built this simple piece of code to extract the text from any pdf report.
3. Other Sources
Another main source was the financial monthly reports in Excel that can be processed using the Pandas library.
4. Final Results
After processing, I get a list of words like the one below
II. Add the phonetic transcription
In order to practise your pronunciation, and get the right use of the tones, you need a phonetic transcription.
For Mandarin, I use the jieba library which takes the Chinese characters and returns the phonetics transcription (pinyin).
You can find a library for your language.
III. Add the pronunciation
In order to improve your speaking ability, you want to add the pronunciation to each card.
There is a solution for this using the gtts library.
This is a Python library and CLI tool to interface with Google Translate’s text-to-speech API.
You can find more details and instructions to use it in the official documentation.
Now you have a list of words or sentences with the translation in English, the phonetics transcription, and a short mp3 audio with the pronunciation.
These cards can be used to practise your…
- Reading Comprehension using the translation
- Pronunciation using the phonetics transcription
- Oral Comprehension using the short audio
Apply the process presented in the visual above and I promise you will see improvements in your language mastery with the help of python!