February 2024 status
This status post is a bit late, to make up for it we are treating you to a larger-than-usual serving of news and content.
5.7 Documentation with a new accessibility section
We have updated our documentation to version 5.7. Alongside documenting new features we are introducing a new accessibility section. The first thing in this section is a list of channels to provide accessibility feedback including a new dedicated chat room in our Matrix space, and a dedicated email address. We plan to expand this in future, for example with a thorough list of keyboard shortcuts and navigation.
Le Monde is using CryptPad again
We are delighted to see CryptPad used once again by French newspaper Le Monde. This time sharing the list of references behind a video analysing the disproportionate contribution of rich people to climate change.
FOSDEM
As stated in the previous status report, CryptPad was present at FOSDEM along with XWiki. It is the biggest free and open-source European event, where communities of different projects can meet, exchange ideas and collaborate.
CryptPad was present in multiple forms, first with presentations, which recorded videos are available in the following links:
- Fabrice presented “Securely collaborate with CryptPad”;
- Clément and Wieland presented “openDesk - The Open Source collaborative suite“
- Ludovic, our CEO shared his experience in “20 Years of Open Source building XWiki and CryptPad”
In a more relaxed fashion, XWiki and CryptPad organized a community meetup with their team on Saturday night, where we exchanged with the community more casually.
All in all we were very glad to exchange with the community during this event and we hope to see you again in future events!
Post-mortems for recent CryptPad.fr outages
Users of our flagship instance CryptPad.fr may have noticed some unexpected downtime in the last couple of months. There are 4 recent outages that we'd like to provide some information about, including steps that we are taking to prevent these issues from happening again. Sorry the details below get quite technical, that can't really be avoided if we want to provide a transparent account of these incidents.
For context, we started migrating cryptpad.fr user data to a new storage back-end system in October 2023. This was a long process where we needed to synchronize more than 4,5 TB of data from our old inefficient file-system to a new one based on BTRFS. The final step of this migration to the new storage system took place on December 13th, 2023 (all dates and times are CET). Some time later we ran into issues:
Friday January 12th 2024, 16:00
- No log-in for accounts with two-factor authentication (2FA) enabled
- No access to uploaded static files (blobs: images, PDFs, etc) and no upload of new files
- Spreadsheet documents opening as blank documents
Due to human error, blobs weren't served anymore by the Nginx web server. The running user was changed during a maintenance task but we forgot to update the /var/cache/nginx/
directory access rights. We realized our mistake when users reported the issue during the weekend and fixed it on Monday January 15th at 10:30.
Friday January 19th 2024, 7:30
- No log-in for accounts with 2FA enabled
- Impossible to create new documents
- Existing documents and drives only shown as read-only
Our backup system was creating too many snapshots of the BTRFS filesystem because of a race condition. These snapshots took all the available space in the BTRFS pool which made the storage become read-only. We did not have appropriate monitoring in place on the new system to spot this mistake in time to prevent an outage.
The issue was fixed by identifying and fixing the race condition in our in-house backup software, and adding more space to the server. Service resumed around 9:30 that morning. We then added some monitoring scripts to our Icinga instance with regular, automated checks to be sure the issue wouldn't arise again.
Tuesday January 23th 2024, 13:00
- No log-in for accounts with 2FA enabled
- Impossible to create new documents
- Existing documents and drives only shown as read-only
With similar symptoms to the previous outage, this one was caused by our failure to make the disk we added the previous week fully available to BTRFS (a filesystem balancing operation was needed to spread the data among the different disks). This resulted in space running out again and producing the same "read-only" effects for our users.
This took us a bit longer to figure out, service was only restored after 2.5h at around 15:30. Following this we adapted the monitoring scripts on our Icinga instance so the metadata part of the pool is also checked.
Tuesday February 13th 2024, 14:00
- Disconnections when editing documents and accessing the drive
In this case the server was rebooted while a resource-intensive BTRFS balancing operation was running (still recovering from the last outage). Because this operation was paused but not canceled, it prevented the BTRFS pool from being mounted again.
The issue was fixed by adding the skip_balance
option to the mount command, with service resuming at around 15:00.
In summary you will have spotted a pattern of compounded teething issues as we got used to the new storage back-end in our production conditions. Beyond adding monitoring, we plan to be more mindful of the load applied on our server and eventually refactor its architecture to be better suited to the needs of our flagship instance.
We'd like to apologise to our users, free and paying, for the inconvenience caused by these issues. We are grateful for your patience and your trust.
Contributions to cryptographic research
CryptPad is also contributing in theoretical research with contributions accepted at Eurocrypt one of the most reputable conferences in cryptography, hosted by the International Association for Cryptologic Research (IACR), and at the NIST Post-Quantum standardization Conference, which is the organism leading the current cryptographic effort toward post-quantum transition.
Both results present a scalable post-quantum signature scheme, named Raccoon, that offers nice properties with regard to thresholdization and masking. Threshold signatures allow guaranteeing that a quorum of users collaborated to produce a signature, henceforth alleviating the single point of failure that certificate authorities represents for instance: some companies already attempted to exploit this loophole in the past.
Moreover, masking is important for hardware implementations of cryptography, especially on embedded devices, as a popular mitigation to side channel cryptanalysis.
The full version of this work is available in the IACR preprint archive and more information can be found on the Raccoon signature website. This is the result of the joint effort of Shuichi Katsumata, Mary Maller, Fabrice Mouhartem, Rafael del Pino, Thomas Prest and Markku-Juhani Saarinen.
Up next
- We are nearing full-draft status on the Blueprints R&D website which we are hoping to launch before the next status update
- A new blog post in the tutorial series is to be released on how to make the most of CryptPad’s security from a user standpoint
- The team is working hard on features for 5.8 which is planned for the end of Q1 2024 (also the end of this month!)