At Signeasy, we are transforming how fast-growing companies sign, send, and manage their contracts. With its intuitive, cross-platform, cloud-based software, Signeasy empowers over 48,000 companies globally to digitize paperwork, accelerate business operations, and enhance team productivity.
The rapid growth of Signeasy’s customer base is a testament to its effectiveness and popularity. Handling thousands of eSign requests per hour, the platform has evolved significantly to meet increasing demands. However, this growth brought challenges in scalability, reliability, and performance.
This article explores the issues faced with the previous architecture and the steps taken to create a faster, more scalable, and resilient solution for eSignature and contract workflows.
Challenges in the previous architecture
Signeasy has consistently evolved, introducing various microservices to meet business and customer demands. Because of rapidly adding new features, the platform accumulated some technical debt, leading to performance and reliability issues, especially in critical workflows.
This growth phase has been a learning experience, highlighting areas for improvement and optimization. Some of the notable problems we faced with the old architecture include:
- Single point of failure: Due to tight coupling between backend services, a problem in one service led to widespread disruptions in the entire workflow, especially during the signing process.
- Shared database across services: This anti-pattern in microservices architecture hindered long-term scaling.
- Lack of transaction atomicity: Failures during signature requests resulted in irrecoverable document states, necessitating manual intervention.
- Performance issues: The third-party PDF engine and other backend services experienced frequent failures and latency issues. The platform’s interconnected services led to delays in the workflow, as they required too many server-to-server calls due to their tight coupling.
- No retry mechanisms: Essential events such as customer callbacks (webhooks), email notifications, and other marketing/tracking events lacked a retry feature, which is crucial for business continuity.
- Document list latency: Users with thousands of documents experienced slow loading in the web client. Elasticsearch, used for indexing and listing documents, was only available to some users with restrictions.
In our analysis, we identified that certain issues, though intermittent, played a pivotal role in the execution of legally binding contracts within Signeasy’s workflows. These anomalies not only tested the limits of our system’s scalability but also raised questions about its overall reliability.
The significant amount of time our support team and engineering staff dedicated to resolving these issues underscored a fundamental necessity: a comprehensive redesign of the system architecture. This redesign aims to enhance the robustness and dependability of the system, ensuring smoother operation and fewer disruptions in the future.
Now, let’s break down and examine the specific issues impacting the critical components of Signeasy’s system:
1. PDF engine service
This essential service manages all PDF operations like adding fields and annotations, merging envelope documents in the signature requests, document flattening, etc. Because it was hosted externally, our control over it was limited. It used Amazon S3 for storage and a separate database for metadata. A major issue was its lack of resilience; any failure in this service led to a complete process breakdown, making troubleshooting complex and time-consuming, thereby delaying response to customer issues.
2. Signature service
The signature service, a core element of Signeasy, manages all signature-related tasks. We encountered several problems here. The database transactions lacked atomicity, leading to potential inconsistencies in the signature process and necessitating manual corrections. Additionally, the absence of a retry mechanism for failed post-processing tasks often resulted in total operation failures. Another challenge was the shared database structure across all services, including the signature service, which complicated maintenance and scalability. Moreover, storing PDF document metadata in the same MySQL database was suboptimal.
3. Notification and Auxiliary services
These services are integral for internal operations like sending emails, activating customer webhooks, and initiating third-party events post-signature. However, their tight integration with the main Signature service presented a significant risk, as it created a single point of failure. This close coupling was particularly unnecessary for non-immediate tasks, adding to the system's vulnerability.
Primarily used for indexing document records, Elasticsearch’s role was to enable document listing and search capabilities for clients. The setup involved all services sending data asynchronously to Elasticsearch via Amazon SQS and Lambda. Although this arrangement was initially effective, it demanded continuous maintenance and frequent updates. The necessity of implementing an event hook across multiple services for data transfer posed additional complexity. Moreover, the high volume of data from our user base presented bulk indexing challenges, leading to a strategic decision to limit document indexing to a certain threshold rather than indexing for all users.
How did Signeasy overcome these challenges?
1. Ensure a reliable and high-performance PDF Engine
The primary challenge was to upgrade the PDF Engine for better performance and reliability. Initial optimization efforts provided some improvement but did not fully address the issues. Consequently, we shifted to a new PDF Engine, Apryse (previously known as PDFTron). This engine promised enhanced control and dependability for PDF operations, allowing us to manage PDF operations and metadata more effectively, without the constraints of a server-based system.
2. Implement a dedicated database per service to reduce dependencies
To further address reliability and performance concerns, we revamped our core platform. This involved transitioning from legacy services to new ones that adopted a database-per-service architecture, carefully selecting the most suitable technology for each service. Our objective here was to ensure that each service operated reliably and independently, particularly concerning critical database transactions.
3. Eliminate single point of failure in key workflows
For ancillary services like email notifications and webhook triggers, maintaining independence was key to enable separate development and scalability. We embraced the AWS serverless architecture, using tools such as SNS, SQS, and Lambda. This strategy was instrumental in minimizing the risk of system-wide failures, especially in tasks like sending email notifications and managing webhooks.
4. Select suitable storage for PDF metadata
A significant part of our update was rethinking the storage solution for PDF metadata (XFDF), moving away from MySQL. After considering various NoSQL databases, MongoDB was chosen for its flexibility and cost-effectiveness, marking a crucial step in our system’s overhaul.
5. Speed up document loading for users
Lastly, we completely revamped our Elasticsearch indexing system. A notable part of this revamp was the integration of Elasticsearch Logstash. It’s a versatile, open-source data processing tool that can gather and transform data from sources like MySQL and then funnel it into Elasticsearch through pipelines. This upgrade was aimed at streamlining the data indexing process, thereby improving the speed and efficiency of document loading for users.
Introducing new platform features
The re-architected platform introduced several new components:
1. PDF engine service with Apryse
The new PDF Engine service, built in Golang with Apryse, achieved a marked performance boost. Golang’s strengths in speed and concurrency handling significantly improve our ability to manage concurrent PDF operations within signature workflows through Goroutines.
Apryse enhances the efficiency of our PDF operations, with speed improvements exceeding 30%. We also refined our handling of PDF metadata (XFDF) by adopting MongoDB as the data store.
These enhancements allow us to move away from a server-based system for PDF documents. Furthermore, the PDF Engine service is now directly accessible via ApiGateway, enabling us to decompose requests into smaller, more efficient operations.
2. Document and transaction services
We built new Document and Transaction services from scratch using Golang. These services have distinct data models and schemas and are internal services accessed through Amazon ApiGateway.
Clients interact with these services via authenticated endpoints on ApiGateway, which proxies the requests. A crucial improvement is that all database transactions in these services are now atomic, ensuring that failures don’t leave signature documents in a partial or incomplete state.
3. AWS serverless components
We’ve employed AWS Serverless, with SNS as the central component, to handle asynchronous events in a decoupled manner. Email notifications are sent using HTTP subscriptions (Notification service) connected to SNS topics, with a maximum retry time of 3600 seconds.
For triggering other events, like developer webhook endpoints and third-party events, we use Amazon SQS subscriptions linked to SNS topics, with Lambda acting as the consumer for SQS messages. Amazon SQS can retain messages for up to 14 days and offers retry capabilities for messages processed in Lambda.
Lambda, an event-driven serverless computing service, handles code deployment and execution and integrates logging and monitoring via Amazon CloudWatch. This serverless model greatly reduces the risk of any single point of failure in event processing.
4. Elasticsearch with Logstash pipeline
We introduced a new Logstash pipeline system for indexing documents in Elasticsearch. This system removes the need for direct code integration in services. Data from MySQL is now automatically forwarded to Elasticsearch in real-time via pipelines. Our web client fully integrates with Elasticsearch, providing advanced search capabilities and paginated document lists. Plans are also underway to integrate Signeasy’s mobile app with Elasticsearch.
Outcomes of the new architecture
Following the implementation of the new architecture, Signeasy achieved several positive outcomes:
1. Improved API response time
A notable advancement is the significant reduction in API response times for both sending and signing signature requests, markedly enhancing user experience. The time taken to send signature requests has been reduced to an average of less than 1.5 seconds, a substantial improvement from the previous average of 4 seconds. Similarly, the process of signing signature requests now takes less than 2 seconds on average, a significant decrease from the prior average of 5 seconds.
2. Enhanced service reliability
The new architecture has successfully eliminated intermittent service failures, leading to a much more stable and reliable platform for our users. This improvement ensures uninterrupted service availability, greatly increasing user trust and satisfaction.
3. Efficient document loading
We’ve achieved a considerable enhancement in document loading speeds through our revamped Elasticsearch implementation. This results in a noticeably faster and smoother user experience when accessing documents.
4. Rapid development timeline
The entire re-architecture project was completed within a concise timeframe of six months. This achievement not only reflects the team’s technical prowess and dedication but also underscores our commitment to continuous improvement and innovation in the platform. This rapid development and deployment have allowed us to offer these enhanced capabilities to our users quickly.
Signeasy’s newly restructured platform marks a substantial leap in terms of efficiency, reliability, and user experience, effectively setting the stage for the company’s ongoing growth and continued innovation in the eSignature and contract workflow domain.
This revamped platform has notably enhanced performance metrics, most prominently reflected in the reduced API response times for both initiating signature requests and executing the signing process. The integration of AWS Serverless components has been a game-changer, simplifying infrastructure management and effectively nullifying the occurrence of intermittent service failures.
The accelerated document loading times, coupled with the remarkably swift development phase of just six months, underscore the platform’s readiness to not only meet current operational demands but also adapt to and embrace future growth and technological advancements.