2024-12-30
LOTTE Shopping E-Commerce’s MSA Monitoring Optimization with WhaTap
Company name
LOTTE Shopping
Industry
Retail
Website
https://www.lotteon.com/p/display/main/lotteon
The main reason I chose WhaTap was its intuitive and familiar UI. When it comes to APM, I believe it should be easy for developers and anyone else to access and understand.
LOTTE Shopping
IT Operations Manager

‘Lotte Shopping E-Commerce,’ a customer that uses WhaTap’s monitoring, operates LOTTE ON, a platform that provides a convenient customer experience by integrating online and offline services. LOTTE ON is an e-commerce platform created by combining seven online and offline business divisions, leveraging its strengths as a traditional retail powerhouse. LOTTE ON surpassed 2 million MAUs for the first time last December and this January, and maintains an average daily traffic of 330,000.

It is said that LOTTE ON previously used a different monitoring service but felt uncomfortable with it, which led them to switch to WhaTap Monitoring. We interviewed CEO Jung Sung-min to hear more about why the enterprise company Lotte Shopping E-Commerce chose WhaTap and how they are using it.

Introduction: Introduction to customers and the state of infrastructure

Please give us a brief introduction to LOTTE ON’s company and services.

LOTTE ON is an e-commerce integration platform created by bringing together Lotte’s leading shopping malls to make them easily accessible to users. It provides a convenient customer experience by enabling LOTTE HOME SHOPPING, Hi-Mart, and Super Fresh to be accessed on a single platform, and we are working to deliver fresh stories to customers by integrating online and offline.

Lotte companies provide various services, and we would like to know more about the environments in which they have been operating.

We have been providing services through various digital journeys. Starting with building the Lotte Internet Department Store service in an on-premise environment in 1996, we transitioned to a cloud environment in 2015–2016 to launch Nike and UNIQLO services. After that, Lotte Internet Duty Free was built as an MSA in a cloud environment. In 2018, ELOTTE was built in a cloud-native environment, and finally the LOTTE ON service was launched.

Challenge: From introduction to WhaTap to user reviews

Please tell us about the reason behind introducing the WhaTap monitoring service and why you chose it.

The LOTTE ON service is configured as an EKS- and MSA-based service within the AWS cloud. Above all, LOTTE ON’s architecture is very complex, so deciding how to monitor it was a mission given to us from the beginning of construction. In particular, along with the complex architecture, Kubernetes also needed to be monitored. At that time, there were not many Kubernetes monitoring services, and how to quickly communicate the MSA-divided areas to practitioners was a major concern.

We considered other overseas monitoring products, but ultimately chose WhaTap.
The main reason I chose WhaTap was its intuitive and familiar UI. When it comes to APM, I thought it should be easy for developers and anyone else to access and check. I chose WhaTap because its intuitive dashboard and familiar UI make it very powerful for quickly checking and sharing issues.

Please tell us how you have been using WhaTap since its introduction.

There are three features we use most frequently.

First, the dashboard status. We monitor a dashboard that graphs various data such as the number of payments and number of orders, which are the main business indicators of the LOTTE ON service. This helps LOTTE ON respond quickly to other business services that may be affected. For example, if an order fails due to an issue with a credit card company, the issue can be handled quickly by controlling the payment method.

The second is the flexible alarm function. Since our service is split into MSA units and each person in charge is responsible for different parts, there are separate Slack channels. Thresholds can be set so that each MSA service’s issues are matched to the correct channel, allowing each person in charge to receive the appropriate notifications.

The third and final one is the statistics/report function. Exceptions that occur for each MSA can be checked weekly, and these are shared with each person in charge to verify and follow up on areas requiring action. Along with the issue-prevention function, this statistical information is also very helpful for identifying the cause of problems. It helps determine what exceptions occurred most at the time of the failure.

Please tell us about your experience solving problems while using WhaTap.

I can share two experiences in this section.

First, there was an issue that occurred during a point-earning event. It was an event where customers who filled out surveys for specific products earned points. The event settings were incorrect, and the event was applied to all products. When you write a review for every product, you received 3,000 points. As this spread rapidly across various online community sites, a massive amount of traffic was instantly generated. As a result, the CPU of a specific POD rose significantly, transactions could not be processed, and delays occurred.

At the time, this situation was resolved by making very good use of WhaTap’s EKS POD monitoring dashboard. The dashboard allowed us to check the resource status of each POD container in real time, making monitoring intuitive. In addition, since thresholds can be set, alerts were generated according to the configured conditions so that the relevant practitioner could check immediately, which was very helpful. Using statistical indicators, we were also able to identify specific URLs that were problematic at that time and take measures to control those URLs in future situations.

The second issue occurred during a large-scale event held about twice a year. This event lasts about a week, and a large number of coupons are issued to customers during the period. As customers accumulated more coupons, the logic for applying maximum discounts experienced heavy load. As a result, OOM occurred in the PODs related to discount application.

In this case, we used WhaTap’s heap monitoring. When I checked the graph comparing the time of the issue and normal operation, the problem was that the number of SQL patches increased significantly. By comparing and monitoring the SQL patch count during the failure with the normal count, we were able to prevent the same issue from recurring. Also, by setting the heap memory threshold for each container so that action can be taken immediately when OOM signs appear, it has remained useful until now.

Management: WhaTap customer support services and future plans

How do you plan to use WhaTap in the future?

In addition to the examples mentioned above, we are also experiencing various issues after switching to an MSA structure. Because it has changed to MSA, multiple services are connected to a single transaction, and potential risk factors are scattered. Instead of simply detecting issues with APM dashboards, various analyses are often required in architectures with increased complexity. This is the mission of moving from simple monitoring to observability.

For LOTTE ON to move forward with observability, LOTTE ON uses all metrics collected by WhaTap and is actively working to secure visibility. We are still collaborating with WhaTap engineers to enable the use of additional metrics.

Please tell us why companies should use WhaTap’s monitoring solutions.

Renowned business scholar Peter Drucker said, “If you can't measure it, you can't manage it.” In monitoring for simple quality management—and, furthermore, in order to apply observability—it is necessary to collect as much data as possible and use it in the right place. To secure and utilize this observability, WhaTap’s monitoring solution is essential.

WhaTap, the integrated monitoring platform trusted by over 1,200 companies. Experience it today.