GitHub Project

Getting Started

The following assumes you're on macOS or Linux when executing commands. If you are running Windows, use gradlew.bat to execute Gradle commands.

Project Dependencies

Java 25

This project uses JDK 25 and has been tested with the OpenJDK distribution.

General Setup

Follow the OpenJDK installation instructions that correspond to your operating system.

Brew and macOS

To install OpenJDK 25 with brew, run:

brew install openjdk@25

You may need to register this install with your system's openjdk registry after install:

ln -sfn /opt/homebrew/opt/openjdk/libexec/openjdk.jdk /Library/Java/JavaVirtualMachines/openjdk.jdk

To verify this installation, both of these commands should return information about your JDK 25 install:

/usr/libexec/java_home -v 25
java --version

Build and Test

./gradlew build

Running for Development

./gradlew bootRun

Running the Packaged JAR

First, produce the jar:

./gradlew bootJar

then run from the commandline:

java -jar build/libs/branch-0.0.1-SNAPSHOT.jar

Note that in production, we'd need to create fixed versions and update the script to point to the most-recent version.

Quick cURL

The server starts on port 8080. You can test the problem locally with the following cURL command:

curl localhost:8080/profiles/octocat

where the last path parameter (ex. octocat) is the username to search.

Decision Log

Java Version and Distribution

I wanted a readily-available distribution that supports virtual threads. The Spring Boot docs recommend using at least Java 24. Since Java 25 receives LTS and OpenJDK is free, I figured OpenJDK 25 would be easy to set up and would work well with Spring.

At a previous company, we used Amazon's Corretto distribution without issue and picked it specifically because the base image was produced by our cloud provider. The Lambda coldstart times were shorter and their base Docker image had better integration with ECR, AWS's container registry. To extend this project past a take-home, I'd investigate how different distributions perform with the intended cloud provider.

Virtual Threads and Concurrency

The throughput of this project is limited by waiting for responses from the GitHub API. This IO-bound setting is a primary usecase for virtual threads, which allow the application to process more connections with the same memory allocation compared to platform threads.

This project configures Spring to use virtual threads for handling concurrent inbound requests. The GitHub API calls themselves are made sequentially per request based on GitHub's suggestion to avoid concurrent requests in order to avoid using up the applications quota. In our project, this comes at the cost of latency - we have to wait for the /users/{username} request to finish before making the call to /users/{username}/repos. This project does not make use of the rate limit response headers returned in each API call in the interest of time. These headers would be useful when implementing a backpressure system or communicate to our customer how long they need to wait before the limit resets.

If the client needed to decrease the latency, virtual threads would provide a simple approach to making the two required GitHub API calls concurrently. Because there are fewer system parameters to tune - we don't need to configure a request thread pool - the overall maintenance cost is lower than managing request thread pools. This benefit compounds if we were to add other hosted git providers, like GitLab or BitBucket, since each integration would not require tuning a dedicated thread pool.

Connection Pooling

In order to get the most out of the virtual threads, requests to GitHub will need to utilize a connection pool. Unauthenticated requests are limited to (GitHub rate limit docs) 60 per hour, so the connection pool for this takehome doesn't need to be large - I went with 10 for this single instance.

In production or if we were to add in a personal access token, which increases the limit to 5000 per hour, we'd want to tune the connection pool based on the expected requests per second and SLAs. With 10 connections and ~500ms per GitHub request, we'd exhaust our 5000 request/hour quota in about 4 minutes of continuous load of non-cached usernames.

500 ms/call, 2 calls/username => 1 s/username = 1 username/s; 10 max connections => 10 username/s
5000 calls/hour, 2 calls/username => 2500 username/hr
2500 usernames / (10 username/s) = 250 s =  4.16 minutes

This is significantly below the request reset period of 1 hr, suggesting we'd need to investigate setting up a GitHub Enterprise account for higher throughput. Placing the connection pool at 10 lets us burst up to ten usernames at once, which seems reasonable for this project. Increasing past 10 would waste resources since the pool would quickly become idle due to the rate limit.

Connecting this section to the previous section on virtual threads, we see that we could support more concurrent inbound requests than we can make to GitHub. These requests would queue up, waiting for access to the connection pool. GitHub's rate limiting becomes the system's bottleneck.

Spring RestClient

I stuck with standard Spring dependencies, namely RestClient. This made in-memory caching, serialization, and deserialization straightforward to set up.

For systems requiring deployment flexibility (CLI tools, Lambda functions), a standalone HTTP client would avoid the Spring Web dependency. I've used Retrofit + OkHTTP for banking integrations without issue.

I opted for RestClient over WebFlux because virtual threads provide sufficient concurrency for this I/O-bound workload. Using WebFlux requires using reactive types throughout your contracts (ex. Mono, Flux) and requires using reactive versions of all of your clients. There are more operational complexities in reactive clients (ex. WebFlux, R2DBC) than throughput benefits for our problem. The juice isn't worth the squeeze.

Code Organization

For simplicity, I kept this project as a single module project.

$ tree -d src/main/java/xyz/sanfordtech/branch/
src/main/java/xyz/sanfordtech/branch/
├── common_errors
├── github_adapter
├── profile_api
│   ├── dtos
│   └── errors
├── profile_service
└── web
    └── profile

The profile_api module prevents a cyclic dependency between the github_adapter and the profile_service if we did move to a multi-module gradle project. If we eventually added support for other git hosting services, like GitLab and BitBucket, each of those services would need to:

implement the ProfileAdapter defined in profile_api
add persistence to the profile_service to keep track of supported platforms per username.

A multi-module Gradle project would be overengineering for this project, but the module-level caching makes that project structure appealing for larger projects.

Username Validation

I based the username validation logic on the Enterprise Administrator username documentation.

The key requirements are that usernames

contain at least one character
are no more than 39 characters and
only contain alphanumeric characters and dashes

Performing these checks before making an HTTP call to GitHub helps conserve our meager request limit.

If the core modules (ex. profile_service, github_adapter) were going to be used outside of this Spring application, the validation would also need to be performed in those modules.

Caching

This project uses Caffeine as an in-memory cache to reduce calls to GitHub's rate-limited API. The cache is configured with Spring's @Cacheable and applied to the lookupProfile method in GithubAdapter.java.

I picked it because it was simple to wire into Spring and allowed me to configure a TTL.

Configuration

The cache is configured in application.properties with the following settings:

TTL: 12 hours (gh.adapter.cache-ttl-hrs)
Maximum Size: 10,000 entries (gh.adapter.cache-size)
Statistics: Enabled via recordStats() for monitoring cache performance

These defaults balance freshness with API rate limit conservation. A user's profile is unlikely to change within a 12-hour window, and 10,000 entries should accommodate a reasonable working set for this service.

Limitations

The current implementation has two key limitations:

No Persistence: The cache exists only in memory and is lost on service restart. All cached data must be re-fetched after deployment or crashes.
No Distribution: Each service instance maintains its own cache. Multiple instances cannot share cached data, meaning the same GitHub profile could be fetched multiple times across different instances, wasting rate limit quota.

Production Considerations

For production deployment, especially with multiple service instances, a distributed cache like Redis or Memcached would be recommended. This would:

Share cached data across all service instances
Persist cache data across service restarts
Provide better utilization of GitHub's rate limit quota
Support cache invalidation strategies (e.g., webhook-triggered updates when GitHub profiles change)

The abstraction provided by Spring's caching annotations makes swapping to Redis straightforward. The CacheManager bean in GithubAdapterConfig would need to be updated to configure the Redis client and the appropriate dependencies added. The bigger lift is operating and maintaining Redis in your environment.

Observability

This project would benefit from configuring Spring Boot Actuator for monitoring and observability. The cache is set up to make its statistics available through Actuator's metrics endpoint. Care needs to be given to ensure sensitive details are not exposed in production.

Production Setup

For production deployments, consider:

Enabling only necessary Actuator endpoints and securing them with authentication
Integrating with Prometheus + Grafana for metrics visualization
Setting up alerts on key metrics:
- High cache miss rates (may indicate TTL is too short)
- GitHub API rate limit exhaustion
- Elevated error rates or latencies
Adding distributed tracing (e.g., OpenTelemetry, Zipkin) to track request flows across service boundaries
Structured logging with correlation IDs for request tracing

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
gradle/wrapper		gradle/wrapper
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
build.gradle		build.gradle
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle		settings.gradle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GitHub Project

Getting Started

Project Dependencies

Java 25

General Setup

Brew and macOS

Build and Test

Running for Development

Running the Packaged JAR

Quick cURL

Decision Log

Java Version and Distribution

Virtual Threads and Concurrency

Connection Pooling

Spring RestClient

Code Organization

Username Validation

Caching

Configuration

Limitations

Production Considerations

Observability

Production Setup

About

Uh oh!

Releases

Packages

Languages

san4d/github-project

Folders and files

Latest commit

History

Repository files navigation

GitHub Project

Getting Started

Project Dependencies

Java 25

General Setup

Brew and macOS

Build and Test

Running for Development

Running the Packaged JAR

Quick cURL

Decision Log

Java Version and Distribution

Virtual Threads and Concurrency

Connection Pooling

Spring RestClient

Code Organization

Username Validation

Caching

Configuration

Limitations

Production Considerations

Observability

Production Setup

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages