Designing Location-Intelligent Property Search with PostGIS


January 4, 2026

13 min read

case-studypostgisspatial-dataarchitecturedatabasereal-estate

Geobucket Challenge

Designing Location-Intelligent Property Search with PostGIS

A case study in database-first spatial design

TL;DR

Property search fails when location names vary but coordinates don't. This case study explores how treating location as geography rather than text solved inconsistent search results in Nigeria's real estate market. The solution uses PostGIS spatial indexing to group properties by physical proximity, ensuring "Sangotedo," "Sangotedo, Ajah," and "sangotedo lagos" all return the same relevant results. The approach is database-first, framework-agnostic, and scales without operational overhead.

Context & Problem

Real Estate Search in Nigeria

Property search in Lagos presents a unique challenge: location names are inconsistent. Not because of poor data entry, but because of how places are actually named and referenced. "Sangotedo" might appear as "Sangotedo, Ajah" in one listing and "sangotedo lagos" in another. These aren't typos. They're legitimate variations reflecting how people actually refer to places.

The inconsistency comes from several factors: informal naming conventions, historical place names that have evolved, and the way neighborhoods are described in conversation. A property agent might list something as "Sangotedo" while another uses "Sangotedo, Ajah" to provide more context. Both are correct. Both refer to the same physical area.

Traditional text-based search treats these as completely different locations. Search for "Sangotedo" and you get one set of results. Search for "Sangotedo, Ajah" and you get a different set, even though the properties might be meters apart. This fragmentation creates real problems.

After researching existing property platforms, I found this issue persists across many websites. The same location name variations that fragment results in one system appear in others. This isn't a problem unique to a single platform. It's a systemic issue with how location search is typically implemented.

User Trust + Discovery Issues

When users search for properties, they expect to see everything relevant to their query. Missing properties because of naming variations erodes trust. Users start questioning whether the platform is comprehensive, whether they're searching correctly, or whether they need to try multiple search terms to find what they're looking for.

For property agents, this creates an uneven playing field. Properties listed with more common or standardized names get more visibility. Properties with less conventional naming get buried. This isn't just about fairness. It's about ensuring all legitimate listings get discovered.

The business impact is straightforward: reduced conversion rates, frustrated users, and agents who feel the system doesn't work for them. This isn't a technical curiosity. It's a real business problem that affects revenue and user satisfaction.

Constraints

Existing Stack (Django)

The solution needed to work within an existing Django infrastructure. This wasn't a greenfield project where we could choose the perfect stack. We had to integrate with what was already there, consider team familiarity, and ensure the solution could be maintained by developers who might not have deep PostGIS experience.

This constraint actually helped. It forced us to think about the solution in terms of what the database could do, rather than what we could build in application code. The framework became less important than the data layer.

No Real-Time Requirement

The workload is read-heavy. Users search for properties far more often than new properties are added. Write operations are infrequent enough that we didn't need to optimize for real-time updates. Search results could be slightly stale without impacting user experience.

This constraint simplified the architecture. No need for complex caching invalidation strategies, no need for real-time synchronization, no need for event-driven updates. The system could be simpler.

Scale Considerations

We needed to handle expected property volumes efficiently. Query performance mattered more than write performance. The database needed to handle spatial queries quickly, even as the property catalog grew. This pointed toward database-level optimization rather than application-level tricks.

Solution Overview

Treat Location as Geography, Not Text

The core insight is simple: coordinates are truth, names are labels. A property at latitude 6.4698, longitude 3.6285 is physically located there regardless of whether someone calls it "Sangotedo," "Sangotedo, Ajah," or "sangotedo lagos." The location doesn't change based on how it's described.

This shifts the problem from text matching to spatial relationships. Instead of asking "does this location name match the search query," we ask "are these properties physically close to each other?" Location names become secondary metadata, useful for display and user understanding, but not the primary mechanism for grouping results.

PostGIS Spatial Indexing

PostGIS provides the foundation. It's not just about storing coordinates. It's about having spatial data types, spatial functions, and spatial indexes that understand geographic relationships. A GEOGRAPHY column with a GIST index enables proximity queries that are both fast and accurate.

The database can answer questions like "find all properties within 500 meters of this point" efficiently. It handles the Earth's curvature, accounts for coordinate systems, and uses spatial indexes to make these queries fast. This isn't something you want to implement in application code.

The exact SQL implementation is available in the repository. The key idea is using PostGIS functions like ST_DWithin for proximity checks and ST_Point for coordinate handling, with GIST indexes making these queries fast even at scale.

Properties within approximately 500 meters of each other are treated as being in the same area. This distance is arbitrary but practical. It's close enough that properties are genuinely in the same neighborhood, but far enough that we're not creating too many micro-buckets.

When a user searches for "Sangotedo," the system finds properties whose location names are similar to the search term, then returns all properties that are physically close to those matches. The location name matching is a starting point, but the spatial relationship is what determines the final results.

In practice, the system uses spatial proximity as the primary grouping mechanism, with lightweight text similarity only as a discovery aid. The spatial relationship determines inclusion. Text similarity helps identify the initial anchor point, but once properties are grouped by physical distance, they're returned together regardless of naming variations. This hybrid approach ensures both discovery (finding the right area) and consistency (returning all relevant properties in that area).

Thin API Layer

The database does the heavy lifting. The API layer orchestrates requests, handles validation, and formats responses, but it doesn't compute spatial relationships or perform complex matching. The framework choice becomes less important because the core logic lives in database functions.

The search logic is encapsulated in a database function that handles normalization, fuzzy matching, and spatial queries. The API just calls this function and formats the response. The framework doesn't matter because the intelligence is in the database.

This approach has several benefits: the logic is centralized, performance is optimized at the database level, and the solution works regardless of whether the API is built with Django, Fastify, or anything else. The problem is spatial, not framework-specific.

Key Technical Decisions

Why PostGIS

PostGIS is built-in, battle-tested, and designed specifically for this kind of problem. It's not a library you add on top of PostgreSQL. It's an extension that adds spatial capabilities at the database level. This means spatial operations are first-class citizens, not afterthoughts.

The alternative would be implementing spatial calculations in application code or using external services. Both add complexity, latency, and points of failure. PostGIS keeps everything in the database where it belongs. Distance calculations account for the Earth's curvature. Spatial indexes (GIST) provide O(log n) query performance. It's the industry standard for a reason.

Why Spatial Indexes

Without proper indexing, spatial queries become expensive. Checking whether a point is within 500 meters of another point requires distance calculations, and doing that for every property in the database doesn't scale. GIST indexes make these queries fast by organizing spatial data in a way that proximity searches can use efficiently.

A single GIST index on the location column enables O(log n) proximity queries. The database can use the index to quickly identify candidate properties before doing the expensive distance calculations. Without it, every query would scan the entire table. For a read-heavy workload, this is critical.

Why No Geo-Bucketing Initially

The assessment mentioned geo-bucketing, but we didn't implement it in the initial solution. Direct spatial queries with proper indexing were sufficient for the problem at hand. Bucketing adds a layer of complexity. You need to manage bucket creation, handle edge cases where properties are near bucket boundaries, and maintain the bucket structure.

Starting with direct spatial queries keeps the solution simple. If scale requires it later, bucketing can be added as an optimization layer. But premature optimization is still premature, even when it sounds sophisticated. The simpler solution is easier to understand, easier to maintain, and easier to debug.

Why No Real-Time

The read-heavy workload doesn't need real-time updates. When a new property is added, it's fine if search results reflect it within seconds rather than milliseconds. Users aren't waiting for live updates. They're performing searches that can work with slightly stale data.

This constraint simplifies the architecture significantly. No need for complex caching strategies, no need for invalidation logic, no need for real-time synchronization. The system can be simpler, and simpler systems are easier to reason about and maintain.

Why Framework Choice is Secondary

The problem is spatial, not framework-specific. PostGIS works with Django, Fastify, Rails, or any framework that can talk to PostgreSQL. The database functions handle the spatial logic, and the API layer just needs to call them and format responses.

This is where technical maturity shows. Recognizing that the framework is just transport. The real intelligence is in how we structure the data and the queries. Focusing on framework choice would miss the point entirely.

Business Impact

Consistent Search Results

Users searching for "Sangotedo," "Sangotedo, Ajah," or "sangotedo lagos" now get the same set of results. The system understands that these are different ways of referring to the same area, and it returns all relevant properties regardless of how they're named. This builds trust. Users can search naturally without worrying about using the "right" search term.

Increased User Confidence

Predictable search behavior means users don't second-guess themselves. They don't wonder if they searched wrong, if they need to try different terms, or if the platform is incomplete. The search works the way they expect it to work, which improves the overall experience.

Fair Agent Exposure

Properties get discovered based on their physical location, not their naming convention. An agent who lists a property as "sangotedo lagos" gets the same visibility as one who lists it as "Sangotedo, Ajah" if they're in the same area. This creates a more equitable system and reduces the incentive to game naming conventions.

Scales Without Operational Cleanup

The database handles the complexity. There's no manual location normalization needed, no ongoing maintenance of name mappings, no operational overhead. The system self-organizes based on spatial relationships. As new properties are added, they automatically get grouped with nearby properties, regardless of how they're named.

This is the differentiator. A technical solution that creates real business value. It's not just about implementing PostGIS correctly. It's about understanding how the solution serves users, agents, and the business. The technical work enables better outcomes for everyone.

While this case focuses on real estate, the same principles apply to any system where humans describe locations inconsistently but expect consistent results. Ride-hailing apps need to match "Airport Road" with "Murtala Mohammed International Airport." Delivery services must understand that "Downtown" and "City Center" might be the same area. Logistics platforms face similar challenges with warehouse and pickup locations. The core insight—treat location as geography, not text—transcends the specific domain.

Trade-offs & Future Work

When H3 Would Be Added

H3 (hexagonal hierarchical spatial index) is a more sophisticated approach to spatial indexing. It provides uniform cell sizes, hierarchical relationships, and efficient spatial queries. We didn't need it for the initial solution, but it would be valuable at larger scales.

The trade-off is complexity. H3 adds another layer to understand and maintain. It's worth it when you have millions of properties and need to optimize query performance further, but it's overkill when direct spatial queries with GIST indexes are sufficient.

Caching Opportunities

Popular searches could be cached. Location name to bucket mappings could be cached. But caching adds complexity. You need invalidation strategies, you need to decide what's worth caching, and you need to handle cache misses gracefully.

For the initial solution, the database queries were fast enough that caching wasn't necessary. As traffic grows, caching becomes more valuable, but it's a future optimization, not a current requirement.

Canonical Locations

Standardizing location names would improve the user experience. Instead of showing "Sangotedo," "Sangotedo, Ajah," and "sangotedo lagos" as separate options, we could normalize to a canonical name like "Sangotedo, Lagos" for display purposes.

This is a nice-to-have, not a need-to-have. The spatial search works without it. Canonical names would make the UI cleaner and help users understand that these are the same area, but they're not required for the core functionality.

Map UX Improvements

Visual representation of the spatial buckets would help users understand how the system works. Interactive maps showing property clusters, bucket boundaries, and search results would improve the user experience.

These are user-facing improvements that come after the core functionality is solid. They make the system more understandable and engaging, but they're enhancements, not requirements.

The key is thinking ahead without over-engineering. Understanding what might be needed later helps inform current decisions, but it doesn't mean building everything now. Mature technical planning balances current needs with future possibilities.

Validation & Outcome

Technical Validation

The approach was validated through a technical assessment and follow-up discussions. The test case of three properties with different location names in the same area all being returned by a single search worked as expected. The technical approach was sound, the implementation was clean, and the results demonstrated that the problem was solved.

Strong Alignment Discussion

The technical decisions resonated with the stakeholders. The focus on spatial relationships over text matching, the database-first approach, and the emphasis on simplicity over premature optimization all aligned with how they thought about the problem. The discussion confirmed that we understood both the technical challenge and the business context.

Confirmed Production Viability

The solution is production-ready. It's scalable, maintainable, and performs well. The architecture is straightforward enough that other developers can understand and extend it. The database functions are well-tested, the indexes are properly configured, and the API layer is thin and focused.

This closes the loop: problem identified, solution designed with constraints in mind, implementation focused on what matters, business impact understood, and validation that the approach works. It's not just about writing code. It's about solving a real problem in a way that serves users, agents, and the business.

The implementation is available on GitHub for reference.

Share: