Update : I made a follow-up post containing a Nginx-based solution to cache map tiles from OSM and limit the amount of PII you send

While monitoring the logs in Rethink DNS (awesome app BTW) today, I noticed the Immich app making requests to api-l.cofractal.com.

After reaching out on Immich’s discord, the devs explained to me that it is used as a tile provider for the map feature. I can confirm it is not realistic to self-host a tile provider without heavily tuning down the level of details on the map (which would still require a lot of disk space and CPU time). I understand the need for a third-party service to provide the map tiles, but I’m concerned by this one.

Visiting cofractal.com only tells us that they’re selling APIs. I did not find any details about the company, not even the country they’re registered in. The website is also missing informations about what they are logging or not. Everything else seems gated behind a login page, but they “are not currently accepting new customers”. The whois for the domain says they’re in California. Digging a bit more, I find AS26073 which apparently is the same company.

This bothers me, because Cofractal gets sent every location you viewed (and the zoom level) on Immich’s map, along with your client’s IP address and a “Referrer” header pointing to your Immich instance. This sounds like a lot of PII to me. It’s also behind cloudflare which gets to see the same stuff.

When asked about it, one dev (thanks to them for almost instantly replying to every concern/question I threw at them) explained that they personally know the people behind Cofractal. According to this Immich dev, Cofractal provides free access to its paid service to Immich’s user base as a way to support the project, with the side benefit of load testing their platform.

This explanations seems plausible and reasonable to me. However, I do not personally know the people behind Cofractal, and by default, I do not trust for-profit companies to act in an altruistic way. Here’s a summary of everything that makes me uneasy about this company :

  • it does not say anything about the kind of data they are logging or not
  • it requires digging through whois records to find the most basic info about the company
  • it freely provides access to its normally paid service (for the whole Immich user base), but it does not communicate about it (or it is really hard to find)
  • it does not communicate about anything : searching for its name only returns its home page and websites with informations on Autonomous Systems
  • it is “not currently accepting new [paying] customers” while providing the service for free to a quite large user base (Immich v1.109.2 got 170k downloads in 5 days, v1.108.0 got 438k downloads in 13 days )
  • It is not mentioned anywhere in the whole immich.app website (searching for site:immich.app "cofractal" gave me no result). Not even a “Thank You” or “Sponsor” note on the homepage for the free API
  • (it is behind cloudflare)

The dev I talked to encouraged me to create a feature request, and seemed favorable to adding a switch for disabling maps client side. It is already possible to disable it server-wide, and the “URL to a style.json map theme” option seems to provide a way to customize the tile provider. Which leads to this post : I’m trying to collect feedback on this before creating the feature request.

  • It should be made prominently clear to server admins that leaving maps enabled causes clients to send requests to a third party-server and give details about what is sent (viewed locations, zoom level, IP address, Immich instance URL). The Post Install Steps in the docs and a paragraph above the switch on the config page seem like good places to me. Are there other/more appropriate place for such a warning ?
  • The “URL to a style.json map theme” option should probably be renamed to make it clearer that it allows changing tile providers. Or better yet, it could be reworked to make it easier to choose which third-party you decide to trust
  • What do you think about the idea of providing instance admins with a list of choices for tile providers ? Maybe with a short pros/cons list in the docs for each provider. I’d be fine with using a more reputable provider with the extra step of configuring my own API key (which would probably require proxying requests to the tile provider to not share the API key with all clients)
  • Should the Immich server proxy requests to the tile provider in any case ? Since the tile provider has access to the Referrer and Origin headers (which is probably required for CORS), they are currently able to link user IP addresses with Immich instances. Proxying requests with the Immich server should prevent that.
  • I would go as far as making maps disabled by default for new installs. I understand that “disabling by default would be a significant downgrade for a majority of users”, but I feel like there’s a strong overlap between the self-hosting and privacy communities. So we should at least have some debate about it

I’ve also been told that I’m the first one to raise concerns about this, which leads to one more question : Did nobody complain because nobody noticed ? Or are my concerns unjustified ?

  • sorter_plainview@lemmy.today
    link
    fedilink
    English
    arrow-up
    17
    ·
    edit-2
    4 months ago

    I read through your comments and the reply from devs regarding OSM. I will add a few points that can be part of the feature request. I have some experience dealing with maps, and my understanding is you can set up an offline version of OSM, which will get updated only when required.

    leaflet.offline is a library which provides a similar functionality. I think with some modifications this can be implemented to significantly reduce the load on OSM that using it directly.

    Even with a very large zoom level say 11 to 15, a large area of maps takes like a few hundred MBs. We once cached the entire region of California with all the details and it was around 240 MB IIRC. But Immich does not need this much details and it is possible to restrict zoom levels to certain details.

    For someone self hosting several hundreds of GBs of photos, this should be doable without using too much storage. I think the problem will be that this is a huge engineering effort. Depending on the priority of the feature it may not be easy to do this.

    There is a site called Switch2OSM which details almost everything you need to know. The previous link is on how to serve map tiles on your own. Again it is a daunting task and not suitable for everyone.

    If anyone needs a live update of OSM as things get added, look into the commercial offerings.

    In conclusion, it is possible to include a highly optimised version of OSM, instead of putting their servers under heavy load. The catch is, it is not easy and will need a huge engineering effort. I think developers should take a call on this.

    • pcouy@lemmy.pierre-couy.frOP
      link
      fedilink
      English
      arrow-up
      4
      ·
      4 months ago

      Thanks for sharing your experience and for the links.

      Do you think it would be doable to make/host a tileserver that only generates the first few zoom levels for the whole planet by default, and is able to generate tiles for more detailed zoom levels only for specific locations ? I’m thinking of a feature where Immich asks the tile server to generate the appropriate tiles based on the locations of photos. Since we only ever zoom on locations where photos have been taken, and we often take several photos at the same locations, could this decrease the requirements enough for self-hosting ?

      • sorter_plainview@lemmy.today
        link
        fedilink
        English
        arrow-up
        2
        ·
        4 months ago

        If you are asking about vector maps, I am not really sure, because I have no experience with it. So can’t really comment on that. On raster maps, as you already know every tile is a PNG. The behaviour you described is very similar to the client side caching that usually happens in the browser. Depending on the coordinates in the viewport and zoom level the server provides the tiles.

        Usually to save the map most offline map making tools will ask you to draw a rectangle and select the required zoom levels. In an interactive map, the rectangle is the viewport of the device. So there can be a feature which will download and store the tiles around a specific gps location for a fixed geographical area. That should be doable without much issue. But in this case that may not be a good idea.

        If you visualise all zoom levels stacked over each other, the images need to be retrieved when the user zooms into a point the geographical area will not stay the same. Smaller geographical area is only needed with higher zoom levels. If we only take all the tiles that get downloaded in every layer, it may produce a shape similar to an inverted pyramid. So saving the images as a user zooms in for the first time, may be the best idea.

        Then the saved tiles need to be used again when users zoom in the same area. Also these tiles need not be updated frequently and maybe even once in every 3 months might be enough, that too only when the user zooms in again in that area.

        This can be a little tricky as almost all the tools that create offline maps do it for a fixed area and selected zoom levels, every point in that area gets equal priority. But in this case the point is the important element. The area nearby may not be relevant at all. So that is the part that needs some exploration.

        • pcouy@lemmy.pierre-couy.frOP
          link
          fedilink
          English
          arrow-up
          2
          ·
          edit-2
          4 months ago

          I’ll try clarifying what I had in mind :

          I tried running maptiler to generate tiles from OSM’s data, which required an insane amount of time and resources (not doable for most self-hosters including myself, even for a single country) to process the data and store the results. I was wondering if there would be a way to ask maptiler (or another equivalent tool) to only generate tiles that contain points from a given set (in this case, photos) and maybe the tiles adjacent to them. What about doing this for every zoom level ? This would require generating at most zoom_levels * n_photos (* 9) if we include adjacent) tiles, and a lot less for the typical person taking several photos at the same place.

          • sorter_plainview@lemmy.today
            link
            fedilink
            English
            arrow-up
            2
            ·
            edit-2
            4 months ago

            Hey I understood what you meant. The result that you are trying to achieve is very close to the browser caching normally present is what I meant. When you zoom in it will only load that area. And I don’t think you can specify the number of tiles to be a specific number, since the zoom levels are not linear.

            The offline leaflet I shared in the previous comment actually does the same thing you want to achieve. The difference is the offline mode is discarded immediately when the system is back online. So that library could be modified to incorporate the time dependency and users visiting a point again I specified in the last comment, at least in theory.

            Regarding OSM data, there are zip files available for downloading. Geofabrik and openstreetmap.fr are examples. Another tool is Protomaps, where you can download by drawing a polygon. But these are not going to be the ideal solution for a product like Immich.

            By the way I saw your update. Great job on following up and providing a fix for others. I really really appreciate it.