INNER JOIN ON vs WHERE clause (2009)

72 points | by thunderbong 2 days ago

58 comments

santiagobasulto 2 days ago
Crazy that there are no in-depth answers with some EXPLAINs and profiling. Just mentions to ANSI standards.
EDIT: I realized I completely misread the question. I thought it was asking the difference of using a WHERE clause or putting conditionals in the JOIN (see my reply above for an example). The original SO questions is about an old-school CROSS JOIN `table1,table2` (which is a cartesian product in relational algebra) and a JOIN.
Edit 2: "old school" in the sense of joining tables before JOINs existed. I think it was System R the first engine to introduce efficient JOINs vs cross products.
[-]
- santiagobasulto 2 days ago
  I did a quick test in Postgres using the sample Airlines database.
  Here are the two tested queries:
  Query 1:
```
    SELECT 
        t.passenger_name, 
        t.ticket_no, 
        bp.seat_no
    FROM 
        Flights f
    JOIN 
        Ticket_flights tf ON f.flight_id = tf.flight_id
    JOIN 
        Tickets t ON tf.ticket_no = t.ticket_no
    JOIN 
        Boarding_passes bp ON t.ticket_no = bp.ticket_no AND tf.flight_id = bp.flight_id
    WHERE 
        f.arrival_airport = 'OVB';
```
  Query 2:
```
    SELECT 
        t.passenger_name, 
        t.ticket_no, 
        bp.seat_no
    FROM 
        Flights f
    JOIN 
        Ticket_flights tf ON (f.flight_id = tf.flight_id AND f.arrival_airport = 'OVB')
    JOIN 
        Tickets t ON tf.ticket_no = t.ticket_no
    JOIN 
        Boarding_passes bp ON t.ticket_no = bp.ticket_no AND tf.flight_id = bp.flight_id
```
  Then I ran EXPLAIN for both of them and the query plan is THE same. So there's not a big difference at least in Postgres.
  Here's the GPT conversation: https://i.imgur.com/dIzcfnc.jpeg
  It doesn't let me share it because it contains an image
  [-]
  - shrx 2 days ago
    To me this is way more readable:
```
    SELECT 
        t.passenger_name, 
        t.ticket_no, 
        bp.seat_no
    FROM 
        Flights f, 
        Ticket_flights tf, 
        Tickets t, 
        Boarding_passes bp
    WHERE 
        f.flight_id = tf.flight_id 
        AND f.arrival_airport = 'OVB'
        AND tf.ticket_no = t.ticket_no
        AND t.ticket_no = bp.ticket_no
        AND tf.flight_id = bp.flight_id;
```
    [-]
    - wild_egg 2 days ago
      No love for JOIN ... USING in this thread eh
      SELECT t.passenger_name, t.ticket_no, bp.seat_no FROM Flights f JOIN Ticket_flights tf USING (flight_id), JOIN Tickets t USING (ticket_no), JOIN Boarding_pass bp USING (ticket_no, flight_id) WHERE f.arrival_airport = 'OVB';
      [-]
      - goodlinks 2 days ago
        Never seen this before, always thought it would be tasty sugar though.. thanks for making me aware of it!
      - yen223 2 days ago
        `using` works really well, but only when the two column names are the same.
        That's why it's not a bad idea to include the table name in the id column name: `flight.flight_id` instead of `flight.id`.
      - santiagobasulto 2 days ago
        For me, idk why, it feels too "ORACLE-ly". I stopped using Oracle after administering an 9i until ~2010 and I never want to go back
        But yes, `USING` is convenient and pleasant to the eyes.
      - Suppafly 2 days ago
        I've never seen USING before, is that available in mssql or just the various open source ones?
        [-]
        wild_egg a day ago
        I haven't used mssql since ~2013 but I think you're right that it's not available there.
        Works great in MySQL, PostgreSQL, and SQLite though.
        yen223 2 days ago
        It doesn't look like mssql (aka sql server) supports USING.
      - dhc02 2 days ago
        I like this and haven't used it before. Thanks for sharing.
    - pophenat 2 days ago
      To me placing the join predicates immediately after the tables is more readable as I don’t have to switch between looking at the from and where clauses to figure out the columns on which the tables are joined.
      [-]
      - buttercraft 2 days ago
        Yep, nothing is harder to read than joins scattered in random order throughout the where clause.
        Additionally, putting joins in the where clause breaks the separation of concerns:
        FROM: specify tables and their relationships
        WHERE: filter rows
        SELECT: filter columns
    - Suppafly 2 days ago
      I guess as long as you're giving it some criteria to join on, I had a coworker do these sorts of joins but never specified any real criteria for joining and the queries were always a mess and returned tons of extra junk. Personally I prefer to explicitly do the joins first.
    - mmcdermott 2 days ago
      I've usually found that this breaks down when there are a lot of filtering conditions besides the join condition, and multiple columns used in the joins. The WHERE clause gets long and jumbled and it is much easier to separate join conditions from filtering conditions.
  - notachatbot123 2 days ago
    > Then I ran EXPLAIN for both of them and the query plan is THE same.
    *according to a LLM. Did you verify this?
    [-]
    - sgarland 2 days ago
      My god, this is where we’re at? Asking an LLM to hallucinate a schema and the resultant EXPLAIN plans for given queries?
      Postgres is incredibly easy to spin up in a container to test things like this, and the mentioned schema (Postgres Air) is also freely available.
      [-]
      - 2 days ago
        [deleted]
    - santiagobasulto 2 days ago
      Guys, I ran an EXPLAIN in a dockerized postgres server. You can see it in the screenshot I shared. Why did you assume I was just trusting the LLM? Jeez, Hn.
      [-]
      - notachatbot123 a day ago
        I mean the actual EXPLAINs. Were they actually exactly 100% the same?
    - jgalt212 2 days ago
      seriously. I'd never ask an LLM a question I had no idea what the answer was.
    - 0cf8612b2e1e 2 days ago
      Glad you pointed this out. I was assuming author ran an actual explain. Not that the LLM made a guess.
  - serpix 2 days ago
    Both examples are (to my delight) using aliased table names for all columns which is already a major step up in readability.
    [-]
    - Izkata 2 days ago
      I tend to find table aliases a step down in readability, and only use them as necessary, because now your eyes have to jump up and down to see where the columns come from.
      [-]
      - ausp 2 days ago
        The aliases don't force you to follow each one through; eyes can do no jumping if you like in either case.
        But if you can't infer from the column name which table they will come from, I find having the option to check far more preferable to that of having no way of knowing.
        [-]
        rbanffy 2 days ago
        I think the aliases in the example are very intuitive - it's easy to correctly guess where they come from.
      - Suppafly 2 days ago
        >I tend to find table aliases a step down in readability
        I suppose it depends on your database, one of the ones I work with all the time has crazy long table and view names and aliases make resulting SQL more readable.
  - kragen 2 days ago
    That's a really impressive GPT conversation; I appreciate you sharing it!
goshx 2 days ago
There used to be a difference, if I am correct. Many years ago, we had to rewrite a lot of implicit joins before we could upgrade MySQL because they changed the precedence of tables in implicit joins between major versions. I couldn't easily find this reference, but it was well over 10 years ago.
EDIT: 10 years ago... lol it was over 20! http://download.nust.na/pub6/mysql/doc/refman/5.0/en/join.ht... search for 5.0.12 on this page.
inanutshellus 2 days ago
... and once you've switched to INNER JOIN, the next question is "when is it appropriate to add to the JOIN clause vs WHERE clause?"
as in
```
    join orders returns on returns.original_order_id = orders.order_id
    WHERE       <-------- Should order type be in the WHERE clause?
        returns.order_type = 'RETURN'
```
vs
```
    join orders returns on returns.original_order_id = orders.order_id
    AND       <-------- Should order type be in the JOIN clause?
        returns.order_type = 'RETURN'
```
One argument was that JOINs are specifically for tickling indexes and only for structural joins between tables, not for describing the table itself, so it should go in the WHERE clause. ("ALL context should be in the where clause.")
One argument was that sometimes a definition is limited by context, so strategically it makes sense to add to the JOIN clause. ("the `returns` alias MUST be limited to return orders, so it should be in the JOIN"; you'd have to adjust your indexes to account for the "order_type".)
[-]
- 0cf8612b2e1e 2 days ago
  Reading the title, I thought this was the actual question at hand.
  I try to keep JOIN logic to only on the keys. I like to think this makes reading the query more linear: now I have all of these tables, what data gets retained/removed?
  [-]
  - inanutshellus 2 days ago
    I see it as WHERE clause joining is the "can't mess it up" way to do it, and JOIN clause joining is "better in some situations".
    Imagine you're using SQL fragments. Using JOIN clauses make your code extremely clean:
```
    <sql id="getRecentReturns">
       <fragment id="ReturnOrderFragment">
       where returns.order_date between sysdate - 30 and sysdate  <------- Nice, clean code. No chance bugs from forgetting to add `returns.order_type = "Return"` or `employee.customer_type = "Employee"` or whatever.
    </sql>
```
    That said - if the guy making the table didn't make an index that includes your attribute (order_type and customer_type above), don't use it."
shrx 2 days ago
I disagree with the apparently more popular notion that INNER JOIN is more readable. Sure, it's more verbose, but that doesn't make it more readable.
[-]
- stravant 2 days ago
  The join version is also certainly less readable for anyone who doesn't normally work with databases and is just dipping into the database handling code for some reason.
  [-]
  - zzzeek 2 days ago
    I dont usually buy the argument "make this code no longer state the intent that would normally be understood by people who know the programming language, so that it's more readable for people who don't know the programming language"
- ndepoel 2 days ago
  It's all about clearly stating your intent. With INNER JOIN you're literally saying "I want to join these two tables together on this particular relation and work on the result", while with the more basic WHERE form you're saying "just lump these two tables together and then we'll filter out the rows that we actually want to see". The join becomes more of a happy side-effect with that, rather than the thing you clearly want to do.
  Not only does writing your code in such a way that it states your intent make it easier to read for other humans, it also makes it easier for compilers/query planners to understand what you're trying to do and turn it into a more efficient process at run-time. Now query planners are usually pretty good at distilling joins from WHERE clauses, but that form does also make it easier for mistakes to creep in that can murder your query performance in subtle and hard-to-debug ways.
  [-]
  - jmull 2 days ago
    > it also makes it easier for compilers/query planners to understand what you're trying to do
    Hopefully that's not true. SQL's a declarative language, where you describe what you want and the system figures out how to do it. If you describe exactly the same thing using two different syntaxes, the system shouldn't really do anything different. That just makes the programmer's job harder.
    [-]
    - sgarland 2 days ago
      Ideally, but not always true. For various versions of MySQL and Postgres, the planner will infer when a semi/anti-join can replace a WHERE [NOT] IN clause, but not always. IIRC there are still edge cases where neither one picks up on it, and so it’s always safer (and much more clear to the reader) to explicitly use WHERE [NOT] EXISTS if that’s what you want.
  - renhanxue 2 days ago
    Also, using the ON clause is consistent with the syntax for outer joins, where you have to use it (because there's a very important logical difference between putting the condition in the ON clause vs in the WHERE clause).
- qsort 2 days ago
  I think it's better at specifying intent, similarly to how you would use "for" and "while" in a programming language even though they are literally the same thing and more often than not they compile to, respectively, identical query plans and identical asm/bytecode.
  Also if you work a lot with databases you often need to do outer joins, a full cartesian product is almost never what you want. The join syntax is more practical if you need to change what type of join you are performing, especially in big queries.
- yen223 2 days ago
  It's been a very long time since I've seen a query that uses the `table1, table2` cross-join syntax.
  I'd be curious to know how many SQL people nowadays know what that does.
  [-]
  - sebastiansm 2 days ago
    I learned that syntax 5 years ago from my boss. I still prefer to use CROSS JOIN just to make it clear to other noobs like me what I'm expecting with the query.
- forinti 2 days ago
  I think it matters specially with big queries. When you have dozens of tables, it really helps to separate the joins from the where clause.
  [-]
  - chrisandchris 2 days ago
    For myself, this is even with a couple of tables valid. I think of the query as "looking up references in a book". Therefore I write global filters in the WHERE and stuff specific to a table in the ON condition.
- chasil 2 days ago
  I find it to be especially more readable when I can use JOIN ... USING().
  This is assuming that your SQL variant supports it.
robertclaus 2 days ago
For how much the standards were referenced, nobody seemed to actually explain the history of the two syntaxes or the reasoning that led to the introduction of a dedicated join statement. If anyone has an article that summarizes the actual history (or even the RFC for introducing the JOIN syntax), I would love to read it.
yyx 2 days ago
Wonder what's the difference in PRQL...
iblaine 2 days ago
Is there a difference between implicit and explicit joins? No, but use an explain plan to be sure.
(Saved you a click)
icsrutil 2 days ago
From my understanding, those two are different. the result maybe the same, but the the dataset is different after the FROM is executed.
> ON table1.foreignkey = table2.primarykey
The calculated dataset is the result, after the ON clause.
[-]
- paol 2 days ago
  No, they are the same according to SQL semantics.
  In other words it's not just that this particular case returns the same result, but that in all cases the result must be the same.
- skeeter2020 2 days ago
  I've only looked at the execution plan in SQL server, but they're the same. Because of the way they're set up you'll get an index seek and an index scan when using the ANSI syntax or the older style. The WHERE clause join does not apply a bunch of filtering on an intermediate state.
  [-]
  - sumtechguy 2 days ago
    They are effectively the same but only because the set theory algebra lets the optimizer move things around.
    The first one says 'put these two tables together then apply the where conditions that link them together and other stuff'. The second one says 'put these together but pay attention to this on condition when doing so and some other stuff'.
    Now the set theory algebra lets the optimizers pick out parts of the where condition and move them up to the matching part. You see that in your query plan. The optimizer will also add other coditions such as what you see with the index pick and the type of matching/filtering it is using.
    I personally have used/abbused the on condition to make things more clear things that really only apply to that table and matching.
    Now he does ask is it the same with MySQL. Probably but someone would need to look at the query plan it spits out. But my guess is yes.
    The question is sort of like asking is x + y = z the same as y + x = z. Well yeah sorta mostly. You can get weird cases where doing the order of operations differently gets you different speed profile results though. But the output is the same. But that is the fun of playing with optimizers and your knowing your platform characteristics.
Ingaz 2 days ago
Discussion about differences SQL-89 vs SQL-91?
In 2024.
Really?
a2800276 2 days ago
[flagged]
[-]
- red_admiral 2 days ago
  With a good enough query planner and optimizer, yes. I'm not sure if that was always the case. I can imagine historically if you were joining on fields that were only indexed in one table and not the other - I'm not saying this is an intelligent thing to do, but it sometimes happens - then controlling the order of the join yourself if the database didn't optimize this for you would be important.
  [-]
  - skeeter2020 2 days ago
    You're setting up a scenario where you'd be in trouble regardless though; controlling the join order would be a local optimization and query would still probably have a table scan, while appropriately indexing would solve any query planner mistakes and give you a more efficient execution in both styles.
- coretx 2 days ago
  But we still don't know if it's more expensive or not... I'm hitting F5 on this page hoping to find out.
  [-]
  - gsck 2 days ago
    Run explain and find out for yourself. Its highly dependant on your DB and its execution planner