11/21/2023 0 Comments Athena json queryGlue Crawlers can, for example, flip-flop the schema of a table from run to run if the sample they look at each time is different enough. In some situations tools like these might be helpful, but in other situations they might just make it worse. Glue Crawlers, for example, will read a sample of your data and figure out which properties exist, and their types, and at re:Invent 2019, AWS launched a feature that detects the schemas of events sent through EventBridge. Lots of tools exist that are aimed at helping you figure out the schema of JSON data. There’s very rarely need to fix the data, Athena is all about making it possible to read the data you have. If you get it wrong you can just drop the table and recreate it with a new schema that captures some aspect you had previously missed. Experimenting with different table designs in Athena is cheap, though. If you don’t control the code that produces a data set it can take some time to figure out what its schema is. This is roughly what the term “schemaless” means there’s no formal schema, and it’s possible for documents to have slightly different schemas, but when zooming out and looking at all documents together they look more alike than not. ![]() JSON doesn’t require a schema, although most JSON data probably has some kind of informal schema defined by the code that produces it. More importantly, if you define a struct column, the fields of the struct work just like the columns of a table: if they’re not found in the data the value defaults to NULL, and if there are more properties in the data than in the table definition, those properties are ignored. If you define a column as array and the JSON document has a list of numbers, that works too, as long as type coercion works there is usually no problem. ![]() What’s good to know is that Athena is fairly forgiving when it comes to describing complex types. The official Athena documentation makes a fairly good job at describing how to create tables for JSON data. Most of the properties are usually scalar, with one or two lists of either strings or fairly simple objects with one or two properties each. In my experience, most JSON data isn’t very hierarchical. Being able to describe most JSON data in table form is one of the most powerful features of Athena. If you are using Athena to query JSON data you have most likely already worked with complex types in your data in the form of an array property or an object property. Complex types can have arbitrarily complex structure, for example array,tags:map> could be a way to describe a list of article metadata. Complex types are types that refer to other types – or more concretely in the case of Athena, array (lists of elements), map (key/value associations), and struct (key/value associations with a fixed schema).Ĭomplex types can be for example array (an array of strings), map (a map with string keys and boolean values), or struct (a record with a string property called “name” and an integer property called “age”). Simple types, or “scalars”, are things like number, boolean, string, and timestamp. In this article I’m going to cover how to use complex types in three different contexts: how to create tables when you have complex types in data, how to work with complex types in queries, and how to deal with complex types in results.įirst, let’s define complex types so that we’re on the same page. With Athena this is almost never an issue. ![]() The data I worked with almost always contained lists of strings and other similarly “simple” complex types. When you try to describe the world you often end up with lists of things, the one-or-more type of relationship is very common, and things like lists of tags, or key/value pairs of metadata is more or less standard.īefore Athena I worked a lot with Redshift, and was often frustrated with the lack of complex types. Developers like data formats like JSON because they allow for expressing things like one-to-many relationships naturally and in a self-contained manner. The relational model wasn’t made with complex types in mind, but modern data is very rarely flat. It even supports lists and maps in CSV files, if you really want it to. Most of the data formats that Athena supports have support for complex types in the form of lists and maps.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |